Last updated: May 29, 2026
Application No. 17/761,872
METHOD AND ELECTRONIC DEVICE FOR RECOGNIZING SONG, AND STORAGE MEDIUM

Non-Final OA §101§103
Filed
Mar 18, 2022
Priority
Sep 19, 2019 — CN 201910887630.8 +1 more
Examiner
RONES, CHARLES
Art Unit
2168
Tech Center
2100 — Computer Architecture & Software
Assignee
Tencent Music Entertainment Technology (Shenzhen) Co. Ltd.
OA Round
2 (Non-Final)
This examiner grants 22% of cases after interview

— +28.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 46 resolved cases, 2023–2026
Examiner Intelligence

RONES, CHARLES View full profile →
Grants only 22% of cases
Career Allowance Rate
10 granted / 46 resolved
-33.3% vs TC avg
Strong +29% interview lift
Without
With
+28.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
4 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
1.9%
-38.1% vs TC avg
§103
73.0%
+33.0% vs TC avg
§102
15.1%
-24.9% vs TC avg
§112
6.3%
-33.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 46 resolved cases
Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
Claims 1-4, 6-8, 14-17 and 19-22 are pending, of which claims 1 and 14 are in independent form.  Claims 1-4, 6-8, 14-17 and 19-22 are rejected under 35 U.S.C. 101 and 35 U.S.C. 103.

Response to Claim Amendments and Arguments
On pages 8-11 of the remarks filed on 14 February 2025, with respect to the 35 U.S.C. 101 rejections of the claims, Applicant argues the independent claims are not directed towards a simple mathematical calculation, or a subjective mental process and the additional elements amount to more than mere data gathering and generic computer components performing generic functions and Applicant points to improvements to the state of the art which support a position that even assuming arguendo, the claims are found to be directed towards a judicial exception, the additional elements integrate the claim into a practical application that amounts to significantly more than the judicial exception itself.  



Examiner’s Response:
Examiner is of the position the claim limitations reciting transforming a song segment into a spectrum map to be input into data models used to create a feature vector and calculating the similarity of said feature vector to stored feature vectors to determine a match, as recited in the independent claims, are directed towards the judicial exception, abstract idea, enumerated groupings of a mathematical concept and mental process.  The additional elements of the independent claims identified by the Examiner amount to insignificant extra solution activity and therefore do not integrate the judicial exception into a practical application that amounts to significantly more than the judicial exception itself such that the independent claims as a whole would be patent eligible, detailed below.

Applicant’s Argument:
On pages 13-14 of the remarks, Applicant argues that while the Bokar reference discloses song recognition that involves acquiring an audio input, dividing the input into frames, generating a spectrum map and inputting the frames into a neural network and outputting a vector representing multiple features, Bokar does not disclose using an exponential linear unit activation function in the CNN, nor does Bokar disclose a dividing and encoding network which is specifically tailored to transform high-dimensional audio into a low-dimensional feature vector.



Examiner’s Response:
Examiner has reviewed the cited prior art, considered the claim amendments and arguments, conducted an updated prior art search, and applied a new reference to address the claims as amended detailed below.

Claim Rejections – 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-4, 6-8, 14-17 and 19-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-patentable subject matter. The claims are directed to an abstract idea without significantly more.
Independent claims 1 and 14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The judicial exception is not integrated into a practical application. The claims do not include additional elements that are sufficient to amount to significantly more than judicial exception. The eligibility analysis in support of these findings is provided below, in accordance with the “2019 Revised Patent Subject Matter Eligibility Guidance” (published on 1/7/2019 in Fed, Register, Vol. 84, No. 4 at pgs. 50-57, hereinafter referred to as the “2019 PEG”).
Step 1. In accordance with Step 1 of the eligibility inquiry (as explained in MPEP 2106), it is first noted the method of claim 1 [i.e., process] and the device of claim 14 [i.e., apparatus] are both directed towards one of the eligible categories of subject matter, and therefore satisfies Step 1.  
Step 2A. In accordance with Step 2A Prong One of 2019 PEG, it is noted that the independent claims recite an abstract idea falling within the mathematical concepts and mental processes enumerated groupings of abstract ideas set forth in the 2019 PEG.  Examiner is of the position that independent claims 1 and 14 are directed towards the Mathematical Concepts Grouping of Abstract Ideas including: mathematical relationships; mathematical formulas or equations; and mathematical calculations, and the Mental Processes; concepts performed in the human mind (including an observation, evaluation, judgment, opinion), more specifically, the independent claims recite the following limitations directed towards Mathematical Concepts and Mental Processes:

“transforming the target song segment to generate a corresponding first spectrum map”, as drafted recites a mathematical relationship as illustrated in dependent claim 2 and on page 8, paragraph 3 of the Specification reciting a mathematical transform;

“performing a convolution operation in a convolutional neural network of the preset neural network model to generate a feature tensor, and encoding the feature tensor according to a dividing-and-encoding network of the preset neural network model to generate a multi-dimensional first feature vector, wherein an activation function of the convolutional neural network is exponential linear unit and the first feature vector represents information contained the target song fragment”, as drafted recites a mathematical relationship as illustrated on page 8, paragraph 1 of the Specification reciting performing a convolution operation;

“acquiring, by a memory of the electronic device, second feature vectors of pre-stored songs, wherein one pre-stored song is divided into a plurality of pre-stored song segments, one pre-stored song segment corresponds to one second feature vector, the second feature vector representing information contained in the pre-stored song segment, and the first feature vector and the second feature vectors have the same number of dimensions”, as drafted recites a mathematical relationship as illustrated on page 9, paragraph 3 of the Specification reciting dividing a song of duration 240s into 24 pre-stored song segments of a preset duration of 10s;

“calculating, by a processor of the electronic device, similarities between the first feature vector and the second feature vectors, and determining a maximum similarity”, as drafted recites a mathematical relationship; and

“determining, by the processor of the electronic device, that the target song segment and a pre-stored song corresponding to the maximum similarity are different versions of the same song in response to the maximum similarity being greater than a preset threshold”, as drafted recites a mental process of determining based on a comparison or evaluation to a threshold as illustrated by the claim language.

With respect to Step 2A Prong Two of the 2019 PEG, the independent claim’s additional elements, as interpreted by the Examiner, are identified below:

“acquiring a target song segment and transforming the target song segment to generate a corresponding first spectrum map”, as drafted recites insignificant extra solution activity that amounts to mere data gathering (See MPEP 2106.05(g) Examples of activity that courts have found to be insignificant extra-solution activity: Mere Data Gathering: example iii); and

“inputting the first spectrum map into a preset neural network model”, as drafted recites insignificant extra solution activity that amounts to mere data gathering (See MPEP 2106.05(g) Examples of activity that courts have found to be insignificant extra-solution activity: Mere Data Gathering: example iii).

The additional element identified above fail to integrate the abstract idea into a practical application because the additional elements of acquiring data, identified above, amounts to insignificant extra-solution activity, See MPEP 2106.05(g) which lists three considerations when making a determination as to whether additional elements are insignificant extra-solution activity.  
Step 2B.  Similar to the analysis under 2A Prong Two, because the additional elements of the independent claims amount to insignificant extra solution activity, the additional elements do not add significantly more to the judicial exception such that the independent claim as a whole would be patent eligible.

Therefore, independent claims 1 and 14 are rejected under 35 U.S.C. 101.  Additionally, dependent claim 22 is rejected under the same rationale as claim 1 provided above.

Dependent claims 2-4, 6-8, 15-17 and 19-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The judicial exception is not integrated into a practical application. The claims do not include additional elements that are sufficient to amount to significantly more than judicial exception. The eligibility analysis in support of these findings is provided below, in accordance with the “2019 Revised Patent Subject Matter Eligibility Guidance” (published on 1/7/2019 in Fed, Register, Vol. 84, No. 4 at pgs. 50-57, hereinafter referred to as the “2019 PEG”).
Step 1. Analyzed above.
Step 2A. In accordance with Step 2A Prong One of 2019 PEG, it is noted that the dependent claims all recite an abstract idea falling within the mathematical concepts enumerated groupings of abstract ideas set forth in the 2019 PEG.  Examiner is of the position that dependent claims 2-4, 6-8 and 15-17 and 19-21 are directed towards the Mathematical Concepts Grouping of Abstract Ideas including: mathematical relationships; mathematical formulas or equations; and mathematical calculations, more specifically, the dependent claims recite the following limitations directed towards Mathematical Concepts:

“Claim 2…wherein said transforming the target song segment to generate the corresponding first spectrum map comprises: performing a short-time Fourier transform on the target song segment to generate a corresponding first spectrum map.” as drafted recites a mathematical relationship;

“Claim 3...wherein said transforming the target song segment to generate the corresponding first spectrum map comprises: down-sampling the target song segment at a preset sampling rate; and transforming the down-sampled target song segment to generate a corresponding first spectrum map.”, as drafted recites a mathematical relationship as illustrated at page 7, last paragraph through page 8, paragraph 3 of the Specification reciting down-sampling to 16KHz;

“Claim 4... wherein said down-sampling the target song segment at the preset sampling rate comprises: determining whether a duration of the target song segment is greater than a preset duration; if yes, adjusting the duration of the target song segment to the preset duration; and down-sampling, at a preset sampling rate, the target song segment of the preset duration.”, as drafted recites a mathematical relationship as illustrated at page 7, last paragraph through page 8, paragraph 3 of the Specification reciting down-sampling to 16KHz;

“Claim 6...transforming the feature tensor into one-dimensional data by the input layer…dividing the one-dimensional data into n parts by the data segmentation layer…”, as drafted recites a mathematical relationship as illustrated on page 8, paragraph 3 of the Specification reciting a mathematical transform;

“Claim 7...dividing the down-sampled pre-stored song into a plurality of pre-stored song segments having a preset duration; performing a short-time Fourier transform on the pre-stored song segments to generate corresponding second spectrum maps; and generating second feature vectors according to the second spectrum maps and the preset neural network model, associating the second feature vectors with the pre-stored song segments and the pre-stored song…”, as drafted recites a mathematical relationship as illustrated on page 9, paragraph 3 of the Specification reciting dividing a song of duration 240s into 24 pre-stored song segments of a preset duration of 10s and on page 8, paragraph 3 of the Specification reciting a mathematical transform; and 

“Claim 8...wherein said calculating the similarities between the first feature vector and the second feature vectors comprises: calculating Euclidean distances between the first feature vector and the second feature vectors, and determining similarities between the first feature vector and the second feature vectors according to the Euclidean distances, wherein the smaller the Euclidean distance is, the greater the similarity is.”, as drafted recites a mathematical relationship.

Dependent claims 15-17 and 19-21 are similar in scope to dependent claims 2-4 and 6-8 mapped above.

With respect to Step 2A Prong Two of the 2019 PEG, the dependent claims additional elements, as interpreted by the Examiner, are identified below:

“Claim 6…wherein the dividing-and-encoding network comprises an input layer, a data segmentation layer, a fully-connected layer and an output layer, and said encoding the feature tensor according to the dividing-and-encoding network to generate the multi-dimensional first feature vector comprises: inputting the feature tensor into the dividing-and-encoding network…and inputting the one- dimensional data into the data segmentation layer…and connecting each part to the fully-connected layer…outputting n eigenvalues by the output layer, wherein the n eigenvalues constitute an n-dimensional first feature vector and n is a positive integer greater than 1.” , as drafted recites generic computer components performing generic computer functions [i.e., the dividing-and-encoding network….comprising layers…] a convolutional neural network and a dividing and encoding network] (See MPEP 2106.05(f)(2) examples where the courts have found the additional elements to be mere instructions to apply an exception, because they do no more than merely invoke computers or machinery as a tool to perform an existing process include: Example iii) that amount to insignificant extra solution activity [i.e., inputting…connecting layers…outputting] (See MPEP 2106.05(g) Examples of activity that courts have found to be insignificant extra-solution activity: Mere Data Gathering: example iii); and

“Claim 7…further comprising: acquiring a pre-stored song and down-sampling the pre-stored song at a preset sampling rate…and storing the second feature vectors, pre- stored song segments and pre-stored song which are associated in a pre-stored song set.”, as drafted recites insignificant extra solution activity that amounts to mere data gathering (See MPEP 2106.05(g) Examples of activity that courts have found to be insignificant extra-solution activity: Mere Data Gathering: example iii).

The additional element identified above fail to integrate the abstract idea into a practical application because the additional elements claiming computer components recites generic computer components performing generic computer functions and the additional elements of inputting, connecting and outputting data, amounts to insignificant extra-solution activity, See MPEP 2106.05(g) which lists three considerations when making a determination as to whether additional elements are insignificant extra-solution activity.  
Step 2B.  Similar to the analysis under 2A Prong Two, because the additional elements of the independent claims amount to generic computer components performing generic computer functions and insignificant extra solution activity, the additional elements do not add significantly more to the judicial exception such that the independent claim as a whole would be patent eligible.
Therefore, dependent claims 2-4, 6-8, 15-17 and 19-21 are rejected under 35 U.S.C. 101.

Examiner Note
The Chen reference cited below and referenced in the attached PTO-892 form claims priority to an earlier filed provisional application which supports the paragraphs of Chen cited below.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6-8, 14-17 and 19-22 are rejected under 35 U.S.C. 103 as being unpatentable over Bokar U.S. Pub. No. 2019/0164557 (hereinafter “Bokar”) in view of Chen et al. U.S. Pub. No. 2020/0105398 (hereinafter “Chen”) in view of Henry U.S. Pub. No. 2018/0174037 (hereinafter “Henry”) in further view of Ellis et al. U.S. Pub. No. 2013/0226957 (hereinafter “Ellis”).
Regarding independent claim 1, Bokar discloses:
A method for recognizing a song, applied to a real-time recognition system, and the method comprising: acquiring, by a voice component of an electronic device a target song segment and transforming the target song segment to generate a corresponding first spectrum map (Bokar at paragraph [0006] discloses receiving an audio input that is divisible into a plurality of audio frames [i.e., target song segment].   Further, Bokar at paragraph [0029] discloses generating a spectrum map of an audio signal or input.  Lastly, Bokar at paragraph [0053] discloses an electronic device for implementing the disclosure.)

inputting the first spectrum map into a preset neural network model (Bokar at paragraph [0030] discloses inputting an audio frame into a neural network [i.e., preset neural network model] and outputting a vector representative of a plurality of features.)

acquiring, by a memory of the electronic device, second feature vectors of pre-stored songs, wherein one pre-stored song is divided into a plurality of pre-stored song segments, one pre-stored song segment corresponds to one second feature vector, the second feature vector representing information contained in the pre-stored song segment, and the first feature vector and the second feature vectors have the same number of dimensions (Bokar at paragraphs [0021] and [0027] discloses training a model with a training dataset that comprises audio samples.  Additionally, Bokar at paragraph [0006] discloses dividing an audio input into a plurality of frames and extracting features from the audio frames and generating feature vectors from the extracted features.  Lastly, Bokar at [0035] discloses for each audio frame generating a matrix of ‘n’ by ‘D’ where ‘D’ is the number of dimensions [i.e., the first feature vector and the second feature vectors have the same number of dimensions…].)

calculating, by a processor of the electronic device, similarities between the first feature vector and the second feature vectors, and determining a maximum similarity (Bokar at paragraph [0021] discloses calculating a maximum likelihood estimation [i.e., maximum similarity] using a trained model and generated vectors.) 

While Bokar at paragraphs [0028]-[0030] and [0035] discloses inputting audio frames into a neural network and using the neural network to generate a matrix of size ‘n’ by ‘D’ where ‘D’ is the number of dimensions [i.e., generate a feature tensor] and spectrum data to generate a vector representative of a plurality of features, Bokar does not disclose:
performing a convolution operation in a convolutional neural network of the preset neural network model to generate a feature tensor,network is exponential linear unit and the first feature vector represents information contained the target song fragment.
However, Chen teaches, at paragraph [0019] teaches, performing a convolution operation in a convolutional neural network of the preset neural network model (i.e., “Two of the most widely used types of deep learning architectures are convolutional neural networks (CNNs) and recurrent neural networks (RNNs). We developed a DNN architecture that integrates a CNN with a RNN in a sequential structure (FIG. 4a)…The convolution operation consists of 40 one-dimensional (1D) temporal filters (vectors) of size 1×7, followed by exponential linear activation” Para.[0043])  to generate a feature tensor…wherein an activation function of the convolutional neural network is exponential linear unit and the first feature vector represents information contained the target song fragment (i.e., “The convolution operation consists of 40 one-dimensional (1D) temporal filters (vectors) of size 1×7, followed by exponential linear activation” Para.[0043] and “We used exponential linear units (ELUs) as the activation function for all the CNN units due to its demonstrated faster learning time and higher accuracy compared to other types of nonlinearity.” Para. [0071] and “The deep neural network may comprise a convolutional neural network (CNN) followed sequentially by a recurrent neural network (RNN). For example, the CNN may be used to learn various features from the set of reference data and the features may be further passed to the RNN to discover temporal patterns within the CNN features.” Para. [0019])  
Both the Bokar reference and the Chen reference, in the sections cited by the Examiner, are in the field of endeavor of using neural networks to identify data.  Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the use of a neural network to generate a feature vector as disclosed in Bokar with the use of a CNN and an activation function using an exponential linear unit to identify data features as taught in Chen to facilitate in increasing performance and accuracy in identifying features in data (See Chen at paragraph [0071]).

While Bokar at paragraphs [0028]-[0030] and [0035] discloses inputting audio frames into a neural network and using the neural network to generate a matrix of size ‘n’ by ‘D’ where ‘D’ is the number of dimensions and spectrum data to generate a vector representative of a plurality of features, Bokar does not disclose:
and encoding the feature tensor according to a dividing-and-encoding network of the preset neural network model to generate a multi-dimensional first feature vector.
However, Henry at paragraph [0059] teaches using a convolution neural network to generate a context vector.  Additionally, Henry teaches, and encoding the feature tensor according to a dividing-and-encoding network of the preset neural network model to generate a multi-dimensional first feature vector (i.e., “encoding component 840 may perform a divide-and-encode operation so that the output has a smaller dimension than the input and that the output is an approximately binary vector (many of the elements of the output vector are close to 0 or 1).” Para. [0063])
Both the Bokar reference and the Henry reference, in the sections cited by the Examiner, are in the field of endeavor of using neural networks to identify data.  Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the use of a neural network to generate a feature vector as disclosed in Bokar with the use of a divide and encode operation taught in Henry to facilitate in the output having a smaller dimension than the input (See Henry at paragraph [0063]).

While Bokar at paragraphs [0006] and [0021] discloses using thresholds in similarity calculations and calculating similarity between audio data using a maximum likelihood estimation of vectors generated using audio data, Bokar does not disclose:
determining, by the processor of the electronic device, that the target song segment and a pre-stored song corresponding to the maximum similarity are different versions of the same song in response to the maximum similarity being greater than a preset threshold.
In other words, Bokar does not disclose identifying audio data as a different version of the same song.
However, Ellis at paragraph [0080] teaches identifying and storing different versions of a reference song by comparing a reference song vector with a query song vector.
Both the Bokar reference and the Ellis reference, in the sections cited by the Examiner, are in the field of endeavor of identifying audio data.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the similarity calculations between audio data to identify said audio data as disclosed in Bokar with the comparing and identifying of audio data from different versions of the same song as taught in Ellis to facilitate in automatically identifying similar songs (See Ellis at paragraphs [0004]-[0005]).

Regarding dependent claim 2, all of the particulars of claim 1 have been addressed above.  Additionally, Bokar discloses:
wherein said transforminq the target song segment to generate the corresponding first spectrum map comprises: performing a short-time Fourier transform on the target song segment to generate a corresponding first spectrum map. (Bokar at paragraphs [0028] - [0029] discloses performing Fourier transformations on audio frames.)   

Regarding dependent claim 3, all of the particulars of claim 1 have been addressed above.  While Bokar at paragraph [0006], [0021] and [0029] discloses using audio samples of an audio input and generating a spectrum map to identify matching audio frames, Bokar does not disclose:
wherein said transforming the target song segment to generate the corresponding first spectrum map comprises: down-sampling the target song segment at a preset sampling rate; and transforming the down-sampled target song segment to generate a corresponding first spectrum map.
In other words, Bokar does not disclose, down-sampling the target song segment at a preset sampling rate.
However, Ellis at paragraphs [0041] and [0131] teaches identifying similar songs by creating vectors by sampling and resampling audio frames using different rates [i.e., down-sampling the target song segment at a preset sampling rate] and generating a spectrogram of Fourier transforms.  
Both the Bokar reference and the Ellis reference, in the sections cited by the Examiner, are in the field of endeavor of identifying audio data.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the similarity calculations between audio data and audio frames to identify said audio data as disclosed in Bokar with the sampling and resampling of audio data at different rates and comparing and identifying said audio data as taught in Ellis to facilitate in automatically identifying similar songs (See Ellis at paragraphs [0004]-[0005]).

Regarding dependent claim 4, all of the particulars of claims 1 and 3 have been addressed above.  Additionally, Bokar as modified with Ellis discloses:
wherein said down-sampling the target song segment at the preset sampling rate comprises: determining whether a duration of the target song segment is greater than a preset duration; if yes, adjusting the duration of the target song segment to the preset duration; and down-sampling, at a preset sampling rate, the target song segment of the preset duration (Examiner is relying on Ellis at paragraphs [0041] and [0131] to teach down-sampling the target song segment at a preset sampling rate.  Additionally, Bokar at paragraph [0029] discloses feature extraction using a specified time window duration, such as for a 50 millisecond time window.  The motivation to combine statement previously provided in the rejection of dependent claim 3 provided above, combining the Bokar reference and the Ellis reference is applicable to dependent claim 4.)

Regarding dependent claim 6, all of the particulars of claim 1 have been addressed above.  While Bokar at paragraphs [0021]-[0026] discloses estimating parameters [i.e., eigenvalues] which are a best match of the distribution of training feature matrices, and Bokar at paragraphs [0028]-[0030] and [0035] discloses inputting audio frames into a neural network and using, the neural network, a generated matrix of size ‘n’ by ‘D’ where ‘D’ is the number of dimensions [i.e., generate a feature tensor] and spectrum data to generate a vector representative of a plurality of features, Bokar does not disclose:
wherein the dividing-and-encoding network comprises an input layer, a data segmentation layer, a fully-connected layer and an output layer, and said encoding the feature tensor according to the dividing-and-encoding network to generate the multi-dimensional first feature vector comprises: inputting the feature tensor into the dividing-and-encoding network, transforming the feature tensor into one-dimensional data by the input layer, and inputting the one- dimensional data into the data segmentation layer; dividing the one-dimensional data into n parts by the data segmentation layer and connecting each part to the fully-connected layer; and after an operation in the fully-connected layer, outputting n eigenvalues by the output layer, wherein the n eigenvalues constitute an n-dimensional first feature vector and n is a positive integer greater than 1.
However, Henry at paragraph [0058] teaches a multi-layer neural network to create embedded vectors.  Examiner is of the position that the limitations of the claims specifying how and what type of data is transferred between the layers of a CNN network comprises a specific use case and is given little patentable weight.  The motivation to combine statement combining the Bokar reference with the Henry reference provided in the rejection of claim 1 is applicable to dependent claim 6 as well.

Regarding dependent claim 7, all of the particulars of claim 1 have been addressed above.  Additionally, Bokar as modified with Ellis discloses:
acquiring a pre-stored song and down-sampling the pre-stored song at a preset sampling rate; dividing the down-sampled pre-stored song into a plurality of pre-stored song segments having a preset duration; performing a short-time Fourier transform on the pre-stored song segments to generate corresponding second spectrum maps; and generating second feature vectors according to the second spectrum maps and the preset neural network model, associating the second feature vectors with the pre-stored song segments and the pre-stored song, and storing the second feature vectors, pre- stored song segments and pre-stored song which are associated in a pre-stored song set (Bokar at paragraph [0006], [0021] and [0029] discloses dividing an audio input into audio frames of a specified duration and generating vectors and a spectrum map of Fourier transforms to identify matching audio frames.   Additionally, Bokar at paragraphs [0021], [0027] and [0030] discloses training a model with a training dataset of audio samples and utilizing a neural network in feature extraction.  Lastly, Examiner is relying on Ellis at paragraphs [0041] and [0131] to teach down-sampling…audio data…at a preset sampling rate.  The motivation to combine statement previously provided in the rejection of dependent claim 3 provided above, combining the Bokar reference and the Ellis reference is applicable to dependent claim 7.)

Regarding dependent claim 8, all of the particulars of claim 1 have been addressed above.  While Bokar at paragraph [0021] discloses calculating a maximum likelihood estimation using a trained model and generated vectors, Bokar does not disclose:
wherein said calculating the similarities between the first feature vector and the second feature vectors comprises: calculating Euclidean distances between the first feature vector and the second feature vectors, and determining similarities between the first feature vector and the second feature vectors according to the Euclidean distances, wherein the smaller the Euclidean distance is, the greater the similarity is.
In other words, Bokar does not disclose using Euclidean distance in calculating similarities between data.
However, Ellis at paragraphs [0006] and [0015] teaches using a Euclidean distance between vectors to identify similar songs.
Both the Bokar reference and the Ellis reference, in the sections cited by the Examiner, are in the field of endeavor of identifying audio data.  Before the effective filing date of the claimed invention it would have been obvious to one of ordinary skill in the art to combine the similarity calculations between audio data and audio frames to identify said audio data as disclosed in Bokar with the use of Euclidean distances in calculating similarities between data as taught in Ellis to facilitate in automatically identifying similar songs (See Ellis at paragraphs [0004]-[0005]).

Regarding independent claims 14, while independent claim 14, a device claim, and independent claim 1, a method claim, are directed towards different statutory classes, they are similar in scope.  Therefore, claim 14 is rejected under the same rationale as claim 1.  With respect to the hardware limitations specified in the claim, specifically, a memory and a processor… Bokar at paragraphs [0048]-[0050] discloses similar hardware implementations.

Regarding dependent claim 15, all of the particulars of claim 14 have been addressed above.  Additionally, claim 15 is rejected under the same rationale as claim 2 provided above.

Regarding dependent claim 16, all of the particulars of claim 14 have been addressed above.  Additionally, claim 16 is rejected under the same rationale as claim 3 provided above.

Regarding dependent claim 17, all of the particulars of claims 14 and 16 have been addressed above.  Additionally, claim 17 is rejected under the same rationale as claim 4 provided above.

Regarding dependent claim 19, all of the particulars of claims 14 and 18 have been addressed above.  Additionally, claim 19 is rejected under the same rationale as claim 6 provided above.

Regarding dependent claim 20, all of the particulars of claim 14 have been addressed above.  Additionally, claim 20 is rejected under the same rationale as claim 7 provided above.

Regarding dependent claim 21, all of the particulars of claim 14 have been addressed above.  Additionally, claim 21 is rejected under the same rationale as claim 8 provided above.

Regarding dependent claims 22, while dependent claim 22, a non-transitory storage medium claim, and independent claim 1, a method claim, are directed towards different statutory classes, they are similar in scope.  Therefore, claim 22 is rejected under the same rationale as claim 1.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANTHONY G GEMIGNANI whose telephone number is (571)272-1018. The examiner can normally be reached M-F 8-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amy Ng can be reached at 571-270-1698. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/A.G.G./Examiner, Art Unit 2164                                                                                                                                                                                                        
/AMY NG/Supervisory Patent Examiner, Art Unit 2164
Read full office action
Prosecution Timeline

Mar 18, 2022
Application Filed
Nov 15, 2024
Non-Final Rejection mailed — §101, §103
Feb 14, 2025
Response Filed
Jun 11, 2025
Final Rejection mailed — §101, §103
Aug 11, 2025
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/118,404
Patent 12585564
METHODS FOR CONFIGURING SPAN OF CONTROL UNDER VARYING TEMPERATURE
5y 3m to grant Granted Mar 24, 2026
15/480,966
Patent 10996865
APPLICATION-SPECIFIC MEMORY SCALING IN MULTI-DEVICE SYSTEMS
4y 0m to grant Granted May 04, 2021
15/282,878
Patent 10990284
ALERT CONFIGURATION FOR DATA PROTECTION
4y 7m to grant Granted Apr 27, 2021
15/462,397
Patent 10978169
PAD DETECTION THROUGH PATTERN ANALYSIS
4y 0m to grant Granted Apr 13, 2021
15/463,750
Patent 10971241
PERFORMANCE BASED METHOD AND SYSTEM FOR PATROLLING READ DISTURB ERRORS IN A MEMORY UNIT
4y 0m to grant Granted Apr 06, 2021
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
22%
Grant Probability
50%
With Interview (+28.7%)
3y 6m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 46 resolved cases by this examiner. Grant probability derived from career allowance rate.