Last updated: April 19, 2026
Application No. 17/841,322
SYSTEMS AND METHODS FOR CLASSIFYING MUSIC FROM HETEROGENOUS AUDIO SOURCES

Non-Final OA §101§103§112
Filed
Jun 15, 2022
Examiner
DUONG, HIEN LUONGVAN
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
Netflix Inc.
OA Round
3 (Non-Final)
Interview Optional

— +22.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 643 resolved cases, 2023–2026
Examiner Intelligence

DUONG, HIEN LUONGVAN View full profile →
Grants 75% — above average
Career Allow Rate
480 granted / 643 resolved
+19.7% vs TC avg
Strong +23% interview lift
Without
With
+22.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
42 currently pending
Career history
685
Total Applications
across all art units
Statute-Specific Performance

§101
11.0%
-29.0% vs TC avg
§103
51.5%
+11.5% vs TC avg
§102
18.5%
-21.5% vs TC avg
§112
6.6%
-33.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 643 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Remarks
This office action is issued in response to communication filed on 2/12/2026 Claims  1-17 and 20-22 are pending in this Office Action.  
Notice of Pre-AIA  or AIA  Status
 	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  
 				Response to Arguments
 	Applicant's arguments filed on 2/12/26 with respect to rejection of claims under 35 USC  103 have been considered but are moot in view of the new ground of rejection. 
	Applicant's arguments filed on 2/12/26 with respect to rejection of claims under 35 USC  101 have been considered and are not persuasive. The examiner respectfully traverses applicant’s arguments.
	Applicant argues: “Claim 1 recites certain features that cannot be performed practically by a human mind. For example, claim 1 recites inter alia (1) "dividing, by circuitry, an audio stream with heterogeneous audio content that includes both music and non-music audio into a plurality of frames," (2) "identifying, by the circuitry, which of the plurality of frames contain music," (3) "labeling, by the circuitry, the first frame based at least in part on the first classification of music and the second frame based at least in part on the second classification of music," and (4) "refraining from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not containing music." (Applicant’s arguments at page 11)
	The examiner respectfully disagrees. Except “by circuitry” language, there is nothing in the claim 1 that prevents the limitations being performed in the human mind.  Claim 1 recites :   “dividing, by circuitry, an audio stream with heterogeneous audio content into a plurality of frames”. Except “by circuitry” language, the dividing step encompasses the user  with the help of a pen and paper, marks down  different time stamps on different portions of the audio stream. Similarly, except for the “by the circuitry” language,” identifying, by the circuitry, which of the plurality of frames contain music”  encompasses the user making determination which portion corresponding to music and not-music respectively. “labeling, by the circuitry, the first frame based at least in part on the first classification of music and the second frame based at least in part on the second classification of music”  and “refraining from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not containing music" encompasses  the user naming or not naming the audio frames. Claim 1 therefore recites a judicial exception under step 2a-prong 1. 
 	Applicant argues: “claim 1 recites subject matter that effectuates a Diehr-type transformation of audio data into labeled audio segments, which at the very least integrates the alleged abstract idea into a practical application” (Applicant’s argument at page 13)
	The examiner respectfully disagrees. Claim 1 as recited does not transform the audio in any way shape or form. The end result of claim 1 is the labeling of the frames which as indicates above can also be performed in human mind and therefore does not integrate the judicial exception into practical application under step 2A-prong 2. 
 	Applicant argues: “Moreover, claim 1 as a whole reflects improvements to both computer technology and the field of automated audio content analysis….. . The USPTO’s guidance indicates that the claim merely need to reflect an improvement in the relevant technology, as opposed to needing to recite the improvement itself. (Applicant’s argument at page 14)
	The examiner respectfully disagrees. In the July 2024 Guidance Update on  Patent Subject Matter Eligibility (hereinafter “July24 guidance”), Including on Artificial Intelligence, the guidance states that 
 	“A key point of distinction to be made for AI inventions is between a claim that reflects an improvement to a computer or other technology described in the specification (which is eligible) and a claim in which the additional elements amount to no more than (1) a recitation of the words “apply it” (or an equivalent) or are no more than instructions to implement a judicial exception on a computer, or (2) a general linking of the use of a judicial exception to a particular technological environment or field of use (which is ineligible). “An important consideration in determining whether a claim improves technology is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome.”  AI inventions may provide a particular way to achieve a desired outcome when they claim, for example, a specific application of AI to a particular technological field ( i.e., a particular solution to a problem). In these situations, the claim is not merely to the idea of a solution or outcome and amounts to more than merely “applying” the judicial exception or generally linking the judicial exception to a field of use or technological environment. In other words, the claim reflects an improvement in a computer or other technology.” 
  	As clearly states in the July24 guidance, Claim 1 as recited does not reflect the an improvement to a computer or other technology as described in the specification. The additional elements of “providing, by the circuitry, the first and second spectrogram patches as inputs to a convolutional neural network classifier; receiving, from the convolutional neural network classifier, a first classification of music based at least in part on the first spectrogram patch and a second classification of music based at least in part on the second spectrogram patch” are no more than instructions to implement a judicial exception on a computer and therefore, do not reflect an improvement to a computer or other technology.
 	Applicant argues: “Therefore, because the subject matter of claim 1 is “far from routine and conventional,” claim 1 reflects and/or constitutes an inventive concept under step 2B of the Test. Finally, because claim 1 reflects and/or constitutes an inventive concept under step 2B of the Test, claim 1 recites additional elements that amount to significantly more than the alleged abstract idea itself, which thus brings claim 1 into eligibility under § 101. For at least these reasons, even if claim 1 were directed to an abstract idea (which Applicant contests), claim 1 also recites additional elements that amount to significantly more than the alleged abstract idea itself, thereby satisfying step 2B of the Test. Due to the satisfaction of step 2B of the Test, claim 1 qualifies as eligible subject matter under § 101”(Applicant’s arguments at page 17)
 	The examiner respectfully disagrees. As indicates above, claim 1 does not reflect an improvement to a computer or other technology. The additional elements are reevaluated under step 2B. Using Convolution Neural Network to classify spectrogram patch is at best the equivalent of merely adding the words “apply it” to the judicial exception. Even when considered in combination, the additional elements  do not provide an inventive concept, claim 1 therefore is ineligible.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


 	Claims 1-17 and 20-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.   
Claims 1: 
Step 1: Statutory Category ?: Yes.  claim 1 recites a method (i.e., a “process”)  which  is statutory category.
Step 2A-Prong 1: Judicial Exception Recited ?: Yes.  
 	Claim 1:
 	The limitations “dividing, by circuitry, an audio stream with heterogenous audio content  that includes both music and non-music audio  into plurality of frames; identifying, by the circuitry, which of the plurality of frames contain music; labeling, by the circuitry, the first frame based at least in part on the first classification of music and the second frame based at least in part on the second classification of music;  refraining from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not containing music” are mental processes that can be performed in the human mind using observation, evaluation, judgment and opinion including with the help with a pen and paper.  Except “by circuitry” language, the dividing step encompasses the user  with the help of a pen and paper, marks down  different time stamps on different portions of the audio stream. Similarly, except for the “by circuitry” language, “identifying, by the circuitry, which of the plurality of frames contain music” encompasses the user recognizing which portion is music and which portion is NOT music and  “labeling, by the circuitry, the first frame based at least in part on the first classification of music and the second frame based at least in part on the second classification of music” encompasses  the user naming or not to name the audio frames.
 	Claim 1 recite the limitation of “generating, by the circuitry, a first spectrogram patch based at least in part on a first frame included in the plurality of frames and a second spectrogram patch based at least in part on a second frame included in the plurality of frames” which is a mathematical calculations that falls within the “mathematical concepts” grouping of abstract ideas.

Step 2A-Prong 2: Integrated into a practical application? No. 
 	Claim 1 recites additional elements of  “providing, by the circuitry, the first and second spectrogram patches as inputs to a convolutional neural network classifier; receiving, from the convolutional neural network classifier, a first classification of music based at least in part on the first spectrogram patch and a second classification of music based at least in part on the second spectrogram patch”. The convolutional neural network classifier is described at the very high level of generality such that it amounts to using a computer with a generic convolution neural network classifier to apply the abstract idea. 
 	The additional element of “circuitry” is recited at a high level of generality  and amount to no more than mere instructions to apply the exception using a generic computer component.  

 Step 2B: Recites additional elements that amount to significantly more than the judicial exception? No. 
 	Claim 1 does not include additional elements that are sufficient to amount to significantly more than judicial exception. As indicates above, the additional element of using the convolutional neural network classifier and the circuitry are at best the equivalent of merely adding the words “apply it” to the judicial exception and therefore do not amount to significantly more than the judicial exception. Even when considered in combination, the additional elements  do not provide an inventive concept, claim 1 therefore is ineligible.
 
 	Claim 2 recites the limitation of “wherein the classification of music comprises a classification of a musical mood” which is part of the CNN classifier of claim 1. The additional limitation above does not integrate the abstract idea into practical application in step 2A-Prong 2 and does not amount to significant more than the judicial exception in step 2B. Claim 2 is not patent eligible.
 
 	Claim 3 recites the limitation of “wherein the classification of music comprises a classification of at least one of: a musical genre; a musical style; or a musical tempo” which is part of the CNN classifier of claim 1. The additional limitation above does not integrate the abstract idea into practical application in step 2A-Prong 2 and does not amount to significant more than the judicial exception in step 2B. Claim 3 is not patent eligible. 
 	
 	Claim 4 recites the limitation of “wherein the plurality of spectrogram patches comprises a plurality of mel spectrogram patches” which is a mathematical calculations that falls within the “mathematical concepts” grouping of abstract ideas.
. Claim 4 does not include any additional limitation integrate the abstract idea into practical application in step 2A-Prong 2 and amount to significant more than the judicial exception in step 2B. Claim 4 is not patent eligible. 

 	Claim 5 recites the limitation of “wherein the plurality of spectrogram patches comprises a plurality of log-scaled mel spectrogram patches” which is a mathematical calculations that falls within the “mathematical concepts” grouping of abstract ideas. Claim 5 does not include any additional limitation integrate the abstract idea into practical application in step 2A-Prong 2 and amount to significant more than the judicial exception in step 2B. Claim 5 is not patent eligible.

 	Claim 6 recites the limitation of “identifying, across a plurality of frames, a subset of consecutive frames with a common classification” which is a mental process. The claim recites additional limitation of “applying the common classification as a label to an integral segment of music comprising the subset of consecutive frames” which amounts to using a computer  to apply the abstract idea and therefore does not does not  integrate the abstract idea into practical application in step 2A-Prong 2 and is at best the equivalent of merely adding the words “apply it” to the judicial exception and therefore do not amount to significantly more than the judicial exception in step 2B. Claim 6 is not patent eligible.

 	 Claim 7 recites the limitation of “wherein identifying the subset of consecutive frames comprises applying a temporal smoothing function to classifications corresponding to the plurality of frames” which is a mathematical calculations that falls within the “mathematical concepts” grouping of abstract ideas. Claim 7 does not include any additional limitation integrate the abstract idea into practical application in step 2A-Prong 2 and amount to significant more than the judicial exception in step 2B. Claim 7 is not patent eligible.

 Claim 8 recites the limitation of “recording, in a data store, the audio stream as containing music with the common classification; and recording, in the data store, at least one timestamp of indicating a location of the subset of consecutive frames” which are insignificant extra-solution activities and therefore does not does not  integrate the abstract idea into practical application in step 2A-Prong 2 and are well-understood, routine conventional activities previously known to the industry in step 2B and therefore do not amount to significantly more than the judicial exception. (See MPEP 2106.05(d)) and 2106.07(a)III). Claim 8 is not patent eligible.

Claim 9 recites the limitation of “identifying at least one additional segment of music adjacent to the subset of consecutive frames with a different classification from the common classification” which is a mental process. The claim recites the additional limitation of “applying the common classification and the different classification as labels to a larger segment of music comprising the integral segment of music and the at least one additional segment of music” which amounts to using a computer  to apply the abstract idea and therefore does not does not  integrate the abstract idea into practical application in step 2A-Prong 2 and is at best the equivalent of merely adding the words “apply it” to the judicial exception and therefore do not amount to significantly more than the judicial exception in step 2B. Claim 9 is not patent eligible.

Claim 10 recites the limitation of “identifying a corpus of frames having predetermined music-based classifications” which is a mental process. The claim recites the additional limitation of “a training the convolutional neural network classifier with the corpus of frames and the predetermined music-based classifications” which amounts to using a computer  to apply the abstract idea and therefore does not does not  integrate the abstract idea into practical application in step 2A-Prong 2 and is at best the equivalent of merely adding the words “apply it” to the judicial exception and therefore do not amount to significantly more than the judicial exception in step 2B. Claim 10 is not patent eligible.

Claims 11: 
Step 1: Statutory Category ?: Yes. claim 11 recites a system (i.e., a “machine”)  which is statutory category.
Step 2A-Prong 1: Judicial Exception Recited ?: Yes.  
 	Claim 11:
 	The limitations “divide an audio stream with heterogenous audio content that includes both music and non-music audio into a plurality of frames; identify which of the plurality of frames contain music; label the first frame based at least in part on the first classification of music and the second frame based at least in part on the second classification of music; refraining from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not containing music ” are mental processes that can be performed in the human mind using observation, evaluation, judgment and opinion including with the help with a pen and paper.   The divide step encompasses the user with the help of a pen and paper, marks down  different time stamps on different portions of the audio stream. The identify step encompasses the user recognizing which portion is music and which portion is NOT music Similarly, label and refrain from label steps encompass  the user naming or NOT the audio frames.
 	Claim 11 recite the limitation of “generate a first spectrogram patch based at least in part on a first frame included in the plurality of frames and a second spectrogram patch based at least in part on a second frame included in the plurality of frames” which is a mathematical calculations that falls within the “mathematical concepts” grouping of abstract ideas.
Step 2A-Prong 2: Integrated into a practical application? No. 
 	Claim 11 recites additional elements of  “provide the first and second spectrogram patches as inputs to a convolutional neural network classifier; receive from the convolutional neural network classifier, a first classification of music based at least in part on the first spectrogram patch and a second classification of music based at least in part on the second spectrogram patch”. The convolutional neural network classifier is described at the very high level of generality such that it amounts to using a computer with a generic convolution neural network classifier to apply the abstract idea. 
 	Claim 11 recites additional elements of “ at least one physical processor and physical memory”. The processor and memory are recited at a high level of generality  and amount to no more than mere instructions to apply the exception using a generic computer components.  
 Step 2B: Recites additional elements that amount to significantly more than the judicial exception? No. 
 	Claim 11 does not include additional elements that are sufficient to amount to significantly more than judicial exception. As indicates above, the additional elements of using the convolutional neural network classifier, memory and processor are at best the equivalent of merely adding the words “apply it” to the judicial exception and therefore do not amount to significantly more than the judicial exception. Even when considered in combination, the additional elements  do not provide an inventive concept, claim 11 therefore is ineligible.

 	Claim 12 recites the limitation of “wherein the classification of music comprises a classification of a musical mood” which is part of the CNN classifier of claim 11. The additional limitation above does not integrate the abstract idea into practical application in step 2A-Prong 2 and does not amount to significant more than the judicial exception in step 2B. Claim 12 is not patent eligible. 
 .  
 	Claim 13 recites the limitation of “wherein the classification of music comprises a classification of at least one of: a musical genre; a musical style; or a musical tempo”  which is part of the CNN classifier of claim 11. The additional limitation above does not integrate the abstract idea into practical application in step 2A-Prong 2 and does not amount to significant more than the judicial exception in step 2B. Claim 13 is not patent eligible. 

 	Claim 14 recites the limitation of “wherein the plurality of spectrogram patches comprises a plurality of mel spectrogram patches” which is a mathematical calculations that falls within the “mathematical concepts” grouping of abstract ideas. Claim 14 does not include any additional limitation integrate the abstract idea into practical application in step 2A-Prong 2 and amount to significant more than the judicial exception in step 2B. Claim 14 is not patent eligible. 

 	Claim 15 recites the limitation of “wherein the plurality of spectrogram patches comprises a plurality of log-scaled mel spectrogram patches” which is a mathematical calculations that falls within the “mathematical concepts” grouping of abstract ideas.. Claim 15 does not include any additional limitation integrate the abstract idea into practical application in step 2A-Prong 2 and amount to significant more than the judicial exception in step 2B. Claim 15 is not patent eligible. 
  	
 	Claim 16 recites the limitation of “identifying, across a plurality of frames, a subset of consecutive frames with a common classification” which is a mental process. The claim recites additional limitation of “applying the common classification as a label to an integral segment of music comprising the subset of consecutive frames” which amounts to using a computer  to apply the abstract idea and therefore does not does not  integrate the abstract idea into practical application in step 2A-Prong 2 and is at best the equivalent of merely adding the words “apply it” to the judicial exception and therefore do not amount to significantly more than the judicial exception in step 2B. Claim 16 is not patent eligible.

 	Claim 17 recites the limitation of “wherein identifying the subset of consecutive frames comprises applying a temporal smoothing function to classifications corresponding to the plurality of frames” which is a mathematical calculations that falls within the “mathematical concepts” grouping of abstract ideas. Claim 7 does not include any additional limitation integrate the abstract idea into practical application in step 2A-Prong 2 and amount to significant more than the judicial exception in step 2B. Claim 17 is not patent eligible.

Claims 20: 
Step 1: Statutory Category ?: Yes. claim 20 recites  a non-transitory computer readable medium (i.e., an article of manufacture) and which is statutory categories.
Step 2A-Prong 1: Judicial Exception Recited ?: Yes.  
 	Claim 20:
 	The limitations “divide an audio stream with heterogenous audio content that includes both music and non-music audio into a plurality of frames; identify which of the plurality of frames contain music; label the first frame based at least in part on the first classification of music and the second frame based at least in part on the second classification of music; refraining from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not containing music ” are mental processes that can be performed in the human mind using observation, evaluation, judgment and opinion including with the help with a pen and paper.   The divide step encompasses the user with the help of a pen and paper, marks down  different time stamps on different portions of the audio stream. The identify step encompasses the user recognizing which portion is music and which portion is NOT music Similarly, label and refrain from label steps encompass  the user naming or NOT the audio frames.
 	Claim 20 recite the limitation of “generate a first spectrogram patch based at least in part on a first frame included in the plurality of frames and a second spectrogram patch based at least in part on a second frame included in the plurality of frames” which is a mathematical calculations that falls within the “mathematical concepts” grouping of abstract ideas.
Step 2A-Prong 2: Integrated into a practical application? No. 
 	Claim  20 recites additional elements of  “provide the first and second spectrogram patches as inputs to a convolutional neural network classifier; receive from the convolutional neural network classifier, a first classification of music based at least in part on the first spectrogram patch and a second classification of music based at least in part on the second spectrogram patch”. The convolutional neural network classifier is described at the very high level of generality such that it amounts to using a computer with a generic convolution neural network classifier to apply the abstract idea. 
 	Claim 20 recites additional elements of “ non-transitory computer readable medium”. The non-transitory computer readable medium is recited at a high level of generality  and amounts to no more than mere instructions to apply the exception using a generic computer component.  
 Step 2B: Recites additional elements that amount to significantly more than the judicial exception? No. 
 	Claim 20 does not include additional elements that are sufficient to amount to significantly more than judicial exception. As indicates above, the additional elements of using the convolutional neural network classifier and  non-transitory computer readable medium are at best the equivalent of merely adding the words “apply it” to the judicial exception and therefore do not amount to significantly more than the judicial exception. Even when considered in combination, the additional elements  do not provide an inventive concept, claim 20  therefore is ineligible.
	
 	Claim 21 recites the limitation of “providing the labels for use in a system” which is simply data gathering and therefore are insignificant extra-solution activities. (See MPEP 2106.05(g)). The data gathering is  are well-understood, routine conventional activities previously known to the industry and therefore do not amount to significantly more than the judicial exception. (See MPEP 2106.05(d)) , Subsection II.  Even when considered in combination, the additional elements  do not provide an inventive concept, claim 21 therefore is ineligible.
  
 	Claim 22 recites the limitation of “wherein providing the labels for use in the system comprises enabling the system to: find the first frame based at least in part on a search term corresponding to the first classification of music; and return the first frame to a user of the system in connection with the search term” which is mental  process that can be performed in the human mind using observation, evaluation, judgment and opinion. Claim 22 does not  include any additional element that integrates the abstract idea into practical application in step 2A-Prong 2 and  amounts to significantly more than the judicial exception in step 2B.  Claim 22 is not patent eligible.  
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-17 and 20-22 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claims 1:
 	Claim 1 now recites “refrain from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not contain music” .  Despite reviewing the specification of the present invention, the examiner cannot find support for the cited claim limitations. At best, the specification at par [0050] only discloses the classification of the audio content music and non-music and  par [0051]  only discloses the using of the output from  music/non-music classifier  to further classify by particular attribute. Nowhere in the par [0050]-[0051] or the rest of the specification discloses that “refrain from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not contain music” as recited in the claim 1. Therefore, Applicant is obligated to respond by explaining where/how the Specification provides support for each of these limitations.  See In re Alton, 76 F.3d 1168, 1175 [37 USPQ2d 1578] (Fed. Cir. 1996).  See Hyatt v. Doll, 91 USPQ2d 1865 (Fed. Cir., 2009).
 	Independent Claims 11 and 20 recite the same limitations and therefore have the same problem.
 	Due at least to their dependency upon Claims 1, 8 or 15, dependent claims 2-10,12-17 and 21-22 also recite new matter.

	Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims  1-4, 10,11-14 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Barkan et al.(US Patent Application Publication 2019/0050716 A1, hereinafter “Barkan”) and further in view of Wold et al.(US Patent Application Publication 2023/0244710 A1, hereinafter “Wold”)
 	As to claim 1, Barkan teaches a computer-implemented method comprising:  
 	  Dividing, by circuitry  an audio stream with heterogenous audio content that includes both music and non-music audio into a plurality of frames (Barkan par [0022] teaches  segmenting the audio track 114 into segment of audio )
 	Identifying , by the circuitry, which of the plurality of frames contain music;
 	generating, by the circuitry, a first spectrogram patch based at least in part on a first frame included in the plurality of frames and a second spectrogram patch based at least in part on a second frame included in the plurality of frames (Barkan par [0039] teaches spectrogram generator 502 receives audio signal then transforms into a representation such as  a spectrogram matrix or melspectogram. For example. Each segment  may be transformed into  a spectrogram matrix); 
  	providing, by the circuitry, the first and second spectrogram patches as inputs to a convolutional neural network classifier; (Barkan par [0040] teaches the spectogram may then be input into convolution layer and max pooling layer 504); 
 	receiving, from the convolutional neural network classifier, a first classification of music based at least in part on the first spectrogram patch and a second classification of music based at least in part on the second spectrogram patch (Barkan par [0043] teaches for a segment, output layer 508 may predict output values at nodes that correspond to respective genre and mood combinations) and 
 	labeling, by the circuitry, the first frame based at least in part on the first classification of music and the second frame based at least in part on the second classification of music.(Barkan par [0051] teaches the use of histogram to tag the audio track 114 with different genre and mood classification labels to allow  a user to select one of the labels and play the portion of the audio track corresponding to the  genre and mood combinations labels)
 	refrain from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not contain music.
 	Barkan fails to expressly teach an audio stream with heterogenous audio content that includes both music and non-music audio ; Identifying , by the circuitry, which of the plurality of frames contain music and refrain from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not contain music .
	 However, Wold teaches an audio stream with heterogenous audio content that includes both music and non-music audio and Identifying , by the circuitry, which of the plurality of frames contain music; and refrain from labeling, by the circuitry, a third frame included in the plurality of frames due at least in part to the third frame not contain music (Wold’s abstract teaches audio features of a plurality of media content items are processed by one or more machine learning model to classify each of the media content items as containing music or not. Wold par [0080] teaches no additional analysis may be performed if the audio media content item is classified as not containing music)
 	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teaching of Barkan and Wold to achieve the claimed invention. One would have been motivated to make such combination to  reduce  bandwidth and/or processor utilization.(Wold par [0080]) 	
 	 
 	As to claim 2, Barkan and Wold teach the computer-implemented method of claim 1, wherein the classification of music comprises a classification of a musical mood. (Barkan par [0043] teaches for a segment, output layer 508 may predict output values at nodes that correspond to respective genre and mood combinations)
 
 	As to claim 3, Barkan and Wold teach the computer-implemented method of claim 1, wherein the classification of music comprises a classification of at least one of: a musical genre; a musical style; or a musical tempo. (Barkan par [0018] teaches audio classification network is configured to classify audio track with genre and mood combinations)

 	As to claim 4, Barkan and Wold teach the computer-implemented method of claim 1, wherein the plurality of spectrogram patches comprises a plurality of mel spectrogram patches. (Barkan par [0039] teaches spectrogram generator 502 receives audio signal then transforms into a representation such as  a spectrogram matrix or melspectogram)	

	As to claim 10, Barkan and Wold teach the computer-implemented method of claim 1, further comprising: identifying a corpus of frames having predetermined music-based classifications; and training the convolutional neural network classifier with the corpus of frames and the predetermined music-based classifications. (Barkan par [0023] teaches to perform the predication for genre and mood combinations, audio classification network 108 may be trained on a labeled dataset of pairs in the form of  audio track segment, label) 	

 	Claims 11-14 merely recite a system to perform the method of claims 1-4  respectively. Accordingly, Barkan and Wold teach every limitations of claims 11-14 as indicates in the above rejection of claims 1-4 respectively. 

 	Claim 20 merely recites a non-transitory computer readable medium comprising one or more computer instructions when executed by a processor, perform the method of claim 1. Accordingly, Barkan  and Wold teach every limitations of claim 20 as indicates in the above rejection of claim 1.

 	As to claim 21, Barkan and Wold teach the computer-implemented method of claim 1, further comprising providing the labels for use in a system. (Barkan par [0019] teaches once the audio tracks 114 are labeled with genre and mood combinations, the audio service manager may provide services based on the output values for the genre and mood combinations) 

 	As to claim 22, Barkan  and Wold teach the computer-implemented method of claim 21, wherein providing the labels for use in the system comprises enabling the system to: find the first frame based at least in part on a search term corresponding to the first classification of music; and return the first frame to a user of the system in connection with the search term.(Barkan par [0019] teaches the audio service manager may allow users to select a genre and mood combination label  and then automatically  start play back of audio track using segment  that is associated with the genre and mood combination)

Claim  5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Barkan and Wold and further in view of further in view of Lee et al.(US Patent Application Publication 2021/0294840 A1, hereinafter “Lee”) 
 	As to claim 5, Barkan and Wold teach the computer-implemented method of claim 4, but fails to teach wherein the plurality of spectrogram patches comprises a plurality of log-scaled mel spectrogram patches. 
 	However, Lee teaches wherein the plurality of spectrogram patches comprises a plurality of log-scaled mel spectrogram patches. (Lee par [0068] teaches log-scaled Mel-spectrogram) 
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teaching of Barkan and Wold and Lee to achieve the claimed invention. One would have been motivated to make such combination to  enable searching for music files  that are perceptually similar to a query music file according to one or more attributes, rather than being limited to a global similarity search along a single dimension of music similarity.(Lee par [0228])

 	As to claim 15, see the above rejection of claim 5.
Claim  6 , 8-9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Barkan, Wold and further in view of  Ikezoye et a.(US Patent Application Publication 2022/0027407 A1, hereinafter “Ikezoye”) 
 	As to claim 6,Barkan and Wold teach the computer-implemented method of claim 1 but fail to teach further comprising: identifying, across a plurality of frames, a subset of consecutive frames with a common classification; and applying the common classification as a label to an integral segment of music comprising the subset of consecutive frames. 
 	However, Ikezoye teaches identifying, across a plurality of frames, a subset of consecutive frames with a common classification; and applying the common classification as a label to an integral segment of music comprising the subset of consecutive frames. (Ikezoye par [0056] teaches the classifier may identify minutes 0- 15 as containing music)
 	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teaching of Barkan , Wold and Ikezoye to achieve the claimed invention. One would have been motivated to make such combination to  improve music search efficiency (Ikezoye par [0009])

 	As to claim 8,Barkan , Wold and  Ikezoye  each the computer-implemented method of claim 6: recording, in a data store, the audio stream as containing music with the common classification; and recording, in the data store, at least one timestamp of indicating a location of the subset of consecutive frames.  (Barkan par [0035] teaches table 412 includes a first column that lists the genre and mood combination labels, second column 416 that lists the average output value  and column 418 that links to a time or segment in the audio track)

 	As to claim 9, Barkan, Wold and  Ikezoye teach the computer-implemented method of claim 6, further comprising: identifying at least one additional segment of music adjacent to the subset of consecutive frames with a different classification from the common classification; and applying the common classification and the different classification as labels to a larger segment of music comprising the integral segment of music and the at least one additional segment of music. (Ikezoye par [0053] teaches media classifier may be further configured to determine additional classifications for unidentified media content items or portions or segments thereof that contain music)

 	As to claim 16, see the above rejection of claim 6.
Claim 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Barkan and Wold and  Ikezoye further in view of Lu et al. “a robust audio classification and segmentation method”. Multimedia ’01: Proceedings of the ninth ACM international conference on Multimedia. Pages 203-211.  October 01, 2001. Hereinafter “Lu”
 	As to claim 7, Barkan , Wold and  Ikezoye teach  the computer-implemented method of claim 6 but fail to teach wherein identifying the subset of consecutive frames comprises applying a temporal smoothing function to classifications corresponding to the plurality of frames.
 	However, Lu teaches applying a temporal smoothing function to classifications corresponding to the plurality of frames.(Lu Section 3.3 teaches if we detect a pattern of consecutive one -second window like “speech-music-speech” it is most likely the sequence should be all speeches which will hence be segmented all as speech. This smoothing process can also further prevent some misclassification)
 	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to combine the teachings of  Barkan, Wold,  Ikezoye and Lu to achieve the claimed invention. One would have been motivated to make such combination to prevent misclassification. (Lu section 3.3)
 	As to claim 17 , see the above rejection of claim 7.
Conclusion
 	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to HIEN DUONG whose telephone number is (571)270-7335. The examiner can normally be reached Monday-Friday 8:00AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at 571-270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HIEN L DUONG/Primary Examiner, Art Unit 2147
Read full office action
Prosecution Timeline

Jun 15, 2022
Application Filed
May 16, 2025
Non-Final Rejection — §101, §103, §112
Aug 20, 2025
Response Filed
Nov 20, 2025
Final Rejection — §101, §103, §112
Feb 12, 2026
Request for Continued Examination
Feb 23, 2026
Response after Non-Final Action
Mar 06, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/063,518
Patent 12597925
SUPERCONDUCTING CURRENT CONTROL SYSTEM
2y 5m to grant Granted Apr 07, 2026
17/192,048
Patent 12566940
METHOD AND APPARATUS FOR QUANTIZING PARAMETERS OF NEURAL NETWORK
2y 5m to grant Granted Mar 03, 2026
17/527,173
Patent 12566815
METHOD, ELECTRONIC DEVICE, AND COMPUTER-READABLE STORAGE MEDIUM FOR PERFORMING IDENTIFICATION BASED ON MULTI-MODAL DATA
2y 5m to grant Granted Mar 03, 2026
17/537,705
Patent 12554798
FINDING OUTLIERS IN SIMILAR TIME SERIES SAMPLES
2y 5m to grant Granted Feb 17, 2026
18/128,106
Patent 12547430
MODEL-BASED ELEMENT CONFIGURATION IN A USER INTERFACE
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
75%
Grant Probability
98%
With Interview (+22.8%)
3y 1m
Median Time to Grant
High
PTA Risk
Based on 643 resolved cases by this examiner. Grant probability derived from career allow rate.