Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicants’ submission filed on 11/12/25 has been entered.
DETAILED ACTION
The instant application having Application No. 17604780 has a total of 20 claims pending in the application.
Claim Rejections – 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: Claim 1 is a process type claim. Claim 12 is a machine type claim. Claim 14 is a manufacture type claim. Therefore, claims 1-20 are directed to either a process, machine, manufacture or composition of matter.
As per claim 1,
2A Prong 1:
“Monitoring … when an audio session becomes active” A user mentally or with pencil and paper monitors when an audio session starts).
“Executing … a hierarchical classification model to classify the audio session” The user mentally or with pencil and paper uses a set of models to determine what type of audio is being started.
“Determine, at a first classification stage, whether the audio session is classifiable with predetermined criteria” The user mentally or with pencil and paper listens to or looks at audio data and classifies the data if possible.
“in response to determining that the audio session is classifiable at a first classification stage, classifying the audio session at the first classification stage based on the predetermined criteria” The user mentally or with pencil and paper classifies the incoming data using predetermined criteria.
“in response to determining that the audio session is not classifiable at the first classification stage, classifying the audio session at a second classification stage based on a … analysis of metadata” The user mentally or with pencil and paper looks for additional detail, such in metadata, to classify the audio if the previous attempt fails.
“Implementing an audio reproduction setting, based on the classification from the first classification stage or the second classification stage…” The user mentally or with pencil and paper discloses the appropriate settings for the audio.
2A Prong 2: This judicial exception is not integrated into a practical application.
Additional elements:
a processor (mere instructions to apply the exception using a generic computer component);
“a machine learning analysis” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: A generic machine learning analysis that is no more than a generic, off the shelf machine learning algorithm.
“Present the audio session; and presenting the audio session using the implemented audio reproduction setting to a device to output the audio session” (Adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g)).
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
A processor, a memory (mere instructions to apply the exception using a generic computer component)
“a machine learning analysis” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: A generic machine learning analysis that is no more than a generic, off the shelf machine learning algorithm.
“Present the audio session; and presenting the audio session using the implemented audio reproduction setting to a device to output the audio session” (MPEP 2106.05(d)(II) indicate that merely “receiving or transmitting data” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed storing step is well-understood, routine, conventional activity is supported under Berkheimer).
As per claims 2, 4-8, and 10-11, these claims contain additional mental steps to claim 1, and are rejected for similar reasons.
As per claim 3, this claim contains additional mental steps to claim 1, and is rejected for similar reasons to claim 1.
As per claim 9, this claim contains additional mental steps and generic machine learning analysis, and is rejected for similar reasons to claim 1.
As per claim 20, this claim contains additional mental steps and generic hardware similar to claim 1, and is rejected for similar reasons.
2A Prong 2: This judicial exception is not integrated into a practical application.
Additional elements:
“an application programming interface” (mere instructions to apply the exception using a generic computer component);
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
“an application programming interface” (mere instructions to apply the exception using a generic computer component)
As per claim 12,
2A Prong 1:
“Detect activation of an audio session” The user mentally or with pencil and paper listens to or looks at audio data.
“execute a hierarchical classification model to classify the audio session, including to” The user mentally or with pencil and paper uses a set of models to determine what type of audio is being started.
“determine whether the audio session is classifiable with predetermined criteria” The user mentally or with pencil and paper listens to or looks at audio data and classifies the data if possible.
“extract metadata corresponding to the audio session” The user mentally or with pencil and paper determines the various metadata for the audio.
“provide the metadata to a … model to classify the audio session in response to determining that the audio session is not classifiable with predetermined criteria” The user mentally or with pencil and paper attempts to classify the audio with easier data, and when it fails, goes on to more complex methods to classify the data.
“implement an audio reproduction setting, based on the classification” The user mentally or with pencil and paper discloses the appropriate settings for the audio.
2A Prong 2: This judicial exception is not integrated into a practical application.
Additional elements:
A memory, a processor (mere instructions to apply the exception using a generic computer component);
“a machine learning model” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: A generic machine learning model that is no more than a generic, off the shelf machine learning algorithm.
“Present the audio session; and present the audio session, with the implemented audio reproduction setting, to a device to output the audio setting” (Adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g)).
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
A processor, a memory (mere instructions to apply the exception using a generic computer component)
“a machine learning model” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: A generic machine learning model that is no more than a generic, off the shelf machine learning algorithm.
“Present the audio session; and present the audio session, with the implemented audio reproduction setting, to a device to output the audio setting” (MPEP 2106.05(d)(II) indicate that merely “receiving or transmitting data” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed storing step is well-understood, routine, conventional activity is supported under Berkheimer).
As per claim 13, 17 and 19, this claim contains additional mental steps and generic machine learning analysis, and is rejected for similar reasons to claim 12.
As perc claim 16, this claim contains similar mental steps and generic computer hardware to claim 12, and is rejected for similar reasons.
2A Prong 2: This judicial exception is not integrated into a practical application.
Additional elements:
“the memory stores a file” (Adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g)).
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
“the memory stores a file” (MPEP 2106.05(d)(II) indicate that merely “storing and retrieving data from memory” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed storing step is well-understood, routine, conventional activity is supported under Berkheimer).
As per claim 18, this claim contains additional mental steps and generic machine learning analysis, and is rejected for similar reasons to claim 1.
2A Prong 2: This judicial exception is not integrated into a practical application.
Additional elements:
“an application programming interface” (mere instructions to apply the exception using a generic computer component);
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
“an application programming interface” (mere instructions to apply the exception using a generic computer component)
As per claim 14,
2A Prong 1:
“Monitor when an audio session becomes active” A user mentally or with pencil and paper monitors when an audio session starts.
“Classify the audio session in accordance with a hierarchy, wherein the hierarchy comprises a first classification stage to classify in accordance with predetermined criteria and a second classification stage to classify using a … model based on metadata corresponding to the audio session if the audio session is not classifiable at the first classification stage” The user mentally or with pencil and paper attempts to classify the audio with easier data, and when it fails, goes on to more complex methods to classify the data.
“implement an audio reproduction setting, based on the classification from the first classification stage or the second classification stage” The user mentally or with pencil and paper discloses the appropriate settings for the audio.
2A Prong 2: This judicial exception is not integrated into a practical application.
Additional elements:
A non-transitory tangible computer readable medium, a processor (mere instructions to apply the exception using a generic computer component);
“a machine learning model” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: A generic machine learning model that is no more than a generic, off the shelf machine learning algorithm.
“Present the audio session and present the audio session, with the implemented audio reproduction setting, to a device to output the audio session” (Adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g)).
2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
A non-transitory tangible computer readable medium, a processor (mere instructions to apply the exception using a generic computer component)
“a machine learning analysis” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: A generic machine learning model that is no more than a generic, off the shelf machine learning algorithm.
“Present the audio session and present the audio session, with the implemented audio reproduction setting, to a device to output the audio session” (MPEP 2106.05(d)(II) indicate that merely “receiving or transmitting data” is a well‐understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed storing step is well-understood, routine, conventional activity is supported under Berkheimer).
As per claim 15, this claim contains additional mental steps and is rejected for similar reasons to claim 14.
Claim Rejections - 35 USC § 112
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 4-5, 8-9, 12-16, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Bharitkar et al (WO 2018199997 ) in view of Guo et al (“Boosting for Content-based Audio Classification and Retrieval: An Evaluation”).
As per claim 1, Bharitkar discloses, “A method, comprising: monitoring, by a processor” (pg.19-20, particularly paragraph 0073; EN: this denotes the hardware for the system). “when an audio session becomes active” (pg.4, particularly paragraph 0022; EN: this denotes the system receiving an audio signal)).
“executing, by the processor, a hierarchical classification model to classify the audio session, including” (Figure 3 and associated paragraphs, particularly pg.7-8, paragraph 0031; EN: this denotes two models, one which works on metadata, and another which is “switched on” when the metadata is contradictory).
“determining, at a first classification stage, whether the audio session is classifiable …” (pg.7, particularly paragraph 0029; EN: this describes the first classification stage).
“in response to determining that the audio session is classifiable at a first classification stage, classifying the audio session at the first classification stage…” (pg.7-8, particularly paragraph 0031; EN: This denotes the system using the classification from the first classifier unless it is found to be unreliable).
“in response to determining that the audio session is not classifiable at the first classification stage” (pg.7-8, particularly paragraph 0031; EN: This denotes the system using the classification from the first classifier unless it is found to be unreliable. Only when the first classifier is found “unreliable” (i.e. not classifiable at that stage) is the second classifier “switched on.”). “classifying the audio at a second classification stage based on a machine learning analysis …” (pg.8, particularly paragraph 0032; EN :this denotes the second stage which performs a different type of classification).
“implementing an audio reproduction setting, based on the classification from the first classification stage or the second classification stage, to present the audio session” (pg.4, particularly paragraph 0020; EN: this denotes reproducing the audio content using speakers with optimal audio presets).
“presenting the audio session using the implemented audio reproduction setting to a device to output the audio session” (pg.4, particularly paragraph 0020; EN: this denotes reproducing the audio content using speakers with optimal audio presets).
However, Bharitkar fails to explicitly disclose, “… with predetermined criteria”, and “the audio session based on a machine learning analysis of metadata.”
Guo discloses, “… with predetermined criteria”, and “the audio session based on a machine learning analysis of metadata” (Pg.369, particularly C1, second to last paragraph; EN: this describes boosting algorithms, which focus on different aspects of the training data when the previous classifier fails to properly classify the data. When combined with the Bharitkar reference, this denotes changing the criteria at each stage of the classification as needed to optimize the classification).
Bharitkar and Guo are analogous art because both involve audio classification.
Before the effective filing date it would have been obvious to one skilled in the art of audio classification to combine the work of Guo and Bharitkar in order to select different data to examine at different stages of classification.
The motivation for doing so would be to allow “to combine a collection of weak classification functions (weak learner) to form a stronger classifier. AdaBoost is an adaptive algorithm to boost a sequence of classifiers, in that the weights are updated dynamically according to the errors in previous learning” (Guo, Pg.1201, Section 3, first paragraph) or in the case of Bharitkar, allow the system to focus on certain data in the first round and change to different data in the second round when the first round of classification is not successful.
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio classification to combine the work of Guo and Bharitkar in order to select different data to examine at different stages of classification.
Further, the Examiner cites MPEP 2144.04 to show that mere “rearrangement of parts” is not enough to make a claim novel over the prior art (see In re Japikse, 181 F.2d 1019, 86 USPQ 70 (CCPA 195). Here the Bharitkar reference clearly discloses using a first model, a neural network, which makes use of metadata to classify (see Bharitkar, Pg.4, paragraph 0022) and then proceeds to use other criteria in the second stage of classification using specific data from the audio (Bharitkar, Pg.8, paragraph 0032). Merely switching the order of these two algorithms as in the claims would be mere rearrangements of part, and therefore obvious to one of ordinary skill in the art at the time of filing. (see In re Japikse, 181 F.2d 1019, 86 USPQ 70 (CCPA 195).
As per claim 2, Bharitkar discloses, “loading a file with the …” (pg.5, particularly paragraph 0024; EN: this denotes the data used to train the model) “wherein the file indicates the classification based on a source of content for the audio session” (pg.4, particularly paragraph 0022; EN; this denotes the sources of the file such as MPEG-2 transport stream, MPEG4 audio video file containers, etc).
Guo discloses, “predetermined criteria” (Pg.369, particularly C1, second to last paragraph; EN: this describes boosting algorithms, which focus on different aspects of the training data when the previous classifier fails to properly classify the data. When combined with the Bharitkar reference, this denotes changing the criteria at each stage of the classification as needed to optimize the classification).
As per claim 4, Bharitkar discloses, “wherein determining whether the audio session is classifiable at the first classification stage comprises determining whether … indicate a classification for a source of the audio session” (pg.4, particularly paragraph 0022; EN; this denotes the sources of the file such as MPEG-2 transport stream, MPEG4 audio video file containers, etc which is part of the classification at the first stage).
Guo discloses, “predetermined criteria” (Pg.369, particularly C1, second to last paragraph; EN: this describes boosting algorithms, which focus on different aspects of the training data when the previous classifier fails to properly classify the data. When combined with the Bharitkar reference, this denotes changing the criteria at each stage of the classification as needed to optimize the classification).
As per claim 5, Bharitkar discloses, “Wherein, in response to determining that a second audio session is classifiable at the first classification stage, the method comprises classifying the second audio session based on …” (pg.8, particularly paragraph 0032; EN :this denotes the second stage which performs a different type of classification when the first is not successful).
Guo discloses, “predetermined criteria” (Pg.369, particularly C1, second to last paragraph; EN: this describes boosting algorithms, which focus on different aspects of the training data when the previous classifier fails to properly classify the data. When combined with the Bharitkar reference, this denotes changing the criteria at each stage of the classification as needed to optimize the classification).
As per claim 8, Bharitkar discloses, “wherein classifying the audio session based on machine learning analysis comprises classifying the audio session as surround content, stereo content, or monophonic content” (pg.4, particularly paragraph 0022; EN: this denotes part of the classifying being the channel count, 1 being monophonic, 2 being stereo, more being surround).
As per claim 9, Bharitkar discloses, wherein the machine learning analysis is performed using machine learning model that is trained with content duration metadata” (pg.4,particularly paragraph 0022; EN: this denotes part of the classifying including duration).
As per claim 12¸Bhartikar discloses, “An apparatus, comprising: a memory; and (pg.19-20, particularly paragraph 0073; EN: this denotes the hardware for the system).
“a processor coupled to the memory, wherein the processor is to:” (pg.19-20, particularly paragraph 0073; EN: this denotes the hardware for the system).
“detect activation of an audio session” (pg.4, particularly paragraph 0022; EN: this denotes the system receiving an audio signal)).
“execute a hierarchical classification model to classify the audio session including to: “(Figure 3 and associated paragraphs, particularly pg.7-8, paragraph 0031; EN: this denotes two models, one which works on metadata, and another which is “switched on” when the metadata is contradictory).
“determining whether the audio session is classifiable …” (Figure 3 and associated paragraphs, particularly pg.7-8, paragraph 0031; EN: this denotes two models, one which works on metadata, and another which is “switched on” when the metadata is contradictory).
“extract metadata corresponding to the audio session” (Figure 3 and associated paragraphs, particularly pg.7-8, paragraph 0031; EN: this denotes two models, one which works on metadata, and another which is “switched on” when the metadata is contradictory).
“provide the … data to a machine learning model to classify the audio session in response to determining that the audio session is not classifiable…” (pg.8, particularly paragraph 0032; EN :this denotes the second stage which performs a different type of classification).
“implement an audio reproduction setting based on the classification to present the audio session” (pg.4, particularly paragraph 0020; EN: this denotes reproducing the audio content using speakers with optimal audio presets).
“Present the audio session, with the implemented audio reproduction setting, to a device to output the audio session” (pg.4, particularly paragraph 0020; EN: this denotes reproducing the audio content using speakers with optimal audio presets).
However, Bharitkar fails to explicitly disclose, “with predetermined criteria”, “the metadata to a machine learning model.”
Guo discloses, “with predetermined criteria”, “the metadata to a machine learning model” (Pg.369, particularly C1, second to last paragraph; EN: this describes boosting algorithms, which focus on different aspects of the training data when the previous classifier fails to properly classify the data. When combined with the Bharitkar reference, this denotes changing the criteria at each stage of the classification as needed to optimize the classification).
Bharitkar and Guo are analogous art because both involve audio classification.
Before the effective filing date it would have been obvious to one skilled in the art of audio classification to combine the work of Guo and Bharitkar in order to select different data to examine at different stages of classification.
The motivation for doing so would be to allow “to combine a collection of weak classification functions (weak learner) to form a stronger classifier. AdaBoost is an adaptive algorithm to boost a sequence of classifiers, in that the weights are updated dynamically according to the errors in previous learning” (Guo, Pg.1201, Section 3, first paragraph) or in the case of Bharitkar, allow the system to focus on certain data in the first round and change to different data in the second round when the first round of classification is not successful.
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio classification to combine the work of Guo and Bharitkar in order to select different data to examine at different stages of classification.
Further, the Examiner cites MPEP 2144.04 to show that mere “rearrangement of parts” is not enough to make a claim novel over the prior art (see In re Japikse, 181 F.2d 1019, 86 USPQ 70 (CCPA 195). Here the Bharitkar reference clearly discloses using a first model, a neural network, which makes use of metadata to classify (see Bharitkar, Pg.4, paragraph 0022) and then proceeds to use other criteria in the second stage of classification using specific data from the audio (Bharitkar, Pg.8, paragraph 0032). Merely switching the order of these two algorithms as in the claims would be mere rearrangements of part, and therefore obvious to one of ordinary skill in the art at the time of filing. (see In re Japikse, 181 F.2d 1019, 86 USPQ 70 (CCPA 195).
As per claim 13, Bharitkar discloses, “Wherein the machine learning model is trained using data indicating content duration, sample rate, video presence, bit depth, or number of channels” (pg.4, particularly paragraph 0022; EN: this denotes duration as an aspect to classifying).
As per claim 14, Bharitkar discloses, “A non-transitory tangible computer readable medium storing executable code executable by a processor to:” (pg.19-20, particularly paragraph 0073; EN: this denotes the hardware for the system).
“monitor when an audio session becomes active” (pg.4, particularly paragraph 0022; EN: this denotes the system receiving an audio signal)).
“Execute a hierarchical classification model to classify the audio session, including to:” (Figure 3 and associated paragraphs, particularly pg.7-8, paragraph 0031; EN: this denotes two models, one which works on metadata, and another which is “switched on” when the metadata is contradictory).
“determining whether the audio session is classifiable …” (Figure 3 and associated paragraphs, particularly pg.7-8, paragraph 0031; EN: this denotes two models, one which works on metadata, and another which is “switched on” when the metadata is contradictory).
”extract metadata corresponding to the audio session” (Figure 3 and associated paragraphs, particularly pg.7-8, paragraph 0031; EN: this denotes two models, one which works on metadata, and another which is “switched on” when the metadata is contradictory).
“provide the … data to a machine learning model to classify the audio session in response to determining that the audio session is not classifiable…” (pg.8, particularly paragraph 0032; EN :this denotes the second stage which performs a different type of classification).
“implement an audio reproduction setting, based on the classification from the first classification stage or the second classification stage, to present the audio session” (pg.4, particularly paragraph 0020; EN: this denotes reproducing the audio content using speakers with optimal audio presets).
“Present the audio session, with the implemented audio reproduction setting, to a device to output the audio session” (pg.4, particularly paragraph 0020; EN: this denotes reproducing the audio content using speakers with optimal audio presets).
However, Bharitkar fails to explicitly disclose, “with predetermined criteria”, “provide the metadata to a machine learning model.”
Guo discloses, “with predetermined criteria”, “provide the metadata to a machine learning model” (Pg.369, particularly C1, second to last paragraph; EN: this describes boosting algorithms, which focus on different aspects of the training data when the previous classifier fails to properly classify the data. When combined with the Bharitkar reference, this denotes changing the criteria at each stage of the classification as needed to optimize the classification).
Bharitkar and Guo are analogous art because both involve audio classification.
Before the effective filing date it would have been obvious to one skilled in the art of audio classification to combine the work of Guo and Bharitkar in order to select different data to examine at different stages of classification.
The motivation for doing so would be to allow “to combine a collection of weak classification functions (weak learner) to form a stronger classifier. AdaBoost is an adaptive algorithm to boost a sequence of classifiers, in that the weights are updated dynamically according to the errors in previous learning” (Guo, Pg.1201, Section 3, first paragraph) or in the case of Bharitkar, allow the system to focus on certain data in the first round and change to different data in the second round when the first round of classification is not successful.
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio classification to combine the work of Guo and Bharitkar in order to select different data to examine at different stages of classification.
Further, the Examiner cites MPEP 2144.04 to show that mere “rearrangement of parts” is not enough to make a claim novel over the prior art (see In re Japikse, 181 F.2d 1019, 86 USPQ 70 (CCPA 195). Here the Bharitkar reference clearly discloses using a first model, a neural network, which makes use of metadata to classify (see Bharitkar, Pg.4, paragraph 0022) and then proceeds to use other criteria in the second stage of classification using specific data from the audio (Bharitkar, Pg.8, paragraph 0032). Merely switching the order of these two algorithms as in the claims would be mere rearrangements of part, and therefore obvious to one of ordinary skill in the art at the time of filing. (see In re Japikse, 181 F.2d 1019, 86 USPQ 70 (CCPA 195).
As per claim 15, Bharitkar discloses, “wherein the second classification stage is after the first classification stage” (pg.8, particularly paragraph 0032; EN :this denotes the second stage which performs a different type of classification).
As per claim 16, Bharitkar discloses, “wherein the memory stores a file with the …” (pg.5, particularly paragraph 0024; EN: this denotes the data used to train the model) “wherein the file indicates the classification based on a source of content for the audio session” (pg.4, particularly paragraph 0022; EN; this denotes the sources of the file such as MPEG-2 transport stream, MPEG4 audio video file containers, etc).
Guo discloses, “predetermined criteria” (Pg.369, particularly C1, second to last paragraph; EN: this describes boosting algorithms, which focus on different aspects of the training data when the previous classifier fails to properly classify the data. When combined with the Bharitkar reference, this denotes changing the criteria at each stage of the classification as needed to optimize the classification).
As per claim 18, Bharitkar discloses, “wherein the classification includes one of the following: movie, music, or voice” (Pg.3, particularly paragraph 0018; EN: this denotes classifying as movie, music, or voice).
Claim Rejections - 35 USC § 103
Claims 3, 10-11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bharitkar et al (WO 2018199997 ) in view of Guo et al (“Boosting for Content-based Audio Classification and Retrieval: An Evaluation”) in claim 1 and further in view of Rajapakse (US 20160328396 A1).
As per claim 3, Bharitkar discloses, further comprising monitoring audio session activity to determine when the audio session becomes active” (pg.4, particularly paragraph 0020; EN: this denotes monitoring the audio and changing the settings as its classified).
Bharitkar fails to explicitly disclose, “further comprising monitoring audio session activity …using an application programming interface.”
Rajapakse discloses, “further comprising monitoring audio session activity …using an application programming interface.” (pg.3, particularly paragraph 0031; EN: this denotes using an API to integrate with audio sources).
Bharitkar and Rajapakse are analogous art because both involve audio processing.
Before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use APIs for audio data.
The motivation for doing so would be for “integrating with various additional products or services to enhance operation of a system 120, for example a software application programming interface (API) may enable integration with media playback software such as ITUNES for making a user’s media library available to a media indexing server 122 for indexing” (Rajapakse, Pg. 3, paragraph 0031) or in the case of Bharitkar allow the system to draw media files from wherever needed via API.
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use APIs for audio data.
As per claim 10, Bharitkar fails to explicitly disclose, “Further comprising implementing the audio reproduction setting by using a surround sound setting in response to classifying the audio session is classified as surround content.”
Rajapakse discloses, “Further comprising implementing the audio reproduction setting by using a surround sound setting in response to classifying the audio session is classified as surround content” (pg.5-6, particularly paragraph 0041; EN: this denotes the system setting up playback with the appropriate speaker versions for the audio file).
Bharitkar and Rajapakse are analogous art because both involve audio processing.
Before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use of the appropriate sound configuration for a file.
The motivation for doing so would be for “insuring the correct speaker version is streamed to the correct playback devices” (Rajapakse, Pg.5-6, paragraph 0041).
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use of the appropriate sound configuration for a file.
As per claim 11, Bharitkar fails to explicitly disclose, “Further comprising implementing the audio reproduction setting by using a stereo sound setting in response to classifying the audio session is classified as stereo content.”
Rajapakse discloses, “Further comprising implementing the audio reproduction setting by using a stereo sound setting in response to classifying the audio session is classified as stereo content” (pg.5-6, particularly paragraph 0041; EN: this denotes the system setting up playback with the appropriate speaker versions for the audio file).
Bharitkar and Rajapakse are analogous art because both involve audio processing.
Before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use of the appropriate sound configuration for a file.
The motivation for doing so would be for “insuring the correct speaker version is streamed to the correct playback devices” (Rajapakse, Pg.5-6, paragraph 0041).
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use of the appropriate sound configuration for a file.
As per claim 17, Bharitkar discloses, “wherein the processor is to monitor the audio session activation…” (pg.4, particularly paragraph 0020; EN: this denotes monitoring the audio and changing the settings as its classified).
Bharitkar fails to explicitly disclose, “…using an application programming interface.”
Rajapakse discloses, “…using an application programming interface” (pg.3, particularly paragraph 0031; EN: this denotes using an API to integrate with audio sources).
Bharitkar and Rajapakse are analogous art because both involve audio processing.
Before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use APIs for audio data.
The motivation for doing so would be for “integrating with various additional products or services to enhance operation of a system 120, for example a software application programming interface (API) may enable integration with media playback software such as ITUNES for making a user’s media library available to a media indexing server 122 for indexing” (Rajapakse, Pg. 3, paragraph 0031) or in the case of Bharitkar allow the system to draw media files from wherever needed via API.
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use APIs for audio data.
As per claim 19, Bharitkar discloses, “wherein to implement the audio reproduction setting includes: to implement a … setting when the classification is movie” (pg.3, particularly paragraph 0018; EN: this denotes different presets for movie, voice, and music).
“to implement a … setting when the classification is music” (pg.3, particularly paragraph 0018; EN: this denotes different presets for movie, voice, and music).
“to implement a monophonic setting when the classification is voice” (pg.3, particularly paragraph 0018; EN: this denotes different presets for movie, voice, and music).
However, Bharitkar fails to explicitly disclose, “surround sound setting’, “stereo setting’ and “monophonic setting.”
Rajapakse discloses, “surround sound setting’, “stereo setting’ and “monophonic setting” (pg.5-6, particularly paragraph 0041; EN: this denotes the system setting up playback with the appropriate speaker versions for the audio file).
Bharitkar and Rajapakse are analogous art because both involve audio processing.
Before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use of the appropriate sound configuration for a file.
The motivation for doing so would be for “insuring the correct speaker version is streamed to the correct playback devices” (Rajapakse, Pg.5-6, paragraph 0041).
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Rajapakse in order to make use of the appropriate sound configuration for a file.
Claim Rejections - 35 USC § 103
Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Bharitkar et al (WO 2018199997 ) in view of Guo et al (“Boosting for Content-based Audio Classification and Retrieval: An Evaluation”) in claim 1 and further in view of Gudorf et al (US 20140150023 A1).
As per claim 6, Bharitkar fails to explicitly disclose, “wherein, in response to determining that the audio session is not classifiable at the first classification stage, the method comprises determining whether the audio session corresponds to a supported browser process.”
Gudorf discloses, “wherein, in response to determining that the audio session is not classifiable at the first classification stage, the method comprises determining whether the audio session corresponds to a supported browser process” (pg.4, particularly paragraph 0046; EN: this denotes various browsers which can be used to deal with audio data and determining the proper format based upon that data. When combined with the Bharitkar reference, this data can be useful when classifying audio data as the format can provide information on what type of data is being looked at).
Bharitkar and Gudorf are analogous art because both involve audio processing.
Before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Gudorf in order to identify the source of audio including associated browsers.
The motivation for doing so would be to “determine what program mode will be required to playback a media asset or media service” (Gudorf, Pg.4, paragraph 0048) or in the case of Bharitkar, allow the system to consider the source when analyzing audio data in order to improve classification of that audio data.
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Gudorf in order to identify the source of audio including associated browsers.
As per claim 7, Bharitkar fails to explicitly disclose, “in response to determining that he audio session does not correspond to a supported browser process, the method comprises determining a media file handle corresponding to the audio session.”
Gudorf discloses, “in response to determining that he audio session does not correspond to a supported browser process, the method comprises determining a media file handle corresponding to the audio session” (pg.4-5, particularly paragraph 0047-0048 and table 1; EN: this denotes looking at the files for additional information regardless of the type of browser).
Bharitkar and Gudorf are analogous art because both involve audio processing.
Before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Gudorf in order to identify the source of audio including associated browsers.
The motivation for doing so would be to “determine what program mode will be required to playback a media asset or media service” (Gudorf, Pg.4, paragraph 0048) or in the case of Bharitkar, allow the system to consider the source when analyzing audio data in order to improve classification of that audio data.
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Gudorf in order to identify the source of audio including associated browsers.
Claim Rejections - 35 USC § 103
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Bharitkar et al (WO 2018199997 ) in view of Guo et al (“Boosting for Content-based Audio Classification and Retrieval: An Evaluation”) in claim 1 and further in view of Baumgartner et al (US 5642171).
As per claim 20, Bharitkar discloses, “wherein monitoring, by the processor, when the audio session becomes active includes determining when an applications initiated streaming audio” (pg.4, particularly paragraph 0022; EN: this denotes streaming data as part of the system).
However, Bharitkar fails to explicitly disclose, “to a sound card.”
Baumgartner discloses, “to a sound card” (C1, particularly L32-58; EN: this denotes the use of sound cards to control speakers).
Bharitkar and Baumgartner are analogous art because both involve audio processing.
Before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Baumgartner in order to use a sound card for audio processing.
The motivation for doing so would be to “provide audio output to speakers” (Baumgartner, C1, L31-49) or in the case of Bharitkar, allow the system to use a sound card to control the reproduction of the audio as needed.
Therefore before the effective filing date it would have been obvious to one skilled in the art of audio processing to combine the work of Bharitkar and Baumgartner in order to use a sound card for audio processing.
Response to Arguments
In pg.7, the Applicant argues in regards to the rejection under U.S.C. 101 of the independent claims,
In particular, claims 1, 12, and 14 recite implementing an audio reproduction setting, based on the classification from the first classification stage or the second classification stage, to present the audio session, and presenting the audio session using the implemented audio reproduction setting to a device to output the audio session. More specifically, systems and methods comprise a processor to monitor when an audio session becomes active and execute a hierarchical classification model to classify the audio session, wherein in the hierarchical classification model comprises determining, at a first classification stage, whether the audio session is classifiable with predetermined criteria, and in response to determining that the audio session is classifiable at a first classification stage, classifying the audio session at the first classification stage based on the predetermined criteria; in response to determining that the audio session is not classifiable at the first classification stage, classifying the audio session at a second classification stage based on a machine learning analysis of metadata, and implementing an audio reproduction setting, based on the classification from the first classification stage or the second classification stage, to present the audio session; and presenting the audio session using the implemented audio reproduction setting, to a device to output the audio session.
In response, the Examiner maintains the rejection as shown above. The Applicant appears to be arguing that the use of generic hardware such as a “device to output the audio session” and the use of generic computer hardware or machine learning algorithms causes the claim to be significantly more than the abstract idea. However, merely outputting audio with generic settings on a generic device, using generic machine learning models, and generic processors is not enough to make the claims significantly more than the abstract idea. As the claims contain no details about the device, machine learning model, or processor that amount to more than generic hardware, they are not enough to be significantly more than the abstract idea, and therefore the rejection is maintained as shown above.
In pg.7-8, the Applicant further argues in regards to the rejection under U.S.C. 101 of the independent claims,
On pages 2-11 of the Office Action, the Office alleges that claims 1, 12, and 14 recite "a memory, a non-transitory tangible computer readable medium, a processor (mere instructions to apply the exception using a generic computer component) a generic machine learning model that is no more than a generic, off the shelf machine learning algorithm." However, as noted above, claims 1, 12, and 14 further recite an audio session, a hierarchical classification model, a first classification stage, a second classification stage, an audio reproduction setting, an audio session, and presenting the audio session to a device to output the audio session. At least these elements of claims 1, 12, and 14 are not parts of a generic computer nor are they elements for mere data gathering. As such, the claim recites more than merely applying a machine learning analysis, as alleged by the Office. Rather, the claims call for implementing an audio reproduction setting and outputting the corresponding audio session using the classification executed by the processor of the claimed systems and methods. In this way, the claim as a whole uses any alleged judicial exceptions in conjunction with a particular machine that is integral to the claim. That is, the processes carried out by the hardware components recited in the claims ultimately generate an output that is an audio session based on the audio reproduction setting.
In response, the Examiner maintains the rejection as shown above. The Applicant appears to argue that they do not use generic hardware or machine learning models, but specific ones. However, there is no evidence of this statement. Merely repeating the claim limitations such as “hierarchical classification model”, “a first classification stage”, and “a second classification stage” does not make these anything different than generic machine learning models. The claims contain no details of these “stages” or what the model contains other than generic components. Applicant also argues that “an audio session”, “an audio reproduction setting”, “and “presenting the audio session to a device to output the audio session” is not generic. However, any computer having any type of sound based output device has “audio sessions”, has “an audio reproduction setting” (volume, for example), and can “present the audio session” by playing sounds using a generic computer. Nothing here is anything more than generic off the shelf computer hardware or machine learning models, and therefore the rejection is maintained as shown above.
In pg.8, the Applicant argues in regards to the rejection under U.S.C. 101,
Accordingly, the claims integrate any alleged judicial exceptions into a practical application because the claims as a whole are tied to the implementation of an audio reproduction setting for outputting an audio session. For instance, the interplay between monitoring an audio session becoming active and executing a hierarchical classification model to classify the audio session by a processor, determining and classifying the audio session with a first and a second classification stage, implementing an audio reproduction setting based on the classification to present the audio session, and presenting the audio session using the implemented audio reproduction setting to a device to output the audio session integrates any alleged judicial exceptions in conjunction with a particular machine, into a practical application. Particularly, the claims require the additional, specific steps enumerated above, which preclude the claims from generally monopolizing any underlying judicial exceptions.
In pg.8-9, the Applicant further argues in regards to the rejection under U.S.C. 101 of the independent claims,
Furthermore, claims 1, 12, and 14 are integrated into a practical application because they recite improvements to the functioning of a computer and technical field. More specifically, MPEP 2106.04(d)(I) provides example considerations of limitations that indicate a practical application of an abstract idea, such as improvements to the functioning of a computer or to any other technology or technical field. MPEP 2106.04(d)(1). Here, advantageously, use of the hierarchical classification recited in claims 1, 12, and 14 enables automatic selection of audio reproduction setting tailored to the received and classified audio session. See, Present Application, IT [0008], [0020], [0028]-[0030]. This can improve classification accuracy and efficiency of the audio content, particularly for applications involving mixed audios. Thus, the claims do not broadly cover a generic machine learning model but, rather, are directed to a specific improvement of this technological field of precise classification of media content. Furthermore, the use of machine learning analysis of metadata in claims 1, 12, and 14 increase classification accuracy and reduce processing resource usage. See Present Application, TT [0014]. Accordingly, the pending claims not only provide an improvement to the technical field of media content classification but also improve the functioning of computer-based audio classification by reducing processing resource usage.
In response, the Examiner maintains the rejection as shown above. Applicant appears to be arguing that they improve a technology and/or technological field of “automatic selection of audio reproduction setting tailored to the received and classified audio session.” However, this is not a technology. A human being can identify what they are listening to and change the settings accordingly. Making this step “automatic” is an improvement to the abstract idea of determining optimal audio settings for your audio data, not an improvement to a technology. Applicant then argues a technology of “classification of media content.” But once again, this is an abstract idea, as classifying media content is not a technology. Finally, Applicant argues that increasing classification accuracy and reducing processor usage by the machine learning analysis is an improvement to the computer. However, this is once again an improvement to the abstract idea of identifying media content. Using more or less data or particular data does not improve the processor or machine learning model, it improves the abstract idea. The processor and/or machine learning algorithm remains the same regardless of the data put into it, and therefore there is no improvement to the hardware or machine learning algorithm, and the rejection is maintained as shown above.
In pg.10, the Applicant argues in regards to the rejection under U.S.C. 103 of claim 1,
As will be articulated below, Bharitkar and Guo, whether alone or in combination, fail to teach or suggest implementing an audio reproduction setting based on a hierarchical classification approach, as recited in claim 1. In particular, the claimed hierarchical classification model includes a first classification stage and a second classification stage. When the audio session is not classifiable with the first classification stage, the audio session is passed on to the second classification stage, which performs machine learning analysis based on metadata to classify the audio session. See also, Present Application, 11 [0019]-[0021]. Bharitkar, on the other hand, does not teach or suggest such a hierarchical classification model with a two-step approach (e.g., moving to the second stage using machine learning if the audio session is not classifiable at the first stage). Instead, Bharitkar describes a dual-model classifier including separate machine learning algorithms 104, 154 that classify audio signal based on metadata from audio signal and decoded audio frames from audio signal, respectively. Bharitikar, " [0031]-[0032]. Bharitkar is silent on any determination of whether an audio session is "classifiable," let alone "determining, at a first classification stage, whether the audio session is classifiable with predetermined criteria; in response to determining that the audio session is classifiable at a first classification stage, classifying the audio session at the first classification stage based on the predetermined criteria; in response to determining that the audio session is not classifiable at the first classification stage, classifying the audio session at a second classification stage based on a machine learning analysis of metadata." as recited in claim 1. Therefore, Bharitkar does not teach or suggest classification using the second classification stage when the audio session is not classifiable at the first classification stage. Thus, Bharitkar fails to disclose an explicit determination of classifiability and a hierarchical classification model recited in claim 1.
In response, the Examiner maintains the rejection as shown above. The Bharitkar reference explicitly states that the secondary classification model is implemented when the first classification is deemed unreliable:
“In addition to trained machine learning model 104, audio signal classifier 100 includes a trained deep learning model 154 which is employed or ‘switched in’ when audio signal 106 is determined by a feature evaluator 140 to have confounding or invalid metadata (e.g. metadata is missing, there is contradictory metadata, the metadata has bonromal values, etc) or when output class values CV generated by trained machine learning model 104 are determined by a reliability evaluator 142 to be unreliable” (Bhatrikar reference, Pg.7-8, paragraph 0031).
Clearly this denotes only using the secondary model when the first model is not sufficient to properly classify the data. If the data IS reliable, that classification is kept and used by the system, and the second model is never used. The determination of “classifiable” is made on whether or not the output of the model is reliable. If it is reliable, it is classifiable, if it is not, then it is not classifiable by that model. Since this meets the broadest reasonable interpretation of the claims, the Examiner maintains the rejection as shown above.
In pg.11, Applicant further argues in regards to the rejection under U.S.C. 103 of claim 1,
Moreover, when the second machine learning algorithm 154 of Bharitkar is used, the outputs of its first machine learning algorithm and the second machine learning algorithm are both considered by a global decision model 150 to determine an audio class. See Bharitkar T [0039]. More specifically, the first machine learning algorithm 104 always provides output class values (CV), and the decision model 150 may use only those output class values (acting as a "pass-thru") or may consider both sets of output class values (CV, CV) from both machine learning algorithms 104, 154. Accordingly, Bharitkar's approach is fundamentally different from the hierarchical classification model as recited in claim 1. Thus, Bharitkar fails to teach or suggest "executing, by the processor, a hierarchical classification model to classify the audio session, including: determining, at a first classification stage, whether the audio session is classifiable with predetermined criteria; in response to determining that the audio session is classifiable at a first classification stage, classifying the audio session at the first classification stage based on the predetermined criteria; in response to determining that the audio session is not classifiable at the first classification stage, classifying the audio session at a second classification stage based on a machine learning analysis of metadata," as recited in claim 1.
In response, the Examiner maintains the rejection as shown above. The Claim at no time requires that only a single classification be used to determine the final classification. The claim limitation merely requires that a second classifier be used if the first classifier is deemed “not classifiable.” As discussed above, the “not classifiable” is the unreliable indicator of the first classification, with the second classifier then used. Simply because both output values can be used to determine by a final value does not stop these steps from being performed. Since the reference meets the broadest reasonable interpretation of the claims the Examiner maintains the rejection as shown above.
Applicant's remaining arguments with respect to claims 1-20 have been considered but are either repeats of the above arguments or are moot in view of the new ground(s) of rejection.
Conclusion
The examiner requests, in response to this Office action, support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line no(s) in the specification and/or drawing figure(s). This will assist the examiner in prosecuting the application.
When responding to this office action, Applicant is advised to clearly point out the patentable novelty which he or she thinks the claims present, in view of the state of the art disclosed by the references cited or the objections made. He or she must also show how the amendments avoid such references or objections See 37 CFR 1.111(c).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEN M RIFKIN whose telephone number is (571)272-9768. The examiner can normally be reached Monday-Friday 9 am - 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571) 270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BEN M RIFKIN/Primary Examiner, Art Unit 2123