DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on 18 October 2024, 07 February 2025, and 17 December 2025 is/are being considered by the examiner.
Claim Objections
Claims 17 and 18 are objected to because of the following informalities:
Regarding claim 17, the phrase “a given context parameter characterizing” at line 4 should read as “a given context parameter of the one or more context parameters characterizing”.
Regarding claim 18, the phrase “A system comprising, comprising” at line 1 should read as “A system .
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 7 and 8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 7, the alternative uses of the words value/values lacks clarity. Claim 7 recites that the system tracks “values of the audio parameter on a parametric space” at line 4. However, claim 1 establishes “a value of the audio parameter” at line 7. As values is a plural of value, claim 7 establishes two separate elements using the same name, while providing no clear connection between the two elements. Is the applicant attempting to modify the limitations recited in claim 1 (e.g., indicating that the updated value of the audio parameter in claim 1, is modified to refer only to modifications in a parametric space)? In the alternative, is this a separate claim part which incorporates the “value of the audio parameter” of claim 1 (e.g., a plurality of values, of which the “value of the audio parameter” is a constituent)? In yet another alternative, are the parts entirely unrelated?
Further, claim 7 then recites updating “the value of the audio parameter on the parametric space according to the description of the audio parameter” at lines 5-6. However, “the value of the audio parameter on the parametric space” at line 5 relies on the plurality of values recited in “values of the audio parameter on a parametric space” at line 4 for antecedent basis. In light of the lack of clarity regarding claim 1, it is further unclear what value of the values is being updated at the updating step in line 5 and/or any relationship of this updating to the updating previously described in claim 1. Therefore, claim 7 lacks clarity and is rejected.
Regarding claim 8, the phrase “over instructions transmitted…” is ambiguous. Claim 8 recites the phrase “over instructions transmitted…” at line 3. However, the limitation fails to include the necessary context for the phrase “instructions transmitted” such that “over” has a clear meaning. Specifically, the claim does not recite in what way the “change in the values of the audio parameter” is understood to be “over” the “instructions transmitted.”
It is noted that applicant provides a clarifying alternative explanation in the specification. In [0117], applicant explains “That is, the adaptive control system 150 updates the value of the audio parameter on the parametric space at each instruction transmitted to the audio rendering system, and those updated values are included in the tracked values of the audio parameter used as input to the machine learning model.” Though this sentence provides context for interpretation, it does not act as an express definition of the indicated phrase. Further, the above interpretations or others not listed here are neither excluded nor required by this clarification. As such, claim 8 lacks clarity and is rejected.
Appropriate correction is required.
Claim Rejections - 35 USC § 101
35 U.S.C. §101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim(s) 1-20 are rejected under 35 U.S.C. §101 because the claimed invention is directed to an abstract idea without significantly more.
To determine subject matter eligibility for each of the recited claims above, we turn to the subject matter eligibility test, also referred to as the Alice/Mayo test, described in MPEP 2106. Regarding step 1 of the subject matter eligibility test, we first determine if the claims are directed to a statutory category. The independent claim(s) 1, and mutatis mutandis claim(s) 18 and 20, recite “receive a natural language instruction referencing at least an audio parameter of an audio rendering system and a description of the audio parameter; determine a value of the audio parameter using a machine learning model trained to, based on the natural language instruction, determine the audio parameter; and transmit an instruction to the audio rendering system to update the value of the audio parameter according to the description of the audio parameter” As the claims recites at least a process, the claim is directed to one of the statutory categories under step 1 of the subject matter eligibility test.
In Step 2A of the test, which is a Two Prong analysis, we then determine if the claim is directed to a judicial exception. For Step 2A, Prong One, we first ask if the claim recites an abstract idea, Law of Nature, or Natural Phenomena. Regarding claim(s) 1, 18, and 20, the limitations of “receive…”, “determine…”, and “transmit…” as drafted cover managing personal behavior or relationships or interactions between people, which is a method of organizing human activity. More specifically, a first person receives a natural language instruction related to audio parameters from a second person (e.g., “The audio doesn’t sound right. Can you make it warmer?”). The first person then determines one or more parameters which can be adjusted to fulfill the instruction (e.g., determining based on a mix of the instruction and knowledge in the field one or more parameters which can be adjusted to make the audio “warmer”). The first person then interacts with the audio system to achieve those results, which correspond to the second person’s instruction (e.g., making actual changes in the audio corresponding to the audio parameters, using a mix of known hardware and software, to achieve the desired warmth of the audio). Therefore, the claims are directed to human activity, and, thus, directed to an abstract idea which is a judicial exception.
In Step 2A, Prong Two of the analysis, we next determine if the claim recites additional elements which integrate the judicial exception into a practical application. The judicial exception recited in claims 1, 18, and 20 is not integrated into a practical application. In particular, claim(s) 1, 18, and 20 recite additional elements of “transmit an instruction to the audio rendering system to update the value of the audio parameter”. However, the claim does not affect a physical transformation or improve the functioning of a computer. The final step is merely to “transmit an instruction… to update the value”. Outputting or transmitting the result of the abstract mental process is considered an extra-solution activity, which does not confer patentability on the abstract idea. Accordingly, the additional elements fail to integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Regarding Step 2B of the analysis, we next determine if the claim recites additional elements which amount to substantially more than the judicial exception. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The additional element of using a “processor,” “computer readable storage medium”, or the “machine learning model” to perform the audio modification described with regards to the human activity amounts to no more than mere instructions to apply the exception using generic computer components. The machine learning model recited is a generic computer component, recited at a high level of generality, which is trained to achieve the human activity result (i.e., determine the audio parameter). It fails to show a specific technical solution to a technical problem or a physical transformation. It does not claim a specific hardware configuration or a non-conventional machine learning architecture that improves the functioning of the computer itself. Mere instructions to apply an exception using a generic computing device or general purpose computer component cannot provide an inventive concept. (See Alice Corp. v. CLS Bank, 573 U.S. 208, 221, 223, 110 USPQ2d 1976, 1982-84 (2014) (quoting Mayo Collaborative Servs. V. Prometheus Labs., Inc., 566 U.S. 66, 72, 101 USPQ2d 1961, 1965).
As well, the “processor” and the “computer readable storage medium” are general-purpose computer components with no provisions for the practical application of the abstract idea. The “computer readable storage medium,” and the “processor,” is/are not meaningfully integrated into the practical application of the abstract ideas recited in claim(s) 1, 18, and 20. The system is described in the context of a processor performing a generic function in light of instructions stored in the memory. Processor and computer readable storage medium, as integrated in and implemented through the system, are recited at a high-level of generality (i.e., as a generic processor of a computing device performing a generic computer function based on instructions and a machine learning model stored in a generic memory, the function being determining audio parameters to modify in response to a natural language instruction) such that it amounts no more than mere instructions to apply the exception using a generic computing device and/or a generic computer component. As well, the remaining claim limitations are well-known, routine, and conventional such as to not qualify as an inventive concept. Specifically, transmission of data over a network is well known, as is evidenced by OIP Techs, Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015). The court has consistently held that “[s]teps that do nothing more than spell out what it means to ‘apply it on a computer’ cannot confer patent-eligibility.” Intellectual Ventures I LLC v. Capital One Bank (USA), 792 F.3d 1363, 1371-72 (Fed. Cir. 2015)(citing Alice, 134 S.Ct. at 2359 (warning against a § 101 analysis that turns on the draftsman's art (citing Parker v. Flook, 437 U.S. 584, 593, 98 S.Ct. 2522, 57 L.Ed.2d 451 (1978))).
Therefore, and in light of the preceding analysis, the claims do not amount to significantly more than the judicial exception. For these reasons, claims 1, 18, and 20 are not patent eligible.
With respect to claim(s) 2, and 19, the claims relate to requesting user feedback and updating a determination process based on that feedback. As performed by a person, these steps appear to refer to the administrative and mental processes of asking the second person if they are satisfied with the adjustment and remembering their preference for future sessions. No additional limitation is present.
With respect to claim(s) 3, the claim relates to receiving an instruction which references the audio parameter indirectly using descriptive keywords. As performed by a person, these steps appear to refer to the mental process of interpreting an indirect adjective and correlating it to a specific technical audio control. No additional limitation is present.
With respect to claim(s) 4, the claim relates to receiving the instruction via spoken words. As performed by a person, these steps appear to refer to the mental process of simply listen to a person speak a request. No additional limitation is present.
With respect to claim(s) 5, the claim relates to limiting the adjustment to specific types of audio parameters. As performed by a person, these steps appear to refer to the mental process of selecting a specific type of audio knob or fader to adjust on a mixing board, each of which represent generic data categories. No additional limitation is present.
With respect to claim(s) 6, the claim relates to identifying an active application and selecting a specific translation model based on that application. As performed by a person, these steps appear to refer to the mental process of applying commands and rules based on the limits and constraints of the system. No additional limitation is present.
With respect to claim(s) 7, the claim relates to tracking/updating the status of an audio parameter on a coordinate system. As performed by a person, these steps appear to refer to the clerical and mathematical process of writing down current settings (e.g., of a dial) on the mixing board and calculating the next position. No additional limitation is present.
With respect to claim(s) 8, the claim relates to maintaining a historical log of changes to the audio parameters. As performed by a person, these steps appear to refer to the clerical process of keeping a ledger of the various changes over time. No additional limitation is present.
With respect to claim(s) 9, the claim relates to mapping an instruction to a predetermined fixed value for the audio parameter. As performed by a person, these steps appear to refer to the mental process of consulting a lookup table in response to a specific request. No additional limitation is present.
With respect to claim(s) 10, the claim relates to mapping an instruction to the absolute upper or lower limit of the audio parameter. As performed by a person, these steps appear to refer to the mental process of understanding an utterance as requesting a maximum or minimum (e.g., “Not warm enough. Make it hot!”) and responding accordingly. No additional limitation is present.
With respect to claim(s) 11, the claim relates to associating the instruction to a specific degree of change. As performed by a person, these steps appear to refer to the mental process of determining how much to change the parameter (e.g., “a lot” vs. “a little”) based on the language used. No additional limitation is present.
With respect to claim(s) 12, the claim relates to scaling the degree of change based on the normalized mathematical range. As performed by a person, these steps appear to refer to the mathematical process of converting a relative abstract value to a value within an established range. No additional limitation is present.
With respect to claim(s) 13, the claim relates to using a trained model to determine a specific amount of adjustment. As performed by a person, these steps appear to refer to the mental process of using past experience to determine an amount of change desired in a user’s instructions, as presented in the context of a generic description of how an ordinary machine learning model learns. No additional limitation is present.
With respect to claim(s) 14, the claim relates to using device information to determine the audio parameter and degree of change. As performed by a person, these steps appear to refer to the mental process of considering the physical capabilities and limitations of the device to determine the appropriate change in response to the user’s request. No additional limitation is present.
With respect to claim(s) 15, the claim relates to compiling a training dataset and bounded by specific limits and training the model with it. As performed by a person, these steps appear to refer to the administrative and mathematical process of applying the limits of the system to the amount of change prescribed in response to the instruction. No additional limitation is present.
With respect to claim(s) 16, the claim relates to determining multiple audio parameters from a single instruction. As performed by a person, these steps appear to refer to the mental process of determining that multiple audio parameter changes are needed to achieve a single audio output result (e.g., make it sound like a stadium). No additional limitation is present.
With respect to claim(s) 17, the claim relates to evaluating the environment context to determine a suggested audio adjustment and prompting the user. As performed by a person, these steps appear to refer to the mental and administrative process of recognizing a factor in the user’s environment (e.g., heavy rain outside is creating noise, which is likely distorting the sound of the music), selecting an audio improvement to compensate, and providing the recommendation to the user for approval. No additional limitation is present.
These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception. As such, for the same reasons as described above with reference to independent claim(s) 1, 18, and 20, dependent claim(s) 2-17, and 19 are not patent eligible.
Appropriate correction is required.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1, 3-5, 7-13, 15-16, 18, and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Non-Patent Literature to Stasis (Spyridon, S., 2020. Audio equalisation using natural language (Doctoral dissertation, Birmingham City University)., hereinafter Stasis).
Regarding claim 1, Stasis discloses A non-transitory computer readable storage medium storing executable instructions that, when executed by one or more processors, cause the one or more processors to (Systems and methods disclosed with relation to “use of descriptive language as a medium for controlling equalisation parameters” for “mixing and mastering tasks in the digital domain” as implemented using “a computer and speakers” where a computer is well known in the relevant art to include both a processor and computer readable storage media containing instructions to perform the desired tasks; Stasis, ¶ pg. ii, lines 3-6; pg. 70, lines 6-7): receive a natural language instruction (“The model of stacked autoencoders is then used to create a novel audio production interface, by which users are able to control equalisation parameters based on descriptive language,” which is also referred to as “user input”; Stasis, ¶ pg. i, lines 26-29; p.51, lines 17-22) referencing at least an audio parameter of an audio rendering system (Discloses “control[ling] equalisation parameters,” which does reference said equalization parameters.; Stasis, ¶ pg. i, lines 26-29) and a description of the audio parameter (The equalization parameters are controlled “based on descriptive language” where the descriptive language includes words such as warmth to describe the “warmth of the signal” which corresponds to “altering the timbral quality of the input sound” {description of the audio parameter}; Stasis, ¶ pg. i, lines 26-29, pg. 4, lines 8-17); determine a value of the audio parameter using a machine learning model (discloses a “smoothing process” for “retain[ing] all salient information” from the natural language input, followed by generating “the smoothed EQ parameters” using “the sAE model”, which is a stacked autoencoder (sAE) deep neural network.; Stasis, ¶ pg. 117, lines 4-23) trained to, based on the natural language instruction, determine the audio parameter (discloses “training the systems through user input, and mapping new parameters based on the nearest neighbour technique” where “The model is trained to encode a given input x into a representation c(x)” using n hidden layers (e.g., 3) where the “output of the [previous] hidden layer is then used as the input for the next hidden layer, and it is trained as in step (1)”, and the final representation c(x) is provided as input to the n decoder layers, where the “process can then be reversed in the decoder part of the system, by transposing the weights (Wi) of the hidden layers to retrieve the reconstructed dataset (c(x))” and the reconstructed dataset is the equalization parameters.; Stasis, ¶ pg. 119-120 (inclusive); pg. 146, lines 9-11); and transmit an instruction to the audio rendering system to update the value of the audio parameter according to the description of the audio parameter (“Finally, after the system has been trained, new user input can be passed through its functions, which will be unweighted and rescaled to account for the original transformations, and produce new high-dimensional parameters... for controlling the EQ”; Stasis, ¶ p.51, lines 17-22).
Regarding claim 3, Stasis discloses wherein the natural language instruction references the audio parameter indirectly using one or more descriptive keywords (“new user input can be passed through… and produce new high-dimensional parameters... for controlling the EQ through a timbral space (two dimensional plane) rather than the original, technical parameters (i.e. high dimensional space)” based on the principles of semantic equalization, while “taking into account information inherent to the input audio (audio features).”; Stasis, ¶ pg. 51, lines 5-22).
Regarding claim 4, Stasis discloses wherein the natural language instruction is a user utterance (the “new user input” as applied to “produce new high-dimensional parameters... for controlling the EQ” is a user utterance.; Stasis, ¶ pg. 51, lines 5-22).
Regarding claim 5, Stasis discloses wherein the audio parameter is one of spatial processing side gain, low-frequency processing compression ratio, low-frequency processing makeup gain, mid-frequency processing gain, high-frequency processing gain, or voice processing gain (“The first warm sub-representation presents a boost on the low-end that takes the shape of a shelving filter and therefore will be called low-shelf boost (LSB) warm. The second warm sub-representation is depicted with a boost on the low-mid range, and will be called low-mid boost (LMB) warm. Finally, the third warm sub-representation displays a higher cut-off in the low range, while most energy is concentrated on the mid and high-mid frequencies. This will be called high-mid boost (HMB) warm.”; Stasis, ¶ pg. 136, lines 7-14).
Regarding claim 7, Stasis discloses wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: track values of the audio parameter on a parametric space (Discloses “map the high-dimensional parameters by achieving a minimal reconstruction error, given a new set of (x; y) coordinates”; Stasis, ¶ pg. 152, lines 7-8); and update the value of the audio parameter on the parametric space according to the description of the audio parameter (“Finally, after the system has been trained, new user input can be passed through its functions, which will be unweighted and rescaled to account for the original transformations, and produce new high-dimensional parameters,” thus passing the user’s chosen input coordinate from the two-dimensional space through a reconstruction function to generate and update the equalisation parameters; Stasis, ¶ p.51, lines 17-22).
Regarding claim 8, Stasis discloses wherein the tracked values of the audio parameter comprise a change in the values of the audio parameter over instructions transmitted to the audio rendering system (Teaches capturing the changes in parameter values as the user actively transmits continuous movement instructions via a “slider interface” to process the audio; Stasis, ¶ pg. 170, lines 6-8).
Regarding claim 9, Stasis discloses wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: determine, based on the natural language instruction, a predefined value of the audio parameter on a parametric space, (Discloses an example embodiment for a dataset, where “The dataset for training the model comprises 800 semantically annotated EQ parameter settings.”; Stasis, ¶ pg. 147, lines 23-29) wherein the predefined value corresponds to the updated value of the audio parameter (Discloses that the semantic class centroids from the low dimensional space correlate directly with the reconstructed high-dimensional parameters utilized to generate the updated EQ curves.; Stasis, ¶ pg. 172, lines 1-9; Table. 6.).
Regarding claim 10, Stasis discloses wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: determine... one of a maximum value or a minimum value of the audio parameter on a parametric space, (Establishes a parametric space and explicitly defines “the pmin and pmax” which during scaling “represent the minimum and maximum values for each parameter, while qmin and qmax represent the target range”; Stasis, ¶ pg. 153, lines 17-21; Equation 6.) wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: determine, based on the natural language instruction, one of a maximum value or a minimum value of the audio parameter on a parametric space, (Discloses using “dialogflow” interpretations of a natural language statement to seek “corresponding digital signal processing parameters 42”, such as from a “database 40”, thus mapping any value to a specified database value, which in the case of extreme natural language modifiers, such as, for example maximum warmth” or “remove brightness”, would be understood by one having ordinary skill in the art, to resolve to the respective directly defined maximum and minimum values, as disclosed in Stasis.; Lam, ¶ [0021]-[0023]) wherein the maximum value or the minimum value corresponds to the updated value of the audio parameter (Further explains that the “rescaling process needs to take place before the parameter values are passed to the EQ... in order to ensure that the parameters will be of the required range to appropriately alter the EQ characteristics,” thus the rescaled value for any value inclusive of any selected value in the range (e.g., the maximum and/or minimum value), correspond to the updated value of the audio parameter.; Stasis, ¶ pg. 153, lines 13-21; Equation 6.).
Regarding claim 11, Stasis discloses wherein the description of the audio parameter is associated with a degree of change to adjust the audio parameter (Specifically discloses associating descriptive semantic terms with a continuous degree of change, which is achieved by grading terms on semantic scales and by providing users with a slider that allows them to continuously modulate the degree of application between different semantic descriptors.; Stasis, ¶ pg. i, lines 26-29; pg. 13, lines 15-21).
Regarding claim 12, Stasis discloses wherein the degree of change is within a normalized range of values between -1 and 1 (discloses that “all the parameters/dimensions of the dataset are converted to a range of 0 < pn < 1. This ensures that the system will not be altered due to the existence of different ranges in the dataset” and to avoid bias “towards the dimension displaying the highest variance.”; Stasis, ¶ pg. 153, lines 10-17), and wherein the normalized range corresponds to a range of decibel values (Teaches that the normalized range is rescaled and applied directly to the equalisation filter’s gain, which operates in dB. (e.g., “a range of -20 up to +20 dB”); Stasis, ¶ pg. 84, lines 9-11, pg. 174 Figure 6.1).
Regarding claim 13, Stasis discloses wherein the machine learning model is further trained to, based on the natural language instruction and tracked values of the audio parameter on a parametric space, determine the degree of change (Discloses training the stacked autoencoder {machine learning model} to correlate semantic annotation {natural language instructions} with “(x,y) coordinates in a Cartesian space {parametric space}.” The user provides input via this space and the “decoder layers of the sAE” use those coordinates to produce a “reconstruction” and determine the specific degree of change for the actual equalizer parameters.; Stasis, ¶ pg. 174, lines 1-15).
Regarding claim 15, Stasis discloses wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: receive parametric space boundaries limiting the value of audio parameters on the parametric space (Establishes a parametric space and explicitly defines “the pmin and pmax” which during scaling “represent the minimum and maximum values for each parameter, while qmin and qmax represent the target range” to limit the values of the parameters within the space before passing them into the neural network to prevent bias and ensure the system operates within defined limits.; Stasis, ¶ pg. 153, lines 17-21; Equation 6.); create a training set comprising natural language instructions labeled with one of an audio parameter or degree of change (the “data for performing semantic transformations “ of the descriptors is collected, where, in one example, the “dataset for training the model comprises 800 semantically annotated EQ parameter settings” generated based on “40 participants” being “asked to equalise 10 musical instrument samples to achieve the two timbral adjectives.”; Stasis, ¶ pg. 147, lines 22-29); and train the machine learning model using the parametric space boundaries and the training set (“in this instance the system is trained to reconstruct parameters of two distinct timbral adjectives. As depicted in Figure 6., apart from the data gathering and preparation (scaling), a weighting process is applied to make the system input dependent. Following that process the system incorporates dimensionality reduction and parameter reconstruction in order to find connections between the high-dimensional parameter space and its two-dimensional representation.”; Stasis, ¶ pg. 151, lines 12-18).
Regarding claim 16, Stasis discloses wherein the natural language instruction further references another audio parameter of the audio rendering system and another description of the other audio parameter (Teaches a machine learning model which translates a single natural language instruction (e.g., “warmth”) into simultaneous adjustments across multiple distinct audio parameters (e.g., adjusting both a low shelf filter gain and a peak filter gain at the same time), as the machine learning model reconstructs the entire high-dimensional parameter space required to set the 5-band EQ from a single natural language input.; Stasis, ¶ pg. 83, lines 1-10, Table 4.; pg. 147 (inclusive)), and wherein the machine learning model is further trained to determine the other audio parameter (The model is specifically trained to output a multi-dimensional parameter vector rather than a single parameter.; Stasis, ¶ pg. 152, line 14 - pg. 153, line 4).
Regarding claim 18, Stasis discloses A system comprising, comprising: one or more processors; and a non-transitory computer readable storage medium storing executable instructions that, when executed by the one or more processors, cause the one or more processors to (Systems and methods disclosed with relation to “use of descriptive language as a medium for controlling equalisation parameters” for “mixing and mastering tasks in the digital domain” as implemented using “a computer and speakers” where a computer is well known in the relevant art to include both a processor and computer readable storage media containing instructions to perform the desired tasks; Stasis, ¶ pg. ii, lines 3-6; pg. 70, lines 6-7): receive a natural language instruction (“The model of stacked autoencoders is then used to create a novel audio production interface, by which users are able to control equalisation parameters based on descriptive language,” which is also referred to as “user input”; Stasis, ¶ pg. i, lines 26-29; p.51, lines 17-22) referencing at least an audio parameter of an audio rendering system (Discloses “control[ling] equalisation parameters,” which does reference said equalization parameters.; Stasis, ¶ pg. i, lines 26-29) and a description of the audio parameter (The equalization parameters are controlled “based on descriptive language” where the descriptive language includes words such as warmth to describe the “warmth of the signal” which corresponds to “altering the timbral quality of the input sound” {description of the audio parameter}; Stasis, ¶ pg. i, lines 26-29, pg. 4, lines 8-17); determine a value of the audio parameter using a machine learning model (discloses a “smoothing process” for “retain[ing] all salient information” from the natural language input, followed by generating “the smoothed EQ parameters” using “the sAE model”, which is a stacked autoencoder (sAE) deep neural network.; Stasis, ¶ pg. 117, lines 4-23) trained to, based on the natural language instruction, determine the audio parameter (discloses “training the systems through user input, and mapping new parameters based on the nearest neighbour technique” where “The model is trained to encode a given input x into a representation c(x)” using n hidden layers (e.g., 3) where the “output of the [previous] hidden layer is then used as the input for the next hidden layer, and it is trained as in step (1)”, and the final representation c(x) is provided as input to the n decoder layers, where the “process can then be reversed in the decoder part of the system, by transposing the weights (Wi) of the hidden layers to retrieve the reconstructed dataset (c(x))” and the reconstructed dataset is the equalization parameters.; Stasis, ¶ pg. 119-120 (inclusive); pg. 146, lines 9-11); and transmit an instruction to the audio rendering system to update the value of the audio parameter according to the description of the audio parameter (“Finally, after the system has been trained, new user input can be passed through its functions, which will be unweighted and rescaled to account for the original transformations, and produce new high-dimensional parameters... for controlling the EQ”; Stasis, ¶ p.51, lines 17-22).
Regarding claim 20, Stasis discloses A method comprising (Systems and methods disclosed with relation to “use of descriptive language as a medium for controlling equalisation parameters”; Stasis, ¶ pg. ii, lines 3-6): receiving a natural language instruction (“The model of stacked autoencoders is then used to create a novel audio production interface, by which users are able to control equalisation parameters based on descriptive language,” which is also referred to as “user input”; Stasis, ¶ pg. i, lines 26-29; p.51, lines 17-22) referencing at least an audio parameter of an audio rendering system (Discloses “control[ling] equalisation parameters,” which does reference said equalization parameters.; Stasis, ¶ pg. i, lines 26-29) and a description of the audio parameter (The equalization parameters are controlled “based on descriptive language” where the descriptive language includes words such as warmth to describe the “warmth of the signal” which corresponds to “altering the timbral quality of the input sound” {description of the audio parameter}; Stasis, ¶ pg. i, lines 26-29, pg. 4, lines 8-17); determining a value of the audio parameter using a machine learning model (discloses a “smoothing process” for “retain[ing] all salient information” from the natural language input, followed by generating “the smoothed EQ parameters” using “the sAE model”, which is a stacked autoencoder (sAE) deep neural network.; Stasis, ¶ pg. 117, lines 4-23) trained to, based on the natural language instruction, determine the audio parameter (discloses “training the systems through user input, and mapping new parameters based on the nearest neighbour technique” where “The model is trained to encode a given input x into a representation c(x)” using n hidden layers (e.g., 3) where the “output of the [previous] hidden layer is then used as the input for the next hidden layer, and it is trained as in step (1)”, and the final representation c(x) is provided as input to the n decoder layers, where the “process can then be reversed in the decoder part of the system, by transposing the weights (Wi) of the hidden layers to retrieve the reconstructed dataset (c(x))” and the reconstructed dataset is the equalization parameters.; Stasis, ¶ pg. 119-120 (inclusive); pg. 146, lines 9-11); and transmitting an instruction to the audio rendering system to update the value of the audio parameter according to the description of the audio parameter (“Finally, after the system has been trained, new user input can be passed through its functions, which will be unweighted and rescaled to account for the original transformations, and produce new high-dimensional parameters... for controlling the EQ”; Stasis, ¶ p.51, lines 17-22).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2, 14, 17, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stasis as applied to claim 1 and 13 above, and further in view of Meacham (U.S. Pat. No. 9886954, hereinafter Meacham).
Regarding claim 2, the rejection of claim 1 is incorporated. Stasis discloses all of the elements of the current invention as stated above. However, Stasis fail(s) to expressly recite wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: generate a natural language prompt requesting feedback from a user regarding the updated value of the audio parameter, wherein the transmitted instruction further causes the audio rendering system to output the natural language prompt; receive the feedback from the user; and re-train the machine learning model based on the feedback.
Meacham teaches systems and methods for a “context aware hearing optimization engine.” (Meacham, ¶ Col. 2, lines 27-28). Regarding claim 2, Meacham teaches wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: generate a natural language prompt requesting feedback from a user regarding the updated value of the audio parameter, ("At 608, feedback may be received from a user of the personal audio system. The feedback can be positive feedback. The personal audio system may provide a voice or audio prompt asking if the determined action is the correct action to perform and receive an input (verbal or through a user interface) from the user on whether the determined action is correct. "; Meacham, ¶ Col. 23, lines 56-62) wherein the transmitted instruction further causes the audio rendering system to output the natural language prompt (The prompt, which is output as part of the transmitted instruction, for "asking if the determined action is correct" may be a "voice or audio prompt"; Meacham, ¶ Col. 23, lines 56-62); receive the feedback from the user (The system can then "receive an input (verbal or through a user interface) from the user on whether the determined action is correct."; Meacham, ¶ Col. 23, lines 56-62); and re-train the machine learning model based on the feedback ("At 610, the feedback is provided to CAHOE and may be used to adjust at least one of the machine learning models of the hierarchical machine learning environment."; Meacham, ¶ Col. 23, line 64 - Col. 24, line 3).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the natural language audio equalization systems of Stasis to incorporate the teachings of Meacham to include wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: generate a natural language prompt requesting feedback from a user regarding the updated value of the audio parameter, wherein the transmitted instruction further causes the audio rendering system to output the natural language prompt; receive the feedback from the user; and re-train the machine learning model based on the feedback. Stasis teaches the use of machine learning to map natural language onto audio parameters, but provides a largely passive interface where the user must manually recognize the need for an adjustment. Meacham teaches a context aware hearing optimization engine which cures the limitations of Stasis in this regard, by utilizing hierarchical machine learning models to actively monitor the environment, determine recommended audio adjustments, and generate conversational prompts requesting user confirmation, which improves the user experience by anticipating user needs and desires before receipt of the request, while still leaving room for user motivated and controlled modifications, as recognized by Meacham. (Meacham, ¶ Col. 2, lines 27-54).
Regarding claim 14, the rejection of claim 13 is incorporated. Stasis discloses all of the elements of the current invention as stated above. However, Stasis fail(s) to expressly recite wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: receive device information from the audio rendering system, wherein the machine learning model is further trained to determine the audio parameter and the degree of change based on the device information.
The relevance of Meacham is described above with relation to claim 2. Regarding claim 14, Meacham teaches wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: receive device information from the audio rendering system, (Discloses a machine learning model 512 which "is configured to receive the output data from machine learning model 510, and a combination of current contextual state 522, one or more stored local contextual states 524, and/or one or more stored global contextual states 526," where the "Machine learning model 512 can select an action to perform with respect to the personal audio system based in part on... local contextual states 524", which includes "a state of the user’s personal audio system at a particular moment in time and an associated action to perform with respect to the user’s personal audio system"; Meacham, ¶ Col. 15, lines 3-12; Col. 21, lines 45-58) wherein the machine learning model is further trained to determine the audio parameter and the degree of change based on the device information ("Machine learning model 512 can be trained with...one or more local contextual states 524" as part of being "trained to determine an action to select that is specific to the user" which is understood to include the output of personal sound, defined as "sound that has been processed, modified, or tailored in accordance with a user’s personal preferences," and where a user’s personal preferences include user input (e.g., feedback).; Meacham, ¶ Col. 5, lines 41-56; Col. 22, lines 22-38).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the natural language audio equalization systems of Stasis to incorporate the teachings of Meacham to include wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: receive device information from the audio rendering system, wherein the machine learning model is further trained to determine the audio parameter and the degree of change based on the device information. Stasis teaches the use of machine learning to map natural language onto audio parameters, but provides a largely passive interface where the user must manually recognize the need for an adjustment. Meacham teaches a context aware hearing optimization engine which cures the limitations of Stasis in this regard, by utilizing hierarchical machine learning models to actively monitor the environment, determine recommended audio adjustments, and generate conversational prompts requesting user confirmation, which improves the user experience by anticipating user needs and desires before receipt of the request, while still leaving room for user motivated and controlled modifications, as recognized by Meacham. (Meacham, ¶ Col. 2, lines 27-54).
Regarding claim 17, the rejection of claim 1 is incorporated. Stasis discloses all of the elements of the current invention as stated above. However, Stasis fail(s) to expressly recite wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: determine one or more context parameters, a given context parameter characterizing a context in which a user consumes audio from the audio rendering system; determine a recommended audio parameter and a recommended degree of change to adjust the recommended audio parameter using a second machine learning model trained to, based on the one or more context parameters, determine the recommended audio parameter and the recommended degree of change; and generate a natural language prompt recommending that the user apply an audio adjustment to the audio rendering system based on the recommended audio parameter and the recommended degree of change.
The relevance of Meacham is described above with relation to claim 2. Regarding claim 17, Meacham teaches wherein the machine learning model is a first machine learning model, wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: determine one or more context parameters (Discloses a hierarchical machine learning environment including a plurality of machine learning models receiving "context aware processing parameters"; Meacham, ¶ Col. 21, lines 45-55), a given context parameter characterizing a context in which a user consumes audio from the audio rendering system ("The one or more context aware processing parameters can include current contextual state, one or more stored local contextual states, and/or one or more stored global contextual states" where the current contextual state "can include at least one of: location information, time-based data, activity data, device settings, situation data, conversation data, application information and/or sensor data" where at least location information and situation data, each regarding a current contextual state for "selecting a set of audio processing parameters that change the manner in which a user experiences his or her ambient sound environment" as received from a "personal audio system 140" is characterizing a context in which a user consumes audio from the audio rendering system.; Meacham, ¶ Col. 5, lines 25-40; Col. 21, lines 45-55); determine a recommended audio parameter and a recommended degree of change to adjust the recommended audio parameter (Discloses a hierarchical machine learning environment, where the topmost model (Machine Learning Model 512) is trained to receive the "current contextual state" and determine an "action set" which can include a processing parameter set, where the "processing parameter set may define the type and degree of one or more processes to be performed on the ambient audio stream... may include numerical parameters, filter models, software instructions, and other information" and, in some examples, "may define filtering by a low pass filter with a particular cut-off frequency (the frequency at which the filter start to attenuate) and slope (the rate of change of attenuation with frequency) and/or compression using a particular function (e.g. logarithmic)" {to adjust the recommended audio parameter} which may be with respect to a "previous contextual state of the user and the user’s device with an action set" {...a recommended degree of change}; Meacham, ¶ Col. 12, lines 15-32; Col. 21, lines 45-55; Col. 22, lines 43-59) using a second machine learning model trained to, based on the one or more context parameters, determine the recommended audio parameter and the recommended degree of change ("Machine learning model 512 can be trained with different combinations of real characteristics, current contextual state 522, one or more local contextual states 524, and/or one or more global contextual states 526" as part of being "trained to determine an action set to be selected based on the real characteristics of the ambient audio stream and one or more indicators that include an ambient sound profile, context information, and/or a strength of recommendation" where the action set includes the processing parameter set, which, as explained previously, includes the recommended audio parameter and the recommended degree of change.; Meacham, ¶ Col. 5, lines 41-56; Col. 12, lines 15-32; Col. 21, lines 45-55; Col. 22, lines 22-38, and 43-59); and generate a natural language prompt recommending that the user apply an audio adjustment to the audio rendering system ("A voice prompt {a natural language prompt} requesting verification from the user of the personal audio system to perform the command may be provided," where a command can include the "process[ing], modif[ying], or tailor[ing]" of "sound... in accordance with a user’s personal preferences," applying an audio adjustment, and where a recommendation includes seeking confirmation of a suggested course of action.; Meacham, ¶ Col. 5, lines 53-56; Col. 23, lines 36-55) based on the recommended audio parameter and the recommended degree of change (Requests verification for performing the command, where the command corresponds to a set of steps to achieve the recommended degree of change for the recommended audio parameter.; Meacham, ¶ Col. 23, lines 36-55).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the natural language audio equalization systems of Stasis to incorporate the teachings of Meacham to include wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: determine one or more context parameters, a given context parameter characterizing a context in which a user consumes audio from the audio rendering system; determine a recommended audio parameter and a recommended degree of change to adjust the recommended audio parameter using a second machine learning model trained to, based on the one or more context parameters, determine the recommended audio parameter and the recommended degree of change; and generate a natural language prompt recommending that the user apply an audio adjustment to the audio rendering system based on the recommended audio parameter and the recommended degree of change. Stasis teaches the use of machine learning to map natural language onto audio parameters, but provides a largely passive interface where the user must manually recognize the need for an adjustment. Meacham teaches a context aware hearing optimization engine which cures the limitations of Stasis in this regard, by utilizing hierarchical machine learning models to actively monitor the environment, determine recommended audio adjustments, and generate conversational prompts requesting user confirmation, which improves the user experience by anticipating user needs and desires before receipt of the request, while still leaving room for user motivated and controlled modifications, as recognized by Meacham. (Meacham, ¶ Col. 2, lines 27-54).
Regarding claim 19, the rejection of claim 18 is incorporated. Claim 19 is substantially the same as claim 2 and is therefore rejected under the same rationale as above.
Claims 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Stasis as applied to claim 1 above, and further in view of Elders (U.S. Pat. No. 11,693,622, hereinafter Elders).
Regarding claim 6, the rejection of claim 1 is incorporated. Stasis discloses all of the elements of the current invention as stated above. However, Stasis fails to expressly recite wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: receive, from the audio rendering system, device information comprising an application presently used by a user to consume audio on the audio rendering system; and select, based on the application, the machine learning model from a plurality of machine learning models.
Elders teaches “a system for configurable keywords that … can execute different functions depending on the operating context of the system.” (Elders, ¶ Col. 3, lines 43-48). Regarding claim 6, Elders teaches wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: receive, from the audio rendering system, device information comprising an application (Describes the detection of the change of the application state on the device, where, in the context of multiple devices "if a speech controlled device 110a is operating at the same time as a tablet computer 110b, and the system 100 is capturing audio through speech controlled device 110a, but a first application is operating on tablet computer 110b, if a keyword is detected from audio captured by speech controlled device 110a, the function for the keyword may be determined based on the first application operating on tablet computer 110b."; Elders, ¶ Col. 12, lines 59-67) presently used by a user to consume audio on the audio rendering system (The application, in the above example, is "operating," thus is being presently used by the user, and where the application may be a "music application {used by the user to consume audio on the audio rendering system}"; Elders, ¶ Col. 4, lines 29-35; Col. 12, lines 59-67); and select, based on the application, the machine learning model from a plurality of machine learning models (Discloses running select models based on the application context, explaining that if the system "discontinues operating a first application but initiates operation of a second application, the server 120 may send the local device 110 an indication to disable detection of keyword(s) associated with the first application and an indication (which may be the same indication or a different indication) to enable detection of keyword(s) associated with the second application.; Elders, ¶ Col. 15, lines 8-15).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the natural language audio equalization systems of Stasis to incorporate the teachings of Elders to include wherein the instructions that, when executed by one or more processors, further cause the one or more processors to: receive, from the audio rendering system, device information comprising an application presently used by a user to consume audio on the audio rendering system; and select, based on the application, the machine learning model from a plurality of machine learning models. Stasis teaches a machine learning architecture capable of translating semantic natural language instructions into specific high dimensional audio equalization parameters, but is silent on using the device’s operational state to alter those calculations. Elders teaches a voice control system that monitors device information to dynamically alter the executed function of a spoken keyword, which provides the known benefit of providing the proper instruction in the context of multiple different applications, such that the user’s desired results are achieved across numerous audio platforms, as recognized by Elders. (Elders, ¶ Col. 12, lines 3-31).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lam (U.S. Pat. App. Pub. No. 2019/0385603) discloses an equalizer adjustment method and a computer readable storage medium, in particular to a method of acoustically controlling an equalizer in a natural language and a computer readable storage medium.
Steinmetz (U.S. Pat. App. Pub. No. 2023/0352058) discloses techniques for automated multitrack mixing in the waveform domain using machine-learning models or systems, and to frameworks for training such machine-learning models or systems.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Sean E Serraguard/Primary Examiner, Art Unit 2657