Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . See 35 U.S.C. § 100 (note).
Continued Examination
A request for continued examination under 37 C.F.R. § 1.114, including the fee set forth in 37 C.F.R. § 1.17(e), was filed in this Application on 23 February 2026 after Final Rejection (25 November 2025). Since this Application is eligible for continued examination under 37 C.F.R. § 1.114, and the fee set forth in 37 C.F.R. § 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 C.F.R. § 1.114. Applicant's submission filed on 13 February 2026 has been entered.
Art Rejections
Anticipation
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1–3, 8–10 and 21–24 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Marco A. Martinez Ramirez and Joshua D. Reiss, Deep Learning and Intelligent Audio Mixing, Proc. of the 3d Workshop on Intelligent Music Production (Salford, UK) (15 September 2017) (“Ramirez”).
Claim 1 is drawn to “a sound editing device.” The following table illustrates the correspondence between the claimed device and the Ramirez reference.
Claim 1
The Ramirez Reference
“1. A sound editing device comprising:
The Ramirez reference similarly describes device for stem audio mixing. Ramirez at Abstract.
“at least one processor configured to execute
Ramirez describes a system that receives input, trains an autoencoder and transforms a raw audio input a mixed stem. Id. at § 4, ¶¶ 1, 4–6. These actions inherently require the use of a signal processor.
“a first receiving unit configured to receive a first audio signal representing sounds performed by a user,
“a second receiving unit configured to receive a second audio signal,
Ramirez describes a device to receive raw audio signals performed by a user that are to be mixed, such as bass, guitar, vocal and keys. Id. at § 4, ¶¶ 1, 2, Table 1. Audio signals from any of those sources corresponds to the first and second audio signals.
“an estimation unit configured to estimate effect information that reflects an effect to be applied to the first audio signal, from the first and second audio signals, by using a trained model indicating an input-output relationship between first and second input audio signals and output effect information that reflects an effect to be applied to the first input audio signal, and
Ramirez describes an autoencoder that implements a training feature (i.e., an estimation unit) to model an input-output relationship between raw inputs (i.e., first input audio signals) and mixed stems (i.e., second input audio signals). Id. at § 4, ¶¶ 1, 5–7. Given that Ramirez uses an autoencoder, subsequent raw inputs (i.e., the first audio signal) will be encoded into the autoencoder’s model space and then decoded according to estimated weights (i.e., estimated effect information) to produce a version that approximates a professionally mixed stem version of the raw input. See id.
“an effect application unit configured to obtain the first audio signal from the first receiving unit and the effect information from the estimation unit, and
“configured to generate [[an]] a fourth audio signal by applying the effect reflected by the effect information that has been obtained from the estimation unit to the first audio signal that has been obtained from the first receiving unit.”
Ramirez’s autoencoder also implements an encoding/decoding function (i.e., an effect application unit) to receive a raw audio input (i.e., the first audio signal) and the weights produced during training (i.e., the effect information). Id. The autoencoder then applies the weights to the raw input to produce a stem (i.e., a fourth audio signal). Id.
Table 1
For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Claim 2 depends on claim 1, and further requires the following:
“wherein the effect information includes parameters for generating the first audio signal to which the effect to be applied has been applied.”
Likewise, Ramirez’s autoencoder includes trained weights (i.e., effect information) for generating a stem from a raw input. Ramirez at § 4, ¶¶ 1, 5–7. For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Claim 3 depends on claim 2, and further requires the following:
“wherein the trained model is generated by learning of output parameters as the output effect information based on the first and second input audio signals, and the output parameters are parameters for generating the first input audio signal to which the effect to be applied has been applied.”
Likewise, Ramirez’s autoencoder is trained based on based on raw inputs and corresponding stems to create weights (i.e., effect information representing output parameters, like frequency content of a corresponding stem) for generating a stem from a raw input. Ramirez at § 4, ¶¶ 1, 5–7. For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Claim 21 depends on claim 1, and further requires the following:
“wherein the sounds performed by the user includes sounds produced by a musical instrument.”
Ramirez describes receiving sounds produced by multiple users performing with multiple different types of instruments, such as a bass, guitar and keys. Ramirez at § 4, ¶ 1, Table 1. For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Claim 22 depends on claim 1, and further requires the following:
“wherein the second audio signal represents sounds performed by another user, or sounds generated around the user.”
Ramirez describes receiving sounds produced by multiple users in an ensemble performing together with multiple different types of instruments, such as a bass, guitar, vocalist and keys. Ramirez at § 4, ¶ 1, Table 1. For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Claim 8 is drawn to “a sound editing method.” The following table illustrates the correspondence between the claimed method and the Ramirez reference.
Claim 8
The Ramirez Reference
“8. A sound editing method executed by a computer, the sound editing method comprising:
The Ramirez reference similarly describes device for stem audio mixing. Ramirez at Abstract.
“receiving a first audio signal representing sounds performed by a user;
“receiving a second audio signal;
Ramirez describes a device to receive raw audio signals performed by a user that are to be mixed, such as bass, guitar, vocal and keys. Id. at § 4, ¶¶ 1, 2, Table 1. Audio signals from any of those sources corresponds to the first and second audio signals.
“estimating effect information that reflects an effect to be applied to the first audio signal, from the first and second audio signals, by using a trained model indicating an input-output relationship between first and second input audio signals and output effect information that reflects an effect to be applied to the first input audio signal;
Ramirez describes an autoencoder that implements a training feature (i.e., an estimation unit) to model an input-output relationship between raw inputs (i.e., first input audio signals) and mixed stems (i.e., second input audio signals). Id. at § 4, ¶¶ 1, 5–7. Given that Ramirez uses an autoencoder, subsequent raw inputs (i.e., the first audio signal) will be encoded into the autoencoder’s model space and then decoded according to estimated weights (i.e., estimated effect information) to produce a version that approximates a professionally mixed stem version of the raw input. See id.
“obtaining the first audio signal and the effect information, and generating [[an]] a fourth audio signal by applying the effect reflected by the effect information that has been obtained to the first audio signal that has been obtained.”
Ramirez’s autoencoder also implements an encoding/decoding function (i.e., an effect application unit) to receive a raw audio input (i.e., the first audio signal) and the weights produced during training (i.e., the effect information). Id. The autoencoder then applies the weights to the raw input to produce a stem (i.e., a fourth audio signal). Id.
Table 2
For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Claim 9 depends on claim 8, and further requires the following:
“wherein the effect information includes parameters for generating the first audio signal to which the effect to be applied has been applied.”
Likewise, Ramirez’s autoencoder includes trained weights (i.e., effect information) for generating a stem from a raw input. Ramirez at § 4, ¶¶ 1, 5–7. For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Claim 10 depends on claim 9, and further requires the following:
“wherein the trained model is generated by learning of output parameters as the output effect information based on the first and second input audio signals, and the output parameters are parameters for generating the first input audio signal to which the effect to be applied has been applied.”
Likewise, Ramirez’s autoencoder is trained based on based on raw inputs and corresponding stems to create weights (i.e., effect information representing output parameters, like frequency content of a corresponding stem) for generating a stem from a raw input. Ramirez at § 4, ¶¶ 1, 5–7. For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Claim 23 depends on claim 8, and further requires the following:
“wherein the sounds performed by the user includes sounds produced by a musical instrument.”
Ramirez describes receiving sounds produced by multiple users performing with multiple different types of instruments, such as a bass, guitar and keys. Ramirez at § 4, ¶ 1, Table 1. For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Claim 24 depends on claim 8, and further requires the following:
“wherein the second audio signal represents sounds performed by another user, or sounds generated around the user.”
Ramirez describes receiving sounds produced by multiple users in an ensemble performing together with multiple different types of instruments, such as a bass, guitar, vocalist and keys. Ramirez at § 4, ¶ 1, Table 1. For the foregoing reasons, the Ramirez reference anticipates all limitations of the claim.
Obviousness
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 15–17, 25 and 26 are rejected under 35 U.S.C. § 103 as being unpatentable over Ramirez.
Claims 6, 7, 13, 14 and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Ramirez and Victor Kitov, Real-Time Style Transfer with Strength Control, arXiv:1904.08643v1, https://arxiv.org/abs/1904.08643 (last accessed 16 April 2026) (19 April 2019) (“Kitov”).
Claim 6 depends on claim 1, and further requires the following:
“further comprising a user operable adjustment input configured to adjust a degree of the effect to be applied to the first audio signal, wherein the estimation unit is configured to estimate the effect information that reflects the effect to be applied to the first audio signal at the degree, by using the trained model.”
Claim 7 depends on claim 6, and further requires the following:
“wherein a plurality of trained models including the trained model, which correspond to degrees of the effect to be applied to the first input audio signal, are generated.”
Claims 6 and 7 are treated together. The Ramirez reference describers providing and training an autoencoder to convert an input raw audio signal into a professionally-mixed stem. Ramirez, however, does not describe any mechanism for adjusting the degree of the effect provided, and does not describe the use of multiple trained models for achieving different degrees of effect.
The Kitov reference, however, teaches and suggests that when transferring a style to an input, particularly in real-time, it is known to separately train a series of models that reflect different strengths of an effect. Kitov at Abs., § 1, ¶¶ 2–3. A user will then simply select a desired strength through an interface, which will cause a corresponding model to apply the effect at the desired strength. See id. Kitov is drawn primarily to applying a style transfer to images, but one of ordinary skill would have reasonably expected the same technique to apply equally to any type of input, including audio, since the heart of the process remains the same whether the input is audio or images: a set of models is trained for providing a desired degree of effects. See id. Further, Kitov further teaches techniques for avoiding the need to maintain multiple models. Id. While Kitov’s additional teachings might represent an improvement over the use of multiple trained models, Kitov’s further teachings do not indicate in any regard that the use of multiple trained models tied to different levels of effect strength would be inappropriate for providing a user-desired amount of style transfer strength. To the contrary, Kitov uses the multiple model approach as a baseline for comparison to show that similar results may be achieved with an alternative technique. Id. at §§ 4.1, 4.2. Kitov further recognizes that the baseline approach is superior in some respects. Id. at § 5. Accordingly, it would have been obvious for one of ordinary skill in the art at the time this Application was effectively filed, to apply Kitov’s teachings to Ramirez’s system and method in order to provide user control over the strength of mixing effects applied to raw inputs in order to produce stems. For the foregoing reasons, the combination of the Ramirez and the Kitov references makes obvious all limitations of the claims.
Claim 13 depends on claim 8, and further requires the following:
“further comprising adjusting a degree of the effect to be applied to the first audio signal, and the estimating of the effect information is performed by estimating the effect information that reflects the effect to be applied to the first audio signal at the degree, based on the trained model.”
Claim 14 depends on claim 13, and further requires the following:
“wherein a plurality of trained models including the trained model, which correspond to degrees of the effect to be applied to the first input audio signal, are generated.”
Claims 13 and 14 are treated together. The Ramirez reference describers providing and training an autoencoder to convert an input raw audio signal into a professionally-mixed stem. Ramirez, however, does not describe any mechanism for adjusting the degree of the effect provided, and does not describe the use of multiple trained models for achieving different degrees of effect.
The Kitov reference, however, teaches and suggests that when transferring a style to an input, particularly in real-time, it is known to separately train a series of models that reflect different strengths of an effect. Kitov at Abs., § 1, ¶¶ 2–3. A user will then simply select a desired strength through an interface, which will cause a corresponding model to apply the effect at the desired strength. See id. Kitov is drawn primarily to applying a style transfer to images, but one of ordinary skill would have reasonably expected the same technique to apply equally to any type of input, including audio, since the heart of the process remains the same whether the input is audio or images: a set of models is trained for providing a desired degree of effects. See id. Further, Kitov further teaches techniques for avoiding the need to maintain multiple models. Id. While Kitov’s additional teachings might represent an improvement over the use of multiple trained models, Kitov’s further teachings do not indicate in any regard that the use of multiple trained models tied to different levels of effect strength would be inappropriate for providing a user-desired amount of style transfer strength. To the contrary, Kitov uses the multiple model approach as a baseline for comparison to show that similar results may be achieved with an alternative technique. Id. at §§ 4.1, 4.2. Kitov further recognizes that the baseline approach is superior in some respects. Id. at § 5. Accordingly, it would have been obvious for one of ordinary skill in the art at the time this Application was effectively filed, to apply Kitov’s teachings to Ramirez’s system and method in order to provide user control over the strength of mixing effects applied to raw inputs in order to produce stems. For the foregoing reasons, the combination of the Ramirez and the Kitov references makes obvious all limitations of the claims.
Claim 15 is drawn to “a non-transitory computer-readable medium storing a sound editing program that causes a computer to executed a sound editing method.” The following table illustrates the correspondence between the claimed medium and the Ramirez reference.
Claim 15
The Ramirez Reference
“15. A non-transitory computer-readable medium storing a sound editing program that causes a computer to execute a sound editing method, the sound editing method comprising:
The Ramirez reference similarly describes device for stem audio mixing. Ramirez at Abstract. Ramirez describes a system that receives input, trains an autoencoder and transforms a raw audio input a mixed stem. Id. at § 4, ¶¶ 1, 4–6. These actions inherently require the use of a signal processor. However, Ramirez does not describe implementing the signal processor with a computer that executes a sound editing program stored in a non-transitory computer-readable medium.
“receiving a first audio signal representing sounds performed by a user;
“receiving a second audio signal;
Ramirez describes a device to receive raw audio signals performed by a user that are to be mixed, such as bass, guitar, vocal and keys. Id. at § 4, ¶¶ 1, 2, Table 1. Audio signals from any of those sources corresponds to the first and second audio signals.
“estimating effect information that reflects an effect to be applied to the first audio signal, from the first and second audio signals, by using a trained model indicating an input- output relationship between first and second input audio signals and output effect information that reflects an effect to be applied to the first input audio signal;
Ramirez describes an autoencoder that implements a training feature (i.e., an estimation unit) to model an input-output relationship between raw inputs (i.e., first input audio signals) and mixed stems (i.e., second input audio signals). Id. at § 4, ¶¶ 1, 5–7. Given that Ramirez uses an autoencoder, subsequent raw inputs (i.e., the first audio signal) will be encoded into the autoencoder’s model space and then decoded according to estimated weights (i.e., estimated effect information) to produce a version that approximates a professionally mixed stem version of the raw input. See id.
“obtaining the first audio signal and the effect information, and generating [[an]] a fourth audio signal by applying the effect reflected by the effect information that has been obtained to the first audio signal that has been obtained.”
Ramirez’s autoencoder also implements an encoding/decoding function (i.e., an effect application unit) to receive a raw audio input (i.e., the first audio signal) and the weights produced during training (i.e., the effect information). Id. The autoencoder then applies the weights to the raw input to produce a stem (i.e., a fourth audio signal). Id.
Table 3
The table above shows that the Ramirez reference describes a sound editing method that corresponds closely to the one claimed. However, Ramirez does not describe performing that method with a computer that executes a sound editing program stored in a non-transitory computer-readable medium.
The differences between the claimed invention and the Ramirez reference are such that the invention as a whole would have been obvious to one of ordinary skill in the art at the time this Application was effectively filed. Ramirez performs a method for editing a sound by applying an autoencoder to a set of raw audio inputs to produce audio stems. Ramirez, however, does not describe the use of a computer that executes a sound editing program stored in a non-transitory computer-readable medium. The Examiner takes Official notice of the use of a computer and a non-transitory computer-readable medium as conventional means for executing sound editing, particularly processing with trained autoencoders. It would have been a simple matter for one of ordinary skill in the art at the time this Application was effectively filed to have used such common and conventional means, already known to be used for audio editing, to implement the sound editing method described in the Ramirez reference. For the foregoing reasons, the Ramirez reference makes obvious all limitations of the claim.
Claim 16 depends on claim 15, and further requires the following:
“wherein the effect information includes parameters for generating the first audio signal to which the effect to be applied has been applied.”
Likewise, Ramirez’s autoencoder includes trained weights (i.e., effect information) for generating a stem from a raw input. Ramirez at § 4, ¶¶ 1, 5–7. For the foregoing reasons, the Ramirez reference makes obvious all limitations of the claim.
Claim 17 depends on claim 16, and further requires the following:
“wherein the trained model is generated by learning of output parameters as the output effect information based on the first and second input audio signals, and the output parameters are parameters for generating the first input audio signal to which the effect to be applied has been applied.”
Likewise, Ramirez’s autoencoder is trained based on based on raw inputs and corresponding stems to create weights (i.e., effect information representing output parameters, like frequency content of a corresponding stem) for generating a stem from a raw input. Ramirez at § 4, ¶¶ 1, 5–7. For the foregoing reasons, the Ramirez reference makes obvious all limitations of the claim.
Claim 20 depends on claim 15, and further requires the following:
“wherein the sound editing method further comprises adjusting a degree of the effect to be applied to the first audio signal, and the estimating of the effect information is performed by estimating the effect information that reflects the effect to be applied to the first audio signal at the degree, based on the trained model.”
The Ramirez reference describers providing and training an autoencoder to convert an input raw audio signal into a professionally-mixed stem. Ramirez, however, does not describe any mechanism for adjusting the degree of the effect provided.
The Kitov reference, however, teaches and suggests that when transferring a style to an input, particularly in real-time, it is known to separately train a series of models that reflect different strengths of an effect. Kitov at Abs., § 1, ¶¶ 2–3. A user will then simply select a desired strength through an interface, which will cause a corresponding model to apply the effect at the desired strength. See id. Kitov is drawn primarily to applying a style transfer to images, but one of ordinary skill would have reasonably expected the same technique to apply equally to any type of input, including audio, since the heart of the process remains the same whether the input is audio or images: a set of models is trained for providing a desired degree of effects. See id. Further, Kitov further teaches techniques for avoiding the need to maintain multiple models. Id. While Kitov’s additional teachings might represent an improvement over the use of multiple trained models, Kitov’s further teachings do not indicate in any regard that the use of multiple trained models tied to different levels of effect strength would be inappropriate for providing a user-desired amount of style transfer strength. To the contrary, Kitov uses the multiple model approach as a baseline for comparison to show that similar results may be achieved with an alternative technique. Id. at §§ 4.1, 4.2. Kitov further recognizes that the baseline approach is superior in some respects. Id. at § 5. Accordingly, it would have been obvious for one of ordinary skill in the art at the time this Application was effectively filed, to apply Kitov’s teachings to Ramirez’s system and method in order to provide user control over the strength of mixing effects applied to raw inputs in order to produce stems. For the foregoing reasons, the combination of the Ramirez and the Kitov references makes obvious all limitations of the claim.
Claim 25 depends on claim 15, and further requires the following:
“wherein the sounds performed by the user includes sounds produced by a musical instrument.”
Ramirez describes receiving sounds produced by multiple users performing with multiple different types of instruments, such as a bass, guitar and keys. Ramirez at § 4, ¶ 1, Table 1. For the foregoing reasons, the Ramirez reference makes obvious all limitations of the claim.
Claim 26 depends on claim 15, and further requires the following:
“wherein the second audio signal represents sounds performed by another user, or sounds generated around the user.”
Ramirez describes receiving sounds produced by multiple users in an ensemble performing together with multiple different types of instruments, such as a bass, guitar, vocalist and keys. Ramirez at § 4, ¶ 1, Table 1. For the foregoing reasons, the Ramirez reference makes obvious all limitations of the claim.
Summary
Claims 1–3, 6–10, 13–17 and 20–26 are rejected under at least one of 35 U.S.C. §§ 102 and 103 as being unpatentable over the cited prior art. In the event the determination of the status of the application as subject to AIA 35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA 35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
Response to Applicant’s Argument
Applicant’s Reply (13 February 2026) has substantively amended all the claims. This Office action has been updated accordingly.
Applicant’s Reply at further includes comments pertaining to the rejections presented in the previous Office action (25 November 2025). Those comments have been considered, but are moot in light ofc the new grounds of rejection introduced in this Office action.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WALTER F BRINEY III whose telephone number is (571)272-7513. The examiner can normally be reached M-F 8 am-4:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carolyn Edwards can be reached at 571-270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Walter F Briney III/
Walter F Briney IIIPrimary ExaminerArt Unit 2692
4/16/2026