DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statements (IDSs) submitted on 01/06/2023 and 10/24/2024 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Claim Objections
Claim 6 is objected to because of the following informality: In line 3, "performed" should read, "performs." Appropriate correction is required.
Claim 7 is objected to because of the following informality: In line 1, "according to claim 1" should read, "according to claim 6." Appropriate correction is required.
Claim 7 is objected to because of the following informality: In lines 1-2, "wherein further comprising" should read, "further comprising." Appropriate correction is required.
Claim 8 is objected to because of the following informality: In line 3, "performed" should read, "performs." Appropriate correction is required.
Claim 8 is objected to because of the following informality: in line 5, "melodies of and e" should read, "melodies and one channel or more." Appropriate correction is required.
Claim 9 is objected to because of the following informality: In line 4, "performed" should read, "performs." Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 6 and 8-9 are rejected under 35 U.S.C. 103 as unpatentable over Akama (US 20210358461 A1, filed 10/10/2019), hereinafter Akama, in view of Brunner et al. (MIDI-VAE: "Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer," Published 09/20/2018, retrieved from Instant Application file wrapper), hereinafter Brunner.
Regarding claim 6, Akama teaches a music processing system (Akama ¶0220: "Information devices such as the information processing apparatus 100 according to each embodiment described above are implemented by, for example, a computer 1000"), comprising: musical-piece generating means which generates a musical piece (Akama ¶0042: "The learned model according to the embodiment has an encoder that extracts a feature quantity from data constituting content, and a decoder that reconstitutes the content from the extracted feature quantity.") by using a learning model which performs machine learning (Akama ¶0042: "For example, the information processing apparatus 100 learns an encoder by unsupervised learning such as a variational auto encoder (VAE) and generative adversarial networks (GANs).") on the basis of input data including musical piece data in which a musical score of a musical piece (Akama ¶0049: "First, the information processing apparatus 100 acquires a song 30 as an example of the learning data (step S1).") constituted by one channel or more of melodies (Akama ¶0049: "The song 30 is constituted by, for example, a symbol string (digital data) indicating a pitch, a sound length, and a rest.") and one channel or more of chords is described (Akama ¶0049: "Further, the data indicating the song 30 may include information such as… a chord at certain timing.") and configuration information indicating attributes of elements constituting the musical piece of the musical piece data (Akama ¶0092: "The 'pitch information' indicates information on a pitch (scale) of a sound included in the partial data. The 'sound length rest information' indicates a length of sound (reproduction time or reproduced beat) included in the partial data or a length or timing of the rest. The 'chord information' indicates a type of chords included in the partial data, the constituent sound of the chord, the switching of the chords in the bar, and the like. The 'rhythm information' indicates a beat or a tempo of a bar, a position of a strong beat, a position of a weak beat, and the like."); and the musical-piece generating means has a decoder (Akama ¶0067: "The decoder 60 is a decoder that is learned to reconstitute the content based on the feature quantity extracted by the encoder. In the example of FIG. 1, the decoder 60 outputs the data x2.") which outputs output data in the same format as that of input data (Akama ¶0068: "The data x2 has the same format as the data x1 that is the data of the first content. That is, the data x2 may mean data (symbol string) for reproducing the song 35 having the same format as the song 30."); the musical-piece generating means accepts an input of an operation parameter for operating a nature of a musical piece to be generated together with the input data (Akama ¶0155: "For example, the information processing apparatus 100 can change the song 65 to images of the entire song illustrated in the graph 64 according to the user's request. As described above, the information processing apparatus 100 can generate new content so as to adjust a blend ratio of the feature quantity."); and the latent-variable (Akama ¶0045: "a feature quantity vector (in other words, a latent space indicating the feature quantity)") processing means causes a noise according to the operation parameter to be mixed in the latent variable (Akama ¶0156: "That is, the information processing apparatus 100 can generate new content or variations not only by fixing a rhythm or a degree of modulation of a song, a scale, and the like, but also by controlling the degree of change. As a specific method, the variation of the feature quantity can be generated by obtaining two noises and adding each noise to the two feature quantities z1 and z2. At this time, when the noise is scaled, the degree of variation can be controlled for each of the two feature quantities z1 and z2. For example, when there are two methods for obtaining noise, there is a method for (1) obtaining noise from a certain fixed distribution such as a normal distribution, and a method for (2) learning an encoder using VAE and using noise output from the encoder. In addition, the information processing apparatus 100 can perform a flexible generation process, such as generating new content by exchanging features of certain two songs.").
Akama does not explicitly disclose that the musical-piece generating means has: an encoder which outputs an average vector and a distribution vector of a latent variable corresponding to input data by using the learning model on the basis of input data; latent-variable processing means which generates a latent variable by processing the average vector and the distribution vector; and a decoder which outputs output data according to the latent variable generated by the latent-variable processing means by using the learning model; and the latent-variable processing means causes a noise according to a combination of the distribution vector and the operation parameter to be mixed in the latent variable.
However, Brunner suggests that the musical-piece generating means has: an encoder (Brunner § 3.2: "A VAE consists of an encoder qθ(z|x), a decoder pφ(x|z) and a latent variable z, where q and p are usually implemented as neural networks parameterized by θ and φ.") which outputs an average vector and a distribution vector of a latent variable corresponding to input data by using the learning model on the basis of input data (Brunner § 3.2: "The output of the three encoders is concatenated and passed through several fully connected layers, which then predict σz and µz, the parameters of the approximate posterior qθ(z|x) = N (µz,σz)," where µz and σz correspond to average and distribution vectors, respectively.); latent-variable processing means which generates a latent variable by processing the average vector and the distribution vector (Brunner § 3.2 describes obtaining a latent variable (z) by reparameterization of µz and σz with injected noise Ɛ and hyperparameter σƐ: "Using the reparameterization trick [20], a latent vector z is sampled from this distribution as z ∼ N (µz,σz ∗ Ɛ) where ∗ stands for element-wise multiplication… Ɛ is sampled from an isotropic Gaussian distribution N (0, σƐ ∗ I), where we treat σƐ as a hyperparameter (see Section 4.2 for more details). This shared latent vector is then fed into three parallel fully connected layers, from which the three decoders try to reconstruct the pitch, velocity and instrument rolls."); and a decoder (Brunner § 3.2: "A VAE consists of… a decoder pφ(x|z)") which outputs output data according to the latent variable generated by the latent-variable processing means by using the learning model (Brunner § 3.2: "This shared latent vector is then fed into three parallel fully connected layers, from which the three decoders try to reconstruct the pitch, velocity and instrument rolls."); and the latent-variable processing means causes a noise according to a combination of the distribution vector and the operation parameter to be mixed in the latent variable (Brunner § 3.2 describes obtaining a latent variable (z) by reparameterization of µz and σz with injected noise Ɛ and hyperparameter σƐ: "Using the reparameterization trick [20], a latent vector z is sampled from this distribution as z ∼ N (µz,σz ∗ Ɛ) where ∗ stands for element-wise multiplication… Ɛ is sampled from an isotropic Gaussian distribution N (0, σƐ ∗ I), where we treat σƐ as a hyperparameter (see Section 4.2 for more details). This shared latent vector is then fed into three parallel fully connected layers, from which the three decoders try to reconstruct the pitch, velocity and instrument rolls.").
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the music processing system of Akama by adding the average vector and distribution vector processing of Brunner to more effectively perform a neural style transfer to complete musical compositions (Brunner abstract).
Regarding claim 8, Akama teaches a music processing program (Akama ¶0221: "The CPU 1100 is operated based on a program stored in the ROM 1300 or the HDD 1400, and controls each unit.") characterized by causing a computer to function (Akama ¶0220: "Information devices such as the information processing apparatus 100 according to each embodiment described above are implemented by, for example, a computer 1000") as: musical-piece generating means which generates a musical piece (Akama ¶0042: "The learned model according to the embodiment has an encoder that extracts a feature quantity from data constituting content, and a decoder that reconstitutes the content from the extracted feature quantity.") by using a learning model which performs machine learning (Akama ¶0042: "For example, the information processing apparatus 100 learns an encoder by unsupervised learning such as a variational auto encoder (VAE) and generative adversarial networks (GANs).") on the basis of input data including musical piece data in which a musical score of a musical piece (Akama ¶0049: "First, the information processing apparatus 100 acquires a song 30 as an example of the learning data (step S1).") constituted by one channel or more of melodies (Akama ¶0049: "The song 30 is constituted by, for example, a symbol string (digital data) indicating a pitch, a sound length, and a rest.") and one channel or more of chords is described (Akama ¶0049: "Further, the data indicating the song 30 may include information such as… a chord at certain timing."); and the musical-piece generating means has a decoder (Akama ¶0067: "The decoder 60 is a decoder that is learned to reconstitute the content based on the feature quantity extracted by the encoder. In the example of FIG. 1, the decoder 60 outputs the data x2.") which outputs output data in the same format as that of input data (Akama ¶0068: "The data x2 has the same format as the data x1 that is the data of the first content. That is, the data x2 may mean data (symbol string) for reproducing the song 35 having the same format as the song 30."); the musical-piece generating means accepts an input of an operation parameter for operating a nature of a musical piece to be generated together with the input data (Akama ¶0155: "For example, the information processing apparatus 100 can change the song 65 to images of the entire song illustrated in the graph 64 according to the user's request. As described above, the information processing apparatus 100 can generate new content so as to adjust a blend ratio of the feature quantity."); and the latent-variable (Akama ¶0045: "a feature quantity vector (in other words, a latent space indicating the feature quantity)") processing means causes a noise according to the operation parameter to be mixed in the latent variable (Akama ¶0156: "That is, the information processing apparatus 100 can generate new content or variations not only by fixing a rhythm or a degree of modulation of a song, a scale, and the like, but also by controlling the degree of change. As a specific method, the variation of the feature quantity can be generated by obtaining two noises and adding each noise to the two feature quantities z1 and z2. At this time, when the noise is scaled, the degree of variation can be controlled for each of the two feature quantities z1 and z2. For example, when there are two methods for obtaining noise, there is a method for (1) obtaining noise from a certain fixed distribution such as a normal distribution, and a method for (2) learning an encoder using VAE and using noise output from the encoder. In addition, the information processing apparatus 100 can perform a flexible generation process, such as generating new content by exchanging features of certain two songs.").
Akama does not explicitly disclose that the musical-piece generating means has: an encoder which outputs an average vector and a distribution vector of a latent variable corresponding to input data by using the learning model on the basis of input data; latent-variable processing means which generates a latent variable by processing the average vector and the distribution vector; and a decoder which outputs output data according to the latent variable generated by the latent-variable processing means by using the learning model; and the latent-variable processing means causes a noise according to a combination of the distribution vector and the operation parameter to be mixed in the latent variable.
However, Brunner suggests that the musical-piece generating means has: an encoder (Brunner § 3.2: "A VAE consists of an encoder qθ(z|x), a decoder pφ(x|z) and a latent variable z, where q and p are usually implemented as neural networks parameterized by θ and φ.") which outputs an average vector and a distribution vector of a latent variable corresponding to input data by using the learning model on the basis of input data (Brunner § 3.2: "The output of the three encoders is concatenated and passed through several fully connected layers, which then predict σz and µz, the parameters of the approximate posterior qθ(z|x) = N (µz,σz)," where µz and σz correspond to average and distribution vectors, respectively.); latent-variable processing means which generates a latent variable by processing the average vector and the distribution vector (Brunner § 3.2 describes obtaining a latent variable (z) by reparameterization of µz and σz with injected noise Ɛ and hyperparameter σƐ: "Using the reparameterization trick [20], a latent vector z is sampled from this distribution as z ∼ N (µz,σz ∗ Ɛ) where ∗ stands for element-wise multiplication… Ɛ is sampled from an isotropic Gaussian distribution N (0, σƐ ∗ I), where we treat σƐ as a hyperparameter (see Section 4.2 for more details). This shared latent vector is then fed into three parallel fully connected layers, from which the three decoders try to reconstruct the pitch, velocity and instrument rolls."); and a decoder (Brunner § 3.2: "A VAE consists of… a decoder pφ(x|z)") which outputs output data according to the latent variable generated by the latent-variable processing means by using the learning model (Brunner § 3.2: "This shared latent vector is then fed into three parallel fully connected layers, from which the three decoders try to reconstruct the pitch, velocity and instrument rolls."); and the latent-variable processing means causes a noise according to a combination of the distribution vector and the operation parameter to be mixed in the latent variable (Brunner § 3.2 describes obtaining a latent variable (z) by reparameterization of µz and σz with injected noise Ɛ and hyperparameter σƐ: "Using the reparameterization trick [20], a latent vector z is sampled from this distribution as z ∼ N (µz,σz ∗ Ɛ) where ∗ stands for element-wise multiplication… Ɛ is sampled from an isotropic Gaussian distribution N (0, σƐ ∗ I), where we treat σƐ as a hyperparameter (see Section 4.2 for more details). This shared latent vector is then fed into three parallel fully connected layers, from which the three decoders try to reconstruct the pitch, velocity and instrument rolls.").
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the music processing program of Akama by adding the average vector and distribution vector processing of Brunner to more effectively perform a neural style transfer to complete musical compositions (Brunner abstract).
Regarding claim 9, Akama teaches a music processing method performed by a music processing system (Akama ¶0220: "Information devices such as the information processing apparatus 100 according to each embodiment described above are implemented by, for example, a computer 1000"), characterized in that the music processing system includes musical-piece generating means (Akama ¶0042: "The learned model according to the embodiment has an encoder that extracts a feature quantity from data constituting content, and a decoder that reconstitutes the content from the extracted feature quantity."); the musical-piece generating means generates a musical piece (Akama ¶0042: "The learned model according to the embodiment has an encoder that extracts a feature quantity from data constituting content, and a decoder that reconstitutes the content from the extracted feature quantity.") by using a learning model which performs machine learning (Akama ¶0042: "For example, the information processing apparatus 100 learns an encoder by unsupervised learning such as a variational auto encoder (VAE) and generative adversarial networks (GANs).") on the basis of learning data having musical piece data for learning in which a musical score of a musical piece (Akama ¶0049: "First, the information processing apparatus 100 acquires a song 30 as an example of the learning data (step S1).") constituted by one channel or more of melodies (Akama ¶0049: "The song 30 is constituted by, for example, a symbol string (digital data) indicating a pitch, a sound length, and a rest.") and one channel or more of chords is described (Akama ¶0049: "Further, the data indicating the song 30 may include information such as… a chord at certain timing."); and the musical-piece generating means has a decoder (Akama ¶0067: "The decoder 60 is a decoder that is learned to reconstitute the content based on the feature quantity extracted by the encoder. In the example of FIG. 1, the decoder 60 outputs the data x2.") which outputs output data in the same format as that of input data (Akama ¶0068: "The data x2 has the same format as the data x1 that is the data of the first content. That is, the data x2 may mean data (symbol string) for reproducing the song 35 having the same format as the song 30."); the musical-piece generating means accepts an input of an operation parameter for operating a nature of a musical piece to be generated together with the input data (Akama ¶0155: "For example, the information processing apparatus 100 can change the song 65 to images of the entire song illustrated in the graph 64 according to the user's request. As described above, the information processing apparatus 100 can generate new content so as to adjust a blend ratio of the feature quantity."); and the latent-variable (Akama ¶0045: "a feature quantity vector (in other words, a latent space indicating the feature quantity)") processing means causes a noise according to the operation parameter to be mixed in the latent variable (Akama ¶0156: "That is, the information processing apparatus 100 can generate new content or variations not only by fixing a rhythm or a degree of modulation of a song, a scale, and the like, but also by controlling the degree of change. As a specific method, the variation of the feature quantity can be generated by obtaining two noises and adding each noise to the two feature quantities z1 and z2. At this time, when the noise is scaled, the degree of variation can be controlled for each of the two feature quantities z1 and z2. For example, when there are two methods for obtaining noise, there is a method for (1) obtaining noise from a certain fixed distribution such as a normal distribution, and a method for (2) learning an encoder using VAE and using noise output from the encoder. In addition, the information processing apparatus 100 can perform a flexible generation process, such as generating new content by exchanging features of certain two songs.").
Akama does not explicitly disclose that the musical-piece generating means has: an encoder which outputs an average vector and a distribution vector of a latent variable corresponding to input data by using the learning model on the basis of input data; latent-variable processing means which generates a latent variable by processing the average vector and the distribution vector; and a decoder which outputs output data according to the latent variable generated by the latent-variable processing means by using the learning model; and the latent-variable processing means causes a noise according to a combination of the distribution vector and the operation parameter to be mixed in the latent variable.
However, Brunner suggests that the musical-piece generating means has: an encoder (Brunner § 3.2: "A VAE consists of an encoder qθ(z|x), a decoder pφ(x|z) and a latent variable z, where q and p are usually implemented as neural networks parameterized by θ and φ.") which outputs an average vector and a distribution vector of a latent variable corresponding to input data by using the learning model on the basis of input data (Brunner § 3.2: "The output of the three encoders is concatenated and passed through several fully connected layers, which then predict σz and µz, the parameters of the approximate posterior qθ(z|x) = N (µz,σz)," where µz and σz correspond to average and distribution vectors, respectively.); latent-variable processing means which generates a latent variable by processing the average vector and the distribution vector (Brunner § 3.2 describes obtaining a latent variable (z) by reparameterization of µz and σz with injected noise Ɛ and hyperparameter σƐ: "Using the reparameterization trick [20], a latent vector z is sampled from this distribution as z ∼ N (µz,σz ∗ Ɛ) where ∗ stands for element-wise multiplication… Ɛ is sampled from an isotropic Gaussian distribution N (0, σƐ ∗ I), where we treat σƐ as a hyperparameter (see Section 4.2 for more details). This shared latent vector is then fed into three parallel fully connected layers, from which the three decoders try to reconstruct the pitch, velocity and instrument rolls."); and a decoder (Brunner § 3.2: "A VAE consists of… a decoder pφ(x|z)") which outputs output data according to the latent variable generated by the latent-variable processing means by using the learning model (Brunner § 3.2: "This shared latent vector is then fed into three parallel fully connected layers, from which the three decoders try to reconstruct the pitch, velocity and instrument rolls."); and the latent-variable processing means causes a noise according to a combination of the distribution vector and the operation parameter to be mixed in the latent variable (Brunner § 3.2 describes obtaining a latent variable (z) by reparameterization of µz and σz with injected noise Ɛ and hyperparameter σƐ: "Using the reparameterization trick [20], a latent vector z is sampled from this distribution as z ∼ N (µz,σz ∗ Ɛ) where ∗ stands for element-wise multiplication… Ɛ is sampled from an isotropic Gaussian distribution N (0, σƐ ∗ I), where we treat σƐ as a hyperparameter (see Section 4.2 for more details). This shared latent vector is then fed into three parallel fully connected layers, from which the three decoders try to reconstruct the pitch, velocity and instrument rolls.").
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the music processing method of Akama by adding the average vector and distribution vector processing of Brunner to more effectively perform a neural style transfer to complete musical compositions (Brunner abstract).
Claim 7 is rejected under 35 U.S.C. 103 as unpatentable over Akama in view of Brunner, and further in view of Aoki et al. (JP 2002202779 A, 07/19/2002), hereinafter Aoki.
Regarding claim 7, Akama (in view of Brunner) teaches a music processing system comprising the features of claim 6.
Akama (in view of Brunner) does not explicitly disclose shaping means which shapes the generated musical piece generated by the musical-piece generating means to a musically harmonized content.
However, Aoki suggests shaping means which shapes the generated musical piece generated by the musical-piece generating means (Aoki ¶0009: "the generated melody is evaluated, and the melody of the melody is evaluated based on the evaluation result. It is characterized by comprising melody correcting means for appropriately correcting the rhythm or the pitch.") to a musically harmonized content (Aoki ¶0008: "the melody generating means corrects the pitch obtained by the pitch calculating means to a note on a scale. It is characterized by comprising high-correction means.").
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the music processing system of Akama (as modified by Brunner) by adding the shaping of Aoki to automatically correct a generated melody (Aoki ¶0004).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PHILIP SCOLES whose telephone number is (703)756-1831. The examiner can normally be reached Monday-Friday 8:30-4:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Dedei Hammond can be reached on 571-270-7938. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PHILIP G SCOLES/
Examiner, Art Unit 2837
/JEFFREY DONELS/Primary Examiner, Art Unit 2837