DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the claims filed 7/12/2023.
Claims 1-15 are presented for examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted 7/12/2023 has been considered by the examiner.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
The following title is suggested: “VARIATIONAL AUTOENCODER TRAINING METHOD USING BIMODAL NOISE SAMPLING".
Reasoning: current title is overly generic and fails to describe the invention's contribution.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Independent claim 1, in line 9 of the claim limitations, recites “calculating the latent variable”. It is unclear what the scope of “the latent variable” is as there is a lack of antecedent basis for this limitations and it is unclear if/how this “latent variable” is related to the calculated average using “latent variables”. For the purposes of examination, this limitation is interpreted as “calculating a latent variable”.
Dependent claims 2-5 do not cure the deficiencies of base claim 1 and thus claims 2-5 are also rejected under 35 U.S.C. 112(b) for at least being dependent on the rejected base claim 1.
Independent claims 6 and 11, recite substantially the same limitations as claim 1 and recite the same unclear limitations “calculating the latent variable” and the unclear limitations’ deficiency is not cured within the dependent claims 7-10 nor dependent claims 12-15. Thus claims 6-15 are also rejected under 35 U.S.C. 112(b) are rejected for the same reasons as claims 1-5 and the deficient claim limitations within independent claims 6 and 11 are interpreted for the purposes of examination in the same way as independent claim 1.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 6-8 and 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Kingma et al. (hereinafter Kingma), “Auto-Encoding Variational Bayes” (2013) pages 1-9, in view of Peterson, “Learning with Differentiable Algorithms” (September 1, 2022) pages 1-111.
Regarding independent claim 1, Kingma teaches a non-transitory computer-readable recording medium storing a machine learning program causing a computer to execute a process comprising (Kingma page 7 Figure 1 caption “…Computation took around 20 minutes per million training samples with a dated quad-core Xeon CPU” describes performing computations on a Xeon CPU to execute (causing a computer to execute a process comprising) the described training algorithm, e.g. page 4 Algorithm 1 Pseudocode, (a machine learning program) which necessarily means the instructions must be stored on a computer readable medium, e.g. RAM, hard disk (a non-transitory computer-readable recording medium storing)):
calculating an average of latent variables by inputting input data to an encoder (Kingma page 2 last paragraph describes a variational encoder or recognition model where an encoder qϕ(z∣x) maps input data x to latent variables (by inputting input data to an encoder), pages 5-6 Section 3 Example describing using a neural network encoder to calculate the parameters of the latent distribution, including the mean μ (calculating an average) and standard deviation σ from input data x in Eq. 10 and "Let the mean μ(i) and variance σ(i) ... be the following nonlinear function of x ... μ=W4h+b4" (of latent variables));
sampling a noise, based on a probability distribution of the noise (Kingma Section 2.3 Eq. 3 and "z = gϕ(ϵ,x) with ϵ∼p(ϵ)" teaching sampling a noise ϵ from a probability distribution p(ϵ) to perform the "reparameterization trick");
calculating the latent variable by adding the noise to the average (interpreted as calculating a latent variable by adding the noise to the average per the 35 U.S.C. 112(b) rejection set forth above and is taught by Kingma pages 5-6 Section 3 Equation 9 “…our estimator of the lower bound is... z(i,l)=μ(i)+σ(i)⊙ϵ(l)" describing the reparameterization trick where the latent variable z is calculated by adding the scaled noise to the average: z=μ+σ⊙ϵ); calculating output data by inputting the calculated latent variable to a decoder (Kingma pages 5-6 Section 3 Equation 7 “…"Let pθ(x∣z) (the decoder) be a multivariate Bernoulli whose probabilities are computed from z..." describing a decoder pθ(x∣z) that takes the latent variable z as input to produce output data); and training the encoder and the decoder in accordance with a loss function (Kingman pages 3-4 suggests with training a model, e.g. with the encoder qϕ(z∣x) and the decoder pθ(x∣z), by optimizing a variational lower bound (loss function)), the loss function including encoding information and an error between the input data and the output data, the encoding information being information of a probability distribution of the calculated latent variable and a prior distribution of the latent variable (Kingman pages 3-4 describes the loss including a KL divergence term: "D_KL(qϕ(z∣x(i))∣∣pθ(z∣x(i)))" or regularization terms "log qϕ(z∣x)" and "log pθ(z)" (the loss function including the encoding information). This corresponds to information of the probability distribution of the calculated latent variable z~qϕ and a prior distribution pθ (the encoding information being information of a probability distribution of the calculated latent variable and a prior distribution of the latent variable) wherein the first part (log pθ(x(i)∣z(i,l))) can be interpreted as the negative reconstruction error corresponding to the error between input and output data (and an error between the input data and the output data)).
Kingma does not expressly teach in which a probability is decreased as the probability approaches to a center of the probability distribution from a predetermined position in the probability distribution.
However, Petersen teaches a method for modifying probability distributions used in differentiable algorithms (pages 36-37 Section 3.3.2 "Activation Replacement Trick (ART)") and transforming a standard unimodal Gaussian distribution into a bimodal distribution to address issues with values close to a zero value center (pages 36-37 Section 3.3.2 "Comparing values with large differences causes vanishing gradients, while comparing values with very small differences can modify, i.e., blur, values... Thus, it is desirable to avoid ∣aj−ai∣≈0". To solve this, Petersen proposes a transformation ϕ that converts the distribution into one where the probability density is low around 0 at the center and peaks elsewhere. Figure 3.4 illustrates a probability density function p(ϕ(x)) where the probability decreases as it approaches the center from a peak position, e.g. near -1 or 1).
Because Kingma and Petersen address the issue of operations within the domain of gradient-based optimization of probabilistic models, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings a method for modifying probability distributions used in differentiable algorithms and transforming a standard unimodal Gaussian distribution into a bimodal distribution to address issues with values close to a zero value center as suggested by Petersen to modify the noise distribution p(ϵ) of Kingma using the bimodal distribution teachings of Petersen to teach in which a probability is decreased as the probability approaches to a center of the probability distribution from a predetermined position in the probability distribution. This modification would have been motivated by the desire to so solve the problem of vanishing gradients associated with using a standard Gaussian distribution in a differentiable framework and improve training stability by pushing sampled noise values away from the uninformative center (Petersen pages 36-37 Section 3.3.2).
Regarding dependent claim 2, Kingma, in view of Petersen, teach the non-transitory computer-readable recording medium according to claim 1, wherein in the sampling (see Kingma Section 2.3 Eq. 3 and "z = gϕ(ϵ,x) with ϵ∼p(ϵ)" teaching sampling a noise ϵ from a probability distribution p(ϵ) to perform the "reparameterization trick"), the noise is sampled based on a bimodal distribution of an origin target (see Petersen Section 3.3.2 teaching modifying the probability distribution of Kingma’s sampling of noise to solve vanishing by transforming “a unimodal Gaussian distribution into a bimodal distribution” to avoid the center (based on a bimodal distribution of an origin target)).
Regarding dependent claim 3, Kingma, in view of Petersen, teach the non-transitory computer-readable recording medium according to claim 2, wherein in the sampling, the noise is sampled (see Kingma Section 2.3 Eq. 3 and "z = gϕ(ϵ,x) with ϵ∼p(ϵ)" teaching sampling a noise ϵ from a probability distribution p(ϵ) to perform the "reparameterization trick") based on a bimodal mixed normal distribution of an origin target (see Petersen Section 3.3.2 teaches starting with a standard normal, e.g. Gaussian, distribution and applying a transformation as shown in Figure 3.4 to split it into two peaks, creating a bimodal distribution which is functionally identical and it would have been obvious to one of ordinary skill in the art to approximate Petersen’s “transformed normal” bimodal shape using a “mixed normal” distribution representing bimodal data. This would have been motivated by the desire to push probability mass away from the center zero to improve gradient flow).
Regarding claims 6-8, these are machine learning method claims that are substantially the same as the non-transitory computer-readable recording medium claims of claims 1-3, respectively. Thus, claims 6-8 are rejected for the same reasons as claims 1-3.
Regarding claims 11-13, these are information processing apparatus claims that are substantially the same as the non-transitory computer-readable recording medium claims of claims 1-3, respectively. Thus, claims 11-13 are rejected for the same reasons as claims 1-3. In addition, Kingma teaches an information processing apparatus comprising: a memory; and a processor coupled to the memory and configured to (Kingma page 7 Figure 1 caption “…Computation took around 20 minutes per million training samples with a dated quad-core Xeon CPU” describes performing computations on a Xeon CPU to execute the described training algorithm which necessarily means the instructions must be stored on a computer readable medium, e.g. RAM, hard disk, and further suggests a computer system that contains the computer readable medium, the Xeon CPU, wherein the Xeon CPU coupled to the memory to read/execute the described training algorithm).
Claims 4, 9 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Kingma in view of Peterson, as applied in the rejections of claims 1, 6, and 11 above, and further in view of x.
Regarding dependent claim 4, Kingma, in view of Petersen, teach the non-transitory computer-readable recording medium according to claim 2, wherein in the sampling, the noise is sampled (see Kingma Section 2.3 Eq. 3 and "z = gϕ(ϵ,x) with ϵ∼p(ϵ)" teaching sampling a noise ϵ from a probability distribution p(ϵ) to perform the "reparameterization trick")
Kingma and Petersen do not expressly teach sampled based on a bimodal rectangular distribution of an origin target.
However, Diaz teaches sampling based on a bimodal rectangular distribution of an origin target (Section IV. Comparison of Models A. Underlying Distributions discuss underlying probability density functions, pdfs, for autoassociative mappings (sampling based on) include bimodal uniform distribution (a bimodal rectangular distribution) with -1 <= x <= 1 that includes origin (of an origin target)).
Because Kingma, in view of Petersen, and Diaz address sampling probability distribution functions that comprise an origin target at 0, accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of mapping noise based on bimodal uniform distribution with -1 <= x <= 1 that includes origin as suggested by Diaz into Kingma and Petersen’s non-transitory computer-readable medium stored computer executable process to teach the noise is sampled based on a bimodal rectangular distribution of an origin target. This modification would have been motivated by the desire to affect the geometrical information on the support of the data (Diaz Section IV. Comparison of Models A. Underlying Distributions).
Regarding claim 9, this is a machine learning method claim that is substantially the same as the non-transitory computer-readable recording medium of claim 4. Thus, claim 9 is rejected for the same reason as claim 4.
Regarding claim 14, this is an information processing apparatus claim that is substantially the same as the non-transitory computer-readable recording medium of claim 4. Thus, claim 14 is rejected for the same reason as claim 4.
Claims 5, 10 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Kingma in view of Peterson, as applied in the rejections of claims 1, 6, and 11 above, and further in view of CHIEN et al. (hereinafter CHIEN), “SYNTHETIC BUSINESS MICRODATA: AN AUSTRALIAN EXAMPLE” (2020) pages 1-21.
Regarding dependent claim 5, Kingma, in view of Petersen, teach the non-transitory computer-readable recording medium according to claim 2, wherein in the sampling, the noise is sampled (see Kingma Section 2.3 Eq. 3 and "z = gϕ(ϵ,x) with ϵ∼p(ϵ)" teaching sampling a noise ϵ from a probability distribution p(ϵ) to perform the "reparameterization trick").
Kingma and Petersen do not expressly teach noise is sampled based on a bimodal triangular distribution of an origin target.
However, CHIEN teaches noise is sampled based on a bimodal triangular distribution of an origin target (page 4 “The perturbation is added using e = X⊺(y − X⊺𝛼)u, where noise u is generated independently from the symmetric bimodal triangular distribution with modal points at −1 and 1” implying center is 0 with the probability decreasing as it approaches the center 0 from the predetermined positions of modes at -1 and 1, creating a valley at the origin).
Because Kingma, in view of Petersen, and CHIEN address sampling noise based on a probability distribution of the noise in which a probability is decreased as the probability approaches a center, accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of generating noise from a symmetric bimodal triangular distribution as suggested by CHIEN into Kingma and Petersen’s non-transitory computer-readable medium stored computer executable process to teach the noise is sampled based on a bimodal triangular distribution of an origin target. This modification would have been motivated by the desire to provide the choice of distribution to minimize bias in the model estimation (CHIEN page 4).
Regarding claim 10, this is a machine learning method claim that is substantially the same as the non-transitory computer-readable recording medium of claim 5. Thus, claim 10 is rejected for the same reason as claim 5.
Regarding claim 15, this is an information processing apparatus claim that is substantially the same as the non-transitory computer-readable recording medium of claim 5. Thus, claim 15 is rejected for the same reason as claim 5.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KUANG FU CHEN whose telephone number is (571)272-1393. The examiner can normally be reached M-F 9:00-5:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached on (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KC CHEN/Primary Patent Examiner, Art Unit 2143