DETAILED ACTION
Claims 1 – 20 are pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 31 May 2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the Examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19 and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception without significantly more.
Independent claims 1, 16 and 17 recite the limitations of randomly initialising a backbone model parameter and a classification head parameter, applying gradient descent to an unsupervised loss with respect to the initialised backbone model parameter and updating the initialised backbone model parameter, applying gradient descent to a supervised loss and updating the initialised classification head parameter, and then facilitating the deployment of the updated model and the updated classification head parameter.
Nothing in the claims preclude the claimed technique from being performed as mathematical algorithms. The entire process involves data initialisation of both the backbone model and classification head parameters, data transformation through the application of the gradient descent in both scenarios, and then the application of the obtained data from the gradient descent processes. The application of the gradient descent presents mathematical processes, and then, facilitating the deployment of the updated parameters is simply the use of the parameters in a particular setting which a human may mentally perform. The claims hereby recite mathematical algorithms and a mental process.
This judicial exception is not integrated into a practical application as the claims simply teach of initialising data, transforming data though calculations, and then applying the data to an intended process. While claims 16 and 17 make mention of a computer program product, a memory and a processor, these are recited in generic terms.
The invention is not tied to any particular defining structure and simply provides instructions to apply the judicial exception. The technique can be performed by a generic computer which would be presented as a tool to implement the abstract idea (classifiable as automation of the mental process steps). The Specification in [00139] provides a computer in several forms that are suitable to read upon the limitations of this claims. The computer is recited at a high level of generality that it amounts to no more than mere instructions to apply the exception using a generic computer. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the invention is not tied to a practical application.
The claims provide techniques that amount to no more than mere instructions that apply the judicial exception which can be performed by a generic device. While the claims make mention of a memory and a processor, the claims do not recite specifics on the use of these computer parts, nor of any step that introduces significantly more context to the claims, and therefore still do not amount to significantly more than the mentioned judicial exception. Mere instructions to apply an exception using a generic device cannot provide an inventive concept. Claims 1, 16 and 17 are not eligible.
Claims 2 and 18 recite purely mathematical algorithms. These do not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claims 3 and 19 provide performing pre-training which also qualifies as a mathematical algorithm. These do not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 4 provides a particular form of performing a mathematical algorithm. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claims 5 and 20 provide performing fine-tuning training which also qualifies as a mathematical algorithm. These do not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 6 provides a particular form of performing a fine-tuning mathematical algorithm. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 7 provides particular types of loss functions that are to be applied. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 8 provides the application of the fine-tuning to a particular task, to be applied in a particular way. This is a mental process of arrangement by providing steps to be performed. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 9 provides a mathematical algorithm. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 10 provides a mathematical algorithm. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 11 provides a mathematical algorithm. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 12 provides a mathematical algorithm. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim 13 simply provides performing inferencing using the model parameters. A human may apply the available model parameters to infer or test a random applicable model. This does not integrate any practical application nor does it provide any additional element sufficient to amount to more than the mentioned judicial exception.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 and 15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Saif, A. F. M., et al. (“Joint unsupervised and supervised training for automatic speech recognition via bilevel optimization.” ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024: hereafter — Saif).
For claim 1, Saif discloses a computer-implemented method comprising:
randomly initializing a backbone model parameter and a classification head parameter (Saif: Page 2 par 2 Algorithm 1; 3.1 — randomly initialising
θ
1
and
φ
1
which are the backbone model and classification hear parameters);
applying a gradient descent to a lower-level unsupervised loss with respect to the initialized backbone model parameter and updating the initialized backbone model parameter (Saif: Page 2 2.2 — ‘[f]or unsupervised training, we use the InfoNCE loss to learn a good representation of the input speech from unlabeled data’; Page 2 Col 1 — ‘lower-level unsupervised training problem serves as the constraint for the backbone model parameters
θ
’; Page 2 Col 2 3.1 — applying gradient descent to
θ
(the backbone model) so as to update it and get
θ
k
+
1
);
applying a gradient descent to a higher-level supervised loss and updating the initialized classification head parameter (Saif: Page 2 2.2 — ‘[f]or supervised training, the Connectionist Temporal Classification (CTC) loss is used’; Page 2 Col 2 Eq (7) — applying a gradient descent to update the classification head parameter and get
φ
k
+
1
); and
facilitating deployment of the updated backbone model parameter and the updated classification head parameter (Saif: Page 3 4. — ‘we carry out experiments on ASR tasks to show the effectiveness of BL-JUST’ (a deployment of the provided process)).
For claim 2, claim 1 is incorporated and the reference of Saif discloses the computer-implemented method of claim 1, wherein the applying the gradient descent to the lower-level unsupervised loss further comprises updating the initialized backbone model parameter based on:
θ
k
+
1
=
θ
k
-
α
∇
θ
L
s
u
p
e
r
v
i
s
e
d
∅
k
,
θ
k
-
α
γ
∇
θ
L
u
n
s
u
p
e
r
v
i
s
e
d
θ
k
where
α
>
0
is an unsupervised learning rate,
θ
represents the backbone model parameter,
L
s
u
p
e
r
v
i
s
e
d
is a supervised loss function and
L
u
n
s
u
p
e
r
v
i
s
e
d
is an unsupervised loss function (Saif: Page 2 Eq (3) — a Connectionist Temporal Classification Loss as the supervised training loss; Page 2 Col 2 par 1 — an unsupervised NCE loss; Page 2 Eq (6)); and
wherein the applying the gradient descent to the higher-level supervised loss further comprises updating the initialized classification head parameter based on:
ϕ
k
+
1
=
ϕ
k
-
β
∇
ϕ
L
s
u
p
e
r
v
i
s
e
d
ϕ
k
,
θ
k
where
β
>
0
is a supervised learning rate and
ϕ
represents the classification head parameter (Saif: Page 2 Eq (7)).
For claim 3, claim 1 is incorporated and the reference of Saif discloses the computer-implemented method of claim 1, further comprising performing pre-training of the backbone model parameter and the classification head parameter using a second unsupervised loss function (Saif: Page 3 Col 2 Training Strategy — a PT+FT (pre-training and fine-tuning) whereby a pre-training occurs first, with unsupervised training; FIG. 1 (upper diagram) — pre-training of the backbone model and classification head parameters using unlabelled data (unsupervised training); Page 2 Eq (6), (7) — updating the backbone model and the classification head parameters for each iteration of
k
such that each
L
N
C
E
θ
k
for each
k
qualifies as a second unsupervised loss function).
For claim 4, claim 3 is incorporated and the combination of Saif in view of Zhang discloses the computer-implemented method, wherein the pre-training is performed using an unsupervised learning rate of the applying the gradient descent to the lower-level unsupervised loss operation (Saif: Page 3 Col 2 Training Strategy — a PT+FT (pre-training and fine-tuning) whereby a pre-training occurs first, with unsupervised training having its learning rate).
For claim 5, claim 1 is incorporated and Saif discloses the computer-implemented method, further comprising performing fine-tuning of the updated backbone model parameter and the updated classification head parameter (Saif: Page 3 Col 2 Training Strategy — a PT+FT (pre-training and fine-tuning) whereby a fine-tuning occurs after a pre-training).
For claim 6, claim 5 is incorporated and Saif discloses the computer-implemented method of claim 5, wherein the performing fine-tuning uses the supervised loss function and a smaller learning rate than a supervised learning rate of the applying the gradient descent to the higher-level supervised loss (Saif: Page 3 Col 2 — ‘[i]n the PT+FT and BL-JUST, learning rates start from
5
×
10
-
3
for unsupervised training and
5
×
10
-
4
for supervised training’ (the learning rate of the supervised training being smaller than the learning rate of the unsupervised training)).
For claim 7, claim 1 is incorporated and Saif discloses the computer-implemented method, wherein the lower-level unsupervised loss is a noise-contrastive estimation loss (Saif: Page 2 2.2 — ‘[f]or unsupervised training, we use the InfoNCE loss’) and the higher-level supervised loss is a connectionist temporal classification loss (Saif: Page 2 2.2 — ‘[f]or supervised training, the Connectionist Temporal Classification (CTC) loss’).
For claim 8, claim 1 is incorporated and Saif discloses the computer-implemented method, further comprising considering an unsupervised training stage that learns generic representations of speech signals that can be fine-tuned for a particular task as the lower-level problem corresponding to the lower-level unsupervised loss, wherein a result of the lower-level problem is a set of lower-level model parameters of backbone layers that promote learning in an upper-level supervised training stage that minimizes a task-specific loss given the lower-level model parameters (Saif: Page 1 Col 2 — ‘we regard the unsupervised training stage, which has the goal of learning generic representations of speech signals that can be fine-tuned for a particular task, as the lower-level problem. Ideally, the result of this lower-level problem is a set of initial model parameters or weights of backbone layers that promote successful and efficient learning in the upper-level supervised training, which minimizes a task-specific loss given the lower-level parameters’).
For claim 9, claim 1 is incorporated and Saif discloses the computer-implemented method, wherein the higher-level supervised loss maximizes a probability of predicting a future sample
x
t
+
p
given a contextual representation
C
t
θ
generated from a speech sequence
x
1
,
x
2
,
…
,
x
t
up to time
t
using a neural network parameterized by the updated backbone model parameter (Saif: Page 2 2.2 — ‘maximize the probability of predicting the future sample
x
t
+
p
given a contextual representation
C
t
θ
generated from the speech sequence
x
1
,
x
2
,
…
,
x
t
up to time
t
using a neural network parameterized by
θ
’).
For claim 10, claim 1 is incorporated and Saif discloses the computer-implemented method, wherein the higher-level supervised loss minimizes a negative log-likelihood of a label sequence
y
n
, given by:
L
s
u
p
e
r
v
i
s
e
d
ϕ
,
θ
=
1
N
∑
n
=
1
N
-
l
o
g
P
y
n
z
x
n
;
ϕ
,
θ
where
z
x
n
;
ϕ
,
θ
is an output of a corresponding model,
ϕ
represents parameters of a classification layer of the corresponding model, and
θ
includes all parameters except those from the classification layer (Saif: Page 2 Eq (3)).
For claim 11, claim 1 is incorporated and Saif discloses the computer-implemented method, wherein the lower-level unsupervised loss is defined as:
min
ϕ
,
θ
L
s
u
p
e
r
v
i
s
e
d
ϕ
,
θ
S
.
t
.
θ
∈
S
∶
=
a
r
g
m
i
n
θ
L
u
n
s
u
p
e
r
v
i
s
e
d
θ
(Saif: Page 2 Eq (4)).
For claim 12, claim 1 is incorporated and Saif discloses the computer-implemented method, wherein the lower-level unsupervised loss of a bilevel problem corresponding to bi-level training is employed and defined by:
min
ϕ
,
θ
F
γ
ϕ
,
θ
∶
=
L
s
u
p
e
r
v
i
s
e
d
ϕ
,
θ
+
γ
L
u
n
s
u
p
e
r
v
i
s
e
d
ϕ
,
θ
where
γ
>
0
is a penalty constant (Saif: Page 2 Eq (5)).
For claim 13, claim 1 is incorporated and Saif discloses the computer-implemented method The computer-implemented method of claim 1, further comprising performing inferencing using the output backbone model parameter and the output classification head parameter (Saif: Page 3 4. — carrying out experiments on ASR tasks (performing inference)).
For claim 14, claim 13 is incorporated and the reference of Saif discloses the computer-implemented method of claim 13, wherein training data for the method is speech recognition data and wherein the inferencing is performed on input speech, further comprising performing speech recognition on the input speech based on results of the inferencing (Saif: Page 3 4. — carrying out experiments on ASR tasks (performing inference); 4.1 — training data set for speech recognition).
For claim 15, claim 14 is incorporated and Saif discloses the computer-implemented method The computer-implemented method of claim 14, wherein the input speech is at least one of raw audio and log-Mel features of an audio track (Saif: Page 3 Col 2 — ‘the input is 80-dimensional logmel features’).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 16, 17, 18, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Saif (“Joint unsupervised and supervised training for automatic speech recognition via bilevel optimization.” ICASSP 2024-2024) in view of HASSAN et al. (US 2022/0398405 A1: hereafter — Hassan).
For claim 16, the subject matter of this claim is rejected by the reference of Saif just as provided in claim 1 above.
The reference of Saif however fails to teach of a computer program product comprising one or more tangible computer-readable storage media and program instructions stored on at least one or more tangible computer-readable storage medium.
The reference of Hassan is however introduced to teach this, as non-transitory computer-readable storage media for tangibly storing computer program instructions, provided in [0018].
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to improve upon the technique provided by Saif, by incorporating the known teaching of Hassan which can implement the technique of Saif as computer program instructions for computer processor execution, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of an easy access to executing the technique on a computer. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
For claim 17, the subject matter of this claim is rejected by the reference of Saif just as provided in claim 1 above.
The reference of Saif however fails to teach of a memory and a processor coupled to the memory able to perform the indicated technique.
The reference of Hassan is however introduced to teach this, as non-transitory computer-readable storage media for tangibly storing computer program instructions and a computer processor, provided in [0018].
Hence, before the effective filing date of the claimed invention, one of ordinary skill in the art would have found it obvious to improve upon the technique provided by Saif, by incorporating the known teaching of Hassan which can implement the technique of Saif by storing the instructions on a memory along with a processor to implement the execution, to thereby come up with the claimed invention. The combination of both prior art elements would have provided the predictable result of an easy access to executing the technique on a computer. See KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 415-421, 82 USPQ2d 1385, 1395-97 (2007).
As for claim 18, system claim 18 and method claim 2 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 18 is similarly rejected under the same rationale as applied above with respect to method claim 2.
As for claim 19, system claim 19 and method claim 3 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 19 is similarly rejected under the same rationale as applied above with respect to method claim 3.
As for claim 20, system claim 20 and method claim 5 are related as system and the method of using same, with each claimed element’s function corresponding to the claimed method step. Accordingly, claim 20 is similarly rejected under the same rationale as applied above with respect to method claim 5.
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
ABATI et al. (US 2021/0150345 A1) provides teaching for performing gradient descent as an optimisation process [0054], a backbone network as a neural network being applied to perform a function, generating activation for a task, wherein each task can be associated with a respective classification head [0082].
See PTO-892.
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to OLUWADAMILOLA M. OGUNBIYI whose telephone number is (571)272-4708. The Examiner can normally be reached Monday – Thursday (8:00 AM – 5:30 PM Eastern Standard Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s Supervisor, PARAS D. SHAH can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OLUWADAMILOLA M OGUNBIYI/Examiner, Art Unit 2653
/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653
04/23/2026