DETAILED ACTION
This action is in response to the application filed on 05/25/2023. Claims 1-20 are pending and have been examined
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1:
Subject Matter of Eligibility Analysis Step 1:
Claim 1 recites a method, which is directed to a process, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 1 recites
Determining a stride value for a first machine learning model (this limitation is a mental process as it encompasses a human mentally calculating a stride value from a machine learning model, given its equation).
Therefore, claim 1 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 1 further recites additional elements of
performing transfer learning from the first machine learning model to a second machine learning model … (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions to apply (see MPEP 21.06(f))).
… wherein the second streaming machine learning model is an online streaming machine learning model (this element does not integrate an abstract idea because it recites a field of use limitation to apply a judicial exception (see MPEP 2106.05(h))).
inserting a spectral pooling layer into the second machine learning model using the stride value (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions to apply (see MPEP 21.06(f))).
Training the second machine learning model with the spectral pooling layer (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions to apply (see MPEP 2106.05(f))).
Therefore, claim 1 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2 Prong 2:
The additional elements of claim 1 do not provide significantly more than the abstract idea itself, taken alone and in combination because
performing transfer learning from the first machine learning model to a second machine learning model… is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
… wherein the second streaming machine learning model is an online streaming machine learning model recites a field of use limitation to apply a judicial exception (see MPEP 2106.05(h)).
inserting a spectral pooling layer into the second machine learning model using the stride value is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
Training the second machine learning model with the spectral pooling layer is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
Therefore, claim 1 is subject matter ineligible.
Regarding claim 2:
Subject Matter of Eligibility Analysis Step 1:
Claim 2 recites a method, which is directed to a process, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 2 is dependent on claim 1, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 1 is applied here. Therefore claim 2 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 2 recites additional elements of
the first machine learning model is a non-streaming machine learning model (this limitation does not integrate an abstract idea because it recites a field of use limitation to apply a judicial exception (see MPEP 2106.05(h)).))
Therefore claim 2 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional elements of claim 2 do not provide significantly more than the abstract idea itself, take alone or in combination because
the first machine learning model is a non-streaming machine learning recites a field of use limitation to apply a judicial exception (see MPEP 2106.05(h)).
Therefore, claim 2 is subject matter ineligible.
Regarding claim 3:
Subject Matter of Eligibility Analysis Step 1:
Claim 3 recites a method, which is directed to a process, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 3 is dependent on claim 1, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 1 is applied here. Therefore claim 3 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 3 recites additional elements of
the first machine learning model is a first online streaming machine learning model (this limitation does not integrate an abstract idea because it recites a field of use limitation to apply a judicial exception (see MPEP 2106.05(h)).))
Therefore claim 3 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional elements of claim 3 do not provide significantly more than the abstract idea itself, take alone or in combination because
the first machine learning model is a first online streaming machine learning model recites a field of use limitation to apply a judicial exception (see MPEP 2106.05(h)).
Therefore, claim 3 is subject matter ineligible.
Regarding claim 4:
Subject Matter of Eligibility Analysis Step 1:
Claim 4 recites a method, which is directed to a process, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 4 is dependent on claim 1, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 1 is applied here. Therefore claim 4 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 4 recites additional elements of
the second machine learning model is an automated speech recognition (ASR) online streaming machine learning model (this element does not integrate an abstract idea because it recites a field of use limitation to apply a judicial exception (see MPEP 2106.05(h))).
Therefore claim 4 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional elements of claim 4 do not provide significantly more than the abstract idea itself, take alone or in combination because
the second machine learning model is an automated speech recognition (ASR) online streaming machine learning model recites a field of use limitation to apply a judicial exception (see MPEP 2106.05(h)).
Therefore, claim 4 is subject matter ineligible.
Regarding claim 5:
Subject Matter of Eligibility Analysis Step 1:
Claim 5 recites a method, which is directed to a process, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 5 recites
Determining the stride value includes generating a cropping mask (this limitation is a mathematical equation because the specification provides the equation of calculating the cropping mask (page 7, paragraph 0024). This is also a mental process because the human mind can mentally calculate the cropping mask by using the equation provided.)
Therefore, claim 5 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 5 does not further recite any additional elements. Therefore, claim 5 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
Since there are not additional elements, claim 5 does not provide significantly more than the abstract idea itself, taken alone or in combination. Therefore, claim 5 is subject matter ineligible.
Regarding claim 6:
Subject Matter of Eligibility Analysis Step 1:
Claim 6 recites a method, which is directed to a process, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 6 is dependent on claim 1, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 1 is applied here. Therefore claim 6 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 6 further recites additional elements of
training the second machine learning model with the spectral pooling layer includes processing a period of past context for a speech signal (this element does not integrate the abstract idea into a practical application it amounts to mere instructions to apply (see MPEP 2106.05(f))).
Therefore, claim 6 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional elements of claim 6 do not provide significantly more than the abstract idea itself, take alone or in combination because
training the second machine learning model with the spectral pooling layer includes processing a period of past context for a speech signal is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
Therefore, claim 6 is subject matter ineligible.
Regarding claim 7:
Subject Matter of Eligibility Analysis Step 1:
Claim 7 recites a method, which is directed to a process, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 7 recites
… determining a chunk size for processing a speech signal (this limitation is a mental process since a human can mentally choose a chunk size).
Therefore claim 7 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 7 recites additional elements of
training the second machine learning model with the spectral pooling layer … (this element does not integrate the abstract idea into a practical application because it recites a technological environment in which to apply a judicial exception (see MPEP 2106.05(h))).
Therefore, claim 7 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B: The additional elements of claim 7 do not provide significantly more than the abstract idea itself, take alone or in combination because
training the second machine learning model with the spectral pooling layer includes determining a chunk size for processing a speech signal specifies a particular technological environment to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(h)).
Therefore, claim 7 is subject matter ineligible.
Regarding claim 8:
Subject Matter of Eligibility Analysis Step 1:
Claim 8 recites a method, which is directed to a process, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 8 is dependent on claim 1, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 1 is applied here. Therefore claim 8 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 8 further recites additional elements of
Processing a speech signal using the trained second machine learning model is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)))
Therefore, claim 8 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional element of claim 8 does not provide significantly more than the abstract idea itself, taken alone or in combination because
Processing a speech signal using the trained second machine learning model is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
Therefore, claim 8 is subject matter ineligible.
Regarding claim 9:
Subject Matter of Eligibility Analysis Step 1:
Claim 9 recites a system, which is directed to a machine, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 9 recites
Determine a stride value for a first online streaming machine learning model (this limitation is a mental process as it encompasses a human mentally calculating a stride value from a machine learning model, given its equation).
Therefore, claim 9 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 9 further recites additional elements of
A memory (This element does not integrate the abstract idea into a practical application because it recites a generic computing component on which to perform the abstract idea (see MPEP 2106.05(b))).
A processor (This element does not integrate the abstract idea into a practical application because it recites a generic computing component on which to perform the abstract idea (see MPEP 2106.05(b))).
perform transfer learning from the first online streaming machine learning model to a second online streaming machine learning model (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions to apply (see MPEP 21.06(f))).
insert a spectral pooling layer into the second machine learning model using the stride value (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions to apply (see MPEP 21.06(f))).
train the second machine learning model with the spectral pooling layer (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions to apply (see MPEP 2106.05(f))).
Therefore, claim 9 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2 Prong 2:
The additional elements of claim 9 do not provide significantly more than the abstract idea itself, taken alone and in combination because
A memory uses a computer as a tool to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(b)).
A processor uses a computer as a tool to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(b)).
perform transfer learning from the first online streaming machine learning model to a second online streaming machine learning model is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
insert a spectral pooling layer into the second machine learning model using the stride value is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
train the second machine learning model with the spectral pooling layer is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f))).
Therefore, claim 9 is subject matter ineligible.
Regarding claim 10:
Subject Matter of Eligibility Analysis Step 1:
Claim 10 recites a system, which is directed to a machine, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 10 recites
Determining the stride value includes generating a cropping mask (this limitation is a mathematical equation because the specification provides the equation of calculating the cropping mask (page 7, paragraph 0024). This is also a mental process because the human mind can mentally calculate the cropping mask by using the equation provided.)
Therefore, claim 10 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 10 does not further recite any additional elements. Therefore, claim 10 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
Since there are not additional elements, claim 10 does not provide significantly more than the abstract idea itself, taken alone or in combination. Therefore, claim 10 is subject matter ineligible.
Regarding claim 11:
Subject Matter of Eligibility Analysis Step 1:
Claim 11 recites a system, which is directed to a machine, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 11 is dependent on claim 9, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 9 is applied here. Therefore claim 11 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 11 further recites additional elements of
determining the stride value includes processing a period of future context from a speech signal (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions (see MPEP 2106.05(f))).
Therefore, claim 11 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional elements of claim 11 do not provide significantly more than the abstract idea itself, take alone or in combination because
determining the stride value includes processing a period of future context from a speech signal is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
Therefore, claim 11 is subject matter ineligible.
Regarding claim 12:
Subject Matter of Eligibility Analysis Step 1:
Claim 12 recites a system, which is directed to a machine, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 12 is dependent on claim 9, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 9 is applied here. Therefore claim 12 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 12 further recites additional elements of
training the second online streaming machine learning model with the spectral pooling layer includes processing a period of past context for a speech signal (this element does not integrate the abstract idea into a practical application it amounts to mere instructions to apply (see MPEP 2106.05(f))).
Therefore, claim 12 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional elements of claim 12 do not provide significantly more than the abstract idea itself, take alone or in combination because
training the second online streaming machine learning model with the spectral pooling layer includes processing a period of past context for a speech signal is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
Therefore, claim 12 is subject matter ineligible.
Regarding claim 13:
Subject Matter of Eligibility Analysis Step 1:
Claim 13 recites a system, which is directed to a machine, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 13 recites
… determining a chunk size for processing a speech signal (this limitation is a mental process since a human can mentally choose a chunk size).
Therefore claim 13 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 13 recites additional elements of
training the second online streaming machine learning model with the spectral pooling layer … (this element does not integrate the abstract idea into a practical application because it recites a technological environment in which to apply a judicial exception (see MPEP 2106.05(h))).
Therefore, claim 13 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B: The additional elements of claim 13 do not provide significantly more than the abstract idea itself, take alone or in combination because
training the second online streaming machine learning model with the spectral pooling layer includes determining a chunk size for processing a speech signal specifies a particular technological environment to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(h)).
Therefore, claim 13 is subject matter ineligible.
Regarding claim 14:
Subject Matter of Eligibility Analysis Step 1:
Claim 14 recites a system, which is directed to a machine, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 14 is dependent on claim 9, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 9 is applied here. Therefore claim 14 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 14 further recites additional elements of
Processing a speech signal using the trained second online streaming machine learning model (this element does not integrate the abstract idea into a practical application it amounts to mere instructions to apply (see MPEP 2106.05(f)))
Therefore, claim 14 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional element of claim 14 does not provide significantly more than the abstract idea itself, taken alone or in combination because
Processing a speech signal using the trained second online streaming machine learning model is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
Therefore, claim 14 is subject matter ineligible.
Regarding claim 15:
Subject Matter of Eligibility Analysis Step 1:
Claim 15 recites a computer readable medium, which is directed to a manufacture, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 15 recites
Determining a stride value for a non-streaming machine learning model (this limitation is a mental process as it encompasses a human mentally calculating a stride value from a machine learning model, given its equation).
Therefore, claim 15 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 15 further recites additional elements of
A computer program product residing on a non-transitory computer readable medium (This element does not integrate the abstract idea into a practical application because it recites a generic computing component on which to perform the abstract idea (see MPEP 2106.05(b))).
performing transfer learning from the non-streaming machine learning model to an online machine learning model (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions to apply (see MPEP 21.06(f))).
inserting a spectral pooling layer into the online machine learning model using the stride value (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions to apply (see MPEP 21.06(f))).
training the online machine learning model with the spectral pooling layer (this element does not integrate the abstract idea into a practical application because it amounts to mere instructions to apply (see MPEP 2106.05(f))).
Therefore, claim 15 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2 Prong 2:
The additional elements of claim 15 do not provide significantly more than the abstract idea itself, taken alone and in combination because
A computer program product residing on a non-transitory computer readable medium uses a computer as a tool to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(b)).
performing transfer learning from the non-streaming machine learning model to an online machine learning model… is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
inserting a spectral pooling layer into the online machine learning model using the stride value is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
training the online machine learning model with the spectral pooling layer is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(therefore, claim 15 is subject matter ineligible.
Regarding claim 16:
Subject Matter of Eligibility Analysis Step 1:
Claim 16 recites a computer readable medium, which is directed to a manufacture, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 16 recites
Determining the stride value includes generating a cropping mask (this limitation is a mathematical equation because the specification provides the equation of calculating the cropping mask (page 7, paragraph 0024). This is also a mental process because the human mind can mentally calculate the cropping mask by using the equation provided.)
Therefore, claim 16 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 16 does not further recite any additional elements. Therefore, claim 16 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
Since there are not additional elements, claim 16 does not provide significantly more than the abstract idea itself, taken alone or in combination. Therefore, claim 16 is subject matter ineligible.
Regarding claim 17:
Subject Matter of Eligibility Analysis Step 1:
Claim 17 recites a computer readable medium, which is directed to a manufacture, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 17 is dependent on claim 15, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 15 is applied here. Therefore claim 17 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 17 further recites additional elements of
training the online streaming machine learning model with the spectral pooling layer includes processing a period of past context for a speech signal (this element does not integrate the abstract idea into a practical application it amounts to mere instructions to apply (see MPEP 2106.05(f))).
Therefore, claim 17 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional elements of claim 17 do not provide significantly more than the abstract idea itself, take alone or in combination because
training the online streaming machine learning model with the spectral pooling layer includes processing a period of past context for a speech signal is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
Therefore, claim 17 is subject matter ineligible.
Regarding claim 18:
Subject Matter of Eligibility Analysis Step 1:
Claim 18 recites a computer readable medium, which is directed to a manufacture, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Claim 18 recites
… determining a chunk size for processing a speech signal (this limitation is a mental process since a human can mentally choose a chunk size).
Therefore claim 18 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 18 recites additional elements of
training the online streaming machine learning model with the spectral pooling layer … (this element does not integrate the abstract idea into a practical application because it recites a technological environment in which to apply a judicial exception (see MPEP 2106.05(h))).
Therefore, claim 18 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B: The additional elements of claim 18 do not provide significantly more than the abstract idea itself, take alone or in combination because
training the online streaming machine learning model with the spectral pooling layer includes determining a chunk size for processing a speech signal specifies a particular technological environment to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(h)).
Therefore, claim 18 is subject matter ineligible.
Regarding claim 19:
Subject Matter of Eligibility Analysis Step 1:
Claim 19 recites a computer readable medium, which is directed to a manufacture, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 19 is dependent on claim 15, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 15 is applied here. Therefore claim 18 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 19 recites additional elements of
the online streaming machine learning model is an automated speech recognition (ASR) online streaming machine learning model (this limitation does not integrate an abstract idea because it recites a field of use limitation to apply a judicial exception (see MPEP 2106.05(h))).
Therefore claim 19 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional elements of claim 19 do not provide significantly more than the abstract idea itself, take alone or in combination because
the online streaming machine learning model is an automated speech recognition (ASR) online streaming machine learning model specifies a particular technological environment to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(h)).
Therefore, claim 19 is subject matter ineligible.
Regarding claim 20:
Subject Matter of Eligibility Analysis Step 1:
Claim 20 recites a computer readable medium, which is directed to a manufacture, and thus is one of the four statutory categories of patentable subject matter.
Subject Matter of Eligibility Analysis Step 2A Prong 1:
Because claim 20 is dependent on claim 15, the Subject Matter of Eligibility Analysis Step 2A Prong 1 from claim 15 is applied here. Therefore claim 20 recites an abstract idea.
Subject Matter of Eligibility Analysis Step 2A Prong 2:
Claim 20 further recites additional elements of
Processing a speech signal using the trained online streaming machine learning model (this element does not integrate the abstract idea into a practical application it amounts to mere instructions to apply (see MPEP 2106.05(f))).
Therefore, claim 20 is not integrated into a practical application.
Subject Matter of Eligibility Analysis Step 2B:
The additional element of claim 20 does not provide significantly more than the abstract idea itself, taken alone or in combination because
Processing a speech signal using the trained online streaming machine learning model is an instruction to perform the abstract idea and cannot provide significantly more (see MPEP 2106.05(f)).
Therefore, claim 20 is subject matter ineligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1, 2, 4, 5, 7, 8, 15, 16, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Doutre et al. (US 20220343894 A1) (hereafter referred to as Doutre) in view of Riad et al. (WO 2023060120 A1) (hereafter referred to as Riad).
Regarding claim 1, Doutre teaches
performing transfer learning from the first machine learning model to a second machine learning model, wherein the second streaming machine learning model is an online streaming machine learning model (Doutre, page 2, paragraph 18, “The transcripts generated by the non-streaming teacher model may then be used to distill knowledge into the streaming ASR model. In this respect, the non-streaming ASR model functions as a teacher model while the streaming ASR model that is being taught by the distillation process is a student model”)
training the second machine learning model … (Doutre, page 4, paragraph 31, “Here, the teacher model 210 distills its knowledge to the student model 152 by training the student model 152 with a plurality of student training samples 232 that include, at least in part, labels or transcriptions 212 generated by the teacher model 210.”)
Doutre does not teach, but Riad does teach
determining a stride value for a first machine learning model (Riad, page 9, paragraph 0057, “method 200 may include applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer”. Examiner notes that Riad also discloses that DiffStride is the downsampling layer being described [0086]).
inserting a spectral pooling layer into the second machine learning model using the stride value (Riad, page 6, paragraph 0045, “The present disclosure includes a learnable stride downsampling layer to leam the size of a cropping mask in a Fourier domain, which may perform resizing in a differentiable way. This learnable stride may be used as a replacement for standard downsampling layers.”)
second machine learning model with spectral pooling layer (Riad, page 16, paragraph 0104, “To address the difficulty of searching stride parameters, provided herein is DiffStride, a downsampling layer that may allow spectral pooling to learn its strides through backpropagation.)
Doutre and Riad are considered analogous to the claimed invention because they both deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre to apply the DiffStride layer (from Riad) to the teacher model (from Doutre) and insert the DiffStride layer to the student model (form Doutre). Doing so is advantageous because “spectral pooling … alleviates the loss of information of spatial pooling, while enabling fractional downsizing factors. Spectral pooling also preserves low frequencies without aliasing, a known weakness of spatial/temporal convnets”. (Riad, page 15, paragraph 0100).
Regarding claim 2, Doutre and Riad teaches the method of claim 1, Doutre further teaches
the first machine learning model is a non-streaming machine learning model (Doutre, page 2, paragraph 0018, “To address the transcription performance for streaming ASR models, implementations described herein are directed toward leveraging a non-streaming ASR model as a teacher to generate transcripts for a streaming ASR student model”).
Regarding claim 4, Doutre and Riad teaches the method of claim 1, Doutre further teaches
the second machine learning model is an automated speech recognition (ASR) online streaming machine learning model (Doutre, page 2, paragraph 0018, “To address the transcription performance for streaming ASR models, implementations described herein are directed toward leveraging a non-streaming ASR model as a teacher to generate transcripts for a streaming ASR student model”).
Regarding claim 5, Doutre and Riad teaches the method of claim 1, Riad further teaches
determining the stride value includes generating a cropping mask (Riad, page 9, paragraph 0057, “method 200 may include applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer, where applying the downsampling layer of the machine learning model to a batch of the training data comprises: projecting an input in a spatial domain to a Fourier domain, constructing a mask in the Fourier domain based on a current value of the stride and dimensions of the input, applying the mask as a low-pass filter to the projected input to produce a tensor in the Fourier domain, cropping the tensor based on the mask, and transforming the cropped tensor to the spatial domain”. Examiner notes that the downsampling layer that is referenced is the DiffStride layer).
Doutre and Riad are considered analogous to the claimed invention because they both deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre to apply the DiffStride layer (from Riad). One of the ordinary skill in the art would have known to apply the known technique of using a downsampling layer (DiffStride), to create a cropping mask. Therefore, applying Riad’s technique would yield the predictable result of creating a cropping mask from determining the stride value (See MPEP 2141 (III)(D) Applying a known technique to a known device ready for improvement to yield predicable results.
Regarding claim 7, Doutre and Riad teaches the method of claim 1, Riad further teaches
training the second machine learning model with the spectral pooling layer includes determining a chunk size for processing a speech signal (Riad, page 9, paragraph 0057, “method 200 may include applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer”. Examiner notes that DiffStride is the spectral pooling layer and that the chunk size is the same as the stride).
Doutre and Riad are considered analogous to the claimed invention because they both deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre to insert the DiffStride layer to the student model (form Doutre). Doing so is advantageous because “spectral pooling … alleviates the loss of information of spatial pooling, while enabling fractional downsizing factors. Spectral pooling also preserves low frequencies without aliasing, a known weakness of spatial/temporal convnets”. (Riad, page 15, paragraph 0100).
Regarding claim 8, Doutre and Riad teaches the method of claim 1, Doutre further teaches
processing a speech signal using the trained second machine learning model (Doutre, page 5, paragraph 0032, “By using the teacher model 210, a plurality or corpus 220 of unlabeled training samples 222 are converted to a corpus 230 of student training model samples 232, 232a-n. The training process 200 then feeds the student training model samples 232 into the student model 152 to enable the student model 152 to learn to predict a transcription 154 based on the audio data of the previously unlabeled sample 222 along with its predicted transcription 212 generated by the teacher model 210”. Examiner notes The corpus 220 generally refers to any collection of unlabeled audio data (e.g., a database or a data store for audio data samples) [0032]).
Regarding claim 15, Doutre teaches
A computer program product residing on a non-transitory computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform operations (Doutre, page 8, paragraph 0052, “These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine-readable medium” and “computer-readable medium” refer to any computer program product, non-transitory computer readable medium, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal”).
performing transfer learning from the non-streaming machine learning model to an online machine learning model (Doutre, page 2, paragraph 18, “The transcripts generated by the non-streaming teacher model may then be used to distill knowledge into the streaming ASR model. In this respect, the non-streaming ASR model functions as a teacher model while the streaming ASR model that is being taught by the distillation process is a student model”)
training the online machine learning model … (Doutre, page 4, paragraph 31, “Here, the teacher model 210 distills its knowledge to the student model 152 by training the student model 152 with a plurality of student training samples 232 that include, at least in part, labels or transcriptions 212 generated by the teacher model 210.”)
Doutre does not teach, but Riad does teach
determining a stride value for a non-streaming machine learning model (Riad, page 9, paragraph 0057, “method 200 may include applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer”. Examiner notes that Riad also discloses that DiffStride is the downsampling layer being described [0086]).
inserting a spectral pooling layer into the online machine learning model using the stride value (Riad, page 6, paragraph 0045, “The present disclosure includes a learnable stride downsampling layer to leam the size of a cropping mask in a Fourier domain, which may perform resizing in a differentiable way. This learnable stride may be used as a replacement for standard downsampling layers.”)
training the online machine learning model with spectral pooling layer (Riad, page 16, paragraph 0104, “To address the difficulty of searching stride parameters, provided herein is DiffStride, a downsampling layer that may allow spectral pooling to learn its strides through backpropagation.)
Doutre and Riad are considered analogous to the claimed invention because they both deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre to apply the DiffStride layer (from Riad) to the teacher model (from Doutre) and insert the DiffStride layer to the student model (form Doutre). Doing so is advantageous because “spectral pooling … alleviates the loss of information of spatial pooling, while enabling fractional downsizing factors. Spectral pooling also preserves low frequencies without aliasing, a known weakness of spatial/temporal convnets”. (Riad, page 15, paragraph 0100).
Regarding claim 16, Doutre and Riad teaches the product of claim 15, Riad further teaches
determining the stride value includes generating a cropping mask (Riad, page 9, paragraph 0057, “method 200 may include applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer, where applying the downsampling layer of the machine learning model to a batch of the training data comprises: projecting an input in a spatial domain to a Fourier domain, constructing a mask in the Fourier domain based on a current value of the stride and dimensions of the input, applying the mask as a low-pass filter to the projected input to produce a tensor in the Fourier domain, cropping the tensor based on the mask, and transforming the cropped tensor to the spatial domain”. Examiner notes that the downsampling layer that is referenced is the DiffStride layer).
Doutre and Riad are considered analogous to the claimed invention because they both deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre to apply the DiffStride layer (from Riad). One of the ordinary skill in the art would have known to apply the known technique of using a downsampling layer (DiffStride), to create a cropping mask. Therefore, applying Riad’s technique would yield the predictable result of creating a cropping mask from determining the stride value (See MPEP 2141 (III)(D) Applying a known technique to a known device ready for improvement to yield predicable results.
Regarding claim 18, Doutre and Riad teaches the method of claim 15, Riad further teaches
training the online streaming machine learning model with the spectral pooling layer includes determining a chunk size for processing a speech signal (Riad, page 9, paragraph 0057, “method 200 may include applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer”. Examiner notes that DiffStride is the spectral pooling layer and that the chunk size is the same as the stride).
Doutre and Riad are considered analogous to the claimed invention because they both deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre to insert the DiffStride layer to the student model (form Doutre). Doing so is advantageous because “spectral pooling … alleviates the loss of information of spatial pooling, while enabling fractional downsizing factors. Spectral pooling also preserves low frequencies without aliasing, a known weakness of spatial/temporal convnets”. (Riad, page 15, paragraph 0100).
Regarding claim 19, Doutre and Riad teaches the method of claim 15, Doutre further teaches
the online streaming machine learning model is an automated speech recognition (ASR) online streaming machine learning model (Doutre, page 2, paragraph 0018, “To address the transcription performance for streaming ASR models, implementations described herein are directed toward leveraging a non-streaming ASR model as a teacher to generate transcripts for a streaming ASR student model”).
Regarding claim 20, Doutre and Riad teaches the method of claim 15, Doutre further teaches
processing a speech signal using the trained online machine learning model (Doutre, page 5, paragraph 0032, “By using the teacher model 210, a plurality or corpus 220 of unlabeled training samples 222 are converted to a corpus 230 of student training model samples 232, 232a-n. The training process 200 then feeds the student training model samples 232 into the student model 152 to enable the student model 152 to learn to predict a transcription 154 based on the audio data of the previously unlabeled sample 222 along with its predicted transcription 212 generated by the teacher model 210”. Examiner notes the corpus 220 generally refers to any collection of unlabeled audio data (e.g., a database or a data store for audio data samples) [0032]).
Claim(s) 3, 9-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Doutre and Riad in view of Tripathi et al. (US 20210343273 A1) (hereafter referred as Tripathi).
Regarding claim 3, Doutre and Riad teach the method of claim 1, Doutre and Riad does not teach, but Tripathi does teach
the first machine learning model is a first online streaming machine learning model (Tripathi, page 1, paragraph 0002, “ASR system is deployed on a mobile phone that experiences direct user interactivity, an application on the mobile phone using the ASR system may require the speech recognition to be streaming such that words appear on the screen as soon as they are spoken.” Examiner notes FIG. 2A illustrates audio data being inputted into a machine learning model)
Doutre, Riad, and Tripathi are considered analogous to the claimed invention because they all deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre and Riad to have the teacher model (from Doutre) be an online streaming machine learning model (from Tripathi). Doing so is advantageous because “when using an ASR system today there is a demand that the ASR system decode utterances in a streaming fashion that corresponds to real-time or even faster than real-time” (Tripathi, page 1, paragraph 0002).
Regarding claim 9, Doutre teaches
a memory (Doutre, page 6, paragraph 0046, “The computing device 500 includes a processor 510 (e.g., data processing hardware 112, 144), memory 520 (e.g., memory hardware 114, 146), a storage device 530, a high-speed interface/controller 540 connecting to the memory 520 and high-speed expansion ports 550, and a low speed interface/controller 560 connecting to a low speed bus 570 and a storage device 530.”)
a processor (Doutre, page 6, paragraph 0046, “The computing device 500 includes a processor 510 (e.g., data processing hardware 112, 144), memory 520 (e.g., memory hardware 114, 146), a storage device 530, a high-speed interface/controller 540 connecting to the memory 520 and high-speed expansion ports 550, and a low speed interface/controller 560 connecting to a low speed bus 570 and a storage device 530.”)
performing transfer learning from the first online streaming machine learning model to a second online streaming machine learning model (Doutre, page 2, paragraph 18, “The transcripts generated by the non-streaming teacher model may then be used to distill knowledge into the streaming ASR model. In this respect, the non-streaming ASR model functions as a teacher model while the streaming ASR model that is being taught by the distillation process is a student model” Examiner notes Doutre only teaches transfer learning from a machine learning model to an online streaming model)
train the online streaming machine learning model … (Doutre, page 4, paragraph 31, “Here, the teacher model 210 distills its knowledge to the student model 152 by training the student model 152 with a plurality of student training samples 232 that include, at least in part, labels or transcriptions 212 generated by the teacher model 210.”)
Doutre does not teach, but Riad does teach
determining a stride value for a first online streaming machine learning model (Riad, page 9, paragraph 0057, “method 200 may include applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer”. Examiner notes that Riad also discloses that DiffStride is the downsampling layer being described [0086]).
inserting a spectral pooling layer into the second online streaming machine learning model using the stride value (Riad, page 6, paragraph 0045, “The present disclosure includes a learnable stride downsampling layer to leam the size of a cropping mask in a Fourier domain, which may perform resizing in a differentiable way. This learnable stride may be used as a replacement for standard downsampling layers.”)
second online streaming machine learning model with spectral pooling layer (Riad, page 16, paragraph 0104, “To address the difficulty of searching stride parameters, provided herein is DiffStride, a downsampling layer that may allow spectral pooling to learn its strides through backpropagation.)
Doutre and Riad are considered analogous to the claimed invention because they both deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre to apply the DiffStride layer (from Riad) to the teacher model (from Doutre) and insert the DiffStride layer to the student model (form Doutre). Doing so is advantageous because “spectral pooling … alleviates the loss of information of spatial pooling, while enabling fractional downsizing factors. Spectral pooling also preserves low frequencies without aliasing, a known weakness of spatial/temporal convnets”. (Riad, page 15, paragraph 0100).
Doutre and Riad does not teach, but Tripathi does teach
performing transfer learning from the first online streaming machine learning model to a second online streaming machine learning model (Tripathi, page 1, paragraph 0002, “ASR system is deployed on a mobile phone that experiences direct user interactivity, an application on the mobile phone using the ASR system may require the speech recognition to be streaming such that words appear on the screen as soon as they are spoken.” Examiner notes FIG. 2A illustrates audio data being inputted into a machine learning model.)
Doutre, Riad, and Tripathi are considered analogous to the claimed invention because they all deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre and Riad to have the teacher model (from Doutre) be an online streaming machine learning model (from Tripathi). Doing so is advantageous because “when using an ASR system today there is a demand that the ASR system decode utterances in a streaming fashion that corresponds to real-time or even faster than real-time” (Tripathi, page 1, paragraph 0002).
Regarding claim 10, Doutre, Riad and Tripathi teaches the system of claim 9, Riad further teaches
determining the stride value includes generating a cropping mask (Riad, page 9, paragraph 0057, “method 200 may include applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer, where applying the downsampling layer of the machine learning model to a batch of the training data comprises: projecting an input in a spatial domain to a Fourier domain, constructing a mask in the Fourier domain based on a current value of the stride and dimensions of the input, applying the mask as a low-pass filter to the projected input to produce a tensor in the Fourier domain, cropping the tensor based on the mask, and transforming the cropped tensor to the spatial domain”. Examiner notes that the downsampling layer that is referenced is the DiffStride layer).
Based on claim 9, Doutre, Riad, and Tripathi are analogous and it would have been obvious to one having ordinary skill in the art to combine Doutre, Riad, and Tripathi.
Regarding claim 13, Doutre, Riad and Tripathi teaches the system of claim 9, Riad further teaches
training the second online streaming machine learning model with the spectral pooling layer includes determining a chunk size for processing a speech signal (Riad, page 9, paragraph 0057, “method 200 may include applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer”. Examiner notes that DiffStride is the spectral pooling layer and that the chunk size is the same as the stride).
Based on claim 9, Doutre, Riad, and Tripathi are analogous and it would have been obvious to one having ordinary skill in the art to combine Doutre, Riad, and Tripathi.
Regarding claim 14, Doutre, Riad and Tripathi teaches the method of claim 9, Doutre further teaches
processing a speech signal using the trained second machine learning model (Doutre, page 5, paragraph 0032, “By using the teacher model 210, a plurality or corpus 220 of unlabeled training samples 222 are converted to a corpus 230 of student training model samples 232, 232a-n. The training process 200 then feeds the student training model samples 232 into the student model 152 to enable the student model 152 to learn to predict a transcription 154 based on the audio data of the previously unlabeled sample 222 along with its predicted transcription 212 generated by the teacher model 210”. Examiner notes The corpus 220 generally refers to any collection of unlabeled audio data (e.g., a database or a data store for audio data samples) [0032]).
Based on claim 9, Doutre, Riad, and Tripathi are analogous and it would have been obvious to one having ordinary skill in the art to combine Doutre, Riad, and Tripathi.
Claim(s) 6 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Doutre and Riad in view of Chen et al. (US 20190318757 A1) (hereafter referred as Chen).
Regarding claim 6, Doutre and Riad teach the method of claim 1, Doutre and Riad does not teach, but Chen does teach
training the second machine learning model with the spectral pooling layer includes processing a period of past context for a speech signal (Chen, page 4, paragraph 0045, “BLSTM layers of the speech separation model can accommodate variable-length inputs. The output of one BLSTM layer may be fed back into the same layer, thus allowing the BLSTM layers to “remember” the past and future context when processing a given stream of audio segments”. Examiner notes BLSTM means Bidirectional Long Short Term Memory).
Doutre, Riad, and Chen are considered analogous to the claimed invention because they all deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre and Riad to add a BLSTM layer (from Chen) into the student model (from Doutre). Doing so is advantageous because it “allows the network to use the surrounding context of a given segment, e.g., segments before and after the current input segment, to contribute to the determination of the mask for the current input segment” (Chen, page 4, paragraph 0045).
Regarding claim 17, Doutre and Riad teach the product of claim 15, Doutre and Riad does not teach, but Chen does teach
training the second machine learning model with the spectral pooling layer includes processing a period of past context for a speech signal (Chen, page 4, paragraph 0045, “BLSTM layers of the speech separation model can accommodate variable-length inputs. The output of one BLSTM layer may be fed back into the same layer, thus allowing the BLSTM layers to “remember” the past and future context when processing a given stream of audio segments”. Examiner notes BLSTM means Bidirectional Long Short Term Memory).
Doutre, Riad, and Chen are considered analogous to the claimed invention because they all deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre and Riad to add a BLSTM layer (from Chen) into the student model (from Doutre). Doing so is advantageous because it “allows the network to use the surrounding context of a given segment, e.g., segments before and after the current input segment, to contribute to the determination of the mask for the current input segment” (Chen, page 4, paragraph 0045).
Claim(s) 11 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Doutre, Riad and Tripathi in view of Chen et al. (US 20190318757 A1) (hereafter referred as Chen).
Regarding claim 11, Doutre, Riad and Tripathi teach the system of claim 9, Doutre, Riad and Tripathi does not teach, but Chen does teach
determining the stride value includes processing a period of future context from a speech signal (Chen, page 4, paragraph 0045, “BLSTM layers of the speech separation model can accommodate variable-length inputs. The output of one BLSTM layer may be fed back into the same layer, thus allowing the BLSTM layers to “remember” the past and future context when processing a given stream of audio segments”. Examiner notes BLSTM means Bidirectional Long Short Term Memory).
Doutre, Riad, Tripathi, and Chen are considered analogous to the claimed invention because they all deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre ,Riad and Tripathi to add a BLSTM layer (from Chen) into the teacher model (from Doutre). Doing so is advantageous because it “allows the network to use the surrounding context of a given segment, e.g., segments before and after the current input segment, to contribute to the determination of the mask for the current input segment” (Chen, page 4, paragraph 0045).
Regarding claim 12, Doutre, Riad and Tripathi teach the system of claim 9, Doutre,Riad and Tripathi does not teach, but Chen does teach
training the second online streaming machine learning model with the spectral pooling layer includes processing a period of past context for a speech signal (Chen, page 4, paragraph 0045, “BLSTM layers of the speech separation model can accommodate variable-length inputs. The output of one BLSTM layer may be fed back into the same layer, thus allowing the BLSTM layers to “remember” the past and future context when processing a given stream of audio segments”. Examiner notes BLSTM means Bidirectional Long Short Term Memory).
Doutre, Riad, Tripathi, and Chen are considered analogous to the claimed invention because they all deal with speech recognitions. It would have been obvious to one having ordinary skill in the art prior to the effective filing date to have modified Doutre, Riad and Tripathi to add a BLSTM layer (from Chen) into the student model (from Doutre). Doing so is advantageous because it “allows the network to use the surrounding context of a given segment, e.g., segments before and after the current input segment, to contribute to the determination of the mask for the current input segment” (Chen, page 4, paragraph 0045).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Sukyas et al. (System and Method for Training Domain-Specific Speech Recognition Language Models) discloses a starting ASR (Automated Speech Recognition) neural network model configured to receive audio in the language and generate a starting transcript of the audio. Sypniewski et al. (End-to-End Neural Networks for Speech Recognition and Classification) discloses end-to-end neural networks for speech recognition and classification and additional machine learning techniques that may be used in conjunction or separately. Penn et al. (System and Method for Applying a Convolutional Neural Network to Speech Recognition) discloses applying a convolutional neural network (CNN) to speech recognition.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN VO whose telephone number is (571)272-9622. The examiner can normally be reached Monday - Friday from 7-3 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached at (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.V./Examiner, Art Unit 2148 /MICHELLE T BECHTOLD/ Supervisory Patent Examiner, Art Unit 2148