Last updated: April 19, 2026

Application No. 18/444,440

AUTOMATED CUSTOMIZATION ENGINE

Final Rejection §103§112

Filed

Feb 16, 2024

Examiner

WOZNIAK, JAMES S

Art Unit

2655

Tech Center

2600 — Communications

Assignee

Just Right Reader Inc.

OA Round

2 (Final)

Interview Optional

— +40.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 385 resolved cases, 2023–2026

Examiner Intelligence

WOZNIAK, JAMES S View full profile →

Grants 59% of resolved cases

Career Allow Rate

227 granted / 385 resolved

-3.0% vs TC avg

Strong +40% interview lift

Without

With

+40.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 7m

Avg Prosecution

42 currently pending

Career history

427

Total Applications

across all art units

Statute-Specific Performance

§101

18.1%

-21.9% vs TC avg

§103

40.1%

+0.1% vs TC avg

§102

18.4%

-21.6% vs TC avg

§112

16.1%

-23.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 385 resolved cases

Office Action

§103 §112

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

In response to the Non-final Office Action from 9/22/2025, Applicant has filed an amendment on 12/15/2025.  In this reply, Applicant has amended independent claims 1, 1, 9, and 16 to narrow the generation of speech as involving conditioning the “text-to-speech model on parameters computed from the received audio content generated by the user, the parameters including at least one of speaking rate, intonation contour, and stress patterns, to produce speech that reflects the at least one attribute of the user.”  Dependent claims 3 was also separately amended to define narrow types of spectral features.
Applicant has also argued that the prior art of record fails to teach the limitations added to the independent claims in the instant amendment (Remarks, Pages 13-14).  These arguments have been fully considered, however, are moot with respect to the new grounds of rejection necessitated by the amended claims and further in view of Tischer (U.S. PG Publication:  2004/0111271 A1).

Applicant argues that the amendment to the title overcomes the objection to the specification (Remarks, Page 10).  
In response to the amended, more specific title of the invention, the previous objection directed towards a non-descriptive title of the invention is now moot and has been withdrawn.

Applicant argues that the amendments to claims 2, 10, and 17 overcome the indefiniteness rejection under 35 U.S.C. 112(b) (Remarks, Page 10).  
In response to the correction of the antecedent basis issues of these claims, the 35 U.S.C. 112(b) rejections are now moot and have been withdrawn.

Response to Arguments

In response to the patent subject matter eligibility rejection of claims 1-20 under 35 U.S.C. 101, Applicant argues that the limitation added to independent claims 1, 9, and 16 regarding "wherein generating speech comprises conditioning a text-to-speech model on parameters computed from the received audio content generated by the user, the parameters including at least one of speaking rate, intonation contour, and stress patterns, to produce speech that reflects the at least one attribute of the user" direct the claims to focus on "particular improvements to speech synthesis technology" by conditioning a model with "signals received from the user's generated audio to shape the synthesis output" that does not preempt all forms of personalized TTS feedback because the claims require conditioning a TTS model on particular parameters computed from the user's own audio.  Applicant also argues that the claims as amended integrate a practical application by further specifying the type of data computed to condition a TTS model.  Accordingly, Applicant concludes that the independent claims are directed towards patent eligible subject matter under 35 U.S.C. 101 (Remarks, Pages 10-12).
In response, when taking Applicant remarks into consideration, the improvement and/or practical application described in the specification should be analyzed to determine whether Applicant arguments and amendments are in line with the TTS model conditioning said to be improved by Applicant's invention.  It is found that Paragraph 0003 describes an improvement to a practical application field of "digital learning" by personalizing "interactive reading environments" while Paragraph 0026 explains that "personalization of the feedback" involves mimicking the user's voice with respect to "a pitch, timbre, speech rate or rhythm, or other suitable characteristics of the user’s voice."  Thus, the amended use of reader feedback by personalizing a TTS model in a digital learning space reflects the disclosed improvement to the practical application described in the specification.  Accordingly, independent claims 1, 9, and 16 are directed towards patent eligible subject matter under step 2A prong 2 of the 2019 Patent Subject Matter Eligibility Guidelines.  Also, as the independent claims have been identified to contain patent eligible subject matter, such subject matter is inherited by claim 3 by virtue of its dependency thus rendering Applicant arguments directed towards this particular claim (Remarks, Page 12) moot.
It should additionally be pointed out that while the instant amendments to the independent claims do assist in overcoming the rejection under 35 U.S.C. 101, the added subject matter is not completely set forth in the original disclosure.  Specifically, claims 1, 9, and 16 recite that the parameters computer from received audio and used to condition TTS include at least one of "speaking rate, intonation contour, and stress patterns."  The only mention of modifying TTS to mimic a user is found in Paragraph 0026 of the disclosure wherein it is noted that the parameters utilize include "a pitch, timbre, speech rate or rhythm."  While the specification does disclose "intonation of the speech" and "stress patterns" in paragraph 0038 such information is used to "evaluate aspects of the speech" or user's reading and make no mention of the term "contour" at all.  There is thus a lack of written description for particularly conditioning TTS on an intonation contour or stress patterns as set forth in the amended independent claims.  Accordingly, these claims and their dependents by virtue of their dependency fail to comply with the written description requirement.  A 35 U.S.C. 112(a) rejection necessitated by such amendment has been set forth in the proceeding 35 U.S.C. 112 section of this action.

Claim Rejections - 35 USC § 112

The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claims 1, 9, and 16 recite that the parameters computer from received audio and used to condition TTS include at least one of "speaking rate, intonation contour, and stress patterns."  The only mention of modifying TTS to mimic a user is found in Paragraph 0026 of the disclosure wherein it is noted that the parameters utilize include "a pitch, timbre, speech rate or rhythm."  While the specification does disclose "intonation of the speech" and "stress patterns" in paragraph 0038 such information is used to "evaluate aspects of the speech" or the user's reading and makes no mention of the term "contour" at all.  There is thus a lack of written description for particularly conditioning TTS on an intonation contour or stress patterns as set forth in the amended independent claims.  Accordingly, these claims and their dependents, by virtue of their dependency, fail to comply with the written description requirement.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5, 8-11, 13, 16-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jochim, et al. in view of Tischer (U.S. PG Publication:  2004/0111271 A1).
With respect to Claim 1, Jochim discloses:
A method for automated reader feedback, the method comprising:
receiving audio content generated by a user and corresponding to textual content provided to the user, the audio content comprising read content (receiving a user utterance at a microphone wherein the utterance pertains to displayed text content, Paragraphs 0022, 0075, and 0079-0080);
comparing the received audio content to expected audio content via a machine learning algorithm (comparison of the user utterance pronunciation to the "target pronunciation" utilizing "machine learning," Paragraphs 0027, 0029, 0076, 0081, and 0091);
determining, based on an output of the machine learning algorithm, that a portion of the received audio content deviates from a portion of the expected audio content by greater than a threshold value (Paragraph 0077- "analyze the output 420 to determine if the mispronunciation (if any) exceeds a threshold;" see also Paragraphs 0018 and 0079 discussing "deviation" between the user and target pronunciations relative to a threshold);
generating speech corresponding to the portion of the expected audio content, wherein the speech corresponding to the portion of the expected audio content is generated based at least on one attribute of the user (aural generation of recommendations including a "target pronunciation" for assisting in pronunciation that is performed "utilizing information associated with the user", Paragraphs 0018, 0077-0079, and 0092); and
outputting the generated speech to the user (playing the aural "target pronunciation" utilizing "one or more speakers," Paragraphs 0018 and 0079).
While Jochim teaches audible generation of recommendations for target pronunciations using "information associated with the user," Jochim does not teach this information includes "at least one of speaking rate, intonation contour, and stress patterns, to produce speech that reflects the at least one attribute of the user" that is used to condition a "text-to-speech model on parameters computed from the received audio content generated by the user."  Tischer, however, discloses that speech samples recorded from a person's "own voice file" are used for "customizing...text to speech" with parameters including "speed" (i.e., speaking rate), "intonations" in samples (i.e., intonation contour), and "emphasis" (i.e., stress patterns; see also "rhythms") (Paragraphs 0034-0035, 0041, 0053, 0055, 0061 (discussing the use of a person's "own voice file")).  Tischer also explains that such conditioned text to speech synthesis can be utilized in "educations programs such as teaching children to read and teaching people new languages" (Paragraph 0065).
Jochim and Tischer are analogous art because they are from a similar field of endeavor in pronunciation education utilizing speech synthesis.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to utilize the customized speech synthesis taught by Tischer to generate the audible feedback of Jochim to provide a predictable result of facilitating educational feedback listening via recognizable voices with greater clarity (Tischer, Paragraph 0023).
With respect to Claim 2, Jochim further discloses:
The method of claim 1, wherein the output of the machine learning algorithm is a deviation value indicative of an amount of deviation of the portion of the received audio control from the portion of the expected audio content, and wherein determining that the portion of the received audio content deviates from the portion of the expected audio content by greater than the threshold value comprises determining that the deviation value exceeds the threshold value (Paragraph 0018- determination that a user pronunciation “deviates from the target pronunciation by an amount that exceeds a predetermined threshold;” See also Paragraphs 0076-0077 and 0079 for comparator processing to identify the deviation and threshold comparison).
With respect to Claim 3, Jochim further discloses:
The method of claim 1, where comparing the received audio content to the expected audio content via the machine learning algorithm comprises:
extracting a first plurality of features from received audio content the first plurality of features including at least one spectral feature selected from the group consisting of Mel- frequency cepstral coefficients, fundamental frequency, spectral bandwidth, formants, spectral centroid, and spectral contrast (extraction of Mel-frequency cepstral coefficients (MFCCs), Paragraph 0027 and 0075);
extracting a second plurality of features from the expected audio content (extraction of correct/target phonetic data representations, Paragraphs 0075 and 0081); and
inputting the first plurality of features and the second plurality of features into the machine learning algorithm (features are provided to a comparator relying on machine learning, Paragraph 0027, 0029, and 0076).
With respect to Claim 5, Jochim further discloses:
The method of claim 1, wherein the textual content is first textual content, and wherein the method further comprises:
transcribing, via a speech-to-text model, the received audio content into second textual content (speech-to-text (STT) model generates a text-based representation of the user utterance, Paragraph 00075); 
comparing the second textual content to the first textual content to determine a minimum number of operations to transform the second textual content into the first textual content (determination between text-based "transcriptions" to identify the required/minimum number of phonemes (e.g., as a percentage) that need to be corrected to be "acceptable," Paragraphs 0077 and 0081); and 
inputting the minimum number of operations into the machine learning algorithm (the output of the comparator is input into the machine learning algorithm of the system, Paragraph 0027, 0029, and 0082).
With respect to Claim 8, Jochim further discloses:
The method of claim 1, wherein the at least one attribute of the user comprises an accent (user data to customize feedback includes particular pronunciations "native speakers of particular languages" wherein the pronunciations associated with such native speakers account for particular accents, Paragraph 0078).
Claim 9 is directed towards a system embodiment comprising a processor and memory storing processor-executable instructions for performing the process of claim 1, and thus, is rejected under similar rationale.  Moreover, Jochim teaches method implementation using a computer processor and memory storing a program (Paragraph 0056).
Claims 10, 11, and 13 contain subject matter respectively similar to Claims 2, 3, and 5, and thus, are rejected under similar rationale.
Claim 16 is directed towards a non-transitory computer-readable medium storing processor-executable instructions for practicing the method of claim 1, and thus, is rejected under similar rationale.  Jochim also discloses method implementation as a non-transitory computer-readable medium storing a program (Paragraph 0096).
Claims 17, 18, and 20 contain subject matter respectively similar to Claims 2, 3, and 5, and thus, are rejected under similar rationale.

Claims 4, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jochim, et al. in view of Tischer and further in view of Arora, et al. (U.S. PG Publication:  2021/0134277 A1).
With respect to Claim 4, Jochim in view of Tischer teaches the pronunciation feedback method set forth utilizing feature extraction in dependent claim 3.  Jochim further discloses:
 inputting the first plurality of features and the second plurality of features into an automatic speech recognizer (ASR) trained to identify phonemes (utterance is provided to an automatic speech recognizer to generate phonetic data utilizing a plurality of features such as the input sound wave or MFCCs, Paragraphs 0027 and 0075); 
outputting, by the ASR (ASR output as a phonetic representation of a user's pronunciation and a target/correction pronunciation to a comparator, Paragraph 0075 and 0081);
identifying a number of phonemes in the first set of phonemes that are excluded from the second set of phonemes (using the phoneme differences between the user and target phonetic representations to determine missing/mispronounced phonemes in the user pronunciation, Paragraphs 0077 and 0081); and
inputting the number of phonemes into the first machine learning algorithm (the system machine learning algorithm receives the mispronunciation data for processing, Paragraphs 0019, 0027, 0077-0079, 0081-0082, and 0086).
While Jochim teaches an ASR-pronunciation assessment structure and the use of machine learning algorithms, Jochim in view of Tischer fails to particularly recite the claimed 2 machine learning model structure set forth in claim 4 including phonetic decoding.  Arora, however, teaches that a speech recognition module can be implemented as a neural network that receives a plurality of speech features and determines the probability that "various phonemes have occurred in the speech signal" (Paragraphs 0056-0057, 0061-0062, 0088-0089) followed by receiving the speech decoding neural network output at a goodness of pronunciation scoring neural network (Paragraph 0018-0019, 0091, and 0129-130).
Jochim, Tischer and Arora are analogous art because they are from a similar field of endeavor in pronunciation education utilizing phonetic representations.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to utilize multiple machine learning algorithms as taught by Arora for the speech recognition to process user and target pronunciations and pronunciation assessment tasks taught by Jochim in view of Tischer to provide predictable result of having specific/more accurate models that are tailored/fine-tuned to the particular recognition and pronunciation scoring tasks.
Claims 12 and 19 contain subject matter similar to Claim 4, and thus, are rejected under similar rationale.

Claims 6-7 and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Jochim, et al. in view of Tischer and further in view of Naber, et al. (U.S. PG Publication:  2022/0020288 A1).
With respect to Claims 6-7, Jochim in view of Tischer teaches the pronunciation feedback method set forth in independent claim 1.  While Jochim teaches the consideration of a number of types of data pertaining to a user such as their native language and associated pronunciations (Paragraph 0078), Jochim in view of Tischer does not teach that such user data attributes comprise a location (in the case of claim 6) and a spoken dialect (in the case of claim 7).  Naber, however, discloses a system that generates a coaching score for pronunciation (Paragraph 0131) that utilizes user attributes in the form of location (Paragraph 0113) and dialect (Paragraph 0091, 0095, and 0124).
Jochim, Tischer, and Naber are analogous art because they are from a similar field of endeavor in pronunciation education systems.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date to expand the user data taught by Jochim in view of Tischer with the additional demographic information for a user taught by Naber to provide a predictable result of considering additional data characterizing a user to better tailor pronunciation feedback/coaching.
Claims 14 and 15 contain subject matter respectively similar to Claims 6 and 7, and thus, are rejected under similar rationale.

Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655



/JAMES S WOZNIAK/Primary Examiner, Art Unit 2655

Read full office action

Prosecution Timeline

Feb 16, 2024

Application Filed

Sep 18, 2025

Non-Final Rejection — §103, §112

Oct 21, 2025

Interview Requested

Oct 28, 2025

Applicant Interview (Telephonic)

Oct 28, 2025

Examiner Interview Summary

Dec 15, 2025

Response Filed

Feb 19, 2026

Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/399,876

Patent 12597422

SPEAKING PRACTICE SYSTEM WITH RELIABLE PRONUNCIATION EVALUATION

2y 5m to grant Granted Apr 07, 2026

18/488,578

Patent 12586569

Knowledge Distillation with Domain Mismatch For Speech Recognition

2y 5m to grant Granted Mar 24, 2026

18/359,113

Patent 12511476

CONCEPT-CONDITIONED AND PRETRAINED LANGUAGE MODELS BASED ON TIME SERIES TO FREE-FORM TEXT DESCRIPTION GENERATION

2y 5m to grant Granted Dec 30, 2025

18/390,934

Patent 12512100

AUTOMATED SEGMENTATION AND TRANSCRIPTION OF UNLABELED AUDIO SPEECH CORPUS

2y 5m to grant Granted Dec 30, 2025

18/448,628

Patent 12475882

METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION (ASR) USING MULTI-TASK LEARNED (MTL) EMBEDDINGS

2y 5m to grant Granted Nov 18, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

59%

Grant Probability

99%

With Interview (+40.1%)

3y 7m

Median Time to Grant

Moderate

PTA Risk

Based on 385 resolved cases by this examiner. Grant probability derived from career allow rate.