Last updated: April 17, 2026

Application No. 17/253,898

VOICE TRAINING THERAPY APP SYSTEM AND METHOD

Non-Final OA §103

Filed

Dec 18, 2020

Examiner

VANDERVEEN, JEFFREY S

Art Unit

3711

Tech Center

3700 — Mechanical Engineering & Manufacturing

Assignee

unknown

OA Round

5 (Non-Final)

This examiner grants 64% of cases after interview

— +17.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 724 resolved cases, 2023–2026

Examiner Intelligence

VANDERVEEN, JEFFREY S View full profile →

Grants 64% of resolved cases

Career Allow Rate

467 granted / 724 resolved

-5.5% vs TC avg

Strong +17% interview lift

Without

With

+17.1%

Interview Lift

resolved cases with interview

Typical timeline

2y 5m

Avg Prosecution

37 currently pending

Career history

761

Total Applications

across all art units

Statute-Specific Performance

§101

6.3%

-33.7% vs TC avg

§103

53.5%

+13.5% vs TC avg

§102

17.0%

-23.0% vs TC avg

§112

14.8%

-25.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 724 resolved cases

Office Action

§103

DETAILED ACTION
The present application is being examined under the pre-AIA  first to invent provisions. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103(a) are summarized as follows:
1.	Determining the scope and contents of the prior art.
2.	Ascertaining the differences between the prior art and the claims at issue.
3.	Resolving the level of ordinary skill in the pertinent art.
4.	Considering objective evidence present in the application indicating obviousness or nonobviousness.

The  Supreme  Court in KSR International Co. v. Teleflex Inc.,  550  U.S. 398, 82 USPQ2d 1385, 1395-97 (2007) identified a number of rationales to support a conclusion of obviousness which are consistent with the proper “functional approach” to the determination of obviousness as laid down in Graham.  Exemplary rationales that may support a conclusion of obviousness include:
(A)    Combining prior art elements according to known methods to yield predictable results;
(B)    Simple substitution of one known element for another to obtain predictable results;
(C)    Use of known technique to improve similar devices (methods, or products) in the same way;
(D)    Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results;
(E)    “Obvious to try” – choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success;
(F)    Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art;
(G)    Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention.

The notations noted below apply to all rejections: In as much structure set forth by the applicant in the claims, the device is capable of use in the intended manner if so desired (See MPEP 2112). It should be noted that a recitation of the intended use of the claimed invention must result in a structural difference between the claimed invention and the prior art in order to patentably distinguish the claimed invention from the prior art. If the prior art structure is capable of performing the intended use, it meets the claim limitations. In a claim drawn to a process of making, the intended use must result in a manipulative difference as compared to the prior art. See In re Casey, 370 F.2d 576, 152 USPQ 235 (CCPA 1967) and In re Otto, 312 F.2d 937, 939, 136 USPQ 458, 459 (CCPA 1963). The intended use defined in the preamble and body of the claim breathes no life and meaning structurally different than that of the applied reference.
           
Claims 1-3 are rejected under 35 U.S.C. 103 as being unpatentable over Zhong (US 2017/0309154 A1) in view of Jones (US 20120116772 A1).   
           
Regarding claim 1, Zhong teaches 1. A system for speech training over a computer network, comprising: ([0018+])  a server device communicatively connected to the computer network, the server device includes at least a processor and memory; ([0036, 0042, 0046 and 0050+])  a user client device communicatively connected to the computer network, the user client device includes at least a microphone and an analog-to-digital converter; (Fig. 7, [0034+][0054+], (704))  an administrator device communicatively connected to the computer network, the administrator device includes at least a digital-to-analog converter, a speaker and an input device;  ([0044+][0054+])  a database communicatively connected to the server device; ([0036+])  the memory of the server device includes instructions for controlling the server device in: ([0020+])delivering by the server device a digital exercise instruction over the computer network to the user client device, for output by the user client device of an analog audio instruction at a first time;  (See Fig. 1 which shows the feedback loop of 116 to 120 through the server 106.  The use of an analog or digital audio signal is considered a design choice one of ordinary skill in the art would have found obvious at the time of the invention.)  receiving by the server device over the computer network, a digital voice signal representative of an input audio signal to the user client device in response to the analog audio instruction at a second time;  (See Fig. 2, wherein the stimulus 120 is the instruction and the output 116 speech gets sent to the server device 106.  As stated above, the use of a digital or analog signal is a design choice which would have been obvious at the time of the invention.)  storing by the server device in the database the digital voice signal received from the user client device for asynchronous review; ([0005+][0050+])  wherein [0005+] speaks of storing the record in memory.  This implies that the storing aspect allows for asynchronous review., serving over the computer network by the server device, a website portal of the server device to the administrator device at a third time; receiving by the server device over the computer network from the website portal, a second digital exercise instruction input to the administrator device based on an asynchronous analysis of the stored digital voice signal; and repeating the delivering, receiving the digital voice signal, storing, and serving for the second digital exercise instruction.  (See Fig. 1 and [0036+]; the diagram of Fig. 1 clearly shows the use of a server 106 (website are provided on servers through a network 104) to deliver stimulus 120 and in response deliver speech data 116 back to the server.  The use of a website portal with server would be obvious, Fig. 4 already shows a computer which would access such web portal on the server.) wherein [0005+] speaks of storing the record in memory.  This implies that the storing aspect allows for asynchronous review.Additionally, while features of an apparatus may be recited either structurally or functionally, claims directed to an apparatus must be distinguished from the prior art in terms of structure rather than function. In re Schreiber, 128 F.3d 1473, 1477-78, 44 USPQ2d 1429, 1431-32 (Fed. Cir. 1997)  The claimed limitations above are directed towards a system not a method.  The elements claimed are "for" a purpose which considered functional language.  The prior art must be capable of achieving these functionalities to read on the prior art.  A processor should be configured to perform certain functions to have the function be given patentable weight.  If the processor is not configured to perform these functions, then any processor is capable or achieve such functioning.
Jones does teach what the primary reference could be considered silent on including receiving by the server device over the computer network from the website portal, a second digital exercise instruction input to the administrator device based on an asynchronous analysis of the stored digital voice signal at a later, non-overlapping and distinct time than the serving the website portal; and See [0030+][0066+] [0030+] speaks of processing the data on a website which can be reviewed by a speech pathologist to provide feedback.  [0066+] clearly defines the non-real time aspects.
It would have been obvious to one of ordinary skill in the art, at the date of the effective filing, to modify Zhong with Jones to provide a speech language pathologist with capabilities to receive data and provide speech therapy feedback. (See [0030+]).
           
Regarding claim 2, Zhong teaches 2. A method of providing voice training to a client over a communications network, comprising: ([0021+])  delivering by a server device communicatively connected to the communications network, a digital exercise instruction to a user client device communicatively connected to the communications network at a first time; ([0050+])  receiving over the communications network by the server device from the user client device a digital voice signal representing an analog voice signal input to the user client device by the client at a second time; ([0037+])  storing the digital voice signal in a database communicatively connected to the server device for asynchronous review; ([0005+][0050+])  delivering a website to an administrator device communicatively connected to the communications network at a thrid time; and ([0054+])  providing access via the communications network to the digital voice signal to the administrator device by the website for output as an audible response signal; and ([0037+])  repeating receiving, storing, delivering and providing access for a second digital exercise instruction responsive to the audible response signal (Fig. 1; [0036+]; wherein the speech data 116 and stimulus 120 are repeatedly output sent to the server 106 and sent back to the user device through stimulus 120.  As noted above for claim 1, Fig. 4 shows the use of a computer to access the web portal provided through the network on server 106.  The use of a web portal is common place when utilizing a network and server.)  based on an asynchronous analysis of the stored digital voice signal.  ([0005+])  wherein [0005+] speaks of storing the record in memory.  This implies that the storing aspect allows for asynchronous review.
Jones does teach what the primary reference could be considered silent on including asynchronous analysis of the stored digital voice signal at a later, non-overlapping and distinct time than the delivering the website. See [0030+][0066+] [0030+] speaks of processing the data on a website which can be reviewed by a speech pathologist to provide feedback.  [0066+] clearly defines the non-real time aspects.
It would have been obvious to one of ordinary skill in the art, at the date of the effective filing, to modify Zhong with Jones to provide a speech language pathologist with capabilities to receive data and provide speech therapy feedback. (See [0030+]).
           
Regarding claim 3, Zhong teaches 3. A computer readable non-transitory medium, comprising instructions for: ([0035+])  delivering over a computer network a digital exercise instruction to a user client device of a patient communicatively connected to the computer network at a first time; ([0036+])  receiving over the computer network from the user client device a digital voice signal representing an analog voice signal input to the user client device by the patient in response to an audible analog output of the digital exercise instruction at a second time; ([0037+], Fig. 1 which shows the output analog speech data 116 being sent to the server 106.  The use of analog or digital signals is considered an obvious matter of design choice which one of ordinary skill in the art would have found obvious at the time of the invention.)  storing the digital voice signal in a database ([0050+]) for asynchronous review ([0005+])  delivering over the computer network a website to an administrator device communicatively connected to the computer network at a thrid time; ([0054+]) providing access over the computer network to the digital voice signal to the administrator device by the website for output as an audible response signal ([0037+]) receiving input of a second digital exercise instruction from the website of the administrator device responsive to the audible response signal based on an asynchronous analysis of the stored digital voice signal; and repeating the delivering the digital exercise instruction, receiving, storing, and delivering the website for the second digital exercise instruction.  (See Fig. 1 and [0036+]; the diagram of Fig. 1 clearly shows the use of a server 106 (website are provided on servers through a network 104) to deliver stimulus 120 and in response deliver speech data 116 back to the server.  The use of a website portal with server would be obvious, Fig. 4 already shows a computer which would access such web portal on the server.  Also, see [0005+] which as previously mentioned speaks of storing the record in memory.  This implies that the storing aspect allows for asynchronous review.)  Additionally, while features of an apparatus may be recited either structurally or functionally, claims directed to an apparatus must be distinguished from the prior art in terms of structure rather than function. In re Schreiber, 128 F.3d 1473, 1477-78, 44 USPQ2d 1429, 1431-32 (Fed. Cir. 1997)  The claimed limitations above are directed towards a system not a method.  The elements claimed are "for" a purpose which considered functional language.  The prior art must be capable of achieving these functionalities to read on the prior art.  A processor should be configured to perform certain functions to have the function be given patentable weight.  If the processor is not configured to perform these functions then any processor is capable or achieve such functioning.
Jones does teach what the primary reference could be considered silent on including receiving input of a second digital exercise instruction from the website of the administrator device responsive to the audible response signal based on an asynchronous analysis of the stored digital voice signal at a later, non-overlapping and distinct time than the delivering the website; and See [0030+][0066+] [0030+] speaks of processing the data on a website which can be reviewed by a speech pathologist to provide feedback.  [0066+] clearly defines the non-real time aspects.
It would have been obvious to one of ordinary skill in the art, at the date of the effective filing, to modify Zhong with Jones to provide a speech language pathologist with capabilities to receive data and provide speech therapy feedback. (See [0030+]).
           
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Zhong (US 2017/0309154 A1) in view of Jones (US 20120116772 A1) and Wolfson (US 2015/0039303 A1).
           
Regarding claim 4, Zhong teaches 4. A system for voice training over a communications network, comprising: ([0018+][0036+])  a processor communicatively connected to the communications network; ([0038+])  memory communicatively connected to the processor; ([0042+])  an output device communicatively connected to the processor for delivering a series of voice exercise instructions over time; ([0044+])a microphone communicatively connected to the processor for receiving analog audio voice signals responsive to the voice exercise instructions at different times; ([0054+])the processor analyzing the stored digital voice signals to track progress of the voice training over time.  ([0005+]); wherein [0005+] speaks of storing the record in memory.  This implies that the storing aspect allows for asynchronous review.
Wolfson does teach what the primary reference is silent on including a transducer communicatively connected to the microphone and the processor for converting the analog audio voice signals to analog electrical voice signals; ([0045+])  an analog-to-digital converter communicatively connected to the transducer and the processor for converting the analog electrical voice signals to digital voice signals;  ([0037+, 0094+,  0158+, 0161+])  delivering the digital voice signals over the communications network for storage in the memory and asynchronous review; and  (See Fig. 1 and note network 104 which communicates the speech data and sends back the stimulus instructions to the user utilizing the server 106.  The use of analog or digital signals is considered an obvious matter of design choice which one of ordinary skill in the art would have found obvious at the time of the invention.)  Additionally, while features of an apparatus may be recited either structurally or functionally, claims directed to an apparatus must be distinguished from the prior art in terms of structure rather than function. In re Schreiber, 128 F.3d 1473, 1477-78, 44 USPQ2d 1429, 1431-32 (Fed. Cir. 1997)  The claimed limitations above are directed towards a system not a method.  The elements claimed are "for" a purpose which considered functional language.  The prior art must be capabale of achieving these functionalities to read on the prior art.  A processor should be configured to perform certain functions to have the function be given patentable weight.  If the processor is not configured to perform these functions, then any processor is capable or achieve such functioning.
Jones does teach what the primary reference could be considered silent on including delivering the digital voice signals over the communications network for storage in the memory and asynchronous review at a later, non-overlapping and distinct time than the delivering the digital voice signals; and See [0030+][0066+] [0030+] speaks of processing the data on a website which can be reviewed by a speech pathologist to provide feedback.  [0066+] clearly defines the non-real time aspects.
It would have been obvious to one of ordinary skill in the art, at the date of the effective filing, to modify Zhong with Wolfson to provide sufficient accuracy in the digitization of a speech signal for reliable speech recognition. ([0006+]). It would have been obvious to one of ordinary skill in the art, at the date of the effective filing, to modify Zhong with Jones to provide a speech language pathologist with capabilities to receive data and provide speech therapy feedback. (See [0030+]).
           
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Zhong (US 2017/0309154 A1) in view of Jones (US 20120116772 A1) and Rot (US 2016/0189566 A1).
           
Regarding claim 5, Zhong teaches 5. A system for voice training over a communications network, comprising: ([0018+]) a processor communicatively connected to the communications network; ([0038+]) memory communicatively connected to the processor; ([0042+]) an input device communicatively connected to the processor for providing a series of voice exercise instructions over time; ([0044+]; Fig. 1 noting the network 104 and server 106 which when utilizing the computer of Fig. 4 may include a web server with a website to be accesses by a web browser.  The use of a web server with a network and server is commonly known in the arts.  The audio input circuity 108 includes an input device to be able to decipher the speech 116 input.  The uploading of data may be over time.)  a digital-to-analog converter for converting digital voice signals, responsive to the above exercise instructions, to analog voice signals; and ([0039+], as noted above, the server is implied to include a web server for the computer of Fig. 4 to access through which the stimulus 120 would be output.  The outputting may be to a website.) a speaker communicatively connected to the processor and the digital-to-analog converter for outputting analog audio voice signals in respect of the digital voice signals; and ([0039+,0044+], As noted above, the server of 106 would be accesses by the computer of Fig. 4 connecting the speaker to the web browser.).      
Rot does teach what the primary reference is silent on including the processor asynchronously accessing and playing back the stored digital voice signals from multiple training sessions and comparing the stored digital voice signals from different time to track progress of the voice training over time. See [0041+] which speaks of the progress feed for different time periods.  This implies they different sessors are stored and compared to determine progress.  The feeds serve to provide indication of progress to the user. Additionally, while features of an apparatus may be recited either structurally or functionally, claims directed to an apparatus must be distinguished from the prior art in terms of structure rather than function. In re Schreiber, 128 F.3d 1473, 1477-78, 44 USPQ2d 1429, 1431-32 (Fed. Cir. 1997)  The claimed limitations above are directed towards a system not a method.  The elements claimed are "for" a purpose which considered functional language.  The prior art must be capabale of achieving these functionalities to read on the prior art.  A processor should be configured to perform certain functions to have the function be given patentable weight.  If the processor is not configured to perform these functions, then any processor is capable or achieve such functioning.       
Jones does teach what the primary reference could be considered silent on including the processor asynchronously accessing at a later, non-overlapping and distinct time and playing back the stored digital voice signals from multiple training sessions; See [0030+][0066+] [0030+] speaks of processing the data on a website which can be reviewed by a speech pathologist to provide feedback.  [0066+] clearly defines the non-real time aspects.
It would have been obvious to one of ordinary skill in the art, at the date of the effective filing, to modify Zhong with Rot to provide progress feedback to the user based on previously generated feedbacks. It would have been obvious to one of ordinary skill in the art, at the date of the effective filing, to modify Zhong with Jones to provide a speech language pathologist with capabilities to receive data and provide speech therapy feedback. (See [0030+]).

Response to Arguments
The applicant argues that Zhong teaches immediate stimulus to a user based on alert signals and real-time analysis of speech.  The applicant argues that this cannot read on the claimed invention that being the temporal separation among three distinct phases as outlined on page 4 of the applicant’s response.  However, the Jones reference as cited clearly shows the ability for a temporal difference between the different steps as claimed.  The Zhong reference does not have a teaching away to confirm that such a temporal change would be taught away.  As such, the examiner is not persuaded by the applicants’ arguments.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEFFREY S VANDERVEEN whose telephone number is (571)270-0503. The examiner can normally be reached Monday - Friday 11am - 7pm CST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Eugene L Kim can be reached at (571) 272-4463. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JEFFREY S VANDERVEEN/Examiner, Art Unit 3711

Read full office action

Prosecution Timeline

Dec 18, 2020

Application Filed

Dec 20, 2023

Non-Final Rejection — §103

Jun 27, 2024

Response Filed

Aug 12, 2024

Final Rejection — §103

Dec 16, 2024