Last updated: April 17, 2026

Application No. 18/397,908

ARTIFICIAL SPEECH PERCEPTION PROCESSING SYSTEM

Non-Final OA §103

Filed

Dec 27, 2023

Examiner

RILEY, MARCUS T

Art Unit

2654

Tech Center

2600 — Communications

Assignee

unknown

OA Round

1 (Non-Final)

Interview Optional

— +15.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 675 resolved cases, 2023–2026

Examiner Intelligence

RILEY, MARCUS T View full profile →

Grants 76% — above average

Career Allow Rate

514 granted / 675 resolved

+14.1% vs TC avg

Strong +16% interview lift

Without

With

+15.7%

Interview Lift

resolved cases with interview

Typical timeline

2y 10m

Avg Prosecution

14 currently pending

Career history

689

Total Applications

across all art units

Statute-Specific Performance

§101

14.7%

-25.3% vs TC avg

§103

60.2%

+20.2% vs TC avg

§102

17.1%

-22.9% vs TC avg

§112

6.6%

-33.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 675 resolved cases

Office Action

§103

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment
This office action is responsive to applicant’s remarks received on December 23, 2025. Claims 1-34 remain pending.

Response to Arguments
Applicant’s arguments with respect to “Requirement for Restriction/Election”, filed on December 23, 2025 have been fully considered but they are not persuasive.

A:  Applicant’s Remarks
For applicant’s remarks “See Applicant Arguments/Remarks Made in an Amendment” filed on December 23, 2025.

A:  Examiner’s Response
Applicant argues that the restriction has not been properly established and the restriction fails to disclose distinct species or embodiments. 
Examiner understands Applicant’s arguments but respectfully disagree. Under the statute, the claims of an application may properly be required to be restricted to one of two or more claimed inventions only if they are able to support separate patents and they are either independent (MPEP § 802.01, § 806.06, and § 808.01) or distinct (MPEP § 806.05 - § 806.05(j)). 
With reference to the claims, Embodiments I & II are directed to an artificial speech perception processing system and a waveform computer-encoding engine. Embodiment III is directed to an activity map delineates intelligent agent. Embodiment IV is directed to an example operation of association deliberation, planning, analysis, verification of utterance harvesting process questionnaires. Embodiment V is directed to a musical performance in intelligent agent real time application. Embodiment VI is directed to an observed temporal cognition of substitution principle. With reference to the claims, Claims 1-11 are directed to an artificial method, of integration in administration, in cataloging, and in artificial creation of an open un-linked referent code & metadata as final record installed base, of heuristics origin, an instantiated artificial language source, mimicking linguistic speech perception process of a human brain. Claims 12-19 are directed to a waveform computer-encoding engine, comprising an engine configured to produce procedural artificial automatic waveform computer-encoding of acoustic speech. Claims 20-28 are directed to an artificial utterance harvesting process product for artificial speech perception processing. Claims 29-34 are directed to an artificial speech perception processing system; comprising: a waveform computer-encoding engine. As a result, Claims 12-29 and 29-34 are directed to Embodiments I & II. Claims 1-11 & 20-28 are directed to Embodiments III & IV.  With this said, the claims of the application are properly restricted because they are independent and able to support separate patents. 
As a result, claims, Claims 29-34 are rejected under 35 U.S.C. 103 and Claims 12-19 are allowed. Claims 13-19 depend on indicated allowable claim 12. Therefore, by virtue of their dependency, Claims 13-19 are also indicated as allowable subject matter. Accordingly, it is respectfully submitted that the present application, as a whole, is not in condition for allowance.


Claim Objections
1.	Claim 12, 14-17, 31 & 32 are objected to because of the following informalities: 
	a.	Claim 12 states in part “artificial automation, servo synchronized comparators switch-on respective candidate bandpass filter from respective bank”
This appears to be a typographical error wherein the Applicant intends this to read “artificial automation, servo synchronized comparators switch-on respective candidate bandpass filter from a respective bank.”
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes the “a” has been added.

	b.	Claim 12 states in part “code & metadata”.
This appears to be a typographical error wherein the Applicant intends this to read “code and metadata.”
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes the “&” has been has been deleted and replaced with “and”.

c.	Claim 14 states in part “in unique computer”.
This appears to be a typographical error wherein the Applicant intends this to read “in a unique computer”
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes the “a” has been added.

d.	Claim 15 states in part “via proprietary automatic artificial speech perception processing computer”.
This appears to be a typographical error wherein the Applicant intends this to read “via a proprietary automatic artificial speech perception processing computer”.
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes the “a” has been added.

e.	Claim 16 states in part “code & metadata”.
This appears to be a typographical error wherein the Applicant intends this to read “code and metadata.”
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes the “&” has been has been deleted and replaced with “and”.

f.	Claim 17 states in part “metadata attributes in artificial utterance harvesting process forms; contributed operational metadata”.
This appears to be a typographical error wherein the Applicant intends this to read “metadata attributes in artificial utterance harvesting process forms; and contributed operational metadata”
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes Examiner assumes the “and” has been added.

f.	Claim 31 states in part “the installed base is coupled”.
This appears to be a typographical error wherein the Applicant intends this to read “the installed base of utterances is coupled”
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes Examiner assumes the “of utterances” has been added.

e.	Claim 32 states in part “hearing aids, and assistive devices, and intelligent agent”.
This appears to be a typographical error wherein the Applicant intends this to read “hearing aids, assistive devices, intelligent agent”
For continued examination purposes and in the best interests of compact prosecution, Examiner assumes the “and” has been has been deleted.
Appropriate corrections are required.


Claim Rejections – 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 29-34 are rejected under 35 U.S.C. 103 as being unpatentable over Chicote et al. (US 20210097976 A1 hereinafter, Chicote ‘976) in view of Barra Chicote et al. (US 20200365137 A1 hereinafter, Barra Chicote ‘137).
	Regarding claim 29; Chicote ‘976 discloses an artificial speech perception processing system (Fig. 2, TTS Component 295 i.e. Fig. 2 illustrates components of a system for performing TTS processing. Paragraph 0004); 
comprising:
 a waveform computer-encoding engine (Fig. 2, Speech Synthesis Engine(s) 218) configured to generate referent code and metadata from inputted speech (i.e. A speech model may be trained to generate audio output waveforms given input data representing speech, such as text data. The encoded context data 124 may represent an encoding of the text data as well as an encoding of pronunciation metadata, such as syllable emphasis data or language-accent data. The TTS front end may also process context data 215, such as text tags or text metadata, that may indicate, for example, how specific words should be pronounced. Paragraphs 0016 & 0023-0028) 
an association including an installed base of utterances (i.e. Each speech unit database (e.g., voice inventory) includes recorded speech utterances with the utterances' corresponding text aligned to the utterances. A speech unit database may include many hours of recorded speech (in the form of audio waveforms, feature vectors, or other formats), which may occupy a significant amount of storage. Paragraph 0040)
and an utterance harvest process (UHP) configured to harvest the referent code and metadata, wherein the UHP is connected to the association and the referent code and metadata is tested against the installed base (i.e. Each speech unit database (e.g., voice inventory) includes recorded speech utterances with the utterances' corresponding text aligned to the utterances. A speech unit database may include many hours of recorded speech (in the form of audio waveforms, feature vectors, or other formats), which may occupy a significant amount of storage. The unit samples in the speech unit database may be classified in a variety of ways including by phonetic unit (phoneme, diphone, word, etc.), linguistic prosodic label, acoustic feature sequence, speaker identity, etc. The sample utterances may be used to create mathematical models corresponding to desired audio output for particular speech units. When matching a symbolic linguistic representation the speech synthesis engine 218 may attempt to select a unit in the speech unit database that most closely matches the input text (including both phonetic units and prosodic annotations). Paragraph 0040)
Examiner reasonably believes that Chicote ‘976 discloses the concept of generating waveforms from incoming speech as expressed above. However, Examiner cites Barra Chicote ‘137 to cure any deficiencies of Chicote ‘976.
Barra Chicote ‘137 discloses the concept of generating waveforms from incoming speech (i.e. The output of the TTS front end 216, which may be referred to as a symbolic linguistic representation, may include a sequence of phonetic units annotated with prosodic characteristics. This symbolic linguistic representation may be sent to the speech synthesis engine 218, which may also be known as a synthesizer, for conversion into an audio waveform of speech for output to an audio output device and eventually to a user. Paragraph 0028).
Chicote ‘976 and Barra Chicote ‘137 are combinable because they are from same field of endeavor of speech systems (Barra Chicote ‘137 at “Background”). 
	Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to modify the speech system as taught by Chicote ‘976 by adding the concept of generating waveforms from incoming speech as taught by Barra Chicote ‘137. The motivation for doing so would have been advantageous so that humans can better interact with and control computing devices by voice. Therefore, it would have been obvious to combine Chicote ‘976 with Barra Chicote ‘137 to obtain the invention as specified.

Regarding claim 30; Chicote ‘976 discloses wherein the association includes a natural language processing engine for entity identification, cataloging, and target referencing (i.e. The NLU component 404 may perform NLU processing on the text data to generate NLU results data. Part of this NLU processing may include entity resolution processing, whereby an entity, represented in the text data, is processed to corresponding to an entity known to the natural language processing system 720. Paragraph 0094)

Regarding claim 31; Chicote ‘976 discloses wherein the installed base is coupled to one or more embedded systems for application of the referent code and metadata of the waveform computer-encoding engine (i.e. Each speech unit database (e.g., voice inventory) includes recorded speech utterances with the utterances' corresponding text aligned to the utterances. A speech unit database may include many hours of recorded speech (in the form of audio waveforms, feature vectors, or other formats), which may occupy a significant amount of storage. The unit samples in the speech unit database may be classified in a variety of ways including by phonetic unit (phoneme, diphone, word, etc.), linguistic prosodic label, acoustic feature sequence, speaker identity, etc. The sample utterances may be used to create mathematical models corresponding to desired audio output for particular speech units. When matching a symbolic linguistic representation the speech synthesis engine 218 may attempt to select a unit in the speech unit database that most closely matches the input text (including both phonetic units and prosodic annotations). Paragraph 0040)

Regarding claim 32; Chicote ‘976 discloses wherein the one or more embedded systems comprise proprietary hearing sciences, including bionics, hearing aids, and assistive devices, and intelligent agent, including assistive device technologies, assistive application software cybernetics and biometrics fields, and information communication technology (i.e. The user recognition component 1095 may additionally or alternatively perform user recognition by comparing biometric data (e.g., fingerprint data, iris data, etc.), received by the natural language processing system 720 in correlation with a user input, to stored biometric data of users. Paragraph 0084)

Regarding claim 33; Chicote ‘976 discloses wherein the waveform computer-encoding engine is further configured to produce artificial substitution code for a bio digital twin of speech perception (i.e. In one method of synthesis called unit selection, the TTS component 295 matches text data against a database of recorded speech. The TTS component 295 selects matching units of recorded speech and concatenates the units together to form audio data. In another method of synthesis called parametric synthesis, the TTS component 295 varies parameters such as frequency, volume, and noise to create audio data including an artificial speech waveform. Paragraph 0083)

Regarding claim 34; Chicote ‘976 discloses wherein the waveform computer-encoding engine is configured to operate machine learning algorithms and natural language programming platforms (i.e. TTS and speech recognition, combined with natural language understanding processing techniques, enable speech-based user control and output of a computing device to perform tasks based on the user's spoken commands. The combination of speech recognition and natural-language understanding processing is referred to herein as speech processing. Paragraph 0001).


Allowable Subject Matter
1.	Claims 12-19 are allowed.
2.	Claims 13-19 depend on indicated allowable claim 12. Therefore, by virtue of their dependency, Claims 13-19 are also indicated as allowable subject matter. 


Examiners Statement of Reasons for Allowance
The cited reference (Chicote ‘976) teaches wherein during text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.
The cited reference (Barra Chicote ‘137) teaches wherein a speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.
The cited references fail to disclose a waveform computer-encoding engine, comprising an engine configured to produce procedural artificial automatic waveform computer-encoding of acoustic speech, the engine produces artificial substitution code for bio digital twin simulation of speech perception; and an artificial processing that comprises: automation, op amp amplifies incoming signal speech from microphone via mobile device, cell phone, information communication technology system, and the like; automation, signal channels then are duplicated; automation, each channel op amp operates as singularly-dedicated for each bandpass filter bank; automation, each or any array or arrays of banks of bandpass filters simultaneously receive signal from dedicated op amps; artificial automation, servo merged metadata and bio digital twin siren comb segments time-code words, via SMPTE, MIDI codes, or other temporal coding; artificial automation, servo synchronized comparators switch-on respective candidate bandpass filter from respective bank; and artificial automation, switch-on identifies artificial substitution address codes and time-stamp stored in CMOS memory as artificial referent code & metadata. As a result, and for these reasons, Examiner indicates Claims 12-19 as allowable subject matter. 


Relevant Prior Art References Not Relied Upon
1.	Rafii (US 10,997,970 B1) - A hearing aid system presents a hearing impaired user with customized enhanced intelligibility sound in a preferred language. The system includes a model trained with a set of source speech data representing sampling from a speech population relevant to the user. The model is also trained with a set of corresponding alternative articulation of source data, pre-defined or algorithmically constructed during an interactive session with the user. The model creates a set of selected target speech training data from the set of alternative articulation data that is preferred by the user as being satisfactorily intelligible and clear. The system includes a machine learning model, trained to shift incoming source speech data to a preferred variant of the target data that the hearing aid system presents to the user.

2.	Kaszczuk et al. US (9,484,014 B1) - In a text-to-speech (TTS) system, a database including sample speech units for unit selection may be include both units represented by sample audio segments as well as parametric representations of units created by Hidden Markov Models (HMMs). Inclusion of parametric representations in the database may reduce the storage necessary to maintain the database. The parametric representations may be configured to match a voice of the audio segments. The parametric representations may correspond to phonetic units that are less frequently encountered in TTS processing, such as rare diphones or phonemes corresponding to foreign languages. Multiple foreign language HMM models may be used to enable polyglot synthesis with a reduction in storage capacity requirements. Parametrically stored speech units may be combined with speech segments generated during processing time by a parametric model.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARCUS T. RILEY, ESQ. whose telephone number is (571)270-1581. The examiner can normally be reached 9-5 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MARCUS T. RILEY, ESQ.
Primary Examiner
Art Unit 2654



/MARCUS T RILEY/Primary Examiner, Art Unit 2654

Read full office action

Prosecution Timeline

Dec 27, 2023

Application Filed

Feb 06, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/244,714

Patent 12603093

ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF

2y 5m to grant Granted Apr 14, 2026

18/478,933

Patent 12585871

NARRATIVE GENERATION FOR SITUATION EVENT GRAPHS

2y 5m to grant Granted Mar 24, 2026

18/747,641

Patent 12585885

DIALOGUE MODEL TRAINING METHOD

2y 5m to grant Granted Mar 24, 2026

18/442,375

Patent 12573404

ELECTRONIC DEVICE AND METHOD OF OPERATING THE SAME

2y 5m to grant Granted Mar 10, 2026

18/357,803

Patent 12567418

SYNCHRONOUS AUDIO AND TEXT GENERATION

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

76%

Grant Probability

92%

With Interview (+15.7%)

2y 10m

Median Time to Grant

Low

PTA Risk

Based on 675 resolved cases by this examiner. Grant probability derived from career allow rate.