Last updated: April 19, 2026

Application No. 18/426,728

METHOD AND DEVICE FOR SYNTHESIZING SPEECH WITH MODIFIED UTTERANCE FEATURES

Non-Final OA §101§102

Filed

Jan 30, 2024

Examiner

SHARMA, NEERAJ

Art Unit

2659

Tech Center

2600 — Communications

Assignee

Xinapse Co. Ltd.

OA Round

1 (Non-Final)

Interview Optional

— +11.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 457 resolved cases, 2023–2026

Examiner Intelligence

SHARMA, NEERAJ View full profile →

Grants 85% — above average

Career Allow Rate

387 granted / 457 resolved

+22.7% vs TC avg

Moderate +12% lift

Without

With

+11.5%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

19 currently pending

Career history

476

Total Applications

across all art units

Statute-Specific Performance

§101

13.9%

-26.1% vs TC avg

§103

39.5%

-0.5% vs TC avg

§102

28.7%

-11.3% vs TC avg

§112

6.4%

-33.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 457 resolved cases

Office Action

§101 §102

DETAILED ACTION

Introduction

1.	This office action is in response to Applicant's submission filed on 01/30/2024. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-10 are currently pending and examined below. 

Drawings

2.	The drawings filed on 01/30/2024 have been accepted and considered by the Examiner. 

Priority

3.	The Applicants priority to Korean Patent Application # 10-2023-0078910, filed on Jun.20, 2023, Korean Patent Application # 10-2023-0078911, filed on Jun.20, 2023 and Korean Patent Application # 10-2023-0078912, filed on Jun.20, 2023, has been accepted and considered in this office action. 

Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

4.	Claims 1-10 are rejected under 35 U.S.C. 101 as being directed to an abstract idea. Regarding claim 1, it is directed to the abstract idea of mathematical and algorithmic operations on data — specifically, generating an embedding (a data representation), transforming the embedding by dimensionality reduction and restoration (mathematical transforms), and adjusting a component of a vector based on user input (data manipulation and user-directed parameter change). These operations are essentially mathematical processes and data processing that constitute an abstract idea under relevant precedent. The claim does not recite an improvement to a technical device or a specific improvement to the functioning of a computer or other technology. The recited steps are generic data transformation operations applied to speech-related data rather than a specific technical solution to a technical problem. Accordingly, claim 1 is directed to an abstract idea. The claim elements, viewed individually and as an ordered combination, do not recite additional features sufficient to transform the abstract idea into a patent-eligible application. The claim recites generic steps — “generating,” “reducing dimensionality by using a predetermined dimensionality reduction technique,” “adjusting a component value,” and “restoring dimensionality” — without providing non-generic limitations, unconventional components, or particularized implementation details that provide a technical improvement. The claim does not recite a specific, non-generic dimensionality reduction technique or a particularized implementation detail that imposes meaningful limits on how the reduction and restoration are performed (e.g., a specific algorithmic architecture with defined structural elements and parameters that are not routine or conventional), or even a particular hardware arrangement or specialized hardware that effects the transformation in a non-conventional, non-generic way, or c. an improvement to the operation of the speech synthesis system itself (e.g., demonstrable reduction in latency by a stated amount, improved perceptual quality measured by objective metrics, reduced computational complexity quantified with bounds, or another specified technological benefit). The recitation of a “user input” and the general purpose of “synthesizing speech with modified utterance features” are functional results and do not themselves supply a technical inventive concept. The use of user input to adjust a parameter is a conventional and well-known mechanism for controlling algorithmic behavior and does not convert the abstract idea into patent-eligible subject matter

Claims 2-8, only provide certain details of the mathematical algorithm, calculations and processes recited in claim 1 and hence also do not amount to significantly more than the judicial exception. Claim 9, is a device claim corresponding to method claim 1 and hence are also rejected at least for the reasons outlined above. Claim 10, is a computer readable medium claim corresponding to method claim 1 and hence is also rejected at least for the reasons outlined above.  

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) The claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

5.	Claims 1-10 are rejected under 35 U.S.C. 102 (a) (1) as being anticipated by Kim (U.S. Patent Application Publication # 2021/0174782 A1).

With regards to claim 1, Kim teaches a method of synthesizing speech with modified utterance features, the method comprising generating an initial embedding vector based on predetermined utterance information (Figures 1-14, teach an artificial intelligence-based method for synthesizing speech by controlling speech style comprising a memory and processor. Paragraphs 181-182, teach acquiring a condition vector from the audio data. The condition vector may be acquired by the encoder and the attention mechanism based on the vector and the prosody vector for the audio data);

generating a low-dimensional embedding vector by reducing dimensionality of the initial embedding vector by using a predetermined dimensionality reduction technique (Para 185, teaches reducing the condition vector to the predetermined reduction dimension by applying a Principal Component Analysis or PCA algorithm to the condition vector); 

adjusting a component value ​​of the low-dimensional embedding vector based on a user input (Paragraphs 215-216, teach acquiring a dimension-reduced condition vector from the sparse code vector whose vector element value is changed); 

and generating a modified embedding vector by restoring dimensionality of the low-dimensional embedding vector of which the component value is adjusted (Paragraphs 215-216, teach acquiring the condition vector in which the condition for determining the speech style is changed by extending the dimension of the condition vector having the predetermined dimension. The processor can acquire the condition vector having the original dimension by expanding the dimension using an inverse transform of the PCA algorithm). 

With regards to claim 2, Kim teaches the method of claim 1, wherein the predetermined dimensionality reduction technique is based on principal component analysis (PCA) using component values ​​of the initial embedding vector (Para 185, teaches reducing the condition vector to the predetermined reduction dimension by applying a Principal Component Analysis or PCA algorithm to the condition vector).

With regards to claim 3, Kim teaches the method of claim 1, wherein the generating of the low-dimensional embedding vector comprises generating a first reduced vector by reducing the dimensionality of the initial embedding vector (Para 186, teaches that the PCA provides the function of reducing the dimension by discarding the eigen vector of the dimension with minimal variance in the appropriate line when sorted by the order of the eigen values);

and generating a second reduced vector having fewer dimensions than the first reduced vector, by reducing dimensionality of the first reduced vector (Paragraphs 187-188, teach setting a reduction dimension with a low loss rate and acquiring a sparse code vector based on a dictionary vector acquired through sparse dictionary coding with respect to the condition vector having the predetermined reduction dimension). 

With regards to claim 4, Kim teaches the method of claim 1, wherein the adjusting of the component value comprises adjusting the component value ​​of the low-dimensional embedding vector by generating a first interface for displaying the component value ​​of the low-dimensional embedding vector and receiving the user input through the first interface (Paragraphs 65-80, teach user input interface including camera, microphone and various types of displays, which are used to acquire intention information for the user input and may determine the user's requirements based on the acquired intention information. Paragraphs 215-216, teach acquiring a dimension-reduced condition vector from the sparse code vector whose vector element value is changed. Para 214, teaches that sparse code vectors for angry or sad emotions or any other emotion can be acquired. These vectors can be acquired and/or modified based on user’s intention and/or requirement as in case of a self-driving car outlined in para 122). 

With regards to claim 5, Kim teaches the method of claim 1, wherein the adjusting of the component value comprises adjusting the component value ​​of the low-dimensional embedding vector by mapping at least one utterance feature extracted from the predetermined utterance information, to the component value ​​of the low-dimensional embedding vector (Paragraphs 177-184, teach acquiring and storing audio data having a predetermined speech style followed by reducing the dimension of the condition vector to a predetermined reduction dimension. For example, when the condition vector is a 40-dimensional multi-dimensional vector, the condition vector may be reduced to a dimension of 20, which is a preset reduction dimension. Para 67, teaches that the input interface may acquire raw input data. In this case, the processor or the learning processor may extract an input feature by preprocessing the input data); 

and generating a second interface for displaying the mapped at least one utterance feature, and receiving the user input for adjusting the at least one utterance feature through the second interface (Paragraphs 174-175, teach a condition wherein when there are more than two types of speech styles differentiated from the prosody vector, the dimension of the condition vector increases. If the condition vector that is the condition for determining the speech style is a five-dimensional vector and one speech style weight is adjusted in ten steps, this becomes an impossible to adjust combination of condition vectors. Therefore, the processor needs to reduce the condition vector element value required to be changed by applying sparse coding that lowers the number of dimensions of the multi-dimensional condition vector and lowers the inter-dimensional dependency of the condition vector. Paragraphs 65-80, teach user input interface including camera, microphone and various types of displays, which are used to acquire intention information for the user input and may determine the user's requirements based on the acquired intention information. Vectors can be acquired and/or modified based on user’s intention and/or requirement as in case of a self-driving car outlined in para 122). 

With regards to claim 6, Kim teaches the method of claim 1, wherein the adjusting of the component value comprises adjusting the component value ​​of the low-dimensional embedding vector to be between a first threshold value and a second threshold value, the first threshold value and the second threshold value being inclusive, based on the user input, and the first threshold value is less than the second threshold value (Para 198, teaches acquiring a dictionary vector D and a sparse representation coefficient vector A through a sparse coding algorithm. The processor may use a Least Absolute Shrinkage and Selection Operator or LASSO algorithm. The LASSO algorithm may apply L1-norm cost so that the inner product of the dictionary matrix D and the sparse representation coefficient vector A minimizes the difference from the data Y, and the sparsity of the sparse representation coefficient vector is maximized by additional limitation).

With regards to claim 7, Kim teaches the method of claim 1, wherein the predetermined dimensionality reduction technique comprises a technique for dimensionality restoration using an inverse operation, and the generating of the modified embedding vector comprises generating the modified embedding vector by performing an inverse operation of an operation of generating the low-dimensional embedding vector by using the predetermined dimensionality reduction technique (Paragraphs 215-216, teach acquiring the condition vector in which the condition for determining the speech style is changed by extending the dimension of the condition vector having the predetermined dimension. The processor can acquire the condition vector having the original dimension by expanding the dimension using an inverse transform of the PCA algorithm). 

With regards to claim 8, Kim teaches the method of claim 1, further comprising generating a speech signal based on text in a particular natural language, and the modified embedding vector (Figure 14, teaches end-to-end speech synthesis and the audio output in blocks 1408-1409).

With regards to claim 9, this is a device claim for the corresponding method claim 1. These two claims are related as method and apparatus of using the same, with each claimed system element's function corresponding to the claimed method step. Accordingly, claim 9 is similarly rejected under the same rationale as applied above with respect to method claim 1.

With regards to claim 10, this is a computer readable medium (CRM) claim for the corresponding method claim 1. These two claims are related as method and CRM of using the same, with each claimed CRM element's function corresponding to the claimed method step. Accordingly, claim 10 is similarly rejected under the same rationale as applied above with respect to method claim 1.

Conclusion

6.	The following prior art, made of record but not relied upon, is considered pertinent to applicant's disclosure: Chun (U.S. Patent Application Publication # 2018/0268806 A1), Alon (U.S. Patent Application Publication # 2024/0330762 A1). These references are also included in the PTO-892 form attached with this office action.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. If you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). In case you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEERAJ SHARMA whose contact information is given below.  The examiner can normally be reached on Monday to Friday 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis-Desir can be reached on 571-272-7799 (Direct Phone).  The fax number for the organization where this application or proceeding is assigned is 571-273-8300.

/NEERAJ SHARMA/
Primary Examiner, Art Unit 2659
571-270-5487 (Direct Phone)
571-270-6487 (Direct Fax)
neeraj.sharma@uspto.gov (Direct Email)

Read full office action

Prosecution Timeline

Jan 30, 2024

Application Filed

Nov 28, 2025

Non-Final Rejection — §101, §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/582,462

Patent 12597428

DISPLAY DEVICE, CONTROL METHOD OF DISPLAY DEVICE, AND RECORDING MEDIUM

2y 5m to grant Granted Apr 07, 2026

18/670,148

Patent 12591736

FINE-TUNED LARGE LANGUAGE MODELS FOR CAPABILITY CONTROLLER

2y 5m to grant Granted Mar 31, 2026

18/453,338

Patent 12579983

SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM

2y 5m to grant Granted Mar 17, 2026

18/339,670

Patent 12573403

SCENE-AWARE SPEECH RECOGNITION USING VISION-LANGUAGE MODELS

2y 5m to grant Granted Mar 10, 2026

18/016,732

Patent 12566076

AD-HOC NAVIGATION INSTRUCTIONS

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

85%

Grant Probability

96%

With Interview (+11.5%)

2y 9m

Median Time to Grant

Low

PTA Risk

Based on 457 resolved cases by this examiner. Grant probability derived from career allow rate.