Last updated: April 19, 2026

Application No. 18/837,211

VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD, INFORMATION TERMINAL, INFORMATION PROCESSING DEVICE, AND COMPUTER PROGRAM

Non-Final OA §101§102§103§112

Filed

Aug 09, 2024

Examiner

RIDER, JUSTIN W

Art Unit

2486

Tech Center

2400 — Computer Networks

Assignee

Sony Group Corporation

OA Round

1 (Non-Final)

Interview Optional

— +7.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 244 resolved cases, 2023–2026

Examiner Intelligence

RIDER, JUSTIN W View full profile →

Grants 82% — above average

Career Allow Rate

201 granted / 244 resolved

+24.4% vs TC avg

Moderate +8% lift

Without

With

+7.7%

Interview Lift

resolved cases with interview

Typical timeline

3y 10m

Avg Prosecution

31 currently pending

Career history

275

Total Applications

across all art units

Statute-Specific Performance

§101

14.2%

-25.8% vs TC avg

§103

37.2%

-2.8% vs TC avg

§102

33.0%

-7.0% vs TC avg

§112

8.0%

-32.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 244 resolved cases

Office Action

§101 §102 §103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/09/2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “extraction unit”, “processing unit”, “first input unit”, “second input unit” and “learning unit” in claims 1, 8 and 9.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-6 and 8-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim limitations “extraction unit”, “processing unit”, “first input unit”, “second input unit” and “learning unit” invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. There does not appear to be any specific structural element specifically linked to either the limitation or algorithm. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-6 and 8-10 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. As described above, the disclosure does not provide adequate structure to perform the claimed functions of the claims in question. The specification does not demonstrate that applicant has made an invention that achieves the claimed function because the invention is not described with sufficient detail such that one of ordinary skill in the art can reasonably conclude that the inventor had possession of the claimed invention.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because under a broadest reasonable interpretation, a computer program is considered software per se. Of not, ‘causing a computer to function,’ is merely an intended use of the software per se and is not accorded any patentable weight.

	Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-2. 4-9 and 11 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Le et al., (US Patent No. 11,417,041 B2) referred to as LE hereinafter.
Regarding claim 1, BASEREF shows a voice processing device (FIG. 1, SERVER 120) comprising:
an extraction unit that extracts a feature value of an avatar image (Col. 13, lines 15-55 disclose the multi-style landmark predictor, which has the ability to use unique feature values to alter or drive the animation and synchronization process.); and
a processing unit that processes a voice uttered by the avatar image on a basis of the extracted feature value (FIG. 2, animation compiler 270 processes the audio in question and compiles it along with the created animation.).
 
Regarding claim 2, BASEREF shows the limitations of claim 1 as applied above, and further shows wherein the extraction unit extracts the feature value of the avatar image (Col. 9, lines 25-55, 'template facial landmarks.) by using a feature value extractor designed such that a feature value extracted from a voice and a feature value extracted from an avatar image created from a face image of a speaker who has uttered the voice share a same feature value space and are close feature values on the space (Col. 9, lines 1-12 disclose wherein animation frames for an avatar are compiled along with input speech.)
or
a speaker feature value extractor designed such that a feature value extracted from a face image and a feature value extracted from an avatar image generated from the face image share a same feature value space and are close feature values on the space (Col. 9, lines 25-55 wherein 3D landmarks of an input image are used to warp the animation to synchronize with the input/animated voice.).
Regarding claim 4, BASEREF shows the limitations of claim 2 as applied above, and further shows wherein the extraction unit determines the feature value by using both the feature value extracted from the voice of the speaker and the feature value extracted from the avatar image (Col. 9, lines 25-55 wherein 3D landmarks of an input image are used to warp the animation to synchronize with the input/animated voice.).
 
Regarding claim 5, BASEREF shows the limitations of claim 1 as applied above, and further shows wherein the extraction unit extracts the feature value by using a feature extractor configured by a model learned by using a data set including a voice, a face image of a speaker who has uttered the voice, and an avatar image generated from the face image (Col. 10, lines 25-37 disclose a learning model to use as a baseline landmark from which to obtain features for synthesis.).
 
Regarding claim 6, BASEREF shows the limitations of claim 1 as applied above, and further shows wherein the extraction unit describes a voice, a face image of a speaker who has uttered the voice, and an avatar image generated from the face image by a common impression word and uses the common impression word as the feature value (Col. 13, lines 9-14 disclose landmark or 'impression' truths that a common and allow for comparison across various voice styles.).
 
Regarding claim 7, BASEREF shows a voice processing method comprising: an extraction step of extracting a feature value of an avatar image  (Col. 13, lines 15-55 disclose the multi-style landmark predictor, which has the ability to use unique feature values to alter or drive the animation and synchronization process.); and
a processing step of processing a voice uttered by the avatar image on a basis of the extracted feature value (FIG. 2, animation compiler 270 processes the audio in question and compiles it along with the created animation.).
 
Regarding claim 8, BASEREF shows an information terminal (FIG. 1, SERVER 120) comprising:
a first input unit that inputs first data for creating an avatar image (FIG. 2, 225);
a second input unit that inputs second data for adjusting a voice of the avatar image (FIG. 2, 210); and
a processing unit that processes the voice of the avatar image on a basis of a feature value determined by using both a feature value extracted from the avatar image created on a basis of the first data and a feature value extracted from a voice of a speaker based on the second data (FIG. 2, 240-260).
 
Regarding claim 9, BASEREF shows an information processing device (FIG. 1, SERVER 120) comprising:
a first model that extracts a feature value of an avatar image (FIG. 2, 225);
a second model that converts a voice quality of a voice of the avatar image or performs voice synthesis on a basis of the feature value extracted by the first model (FIG. 2, 210); and
a learning unit that learns the first model and the second model by using a data set including at least two of a voice, a face image of a speaker who has uttered the voice, or an avatar image generated from the face image (FIG. 2, 240-260).
Regarding claim 11, BASEREF shows a computer program written in a computer-readable format to cause a computer to function as: 
an extraction unit that extracts a feature value of an avatar image (Col. 13, lines 15-55 disclose the multi-style landmark predictor, which has the ability to use unique feature values to alter or drive the animation and synchronization process.); and
a processing unit that processes a voice uttered by the avatar image on a basis of the extracted feature value (FIG. 2, animation compiler 270 processes the audio in question and compiles it along with the created animation.).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over LE in view of Shah et al., (US 2022/0070295 A1) referred to as SHAH hereinafter.
Regarding claim 3, BASEREF shows the limitations of claim 2 as applied above, however failing to but SHAH does further show wherein the processing unit converts a voice quality of an input voice on a basis of the feature value in the feature value space or synthesizes a voice on a basis of the feature value in the feature value space (Fig. 5 shows the process of taking an agent's voice and converts it to a celebrity profile and synthesizes the voice for output.).
It is noted that both LE and SHAH are analogous to the claimed invention in that they both synthesize voice.
Therefore, it would have been obvious to one possessing ordinary skill in the art before the effective filing date of the claimed invention to modify LE in the spirit of SHAH because listening to the plain voice of an IVR or previously recorded messages can quickly become very boring. This may be particular true when the customer dislikes the voice. Once a customer is connected with a live agent, the customer may be more difficult to please if they have endured a lengthy session with a boring or unpleasant voice (SHAH, Paragraph [0004]).

Claim(s) 10 is rejected under 35 U.S.C. 103 as being unpatentable over LE in view of Port et al., (US 11,069,259 B2) referred to as PORT hereinafter.
Regarding claim 10, BASEREF shows the limitations of claim 9 as applied above, however failing to but PORT does further show wherein the learning unit learns the first model and the second model by adversarial learning such that a discriminator that discriminates authenticity of a voice cannot discriminate the authenticity and a determiner that identifies a speaker of the voice cannot identify the speaker (Col. 4, lines 35-40 disclose adversarial learning models in order to identify discrimination and increase quality of output.).
It is noted that both LE and PORT are analogous to the claimed invention in that they both process audio voice signals.
Therefore, it would have been obvious to one possessing ordinary skill in the art before the effective filing date of the claimed invention to modify LE in the spirit of PORT because it ensures the sound sample [feature] utilized is most closely correlated with the desired impact (Col. 7, lines 10-13).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see the Notice of References Cited (PTO-892) for additional references noted but not used currently.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JUSTIN W. RIDER whose telephone number is (571)270-1068. The examiner can normally be reached Monday-Friday, 7.00 am - 4.30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie J Atala can be reached at (571) 272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JUSTIN W. RIDER
Primary Patent Examiner
Art Unit 2486



/Justin W Rider/Primary Patent Examiner, Art Unit 2486

Read full office action

Prosecution Timeline

Aug 09, 2024

Application Filed

Feb 03, 2026

Non-Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/950,677

Patent 12600301

IMAGE CAPTURING APPARATUS, MOVABLE APPARATUS, IMAGE CAPTURING METHOD, AND STORAGE MEDIUM

2y 5m to grant Granted Apr 14, 2026

18/406,973

Patent 12598320

INTER-EYE PREDICTION MODELS FOR XR

2y 5m to grant Granted Apr 07, 2026

18/355,204

Patent 12593117

Imaging System Lens Mounting Arrangement

2y 5m to grant Granted Mar 31, 2026

18/476,389

Patent 12592221

HANDLING CONFIDENTIAL MEETINGS IN A CONFERENCE ROOM

2y 5m to grant Granted Mar 31, 2026

18/873,512

Patent 12593037

ADAPTIVE REGIONS FOR DECODER-SIDE INTRA MODE DERIVATION AND PREDICTION

2y 5m to grant Granted Mar 31, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

82%

Grant Probability

90%

With Interview (+7.7%)

3y 10m

Median Time to Grant

Low

PTA Risk

Based on 244 resolved cases by this examiner. Grant probability derived from career allow rate.

VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD, INFORMATION TERMINAL, INFORMATION PROCESSING DEVICE, AND COMPUTER PROGRAM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email