Last updated: April 19, 2026
Application No. 18/715,173
UTTERANCE SECTION CLASSIFICATION DEVICE, UTTERANCE SECTION CLASSIFICATION METHOD AND UTTERANCE SECTION CLASSIFICATION PROGRAM

Non-Final OA §101§103§DP
Filed
May 31, 2024
Examiner
SULTANA, NADIRA
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Nippon Telegraph and Telephone Corporation
OA Round
1 (Non-Final)
Interview Optional

— +31.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 97 resolved cases, 2023–2026
Examiner Intelligence

SULTANA, NADIRA View full profile →
Grants 74% — above average
Career Allow Rate
72 granted / 97 resolved
+12.2% vs TC avg
Strong +31% interview lift
Without
With
+31.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
29 currently pending
Career history
126
Total Applications
across all art units
Statute-Specific Performance

§101
25.4%
-14.6% vs TC avg
§103
54.8%
+14.8% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
3.6%
-36.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 97 resolved cases
Office Action

§101 §103 §DP
DETAILED ACTION

Notice of AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement

The information disclosure statement (IDS) submitted on 05/31/2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Status of Claims

Claims were amended pursuant to a preliminary amendment filed together with the initial set on 05/31/2024.  For the examination purpose, the claim set with amended claims have been used. Claim 6 was amended, claims 9-20 were newly added. Claims 1-20 are pending of which claims 1, 7 and 8 are independent.

Claim Objections

Claim 19 is objected to because of the following informalities: claim 19 describes “ a speech segment estimation unit estimates a speech segment using a speech segment estimation model in which the speech segment estimation model is a trained model that receives speech text data and outputs speech segment data”. The specification doesn’t mention any  “speech segment estimation unit” and “speech segment estimation model”. In para.[0038] of the specification describes, “ speech section estimation unit 102 estimates a speech section using the speech section estimation model 30, and….receives speech text data as input and outputs speech section data”. For the examination purpose, “speech segment estimation unit” and “ speech segment estimation model” are considered as “speech section estimation unit” and “speech section estimation model’. Appropriate correction is required.

35 U.S.C. 112(f) Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A) 	the claim limitation uses the term "means" or "step" or a term used as a substitute for "means" that is a generic placeholder (also called a nonce term or a nonstructural term having no specific structural meaning) for performing the claimed function;
(B) 	the term "means" or "step" or the generic placeholder is modified by functional language, typically, but not always linked by the transition word "for'' (e.g., "means for'') or another linking word or phrase, such as "configured to" or "so that"; and
(C) 	the term "means" or "step" or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word "means" (or "step") in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word "means" (or "step") in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting
sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word "means" (or "step") are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word "means" (or "step") are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word "means," but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: 
“ speech section estimation unit” in claim 1,
“ speech type estimation unit” in claim 1,
“ speech section classification unit” in claim 1,
“ speech segment estimation unit” in claim 19.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being
interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Double Patenting

Applicant considered non-statutory double patenting rejection with co-pending U.S. Application No. 18/715,180 from the same applicant and same inventor. But the scope of the co-pending U.S. Application No. 18/715,180 is different from the instant application 18/715173 and that’s why non-statutory double patenting rejection is not given.

Claim Rejections - 35 USC § 101 

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.

The Independent claim 1 recites “a speech section estimation unit that estimates a speech section from speech text data including speeches of two or more people”; ”a speech type estimation unit that estimates a speech type of each speech included in the speech section estimated by the speech section estimation unit”; “and a speech section classification unit that classifies the speech section estimated by the speech section estimation unit, using the speech type of each speech estimated by the speech type estimation unit and a speech section classification rule determined in advance as a rule for classifying speech sections based on the speech type”. The limitations above as drafted, is a process that, under its broadest reasonable interpretation, covers a mental process, as this could be performed in the human mind or with the aid of pen and paper.
The limitation of " estimates ... ", "classifies ... ", as drafted covers mental activities. More specifically, a human can obtain a text data/transcript of a speech section of a conversation or dialog, can determine the type conversation they are having, can classify the speech section or conversation, based on some previously determined classification rules. All the steps above are examples of observation and evaluation that could be performed in the human mind or with the aid of pencil and paper.
The claim recites the additional limitations of  “speech section estimation unit”, “speech type estimation unit”, “speech section classification unit” for performing the method. All those are recited at a high level of generality and are recited as performing generic computer functions routinely used in computer applications. The current specification in paragraph [0033] specifies all those units are implemented by CPU 11 reading a speech section classification program stored in the ROM 12 or the storage 14, developing the program in the RAM 13, and executing the program, which is not sufficient to amount to significantly more than the judicial exception. Performing generic computer functions that are well-understood, routine and conventional activities amount to no more than implementing the abstract idea with a computerized system. The claim as drafted, is not patent eligible 
Thus, taken alone, the additional elements do not amount to significantly more than the above identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds
nothing that is not already present when looking at the elements taken individually. There is no indication
that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Claim 1
is therefore not drawn to eligible subject matter as this is directed to an abstract idea without
significantly more than the abstract idea.

The Independent claims 7 and 8 recite “estimating a speech section from speech text data including speeches of two or more people” ; “estimating a speech type of each speech included in the speech section that has been estimated”; “and classifying the speech section that has been estimated, using the speech type of each speech that has been estimated and a speech section classification rule determined in advance as a rule for classifying speech sections based on the speech type”. The limitations above as drafted, is a process that, under its broadest reasonable interpretation, covers a mental process, as this could be performed in the human mind or with the aid of pen and paper.
The limitation of " estimating ... ", "classifying ... ", as drafted covers mental activities. More specifically, a human can obtain a text data/transcript of a speech section of a conversation or dialog, can determine the type conversation they are having, can classify the speech section or conversation, based on some previously determined classification rules. All the steps above are examples of observation and evaluation that could be performed in the human mind or with the aid of pencil and paper.
The claim 7 doesn’t recite any additional elements. The claim 8 recites the additional limitation of  “computer”, for performing the method, which is recited at a high level of generality and are recited as performing generic computer functions routinely used in computer applications. The current specification in paragraph [0030] specifies computer as “ general purpose computer device such as server computer or personal computer”, which is not sufficient to amount to significantly more than the judicial exception. Performing generic computer functions that are well-understood, routine and conventional activities amount to no more than implementing the abstract idea with a computerized system. The claims as drafted, are not patent eligible 
Thus, taken alone, the additional elements do not amount to significantly more than the above identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds
nothing that is not already present when looking at the elements taken individually. There is no indication
that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Claims 7 and 8 are therefore not drawn to eligible subject matter as they are directed to an abstract idea without
significantly more than the abstract idea.

Claims 2, 9 and 14 recite the additional limitation of “wherein, in the speech section classification rule, whether a specific speech type is included in the speech section, or a combination and order relationship of a plurality of speech types included in the speech section is defined” , where to determine that the classification rule includes a rule which defines relationship of different speech types,  could be performed in the human mind or with the aid of pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 2, 9 and 14 do not recite any additional limitations. The claims as drafted, are not patent eligible.

Claims 3, 10 and 15 recite “wherein the speech text data includes a speech of an operator and a speech of a client, and when the speech section includes a speech type indicating a speech of the client evaluating using a positive expression or a speech type indicating a speech of the client evaluating using a negative expression, the speech section classification rule classifies the speech section as a section including dissatisfaction or a demand of the client”, to find out that the speech data contains speech of a customer service representative and a customer, describing a situation using positive or negative expression, which according to the classification rule, classifies that customer is not happy or demanding a solution, could be done by an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 3, 10 and 15 do not recite any additional limitations. The claims as drafted, are not patent eligible.

Claims 4, 11 and 16 recites “wherein the speech text data includes a speech of an operator and a speech of a client, and, when the speech section includes a speech type indicating a speech of the client and the operator regarding an issue, and a speech with the speech type is added with any one of the following types: a speech type indicating a speech of a question by the client to the operator; a speech type indicating a speech of the client expressing a request or a demand to the operator; a speech type indicating a speech of the client answering or explaining to the question of the operator; and a speech type indicating a speech of the client explaining a negative situation, the speech section classification rule classifies the speech section as a section including dissatisfaction or a demand of the client”, to classify or categorize some questions or requests or demands from a conversation of a customer and customer service representative, based on some predetermined rules, as customer’s dissatisfaction or demand, could be an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 4, 11 and 16 do not recite any additional limitations. The claims as drafted, are not patent eligible.

Claims 5, 12 and 17 recite “wherein the speech text data includes a speech of an operator and a speech of a client, and, when the speech section includes a speech type indicating a speech of the operator explaining a negative situation or a speech type indicating a speech of the operator using an expression for softening a negative circumstance, and any one of a speech type indicating a speech of a question by the client to the operator and a speech type indicating a speech of the client expressing a request or a demand to the operator is included within two speeches after a speech added with the speech type, the speech section classification rule classifies the speech section as a section including a dissatisfaction or a demand of the client”,  to classify or categorize some questions or requests or demands or negative expression from a conversation of certain order of a customer and customer service representative, based on some predetermined rules, as customer’s dissatisfaction or demand, could be an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 5, 12 and 17 do not recite any additional limitations. The claims as drafted, are not patent eligible.

Claims 6, 13 and  18 recite “wherein the speech text data includes a speech of an operator and a speech of a client, and, when the speech section includes a speech type indicating a speech of the operator asking about a need of the client, and the speech type of the first asking of the need in the speech section is an open question, the speech section classification rule classifies the speech section as an open type sales section, when the speech section includes a speech type indicating a speech of the operator asking about the need of the client, and the speech type of the first asking of the need in the speech section is a theme question, the speech section classification rule classifies the speech section as a theme type sales section, and when the speech section includes the [[a]] speech type indicating a speech of the operator asking about the need of the client, and the speech type of the first asking of the need in the speech section is an end question, the speech section classification rule classifies the speech section as an end type sales section”, to classify or categorize some questions or requests from a conversation of a customer and customer service representative, based on some predetermined rules, as open type sale, theme type sales or closed sales, could be an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 6, 13 and 18 do not recite any additional limitations. The claims as drafted, are not patent eligible.

Claim 19 recites “wherein a speech segment estimation unit estimates a speech segment using a speech segment estimation model in which the speech segment estimation model is a trained model that receives speech text data and outputs speech segment data”, where receiving text data or transcript from a speech and identifying certain sections or segments from the transcript could be performed in the human mind or with the aid of pen and paper. The claim recites additional limitation of speech segment estimation model, which according to para. [0038] in specification, can be a deep neural network and not sufficient to amount to significantly more than the judicial exception. The claim 19 as drafted, is not patent eligible.

Claim 20 recites “wherein a classifying model for utterance types is generated in advance by performing machine learning using utterance segment data with labels attached to each utterance as learning data”, where assigning labels to the utterances of a certain segment of the conversation could be performed with the aid of pen and paper. The claim recites additional limitation of classifying model, which according to para.[0039], [0074] in specification, can be a deep neural network and not sufficient to amount to significantly more than the judicial exception. The claim 20 as drafted, is not patent eligible.

Claims 8, 14-18 are rejected under 35 U.S.C. 101 because the claims appear to be directed to a software embodiment and not to hardware embodiment, where a machine claim is directed towards a system, apparatus, or arrangement. The claim appears to be directed towards a software embodiment. Para.[0025], [0026] of the published specification describe the elements of the system being implemented as software alone actualizing the embodiments of the invention. The claimed limitations are capable of being performed as software as described in the above paragraphs, alone since no hardware component is being claimed. Software, alone, are not physical components and thus are not statutory since software do not define any structural and functional interrelationships between the computer programs and other claimed elements of a computer, which permit the computer' s program functionality to be realized. Hence, the stated functions comprise software and is thus not directed to a hardware embodiment. Data structures not claimed as embodied in computer readable media are descriptive material per se and are not statutory because they are not capable of causing functional change in the computer. See e.g., Warmerdam, 33 F.3d at 1361, 31, USPQ2d at 1760 (claim to a data structure per se held non-statutory). Such claimed data structures do not define any structural and functional interrelationships between data and other claimed aspects of the invention, which permit the data structure' s functionality to be realized. In contrast, a claimed computer readable medium encoded with a data structure defines structural and functional interrelationships between the data structure and the computer software and hardware components which permit the data structure' s functionality to be realized, and is thus statutory.

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-12 and 14 -17 are rejected under 35 U.S.C. 103 as being unpatentable over Onishi et al. (US 20150310877 A1), hereinafter referenced as Onishi, in view of Raanani et al.  ( US 20170187880 A1), hereinafter referenced as Raanani.

Regarding Claim 1,  Onishi teaches a speech section classification device comprising:
a speech section estimation unit that estimates a speech section from speech text data including speeches of two or more people ( Onishi: Para.[0060],[0061], Fig. 2 illustrates a call analysis server 10, which includes a recognition processing unit 21 ( speech section estimation unit) , which may detect the utterance sections of the operator and the customer in the voice data of the operator and the customer included in the call data. The recognition processing unit 21 acquires the start time and the end time of each utterance section and includes a voice recognition unit 27 which acquires, voice text data from the voice call data).
a speech type estimation unit that estimates a speech type of each speech included in the speech section estimated by the speech section estimation unit ( Onishi: Para.[0059], [0062],[0063], Fig. 2 recognition processing unit 21  includes a voice recognition unit 27, a specific expression table 28, an emotion recognition unit 29. Specific expression table 28 contains specific expression data and emotion recognition unit 29 recognizes the emotion with respect to the voice data and can detect utterance section of different types, such as “apology”, “ anger”, “normal”); 

Onishi while teaching the device of claim 1, fails to explicitly teach the claimed, and a speech section classification unit that classifies the speech section estimated by the speech section estimation unit, using the speech type of each speech estimated by the speech type estimation unit and a speech section classification rule determined in advance as a rule for classifying speech sections based on the speech type.

However, Raanani does teach the claimed, and a speech section classification unit that classifies the speech section estimated by the speech section estimation unit, using the speech type of each speech estimated by the speech type estimation unit and a speech section classification rule determined in advance as a rule for classifying speech sections based on the speech type ( Raanani: Para.[0042]- [0044], Figs. 1, 5, process 500 illustrates analysis of conversations between participants. At block 505, analysis component 110 retrieves call data 105, at block 515, the classifier component 112 ( speech section classification unit) analyzes the features extracted from call data 105 by feature generation component 111, to generate classifiers 120. Each of the classifiers indicates a specific outcome of the conversation ( classified speech), such as “ sales closed”, “ sales failed”. Para.[0033], the feature generation component 111 can generate a set of features that indicate a blueprint of a conversation ( classification rules). The blueprint can include various features that indicate whether the conversation included any agenda setting, rapport building, clarification questions, defining goals, setting expectations).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Raanani’s teaching of analyzing voice conversations between participants of conversations and coordinating calls between participants, in order to influence an outcome of the voice conversation, into the device, system and method of conversation analysis, taught by Onishi, because, this would efficiently coordination calls between sales representatives and customers and would improve outcomes of the calls based on various situations. (Raanani, Para.[0003, [0011]-[0014]). 

Claim 7 is method claim performing the steps in device claim 1 above and as such, claim 7 is similar in scope and content to claim 1 and therefore, claim 7 is rejected under similar rationale as presented against claim 1 above.

Claim 8 is program claim performing the steps in device claim 1 above and as such, claim 8 is similar in scope and content to claim 1 and therefore, claim 8 is rejected under similar rationale as presented against claim 1 above.

Regarding Claim 2, Onishi in view of Raanani teach the speech section classification device according to claim 1. Onishi further teaches, wherein, in the speech section classification rule, whether a specific speech type is included in the speech section, or a combination and order relationship of a plurality of speech types included in the speech section is defined ( Onishi: Para.[0043],[0067], Fig. 2, The transition detection unit 22 detects a plurality of predetermined transition patterns which may include, for example, a pair of a type of the specific emotional state before the transition and a type of the specific emotional state after the transition).

Claim 9 is method claim performing the steps in device claim 2 above and as such, claim 9 is similar in scope and content to claim 2 and therefore, claim 9 is rejected under similar rationale as presented against claim 2 above.

Claim 14 is program claim performing the steps in device claim 2 above and as such, claim 14 is similar in scope and content to claim 2 and therefore, claim 14 is rejected under similar rationale as presented against claim 2 above.

Regarding Claim 3, Onishi in view of Raanani teach the speech section classification device according to claim 2. Onishi further teaches, wherein the speech text data includes a speech of an operator and a speech of a client ( Onishi: Para.[0060],[0061], Fig. 2, the recognition processing unit 21 detect the utterance sections of the operator and the customer in the call data. The recognition processing unit 21 includes a voice recognition unit 27 which acquires, voice text data from the voice call data),
and when the speech section includes a speech type indicating a speech of the client evaluating using a positive expression or a speech type indicating a speech of the client evaluating using a negative expression, the speech section classification rule classifies the speech section as a section including dissatisfaction or a demand of the client ( Onishi: Para.[0093], Fig. 7 illustrates a conversation between a customer and operator, where it shows customer’s speech “label? where is that?” ,” there are so many labels, I am confused” ( negative ) and classified as “dissatisfaction”).

Claim 10 is method claim performing the steps in device claim 3 above and as such, claim 10 is similar in scope and content to claim 3 and therefore, claim 10 is rejected under similar rationale as presented against claim 3 above.

Claim 15 is program claim performing the steps in device claim 3 above and as such, claim 15 is similar in scope and content to claim 3 and therefore, claim 15 is rejected under similar rationale as presented against claim 3 above.

Regarding Claim 4, Onishi in view of Raanani teach the speech section classification device according to claim 2. Onishi further teaches, wherein the speech text data includes a speech of an operator and a speech of a client ( Onishi: Para.[0060],[0061], Fig. 2, the recognition processing unit 21 detect the utterance sections of the operator and the customer in the call data. The recognition processing unit 21 includes a voice recognition unit 27 which acquires, voice text data from the voice call data),
 and, when the speech section includes a speech type indicating a speech of the client and the operator regarding an issue ( Onishi: Para.[0093], Fig. 7 illustrates a conversation between a customer and operator, where it shows customer’s speech regarding an issue “CD ROM is not working”).
and a speech with the speech type is added with any one of the following types: a speech type indicating a speech of a question by the client to the operator; [a speech type indicating a speech of the client expressing a request or a demand to the operator]; [a speech type indicating a speech of the client answering or explaining to the question of the operator]; [and a speech type indicating a speech of the client explaining a negative situation], the speech section classification rule classifies the speech section as a section including dissatisfaction or a demand of the client  ( Onishi: Para.[0093], Fig. 7 illustrates a conversation between a customer and operator, where it shows customer’s question to the operator, “label? where is it?”, which is classified as “ dissatisfaction”).

Claim 11 is method claim performing the steps in device claim 4 above and as such, claim 11 is similar in scope and content to claim 4 and therefore, claim 11 is rejected under similar rationale as presented against claim 4 above.

Claim 16 is program claim performing the steps in device claim 4 above and as such, claim 16 is similar in scope and content to claim 4 and therefore, claim 16 is rejected under similar rationale as presented against claim 4 above.

Regarding Claim 5, Onishi in view of Raanani teach the speech section classification device according to claim 2. Onishi further teaches, wherein the speech text data includes a speech of an operator and a speech of a client ( Onishi: Para.[0060],[0061], Fig. 2, the recognition processing unit 21 detect the utterance sections of the operator and the customer in the call data. The recognition processing unit 21 includes a voice recognition unit 27 which acquires, voice text data from the voice call data),
and, when the speech section includes a speech type indicating a speech of the operator explaining a negative situation or a speech type indicating a speech of the operator using an expression for softening a negative circumstance ( Onishi: Para.[0062], The specific expression table 28 contains, for example, apology expression data such as "I am very sorry" (expression for softening a negative circumstance) , which is in the "apology of the operator" section), 
and any one of a speech type indicating a speech of a question by the client to the operator and a speech type indicating a speech of the client expressing a request or a demand to the operator is included within two speeches after a speech added with the speech type, the speech section classification rule classifies the speech section as a section including a dissatisfaction or a demand of the client ( Onishi: Para.[0093], Fig. 7 illustrates a conversation between a customer and operator, where it shows customer’s question to the operator, “label? where is it?”, which is classified as “ dissatisfaction”. Two speeches after “dissatisfaction”, customer’s question to the operator “this is it?” and later “ now it’s working” is a transition from dissatisfaction to delight).

Claim 12 is method claim performing the steps in device claim 5 above and as such, claim 12 is similar in scope and content to claim 5 and therefore, claim 12 is rejected under similar rationale as presented against claim 5 above.

Claim 17 is program claim performing the steps in device claim 5 above and as such, claim 17 is similar in scope and content to claim 5 and therefore, claim 17 is rejected under similar rationale as presented against claim 5 above.

Claims 6, 13 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Onishi et al. (US 20150310877 A1), hereinafter referenced as Onishi, in view of Raanani et al.  ( US 20170187880 A1), hereinafter referenced as Raanani, further in view of Tamura et al. ( US 20130238321 A1), hereinafter referenced as Tamura.

Regarding Claim 6, Onishi in view of Raanani teach the speech section classification device according to claim 2, wherein the speech text data includes a speech of an operator and a speech of a client ( Onishi: Para.[0060],[0061], Fig. 2, the recognition processing unit 21 detect the utterance sections of the operator and the customer in the call data. The recognition processing unit 21 includes a voice recognition unit 27 which acquires, voice text data from the voice call data),
and when the speech section includes the [[a]] speech type indicating a speech of the operator asking about the need of the client, and the speech type of the first asking of the need in the speech section is an end question, the speech section classification rule classifies the speech section as an end type sales section ( Onishi: Para.[0095], Fig. 8 illustrates a conversation example, where the last part of the conversation between the customer and operator indicates transition from the normal state to the satisfaction ( delight) state of the customer and is identified as end point ).

Onishi in view of Raanani fail to explicitly teach the claimed, and, when the speech section includes a speech type indicating a speech of the operator asking about a need of the client, and the speech type of the first asking of the need in the speech section is an open question, the speech section classification rule classifies the speech section as an open type sales section; when the speech section includes a speech type indicating a speech of the operator asking about the need of the client, and the speech type of the first asking of the need in the speech section is a theme question, the speech section classification rule classifies the speech section as a theme type sales section.

However, Tamura does teach the claimed, and, when the speech section includes a speech type indicating a speech of the operator asking about a need of the client, and the speech type of the first asking of the need in the speech section is an open question, the speech section classification rule classifies the speech section as an open type sales section ( Tamura: Para.[0102], Fig. 9 illustrates a dialog text where speech with index 4, an operator is asking “how can I help you”, which is an open type sales question),
when the speech section includes a speech type indicating a speech of the operator asking about the need of the client, and the speech type of the first asking of the need in the speech section is a theme question, the speech section classification rule classifies the speech section as a theme type sales section ( Tamura: Para.[0102], Fig. 9 illustrates a dialog text where speech with index 4-11, an operator is assisting a client with printer jamming, which can be theme type ( printer)),
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Tamura’s teaching of conversation analysis device and method, into the device, system and method, taught by Onishi in view of Raanani, because, by using the conversation analysis technique identification accuracy of a section in which a person taking part in a conversation is expressing a specific emotion, could be improved. (Tamura, Para.[0010]). 

Claim 13 is method claim performing the steps in device claim 6 above and as such, claim 13 is similar in scope and content to claim 6 and therefore, claim 13 is rejected under similar rationale as presented against claim 6 above.

Claim 18 is program claim performing the steps in device claim 6 above and as such, claim 18 is similar in scope and content to claim 6 and therefore, claim 18 is rejected under similar rationale as presented against claim 6 above.

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Onishi et al. (US 20150310877 A1), hereinafter referenced as Onishi, in view of Raanani et al.  ( US 20170187880 A1), hereinafter referenced as Raanani, further in view of Siefken et al.  ( US 20200273089 A1), hereinafter referenced as Siefken.

Regarding Claim 19, Onishi in view of Raanani teach the speech section classification device according to claim 1. Onishi in view of Raanani fail to explicitly teach the claimed, wherein a speech segment estimation unit estimates a speech segment using a speech segment estimation model in which the speech segment estimation model is a trained model that receives speech text data and outputs speech segment data.  

However, Siefken does teach the claimed, wherein a speech segment estimation unit estimates a speech segment using a speech segment estimation model in which the speech segment estimation model is a trained model that receives speech text data and outputs speech segment data ( Siefken: Para.[0061],[0068],[0070], Fig. 1 illustrates a system for processing customer order based on the conversation 105. The voice recognition system 125, integrated with the voice recognition engine 127, as an integrated voice recognition processor, process the communication, which accepts the text representation of the utterance and output, a transcript based on the intents relating to the input and transmits the transcription information to an order submission API. Para.[0088], the voice recognition system 125 is trained to recognize the utterance, such as if the customer 101 orders “fries on the side,” the system 125 may process the utterances as “fireside,” and may be trained to recognize the utterance as relating to a specific menu item (a side order of fries)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Siefken’s teaching of a system for eatery ordering with mobile interface and point of sale terminal, into the device, system and method, taught by Onishi in view of Raanani, because, by using a trained voice recognition engine, an improved and efficient method and system for processing order could be achieved. (Siefken, Para.[0060], [0088]). 

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Onishi et al. (US 20150310877 A1), hereinafter referenced as Onishi, in view of Raanani et al.  ( US 20170187880 A1), hereinafter referenced as Raanani, further in view of Medan et al.  ( US 20210158834 A1), hereinafter referenced as Medan.

Regarding Claim 20, Onishi in view of Raanani teach the speech section classification device according to claim 1. Onishi in view of Raanani fail to explicitly teach the claimed, wherein a classifying model for utterance types is generated in advance by performing machine learning using utterance segment data with labels attached to each utterance as learning data.

However, Medan does teach the claimed, wherein a classifying model for utterance types is generated in advance by performing machine learning using utterance segment data with labels attached to each utterance as learning data ( Medan: Para.[0039]-[0042], Fig. 1, speech repository 102 is a collection of pathological speech samples/utterances with tags/metadata indicating the time interval and type of each pathological speech and is used for the training of the speech/language pathologies classifier 110).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Medan’s teaching of method and system for creating a speech/language pathologies classifier, into the device, system and method, taught by Onishi in view of Raanani, because, by using large sets of high quality tagged pathological speech data, an improved and efficient method and system for diagnosing and treating speech/language related pathologies could be achieved. (Medan, Para.[0003], [0006]). 
Conclusion

Listed below are the prior arts made of record and not relied upon but are considered pertinent to applicant's disclosure.
Goronzy et al. (US 20050102135 A1) teaches an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extracted by the important event extraction means comprises a discrete sequence of cohesive audio fragments corresponding to an important event included in the audio signals.
Kurata et al. (US 20090228268 A1) teaches a system, method, and program product for processing voice data in a conversation between two persons to determine characteristic conversation patterns. The system includes: a variation calculator for calculating a variation of a speech ratio of a first speaker and a variation calculator for calculating a variation of a speech ratio of a second speaker; a difference calculator for calculating a difference data string; a smoother for generating a smoothed difference data string; and a presenter for presenting the difference between the variation of the speech ratio of the first speaker and the speech ratio of the second speaker. The method includes: calculating a variation of a speech ratio of a first speaker and a second speaker; calculating a difference data string; generating a smoothed difference data string; and grouping them according to their patterns.
Arslan et al.  (US 20150350438 A1) teaches ways of automatically and robustly evaluating agent performance, customer satisfaction, campaign and competitor analysis in a call-center and it is comprising; analysis consumer server, call pre-processing module, speech-to-text module, emotion recognition module, gender identification module and fraud detection module.
Rule et al. (US 10958779 B1) teaches a process where a server can receive a plurality of records at a databases such that each record is associated with a phone call and includes at least one request generated based on a transcript of the phone call. The server can generate a training dataset based on the plurality of records. The server can further train a binary classification model using the training dataset. Next, the server can receive a live transcript of a phone call in progress. The server can generate at least one live request based on the live transcript using a natural language processing module of the server. The server can provide the at least one live request to the binary classification model as input to generate a prediction. Lastly, the server can transmit the prediction to an entity receiving the phone call in progress. The prediction can cause a transfer of the call to a chatbot.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NADIRA SULTANA whose telephone number is (571)272-4048. The examiner can normally be reached M-F,7:30 am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached on (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/NADIRA SULTANA/Examiner, Art Unit 2653
Read full office action
Prosecution Timeline

May 31, 2024
Application Filed
Jan 22, 2026
Non-Final Rejection — §101, §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/654,845
Patent 12603086
CONTEXTUAL EDITABLE SPEECH RECOGNITION METHODS AND SYSTEMS
2y 5m to grant Granted Apr 14, 2026
18/129,882
Patent 12591747
ENTITY-CONDITIONED SENTENCE GENERATION
2y 5m to grant Granted Mar 31, 2026
18/154,197
Patent 12573413
AUDIO CODING METHOD AND RELATED APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
18/316,173
Patent 12567420
METHOD AND APPARATUS FOR CONTROLLING SOUND RECEIVING DEVICE BASED ON DUAL-MODE AUDIO THREE-DIMENSIONAL CODE
2y 5m to grant Granted Mar 03, 2026
17/575,195
Patent 12536992
ELECTRONIC DEVICE AND METHOD FOR PROVIDING VOICE RECOGNITION SERVICE
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+31.1%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 97 resolved cases by this examiner. Grant probability derived from career allow rate.