Last updated: April 19, 2026
Application No. 18/715,180
UTTERANCE SECTION EXTRACTION DEVICE, UTTERANCE SECTION EXTRACTION METHOD AND UTTERANCE SECTION EXTRACTION PROGRAM

Non-Final OA §101§102§103§112
Filed
May 31, 2024
Examiner
LAM, PHILIP HUNG FAI
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Nippon Telegraph and Telephone Corporation
OA Round
1 (Non-Final)
Interview Optional

— +45.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 129 resolved cases, 2023–2026
Examiner Intelligence

LAM, PHILIP HUNG FAI View full profile →
Grants 83% — above average
Career Allow Rate
107 granted / 129 resolved
+20.9% vs TC avg
Strong +46% interview lift
Without
With
+45.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
29 currently pending
Career history
158
Total Applications
across all art units
Statute-Specific Performance

§101
23.7%
-16.3% vs TC avg
§103
53.7%
+13.7% vs TC avg
§102
11.1%
-28.9% vs TC avg
§112
5.3%
-34.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 129 resolved cases
Office Action

§101 §102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Introduction
This office action is in response to Applicant’s submission filed on 5/31/2024. 

Claim Objections
Claims 1-20 are objected to because of the following informalities: In claim 1, lines 6-7, the claim should read “a speech type extraction unit that extracts a speech type of each speech included in the speech text data 
In claim 8, lines 5-6, the claim should read “extracting a speech type of each speech included in the speech text data .
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 8 and 14-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.
Claim 8 drawn to a "program" per se as recited in the preamble ("A speech section extraction program" configured to cause a computer to execute as defined in the disclosure) and as such is non-statutory subject matter. See MPEP § 2106.1V.B.1 .a. Data structures not claimed as embodied in computer readable media are descriptive material per se and are not statutory because they are not capable of causing functional change in the computer. See, e.g., Warmerdam, 33 F.3d at 1361, 31 USPQ2d at 1760 (claim to a data structure per se held nonstatutory). Such claimed data structures do not define any structural and functional interrelationships between the data structure and other claimed aspects of the invention, which permit the data structure's functionality to be realized. In contrast, a claimed computer readable medium encoded with a data structure defines structural and functional interrelationships between the data structure and the computer software and hardware components which permit the data structure's functionality to be realized, and is thus statutory. Similarly, extraction programs claimed as computer listings per se, i.e., the descriptions or expressions of the programs are not physical "things." They are neither computer components nonstatutory processes, as they are not "acts" being performed. Such claimed extraction programs do not define any structural and functional interrelationships between the computer program and other claimed elements of a computer, which permit the computer program's functionality to be realized. To overcome this rejection, Applicant can overcome this rejection by amending claim 8 to recite “one or more a non-transitory storage medium,” and amending “A speech section extraction program” to be “A speech section extraction program product.”

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites a device that, under the broadest reasonable interpretation, claims limitations that cover performance of the limitations in the human mind with the assistance of physical aids (e.g., pen and paper), but for the recitation of generic computer components.  That is, other than reciting “a speech section identification unit”, “a speech portion type determination unit”, “a speech type extraction unit”, “a speech section extraction unit”, nothing in these claim limitations precludes the steps from practically being performed in the mind.  As a whole, claim 1 pertains to organizing, summarizing and presenting info regarding a conversation or transcript, which is a mental process that a human can do.  Individually, each of the limitations also pertains to a mental process, for example:
identifies a speech section including at least one speech from speech text data including speeches of two or more people; (e.g., identification step, a human reads a transcript and makes a note on who said what. This can also be done with assistance of pen and paper.)
determines a speech section type of each of the speech sections identified by the speech section identification unit; (e.g., identification/annotation step, the human determine what each segment is about, like is it a question or answer to a question, or if this is greeting type or a request type.)
extracts a speech type of each speech included in the speech text data from the speech text data; (e.g., an identification/annotation/labeling step, a human processing speech data and labeling the text as question, answer, greeting, or agreement/disagreement.)
extracts an important speech section among speech sections identified by the speech section identification unit, based on a combination and transition of the speech section types determined by the speech section type determination unit, and a combination and transition of the speech types extracted by the speech type extraction unit. (e.g., an evaluation/determination step, a human processing speech data and making note that a segment is important based on combination & transition of speech section type and combination & transition of speech type.  Human are fully capable of looking at the conversation flow and make determination that transition point like agreement to disagreement can be an important point/segment of conversation depending on the context.)
The judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components. Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of receiving, determining, or outputting information) such that they amount to no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of using generic computer components amount to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Claim 1 is not patent eligible.
The examiner further notes that the use of claimed generic computer components (“system”) invokes such generic computer components “merely as a tool to perform an existing process”.  MPEP 2106.05(f).  MPEP 2106.05(f) further explains:
Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more. See Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone); TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016) (computer server and telephone unit). Similarly, "claiming the improved speed or efficiency inherent with applying the abstract idea on a computer" does not integrate a judicial exception into a practical application or provide an inventive concept. Intellectual Ventures I LLC v. Capital One Bank (USA), 792 F.3d 1363, 1367, 115 USPQ2d 1636, 1639 (Fed. Cir. 2015). 

Claim 1 recites generic computer components (“a speech section identification unit”, “a speech portion type determination unit”, “a speech type extraction unit”, “a speech section extraction unit”), with respect to performing tasks.  MPEP 2106.05(d) and (f) further provides examples of court decisions where the courts found generic computing components to be mere instructions to apply a judicial exception, and further explains “increased speed” (e.g., using a computer to increase the speed of an otherwise mental process) does not provide an inventive concept.  For example:
A commonplace business method or mathematical algorithm being applied on a general purpose computer, Alice Corp. Pty. Ltd. V. CLS Bank Int’l, 573 U.S. 208, 223, 110 USPQ2d 1976, 1983 (2014); Gottschalk v. Benson, 409 U.S. 63, 64, 175 USPQ 673, 674 (1972); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015).
A process for monitoring audit log data that is executed on a general-purpose computer where the increased speed in the process comes solely from the capabilities of the general-purpose computer, FairWarning IP, LLC v. Iatric Sys., 839 F.3d 1089, 1095, 120 USPQ2d 1293, 1296 (Fed. Cir. 2016) (emphasis added).
Performing repetitive calculations. Bancorp Services v. Sun Life, 687 F.3d 1266, 1278, 103 USPQ2d 1425, 1433 (Fed. Cir. 2012) ("The computer required by some of Bancorp’s claims is employed only for its most basic function, the performance of repetitive calculations, and as such does not impose meaningful limits on the scope of those claims.")

Claim 7 recites a method that corresponds to the device of claim 1 and is therefore rejected under the same grounds as claim 1 above.  Claim 7 is not patent eligible.  

Claim 8 recites program that corresponds to the method of claim 1 and is therefore rejected under the same grounds as claim 1 above.  While claim 8 further recites a “computer”, these are merely generic computer components recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component.  Therefore, none of these limitations (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception, because in either case the additional limitations merely utilize generic computer components that amounts to no more than mere instructions to apply the exception using generic computer function.  Claim 8 is not patent eligible.  

Claims (2-6, 19-20), 9-13, and 14-18 depend from independent claims 1, 7 and 8 respectively, do not remedy any of the deficiencies of claims 1, 7 and 8 respectively, and therefore are rejected on the same grounds as claims 1, 7 and 8 above.
	Claims 2, 9 and 14 further recites the mental processes of: wherein the speech section extraction unit extracts the important speech section using a speech section extraction rule in which a combination and a transition of the speech section type and a combination and a transition of the speech type are determined in advance in association with the important speech section. (e.g., a human identifies important speech section using a rule book where combination/transition of speech section type and speech type are determined.) 
Claims 3, 10 and 15 further recite limitations reciting the mental process of: wherein the speech text data includes a speech of an operator and a speech of a client, the speech section is a plurality of speech sections each including a plurality of speeches, and when the speech type indicating a speech regarding sales performed by the operator for the client is included and a combination of speech section types indicating a plurality of continuous speech sections specified in advance as an unimportant section is not included, the speech section extraction rule extracts the plurality of speech sections as the important speech section. (e.g., a human can read a transcript between an operator and a client which discuss sales information and identify key section as important.) 
Claims 4, 11 and 16 further recites the mental process of: wherein the combination of the plurality of continuous speech sections specified in advance as the unimportant section is a combination that includes at least one of an "open type sales section" and an "end type sales section" among the "open type sales section", a "theme type sales section", and the "end type sales section", and does not include the "theme type sales section". (e.g., a human can listen/read a transcript or recording and identify opening/intro, theme/main pitch, and end/closing and then note the open/intro and end/closing talk as not important.)
Claims 5, 12 and 17 further recites the mental process of: wherein the speech text data includes a speech of an operator and a speech of a client, and the speech section is a plurality of speech sections each including one speech, and when a speech type indicating a speech expressing that there is no need of the client when the operator conducts sales with the client is not included and a speech section type indicating a speech section in which the operator asks a question to the client before a speech type indicating a speech of the operator proposing to the client is included, the speech section extraction rule extracts one group including the plurality of speech sections as the important speech section (e.g., a human can listen/read a transcript or recording and use a rule book, to capture important section where the operator ask question before making a proposal and where the client have not raise an objection or decline by mentioning they have no need for the product/service offered.)
	Claims 6, 13 and 18 further recite mental processes of: wherein the speech section includes a plurality of speech sections each including a plurality of speeches, and when a combination of speech section types indicating a plurality of continuous speech sections specified in advance as the important section is included, and a combination of speech types indicating speeches specified in advance in each of the plurality of continuous speech sections is included, the speech section extraction rule extracts the plurality of speech sections as the important speech section. (e.g., a human can listen or read a recording/transcript, and identify speech section types, look for speech types and note them as important when a set of rule is met.)
Claim 19 further the processes of: further comprising an utterance segment type determination model, wherein the utterance segment type determination model is a trained model that receives utterance segment data and outputs utterance segment type data. (e.g., a human classifies speech/utterance into types/categories. The utterance segment type determination model is being treated as a generic computer device, and similar rationale has been provided in the independent claims regarding generic or conventional computer device or components.  Use of generic computer component to automate classification does not make the claim eligible without significantly more.)
Claim 20 further the processes of: wherein the determination model for determining utterance segment types is generated in advance by performing machine learning using labeled learning data. (e.g., although the claim appears to disclose the application of a machine learning model, however, the description appears to be using a previously trained supervised machine learning model, and not a specific way of training a machine learning model nor does the claim recite a specific structure of a machine learning model. The utterance segment type determination model described is being treated as a generic computer device, and similar rationale has been provided in the independent claims regarding generic or conventional computer device or components.  Use of generic computer component to automate classification does not make the claim eligible without significantly more.)
In sum, claims (2-6, 19-20), 9-13, and 14-18 depend from claims 1, 7 and 8 respectively, and further recite mental processes as explained above.  None of the additional limitations recited in claims (2-6, 19-20), 9-13, and 14-18 amount to anything more than the same or a similar abstract idea as recited in claims 1, 7 and 8 respectively.  Nor do any limitations in claims (2-6, 19-20), 9-13, and 14-18: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components.  Claims (2-6, 19-20), 9-13, and 14-18 are not patent eligible.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a speech section identification unit”, “a speech portion type determination unit”, “a speech type extraction unit”, “a speech section extraction unit”, in claims 1-6, and 19-20.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 9-13 and 14-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 9 recites the limitation “the speech section extraction unit” in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim 14 recites the limitation “the speech section extraction unit” in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claims 12 and 13 refers to “…device according to claim 9”, however, claim 9 is a method claim. As a result, claims 12 -13 contains two statutory classes, e.g. a method and a device and thus is indefinite under 112 (b) as the metes and bound are unclear.  Although it appears to be a typo, correction is required.  
Similarly, Claims 18 refers to “…device according to claim 15”, however, claim 15 is a program claim and therefore,  an improper device claim  dependent upon an improper program claim (see the 101 rejection for program per se ).  Hence, it is similarly rejected for reasons above.  Although, it appears to be a typo, correction is required.  

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-4, 6-11, 13-16, and 18-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Applicant supplied reference, Yamada (WO/2020/036190) with reference to the English machine translation provided.
Regarding Claim 1, Yamada discloses: 1. (Original) A speech section extraction device comprising: a speech section identification unit that identifies a speech section including at least one speech from speech text data including speeches of two or more people; ([0006] the point extraction device pertaining to the present invention is a point extraction device for extracting or classifying the main points in the dialogue, and the response scene estimation part for estimating the response scene of the utterance contained in the dialogue, and the utterance type estimation division for determining whether the said speech is the object of estimating the speech type based on the response scene, and the utterance type estimation division for determining whether the said speech is the object of estimating the speech type based on the response scene.)
a speech section type determination unit that determines a speech section type of each of the speech sections identified by the speech section identification unit; ([0006] the point extraction device pertaining to the present invention is a point extraction device for extracting or classifying the main points in the dialogue, and the response scene estimation part for estimating the response scene of the utterance contained in the dialogue, and the utterance type estimation division for determining whether the said speech is the object of estimating the speech type based on the response scene, and the utterance type estimation division for determining whether the said speech is the object of estimating the speech type based on the response scene. The Utterance Type Estimation Division, which estimates the speech type of the said utterance that is the subject of estimating the said utterance type according to the Speech Type Estimation Division, and the Utterance Content Extraction Judgment Section, which determines whether or not the utterance is presumed to be the subject of extraction or classification of a part of the utterance as speech point information based on the said utterance type, when the said utterance is presumed to be one of the speech types, and the Utterance Content Extraction Judgment Division, which determines whether or not the utterance is subject to extraction or classification of a part of the utterance as speech point information) 
a speech type extraction unit that extracts a speech type of each speech included in the speech text data from the speech text data; ([0006] and the utterance type estimation division for determining whether the said speech is the object of estimating the speech type based on the response scene. The Utterance Type Estimation Division, which estimates the speech type of the said utterance that is the subject of estimating the said utterance type according to the Speech Type Estimation Division, and the Utterance Content Extraction Judgment Section, which determines whether or not the utterance is presumed to be the subject of extraction or classification of a part of the utterance as speech point information based on the said utterance type, when the said utterance is presumed to be one of the speech types, and the Utterance Content Extraction Judgment Division, which determines whether or not the utterance is subject to extraction or classification of a part of the utterance as speech point information)
and a speech section extraction unit that extracts an important speech section among speech sections identified by the speech section identification unit, based on a combination and transition of the speech section types determined by the speech section type determination unit, and a combination and transition of the speech types extracted by the speech type extraction unit. ([0006] extracting or classifying the main points in the dialogue, and the response scene estimation part for estimating the response scene of the utterance contained in the dialogue, and the utterance type estimation division for determining whether the said speech is the object of estimating the speech type based on the response scene, and the utterance type estimation division for determining whether the said speech is the object of estimating the speech type based on the response scene. The Utterance Type Estimation Division, which estimates the speech type of the said utterance that is the subject of estimating the said utterance type according to the Speech Type Estimation Division, and the Utterance Content Extraction Judgment Section, which determines whether or not the utterance is presumed to be the subject of extraction or classification of a part of the utterance as speech point information based on the said utterance type, when the said utterance is presumed to be one of the speech types, and the Utterance Content Extraction Judgment Division, which determines whether or not the utterance is subject to extraction or classification of a part of the utterance as speech point information, If it is determined that a part of the utterance is extracted or classified as the main point information of the utterance, and a part of the utterance is extracted or classified as the main point information of the utterance from the said utterance based on the type of utterance, and it is determined that the said utterance is not the subject of extraction or classification of a part of the utterance as the main point information of the utterance, It is characterized by having a speech content extraction section that extracts or classifies the entire utterance as the main point information of the utterance.) [the text describes using response scene which implies a sequence/context of interaction and the utterance type to make decision if a segment is a main point or speech point information, which means the text use response scene and utterance type as metric to determine if a section is an important point.]

Regarding Claim 2, Yamada discloses all of claim 1,
Yamada further discloses: wherein the speech section extraction unit extracts the important speech section using a speech section extraction rule in which a combination and a transition of the speech section type and a combination and a transition of the speech type are determined in advance in association with the important speech section. ([0029] The utterance type estimation unit extraction part 18 extracts the utterances of the utterance type estimation unit from the utterances judged to be the subject of the presumption of speech types by the utterance type estimation division 16. Specifically, the utterance type estimation unit extraction part 18 extracts the utterances of the utterance type estimation unit based on the rules memorized by the speech type estimation unit extraction rule memory part 17. As a rule, for example, there is a rule that the period or the last character in the unit of speech recognition result appears as a unit of utterance type estimation. Based on this rule, the utterances of the utterance type estimation unit extraction part 18 extracts the utterances of the utterance type estimation unit based on the string in which the utterances in the dialogue are textualized by speech recognition. In addition, in the Speech Type Estimation Unit Extraction Part 18, the utterances of the utterance type estimation unit may be extracted by the rule that the unit divided by punctuation marks is the utterance type estimation unit other than periods, for example.)

Regarding Claim 3, Yamada discloses all of claim 2,
Yamada further discloses: wherein the speech text data includes a speech of an operator and a speech of a client,the speech section is a plurality of speech sections each including a plurality of speeches, and when the speech type indicating a speech regarding sales performed by the operator for the client is included and a combination of speech section types indicating a plurality of continuous speech sections specified in advance as an unimportant section is not included, the speech section extraction rule extracts the plurality of speech sections as the important speech section. ([0023] In the example shown in Figure 3, for example, in the classification definition, "subject utterance", which is the type of utterance, "inquiry grasping", which is the presumption target response scene, and "response", "contract confirmation", "opening", and "closing", which are non-presumption target response scenes, correspond. Classification definitions are generated based on the definition of the learning object used during training, for example. In the classification definition, among the definitions of the learning object, the response scene that includes positive or negative examples in the training data is considered to be the presumptive target response scene. In the classification definition, among the definitions of the learning object, the response scene in which only the negative example is included in the training data is considered to be the scene outside the presumptive target response. For example, when estimating whether the type of utterance is "subject utterance", utterances with a response scene of "understanding inquiry" are subject to estimation because they include positive examples or negative examples in the training data, and utterances in which the response scene is "contract confirmation", "correspondence", "opening", or "closing" are not subject to estimation because only negative examples are included in the training data.) [The text describes mechanisms for mapping utterance types, which acts as the speech section extraction rule to categorize conversations based on whether they are target (important) or non-target (unimportant) scenarios, which matches the logic of the claim.]

Regarding Claim 4, Yamada discloses all of claim 3,
Yamada further discloses: wherein the speech text data includes a speech of an operator and a speech of a client, and the speech section is a plurality of speech sections each including one speech, and when a speech type indicating a speech expressing that there is no need of the client when the operator conducts sales with the client is not included and a speech section type indicating a speech section in which the operator asks a question to the client before a speech type indicating a speech of the operator proposing to the client is included, the speech section extraction rule extracts one group including the plurality of speech sections as the important speech section. ([0026] In addition, if the response scene is "opening" or "closing", it is determined that the estimated distribution of the speech type 16 is not the presumption target of the speech type because these response scenes are included in the response scenes that are not presumed to be subject to the classification definition and are not included in the estimated target response scenes.) [the text specifically identifying "opening or closing" as the exclusion targets, it implicitly excludes the "theme type" (middle) from this specific restriction, thus fulfilling the requirement that the combination "does not include the theme type sales section" in the exclusion]

Regarding Claim 6, Yamada discloses all of claim 2,
Yamada further discloses: wherein the speech section includes a plurality of speech sections each including a plurality of speeches, and when a combination of speech section types indicating a plurality of continuous speech sections specified in advance as the important section is included, and a combination of speech types indicating speeches specified in advance in each of the plurality of continuous speech sections is included, the speech section extraction rule extracts the plurality of speech sections as the important speech section. ([0024] the Speech Point Information Extraction Definition Memory Part 15 remembers the definition of speech point information extraction that corresponds the response scene, the presumed target speech type, and the speech content extraction method. The presumption target speech type is the speech type that is the target of estimation for the utterance of each response scene. The method of extracting speech content is information that indicates whether a part of the utterance is used as speech point information, the entire speech is used as speech point information, or one of several pre-classified speech point information that shows the main content of the speech. In the definition of extracting information from the main points of speech in the example shown in Figure 4, the response scene is "understanding the inquiry" and the presumed speech types are "subject utterance", "business utterance", and "business confirmation speech". In addition, "subject utterance", which is the type of speech to be presumed, and "part of the speech as speech point information" corresponds to the speech content extraction method, and "case speech" and "case confirmation speech" which are the presumption target speech types correspond to "speech summary information" as the speech content extraction method. In addition, "extraction of speech point information" is to extract speech point information from the speech that indicates the main content of the speech. "Classification of utterance point information" is the classification of utterances into one of several pre-classified utterance point information that indicates the main content of the utterance.) Also see [0050] he response history memory section 25 remembers the response history, including the response scene estimated by the response scene estimation part 13, the speech type estimated by the speech type estimation part 20, and the key points such as the speech point information extracted or classified by the speech content extraction section 23, and the speech of the customer and the person in charge of the response.) [the text mention of storing dialogue scenes and utterance types together in a dialogue history supports the requirement for analyzing a plurality of continuous speech sections and combinations of speech types.]

Claim 7 recites a method that corresponds to the device of claim 1 and is therefore rejected under the same grounds as claim 1 above.

Regarding Claim 8, Yamada discloses: 8. (Original) A speech section extraction program that causes a computer to execute: ([0008] In addition, in order to solve the above problem, the program pertaining to the present invention functions a computer as a device for extracting the points described above.)
As for the rest of the claim, they recite elements of claim 1, therefor the rationale applied in rejection of claim 1 is equally applicable.

Claims 9-11 and 13 are method claims that corresponds to claims 2-4 and 6 therefore they are rejected under similar rationale.
Claims 14-16 are program claims that corresponds to claims 2-4 and therefore they are rejected under similar rationale.
Claim 18 although depends on a different base claim, but nevertheless recite similar elements that corresponds to claim 6, therefore the similar rationale of rejection is applicable.  

Regarding Claim 19, Yamada discloses all of claim 1,
Yamada further discloses: further comprising an utterance segment type determination model, wherein the utterance segment type determination model is a trained model that receives utterance segment data and outputs utterance segment type data. ([0020] The response scene estimation model memory part 12 remembers the response scene estimation model generated by learning the correspondence between the utterance and the response scene. The response scene is a scene in a dialogue, for example, "opening" such as the first greeting, "understanding the inquiry" to understand the content of the inquiry, "contract confirmation" to confirm that the customer is the contractor and the contents of the contract, "corresponding" to answering and responding to the customer about the contents of the inquiry that has been grasped, and "closing" such as the final greeting. For learning, for example, a support vector machine (SVM) can be used.)

Regarding Claim 20, Yamada discloses all of claim 19,
Yamada further discloses: wherein the determination model for determining utterance segment types is generated in advance by performing machine learning using labeled learning data. ([0022] FIG. 3 is a diagram showing an example of a sorting definition memorized by the sorting definition memory unit 14. The Classification Definition Memory Part 14 remembers the classification definition by mapping the type of utterance and the presumptive target response scene and the non-estimated target response scene, as shown in FIG. 3. The estimated target response scene is the response scene used as a positive or negative example in the training data. The presumptive non-target response scene is the response scene that is used as a negative example in the training data or is not subject to learning. Whether to use it as a negative example or not to be used as a learning object should be a setting that is predetermined at the time of learning, for example, adjusting the ratio of the number of positive cases and the number of negative examples to be the same.)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5, 12 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Yamada, in view of Ring (US 20220027977).
Regarding Claim 5, Yamada disclose all the elements of Claim 2,
Yamada does not explicitly disclose the following feature.
Ring discloses: wherein the speech text data includes a speech of an operator and a speech of a client, and the speech section is a plurality of speech sections each including one speech, and when a speech type indicating a speech expressing that there is no need of the client when the operator conducts sales with the client is not included and a speech section type indicating a speech section in which the operator asks a question to the client before a speech type indicating a speech of the operator proposing to the client is included, the speech section extraction rule extracts one group including the plurality of speech sections as the important speech section. ([0048] Aspects of the present invention can automate the incorporation of questions, answers and/or responses (into interaction sequences) that are designed to learn information about the user, including information about the user's personal characteristics (e.g., demographics, background, needs, preferences, tastes, personality, and the like). These may include questions that are seemingly unrelated to the sale, but still may provide useable insight. For example, asking the user's favorite animal may lead to the insight that users having the same favorite animal are more likely to buy the same float. The system can determine how best to sell to that individual and/or to users with similar characteristics (e.g., demographic characteristics, personality characteristics, tastes, preferences or the like). This includes, for example, what kinds of questions, answers, and responses tend to lead toward higher sales, greater customer satisfaction, and/or other desired outcomes or metrics.) [the text disclose to extract sequences in which the operator or sales agent guiding a client (who has not explicitly rejected the offer) via a discovery/proposal technique which aligns with the claim logic of defining a rule for identifying important conversation moment (buying signal or need discovery), which is the time the operator/sales agent is asking question and gaining more insight of potential customer before making a sales pitch, all the while, the client/potential customer have not yet rejected the operator.]
Yamada and Ring are considered analogous art.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Yamada to combine the teaching of Ring, because embodiments described herein shows that asking question and discovering needs of client/buyer is important in achieving successful sales (Ring, [0048]).

Claim 12 is a method claim that corresponds to claim 5 therefore is also rejected under similar rationale.
Claim 17 although depends on a different base claim, but nevertheless recite similar elements that corresponds to claim 5, therefore the similar rationale of rejection is applicable.  


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Eisenzopf (US 20190171712) – discloses method for classifying conversation segments. See para 0038 and fig. 3 for additional details.
Yamamura, T., Hino, M., & Shimada, K. (2018). Dialogue act annotation and identification in a Japanese multi-party conversation corpus. In Proceedings of the fourth Asia Pacific corpus linguistics conference (pp. 529-536). – discloses annotation of dialogue acts for multiparty conversation.  See Abstract and table 1, 4-5 and figure 1 for additional details.
Nakanishi, T., Okada, R., Tanaka, Y., Ogasawara, Y., & Ohashi, K. (2017, July). A topic extraction method on the flow of conversation in meetings. In 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI) (pp. 351-356). IEEE. – discloses topic extraction in a meeting and importance of time series variation.  See Abstract, and section III for additional details.
Bokaei, M. H., Sameti, H., & Liu, Y. (2016). Summarizing meeting transcripts based on functional segmentation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(10), 1831-1841. – discloses identifying most important utterance for each discourse segment. See fig. 1 and section III for additional details.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Philip H Lam whose telephone number is (571)272-1721. The examiner can normally be reached 9 AM-3 PM Pacific time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PHILIP H LAM/            Examiner, Art Unit 2656
Read full office action
Prosecution Timeline

May 31, 2024
Application Filed
Feb 27, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/643,239
Patent 12591626
SEARCH STRING ENHANCEMENT
2y 5m to grant Granted Mar 31, 2026
18/329,990
Patent 12572735
DOMAIN-SPECIFIC DOCUMENT VALIDATION
2y 5m to grant Granted Mar 10, 2026
18/377,570
Patent 12572747
MULTI-TURN DIALOGUE RESPONSE GENERATION WITH AUTOREGRESSIVE TRANSFORMER MODELS
2y 5m to grant Granted Mar 10, 2026
18/119,007
Patent 12562158
ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF
2y 5m to grant Granted Feb 24, 2026
18/670,728
Patent 12561194
ROOT CAUSE PATTERN RECOGNITION BASED MODEL TRAINING
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+45.5%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 129 resolved cases by this examiner. Grant probability derived from career allow rate.