Last updated: May 29, 2026
Application No. 18/485,726
SYSTEMS AND METHODS FOR DETERMINING SEMANTIC POINTS IN HUMAN-TO-HUMAN CONVERSATIONS

Non-Final OA §103
Filed
Oct 12, 2023
Priority
Oct 10, 2022 — IN 202241057971 +1 more
Examiner
BOGGS JR., JAMES
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
3 (Non-Final)
Interview Optional

— +35.9% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 61% grant rate with +35.9% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 112 resolved cases, 2023–2026
Examiner Intelligence

BOGGS JR., JAMES View full profile →
Grants 61% of resolved cases
Career Allowance Rate
68 granted / 112 resolved
-1.3% vs TC avg
Strong +36% interview lift
Without
With
+35.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
13 currently pending
Career history
136
Total Applications
across all art units
Statute-Specific Performance

§101
0.8%
-39.2% vs TC avg
§103
87.3%
+47.3% vs TC avg
§102
1.6%
-38.4% vs TC avg
§112
3.8%
-36.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 112 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on April 3, 2026, has been entered.
Response to Arguments
Applicant’s arguments, filed April 3, 2026, regarding the 35 U.S.C. 103 rejections of claims 1 – 2, 4, 11, 13 – 14, 18 and 20 have been considered but they are not persuasive.
On page 9 of Applicant’s response, Applicant argues “At col. 4, lines 11-14, Kim discloses that "the dialog state tracking neural network utilizes a reset gate associated with the memory slot to generate a second value for the memory slot based on a second segment of the digital dialog". Kim thus discloses that each memory slot has a corresponding reset gate that is used to update that memory slot. Similarly, at col. 9, lines 60-66, Kim discloses that "the values of a memory slot at time t-1 can be used in generating the new values for the memory slot at time t. In other words, the values generated by processing previous segments of digital dialog can impact the current values obtained by processing the current segment of digital dialog".  Kim discloses that the reset gate may be used to update a corresponding memory slot. Similarly, Kim also discloses that the content of a memory slot at a time t-1 may be used to generate the content of the memory slot for a time t. However, Kim merely discloses that the reset gate is used to update a particular memory slot. Kim does not suggest that the reset gate indicates a memory to be updated. Kim discloses that each memory slot is updated based on the previous content of that memory slot. Kim therefore has no need of an indication of which memory to update, because each memory is updated based on its own content. Kim therefore does not remedy the deficiencies of Hakkani-Tur.”.
However, Kim et al. (US Patent No. 11,657,802), recites, in column 4, lines 4-21, "To provide an example, in one or more embodiments, the dialog state tracking system provides a digital dialog to a dialog state tracking neural network having a dynamic memory architecture that includes a plurality of memory slots and reset gates. The dialog state tracking system uses the neural network to generate a first value of a memory slot based on a first segment of the digital dialog. Subsequently, the dialog state tracking neural network utilizes a reset gate associated with the memory slot to generate a second value for the memory slot based on a second segment of the digital dialog. In some embodiments, the dialog state tracking neural network generates the second value of the memory slot by further using an update gate associated with the first memory slot. In some embodiments, the dialog state tracking neural network uses the reset gate and the update gate to generate the second value based on cross-slot interactions between the memory slot and other memory slots in the dynamic memory architecture.", disclosing “deriving, for each dialogue turn, a transient state, based on the one or more NL attributes, the transient state indicating a memory to be updated with a corresponding NL attribute”, where a memory slot reads on a memory and the dialog state tracking neural network processing a segment of dialog and utilizing a reset gate and an update gate to generate values for memory slots and determine how memory slots are to be updated reads on the transient state indicating a memory to be updated with a corresponding NL attribute.  Identifying a memory slot to update does not depend on the source of the content for the memory slot update.
Therefore, the rejections of Claims 1 – 2, 4, 11, 13 – 14, 18 and 20 under 35 U.S.C. 103 are maintained.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: 
“an identifying module configured to identify” in claim 14
“a natural language (NL) attribute generator module configured to determine” in claim 14
a transient state estimator module configured to derive” in claim 14
“a conversation nuance (CN) classifier module configured to derive” in claim 14
“a turn memory update module configured to dynamically store” in claim 14
“a hierarchical semantic point (HSP) module configured to determine” in claim 14
“a search module configured to provide a search result” in claim 14.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  Corresponding structure sufficient to perform the claimed functions is found in the specification in paragraph 0065, “In an embodiment, the processor/controller 202 may be operatively coupled to each of the I/O interface 204, the modules 206, the transceiver 208 and the memory 210. In one embodiment, the processor/controller 202 may include at least one data processor for executing processes in Virtual Storage Area Network. The processor/controller 202 may include specialized processing units such as, integrated system (bus) controllers, memory management control units, floating point units, graphics processing units, digital signal processing units, etc. In one embodiment, the processor/controller 202 may include a central processing unit (CPU), a graphics processing unit (GPU), or both. The processor/controller 202 may be one or more general processors, digital signal processors, application-specific integrated circuits, field-programmable gate arrays, servers, networks, digital circuits, analog circuits, combinations thereof, or other now known or later developed devices for analyzing and processing data. The processor/controller 202 may execute a software program, such as code generated manually (i.e., programmed) to perform the desired operation.”.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 2, 4, 11, 13 – 14, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hakkani-Tur et al. (US Patent No. 10,181,322), hereinafter Hakkani-Tur, in view of Kim et al. (US Patent No. 11,657,802), hereinafter Kim, Rosset et al. ("Multi-level information and automatic dialog act detection in human-human spoken dialogs"), hereinafter Rosset, and Bangalore (US Patent No. 9,214,157).
Regarding claim 1, Hakkani-Tur discloses a method for determining semantic points in a human-to-human conversation, the method comprising:
identifying the human-to-human conversation, comprising a plurality of dialogue turns, on an electronic device (Column 2, lines 1-4, "Embodiments described in the present disclosure provide for a multi-user, multi-domain dialog system including a conversation processing device in communication with a client device."; Column 4, lines 20-26, "The client device includes one or more input devices that collect speech and, optionally, additional inputs from the users 108a, 108b. The users are the human participants in the conversation. The client device includes one or more input devices that receive turns from the conversation between the users and the dialog system as conversational inputs.");
determining, for each dialogue turn of the plurality of dialogue turns, one or more natural language (NL) attributes (Column 4, line 66 - Column 45, line 2, "The language understanding module disassembles and parses the text. The text is converted into semantic representations that may be understood and processed by a machine."; Column 5, lines 33-37, "The feature extraction component extracts lexical and contextual features from the computer-addressed conversational input and conversational inputs occurring prior to the current computer-addressed conversational input (i.e., prior conversational inputs) for use in domain detection."; Extracting lexical and contextual features from the conversational input reads on determining natural language (NL) attributes.);
dynamically storing, at one or more memories, after each dialogue turn, information associated with the human-to-human conversation based on the one or more NL attributes, the transient state (Column 6, line 66 - Column 7, line 4, "The dialog manager preserves the dialog state 250 of the conversation. The dialog state is stored in a dialog state memory (e.g., a dialog state database). In general, the dialog state includes everything that happens during the conversation such as, but not limited to, the conversational input history."; Storing a dialog state in a dialog state memory, where the dialog state includes everything that happens during the conversation, reads on dynamically storing information associated with the human-to-human conversation and the transient state.);
determining one or more semantic relations [and associated dialogue timelines] within the human-to-human conversation based on the dynamically stored information (Column 6, lines 32-39, "The semantic representation component converts the computer-addressed conversational input into a domain-specific semantic representation based on the domain assigned to the computer-addressed conversational input by the domain detection component. A semantic ontology 242 for each domain that includes domain-specific intents 244 and slots 246 is defined to describe possible user requests within the domain."; Determining a domain-specific semantic representation of a conversational input reads on determining a semantic relation within the human-to-human conversation.);
generating one or more semantic points corresponding to the determined one or more semantic relations [and the associated dialogue timelines] within the human-to-human conversation (Column 6, lines 32-51, "The semantic representation component converts the computer-addressed conversational input into a domain-specific semantic representation based on the domain assigned to the computer-addressed conversational input by the domain detection component. A semantic ontology 242 for each domain that includes domain-specific intents 244 and slots 246 is defined to describe possible user requests within the domain. The semantic representation component determines the action a user wants a computer to take or the information the user would like to obtain from the computer-addressed conversational input and generates the appropriate intent defined by the semantic ontology. Examples of intents include, but are not limited to, start over, go back, find information, find content, and play content. For example, the semantic ontology may define semantic frames 248 associated with an intent to search for information about movies using slots such as, but not limited to, director, actor, genre, release date, rating and about restaurants using slots such as, but not limited to, restaurant name, cuisine, restaurant location, address, phone number, and service type."; Determining intents from a semantic representation of a conversational input reads on generating semantic points corresponding to the determined semantic relations within the human-to-human conversation.);
and in response to a user search query, providing a search result to the user based on the generated one or more semantic points (Column 7, lines 6-20, "When a computer-addressed conversational input occurs, a response generator 252 performs a dialog action 254 associated with the semantic representation of the conversational input and generates a response for presentation to the users. Examples of dialog actions include, but are not limited to, executing an informational query against a knowledgebase or other data system (e.g., get a list of recent movies of a selected genre staring a selected actor from a movie database), executing a transactional query to invoke a supported application (e.g., play a media file using a supported media player or submit a query to web search engine using a supported web browser), and executing a navigational query (e.g., start over or go back) against the dialog system to navigate through the dialog state."; A conversational input including an informational query reads on a user search query, and performing a dialog action associated with the semantic representation of the conversational input and generating a response for presentation to the user by submitting a query to a web search engine reads on and providing a search result to the user based on one or more semantic points.).
Hakkani-Tur does not specifically disclose: deriving, for each dialogue turn, a transient state, based on the one or more NL attributes, the transient state indicating a memory to be updated with a corresponding NL attribute.
Kim teaches:
deriving, for each dialogue turn, a transient state, based on the one or more NL attributes, the transient state indicating a memory to be updated with a corresponding NL attribute (Column 4, lines 4-21, "To provide an example, in one or more embodiments, the dialog state tracking system provides a digital dialog to a dialog state tracking neural network having a dynamic memory architecture that includes a plurality of memory slots and reset gates. The dialog state tracking system uses the neural network to generate a first value of a memory slot based on a first segment of the digital dialog. Subsequently, the dialog state tracking neural network utilizes a reset gate associated with the memory slot to generate a second value for the memory slot based on a second segment of the digital dialog. In some embodiments, the dialog state tracking neural network generates the second value of the memory slot by further using an update gate associated with the first memory slot. In some embodiments, the dialog state tracking neural network uses the reset gate and the update gate to generate the second value based on cross-slot interactions between the memory slot and other memory slots in the dynamic memory architecture."; Column 4, lines 37-41, "When the dialog state tracking neural network processes a segment of digital dialog, the dialog state tracking neural network generates new values for one or more of the memory slots in order to generate a new digital dialog state corresponding to the segment."; Column 9, lines 46-54, "As shown in FIG. 3, the dialog state tracking neural network 300 can take a sequence of digital dialog segments (e.g., utterances) from ut−w+1 to ut as the input 302 at time step t where w represents a history window. In other words, ut represents a current segment of digital dialog and the sequence from ut−w+1 to ut−1 represent previous segments of digital dialog within a selected time window that the dynamic memory network uses in determining the current digital dialog state (i.e., the digital dialog state at time t)."; Determining the current dialog state based on a sequence of utterances reads on deriving a transient state for each dialog turn based on natural language attributes, and the dialog state tracking neural network processing a segment of dialog and utilizing a reset gate and an update gate to generate values for memory slots and determine how memory slots are to be updated reads on the transient state indicating a memory to be updated with a corresponding NL attribute.).
Kim is considered to be analogous to the claimed invention because it is in the same field of natural language understanding.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hakkani-Tur to incorporate the teachings of Kim to determine the current dialog state based on a sequence of utterances and implement a dialog state tracking neural network processing a segment of dialog and utilizing a reset gate and an update gate to generate values for memory slots and determine how memory slots are to be updated.  Doing so would allow for implementing dialog state tracking that generates dialog states based on all relevant previous segments of dialog (Kim; Column 5, lines 24-40).
Hakkani-Tur in view of Kim does not specifically disclose: deriving, for each dialogue turn, one or more conversation nuances associated with the human-to-human conversation, based on the one or more NL attributes; dynamically storing, at one or more memories, after each dialogue turn, the one or more conversation nuances associated with each dialogue turn.
Rosset teaches:
deriving, for each dialogue turn, one or more conversation nuances associated with the human-to-human conversation, based on the one or more NL attributes (Section 4, lines 1-8, "A dialog can be divided into segments called turns, in which a single speaker has temporary control of the dialog and speaks for some period of time. Within a turn, the speaker may produce one or more utterances units where the definition of an utterance unit is based on an analysis of the speaker’s intention (the dialog acts). Once a turn is segmented into units which cover a single intention, these are annotated with dialog acts."; Section 4, lines 28-34, "In this study, two of the five broad classes have been further subdivided so as to allow multiple tags to be specified for each utterance unit: the Forward-looking function class was split into two subclasses (Statement and Influence-on-Listener), and the Backward-looking function class was divided into three subclasses (Agreement, Answer and Understanding)."). Determining subclasses of dialog acts by analyzing the speaker’s intention for turns of a dialog reads on deriving conversation nuances associated with the human-to-human conversation for each dialog turn based on natural language attributes.);
dynamically storing, at one or more memories, after each dialogue turn, the one or more conversation nuances associated with each dialogue turn (Section 4, lines 28-34, "In this study, two of the five broad classes have been further subdivided so as to allow multiple tags to be specified for each utterance unit: the Forward-looking function class was split into two subclasses (Statement and Influence-on-Listener), and the Backward-looking function class was divided into three subclasses (Agreement, Answer and Understanding)."; Section 6.2, lines 57-60, "Concerning the second dialog history hypothesis, if the dialog acts of a turn have an incidence on the dialog acts of the next turn, then, it seems useful to capture a larger dialog history."; Subclasses of dialog acts read on conversation nuances associated with each dialogue turn, and capturing dialog history reads on storing the conversation nuances in memory.).
Rosset is considered to be analogous to the claimed invention because it is in the same field of natural language understanding.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hakkani-Tur in view of Kim to incorporate the teachings of Rosset to determine subclasses of dialog acts by analyzing the speaker’s intention for turns of a dialog.  Doing so would allow for automatically modeling discourse structure to develop more sophisticated spoken dialog systems (Rosset; Section 1, lines 1-12).
Hakkani-Tur in view of Kim and Rosset does not specifically disclose:
determining associated dialogue timelines within the human-to-human conversation based on the dynamically stored information; and generating one or more semantic points corresponding to the associated dialogue timelines within the human-to-human conversation.
Bangalore teaches:
determining associated dialogue timelines within the human-to-human conversation based on the dynamically stored information (Column 7, line 46 - Column 8, line 3, "FIG. 3 illustrates an example timeline of a sample natural language conversation between a user and a natural language processing system. In this example, time progresses from left to right. Thus, at point A, the user says to the system “I need a plumber”. This initial utterance can help the system form a context for the conversation. For example, the system can form the initial context that extends beyond the phrase “I need a plumber” to include other related phrases, speech recognition grammars, and so forth. Then at point B, the user says “near Springfield”. The system can determine that “near Springfield” is within a threshold distance of the on-going conversation context, so the system updates the context with that speech.  At point C, the user says “Honey, the phone's ringing!” The system can compare this utterance to the current, on-going conversation context. In this case, the utterance is unrelated to the current context, i.e. is outside a threshold distance from the conversation context. Thus, the system can ignore this utterance and continue to monitor user utterances. At point D, the user continues and says “that specializes in remodeling bathrooms”. The system can determine that this utterance is a continuation of the current context, parse the combination of the speech at points A, B, and D to generate a query. The system can then generate a response to the query and output that response to the user."; Determining if dialog is within a threshold distance of on-going conversation context based on a timeline of a natural language conversation reads on determining associated dialogue timelines.);
and generating one or more semantic points corresponding to the associated dialogue timelines within the human-to-human conversation (Column 7, line 46 – Column 8, line 3, "FIG. 3 illustrates an example timeline of a sample natural language conversation between a user and a natural language processing system. In this example, time progresses from left to right. Thus, at point A, the user says to the system “I need a plumber”. This initial utterance can help the system form a context for the conversation. For example, the system can form the initial context that extends beyond the phrase “I need a plumber” to include other related phrases, speech recognition grammars, and so forth. Then at point B, the user says “near Springfield”. The system can determine that “near Springfield” is within a threshold distance of the on-going conversation context, so the system updates the context with that speech.  At point C, the user says “Honey, the phone's ringing!” The system can compare this utterance to the current, on-going conversation context. In this case, the utterance is unrelated to the current context, i.e. is outside a threshold distance from the conversation context. Thus, the system can ignore this utterance and continue to monitor user utterances. At point D, the user continues and says “that specializes in remodeling bathrooms”. The system can determine that this utterance is a continuation of the current context, parse the combination of the speech at points A, B, and D to generate a query. The system can then generate a response to the query and output that response to the user."; Parsing a combination of speech at multiple points of a natural language conversation to generate a query reads on generating a semantic point corresponding to the associated dialogue timelines within the human-to-human conversation.).
Bangalore is considered to be analogous to the claimed invention because it is in the same field of natural language understanding.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hakkani-Tur in view of Kim and Rosset to incorporate the teachings of Bangalore to determine if dialog is within a threshold distance of on-going conversation context based on a timeline of a natural language conversation and parse a combination of speech at multiple points of a natural language conversation to generate a query.  Doing so would allow for modeling a dialog that has been transpiring between two human users and discard audio which is not compatible with the dialog context (Bangalore; Column 1, line 66 - Column 2, line 21).
Regarding claim 2, Hakkani-Tur in view of Kim, Rosset, and Bangalore discloses the method as claimed in claim 1.
Hakkani-Tur further discloses:
wherein the one or more NL attributes comprises at least one of an intent, dialogue act, a named entity, and a relation among the one or more NL attributes from the plurality of dialogue turns from the human-to-human conversation (Column 6, lines 57-62, "The semantic representation component estimates the intent of the computer-addressed conversational input, selects a semantic frame associated with the intent, and maps the entities extracted from the computer-addressed conversational input to the corresponding slots to fill the semantic frame.").
Regarding claim 4, Hakkani-Tur in view of Kim, Rosset, and Bangalore discloses the method as claimed in claim 1.
Rosset further teaches:
wherein deriving the one or more conversation nuances for each dialogue turn comprises generating one or more labels for each dialogue turn to model a level of uncertainty in the human-to-human conversation (Section 4, lines 3-19, "Within a turn, the speaker may produce one or more utterances units where the definition of an utterance unit is based on an analysis of the speaker’s intention (the dialog acts). Once a turn is segmented into units which cover a single intention, these are annotated with dialog acts. Annotation involves making choices along several dimensions, each one describing a different orthogonal aspect of the utterance unit. The dialog acts represent different aspects of an utterance. For instance, one dimension characterizes the effect an utterance has on the other speaker, such as a request for information or the making of a statement. Another dimension shows that a speaker has understood what has been said to him or her. A dialog act represents a value along one of the dimensions, often referred to as a tag. The utterance tags summarize the intentions of the speaker and the content of the utterance unit."; Dialog acts representing different aspects of an utterance and representing a dialog act as a tag that is a value along one of the dimensions reads on generating one or more labels for each dialogue turn to model a level of uncertainty in the human-to-human conversation.).
Rosset is considered to be analogous to the claimed invention because it is in the same field of natural language understanding.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hakkani-Tur in view of Kim, Rosset, and Bangalore to further incorporate the teachings of Rosset to determine dialog acts representing different aspects of an utterance and representing a dialog act as a tag that is a value along one of the dimensions.  Doing so would allow for automatically modeling discourse structure to develop more sophisticated spoken dialog systems (Rosset; Section 1, lines 1-12).
Regarding claim 11, Hakkani-Tur in view of Kim, Rosset, and Bangalore discloses the method as claimed in claim 1.
Hakkani-Tur further discloses:
determining an NL representation along with a range of the associated plurality of dialogue turns for a user of the human-to-human conversation based on the generated one or more semantic points (Column 6, line 66 - Column 7, line 4, "The dialog manager preserves the dialog state 250 of the conversation. The dialog state is stored in a dialog state memory (e.g., a dialog state database). In general, the dialog state includes everything that happens during the conversation such as, but not limited to, the conversational input history."; Column 6, lines 32-51, "The semantic representation component converts the computer-addressed conversational input into a domain-specific semantic representation based on the domain assigned to the computer-addressed conversational input by the domain detection component. A semantic ontology 242 for each domain that includes domain-specific intents 244 and slots 246 is defined to describe possible user requests within the domain. The semantic representation component determines the action a user wants a computer to take or the information the user would like to obtain from the computer-addressed conversational input and generates the appropriate intent defined by the semantic ontology. Examples of intents include, but are not limited to, start over, go back, find information, find content, and play content. For example, the semantic ontology may define semantic frames 248 associated with an intent to search for information about movies using slots such as, but not limited to, director, actor, genre, release date, rating and about restaurants using slots such as, but not limited to, restaurant name, cuisine, restaurant location, address, phone number, and service type."; Determining possible user requests reads on determining a natural language representation, the dialog states of a conversational input history read on a range of the associated plurality of dialogue turns, and the intent determined from the semantic representation read on semantic points.).
Regarding claim 13, Hakkani-Tur in view of Kim, Rosset, and Bangalore discloses the method as claimed in claim 1.
Bangalore further teaches:
generating the dialogue timelines based on one or more update points, after each dialogue turn, associated with the dynamically storing of the information associated with the human-to-human conversation (Column 7, line 46 – Column 8, line 3, "FIG. 3 illustrates an example timeline of a sample natural language conversation between a user and a natural language processing system. In this example, time progresses from left to right. Thus, at point A, the user says to the system “I need a plumber”. This initial utterance can help the system form a context for the conversation. For example, the system can form the initial context that extends beyond the phrase “I need a plumber” to include other related phrases, speech recognition grammars, and so forth. Then at point B, the user says “near Springfield”. The system can determine that “near Springfield” is within a threshold distance of the on-going conversation context, so the system updates the context with that speech.  At point C, the user says “Honey, the phone's ringing!” The system can compare this utterance to the current, on-going conversation context. In this case, the utterance is unrelated to the current context, i.e. is outside a threshold distance from the conversation context. Thus, the system can ignore this utterance and continue to monitor user utterances. At point D, the user continues and says “that specializes in remodeling bathrooms”. The system can determine that this utterance is a continuation of the current context, parse the combination of the speech at points A, B, and D to generate a query. The system can then generate a response to the query and output that response to the user."; Parsing a combination of speech at multiple points of a natural language conversation reads on generating the dialogue timelines based on update points.).
Bangalore is considered to be analogous to the claimed invention because it is in the same field of natural language understanding.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hakkani-Tur in view of Kim, Rosset, and Bangalore to further incorporate the teachings of Bangalore to parse a combination of speech at multiple points of a natural language conversation.  Doing so would allow for modeling a dialog that has been transpiring between two human users and discard audio which is not compatible with the dialog context (Bangalore; Column 1, line 66 - Column 2, line 21).
Regarding claim 14, arguments analogous to claim 1 are applicable.  In addition, Hakkani-Tur discloses a system for determining semantic points in a human-to-human conversation (Column 2, lines 1-4, "Embodiments described in the present disclosure provide for a multi-user, multi-domain dialog system including a conversation processing device in communication with a client device."), the system performing the steps of claim 1.
Regarding claim 18, arguments analogous to claim 11 are applicable.
Regarding claim 20, arguments analogous to claim 13 are applicable.
Allowable Subject Matter
Claims 3, 5 – 10, 12, 15 – 17 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:
Regarding claim 3, the primary reason claim 3 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, is the inclusion of the limitation “wherein deriving the transient state for each dialogue turn comprises assigning one of a temporary, confirmed, and ignored labels to each of the one or more NL attributes based on the plurality of dialogue turns of the human-to-human conversation” in combination with the limitations to identify a human-to-human conversation comprising a plurality of dialogue turns, determine one or more natural language attributes for each dialogue turn of the plurality of dialogue turns, derive a transient state for each dialogue turn based on the one or more natural language attributes, where the transient state indicates a memory to which a corresponding natural language attribute is determined to transition, derive one or more conversation nuances associated with the human-to-human conversation for each dialogue turn based on the one or more natural language attributes, store, at one or more memories, information associated with the human-to-human conversation after each dialogue turn based on the one or more natural language attributes, the transient state, and the one or more conversation nuances associated with each dialogue turn, determine one or more semantic relations and associated dialogue timelines within the human-to-human conversation based on the stored information, generate one or more semantic points corresponding to the determined one or more semantic relations and the associated dialogue timelines within the human-to- human conversation, and in response to a user search query, provide a search result to the user based on the generated one or more semantic points.
Regarding claims 5 – 10, the primary reason claims 5 – 10 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, is the inclusion, in all the claims, of the limitation “dynamically updating, at the one or more memories, after each dialogue turn, the stored information associated with the one or more NL attributes of the human-to-human conversation based on the transient state and the one or more conversation nuances associated with each dialogue turn, wherein the one or more memories include a user preference memory, a cache memory, and a final goal memory” in combination with the limitations to identify a human-to-human conversation comprising a plurality of dialogue turns, determine one or more natural language attributes for each dialogue turn of the plurality of dialogue turns, derive a transient state for each dialogue turn based on the one or more natural language attributes, where the transient state indicates a memory to which a corresponding natural language attribute is determined to transition, derive one or more conversation nuances associated with the human-to-human conversation for each dialogue turn based on the one or more natural language attributes, store, at one or more memories, information associated with the human-to-human conversation after each dialogue turn based on the one or more natural language attributes, the transient state, and the one or more conversation nuances associated with each dialogue turn, determine one or more semantic relations and associated dialogue timelines within the human-to-human conversation based on the stored information, generate one or more semantic points corresponding to the determined one or more semantic relations and the associated dialogue timelines within the human-to- human conversation, and in response to a user search query, provide a search result to the user based on the generated one or more semantic points.
Regarding claim 12, the primary reason claim 12 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, is the inclusion of the limitation “determining a compressed version of the one or more dialogue turns based on the one or more semantic points and the at least one dialogue turn, wherein the compressed version of the one or more dialogue turns is displayed on a user interface for a user of the human-to-human conversation” in combination with the limitations to identify a human-to-human conversation comprising a plurality of dialogue turns, determine one or more natural language attributes for each dialogue turn of the plurality of dialogue turns, derive a transient state for each dialogue turn based on the one or more natural language attributes, where the transient state indicates a memory to which a corresponding natural language attribute is determined to transition, derive one or more conversation nuances associated with the human-to-human conversation for each dialogue turn based on the one or more natural language attributes, store, at one or more memories, information associated with the human-to-human conversation after each dialogue turn based on the one or more natural language attributes, the transient state, and the one or more conversation nuances associated with each dialogue turn, determine one or more semantic relations and associated dialogue timelines within the human-to-human conversation based on the stored information, generate one or more semantic points corresponding to the determined one or more semantic relations and the associated dialogue timelines within the human-to- human conversation, and in response to a user search query, provide a search result to the user based on the generated one or more semantic points.
Regarding claims 15 – 17, the primary reason claims 15 – 17 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, is the inclusion, in all the claims, of the limitation “dynamically update, at the one or more memories, after each dialogue turn, the stored information associated with the one or more NL attributes of the human-to-human conversation based on the transient state and the one or more conversation nuances associated with each dialogue turn, and wherein the one or more memories include a user preference memory, a cache memory, and a final goal memory” in combination with the limitations to identify a human-to-human conversation comprising a plurality of dialogue turns, determine one or more natural language attributes for each dialogue turn of the plurality of dialogue turns, derive a transient state for each dialogue turn based on the one or more natural language attributes, where the transient state indicates a memory to which a corresponding natural language attribute is determined to transition, derive one or more conversation nuances associated with the human-to-human conversation for each dialogue turn based on the one or more natural language attributes, store, at one or more memories, information associated with the human-to-human conversation after each dialogue turn based on the one or more natural language attributes, the transient state, and the one or more conversation nuances associated with each dialogue turn, determine one or more semantic relations and associated dialogue timelines within the human-to-human conversation based on the stored information, generate one or more semantic points corresponding to the determined one or more semantic relations and the associated dialogue timelines within the human-to- human conversation, and in response to a user search query, provide a search result to the user based on the generated one or more semantic points.
Regarding claim 19, the primary reason claim 19 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, is the inclusion of the limitation “determine a compressed version of the one or more dialogue turns based on the semantic points and the at least one dialogue turn, wherein the compressed version is displayed on a user interface for a user of the human-to-human conversation” in combination with the limitations to identify a human-to-human conversation comprising a plurality of dialogue turns, determine one or more natural language attributes for each dialogue turn of the plurality of dialogue turns, derive a transient state for each dialogue turn based on the one or more natural language attributes, where the transient state indicates a memory to which a corresponding natural language attribute is determined to transition, derive one or more conversation nuances associated with the human-to-human conversation for each dialogue turn based on the one or more natural language attributes, store, at one or more memories, information associated with the human-to-human conversation after each dialogue turn based on the one or more natural language attributes, the transient state, and the one or more conversation nuances associated with each dialogue turn, determine one or more semantic relations and associated dialogue timelines within the human-to-human conversation based on the stored information, generate one or more semantic points corresponding to the determined one or more semantic relations and the associated dialogue timelines within the human-to- human conversation, and in response to a user search query, provide a search result to the user based on the generated one or more semantic points.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JAMES BOGGS/Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Show 3 earlier events
Nov 20, 2025
Examiner Interview Summary
Nov 20, 2025
Applicant Interview (Telephonic)
Dec 22, 2025
Response Filed
Feb 03, 2026
Final Rejection mailed — §103
Apr 03, 2026
Response after Non-Final Action
May 04, 2026
Request for Continued Examination
May 06, 2026
Response after Non-Final Action
May 22, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/041,710
Patent 12620399
VOICE PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND COMPUTER READABLE MEDIUM
3y 2m to grant Granted May 05, 2026
18/163,848
Patent 12586600
Streaming Vocoder
3y 1m to grant Granted Mar 24, 2026
17/977,443
Patent 12573406
VOICE AUTHENTICATION BASED ON ACOUSTIC AND LINGUISTIC MACHINE LEARNING MODELS
3y 4m to grant Granted Mar 10, 2026
18/314,249
Patent 12572752
DYNAMIC CONTENT GENERATION METHOD
2y 10m to grant Granted Mar 10, 2026
18/483,896
Patent 12562170
BIOMETRIC AUTHENTICATION DEVICE, BIOMETRIC AUTHENTICATION METHOD, AND RECORDING MEDIUM
2y 4m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
61%
Grant Probability
97%
With Interview (+35.9%)
3y 2m (~7m remaining)
Median Time to Grant
High
PTA Risk
Based on 112 resolved cases by this examiner. Grant probability derived from career allowance rate.