Last updated: April 19, 2026
Application No. 18/378,118
SYSTEM AND METHOD FOR OPTIMIZING A USER INTERACTION SESSION WITHIN AN INTERACTIVE VOICE RESPONSE SYSTEM

Final Rejection §101§102§103§112
Filed
Oct 09, 2023
Examiner
WOZNIAK, JAMES S
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Hishab Singapore Private Limited
OA Round
2 (Final)
This examiner grants 59% of cases after interview

— +40.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 385 resolved cases, 2023–2026
Examiner Intelligence

WOZNIAK, JAMES S View full profile →
Grants 59% of resolved cases
Career Allow Rate
227 granted / 385 resolved
-3.0% vs TC avg
Strong +40% interview lift
Without
With
+40.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
42 currently pending
Career history
427
Total Applications
across all art units
Statute-Specific Performance

§101
18.1%
-21.9% vs TC avg
§103
40.1%
+0.1% vs TC avg
§102
18.4%
-21.6% vs TC avg
§112
16.1%
-23.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 385 resolved cases
Office Action

§101 §102 §103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

In response to the Non-final Office Action mailed on 6/30/2025, Applicant has filed an amendment on 9/26/2025.  In this reply, despite a multitude of claim objections for formal matters and rejections under various statutes, Applicant has not elected to make any claim amendments to positively advance prosecution.  Instead, Applicant has traversed each of the rejections and has requested that the objections due to formal matters be held in abeyance.  Applicant arguments have not been found to be persuasive and have not been addressed in detail in the Response to Arguments section.

Claims 5-6, 16, and 19 were objected to for containing reference characters not enclosed within parentheses.  Applicant has replied by effectively requesting such matters to be held in abeyance until an indication of allowance is reached (Remarks, Page 1). 
Per 37 CFR 1.111(b), while such a request may be submitted in an otherwise bona fide response, Applicant is recommended to make corrective amendments in a reply to avoid additional responses or amendments in the future in the interest of compact prosecution.  Failure to make amendments at the earliest opportunity only serves to lengthen the patent prosecution process.
Response to Arguments

Applicant’s arguments have been fully considered, but are not found to be persuasive:
(I) Claim Interpretation under 35 U.S.C. 112(f):
Applicant first argues that "this rejection" under 35 U.S.C. 112(f) is traversed (Remarks, Page 2) and also that "claims 1-2, 14, and 16 should not be objected under 35 U.S.C. 112(f)" (Remarks, Page 5).
These comments are confusing because 35 U.S.C. 112(f) is neither a rejection nor objection to the claims.  Invocation of 35 U.S.C. 112(f) relates to the manner in which claim elements are interpreted under the broadest reasonable interpretation (BRI).  See MPEP 2181- “Therefore, the broadest reasonable interpretation of a claim limitation that invokes 35 U.S.C. 112(f) is the structure, material or act described in the specification as performing the entire claimed function and equivalents to the disclosed structure, material or act.” 
In particular, the type of 35 U.S.C. 112(f) is a special case of claim interpretation related to computer-implemented 35 U.S.C. 112(f).  In this case, not only is the corresponding structure (e.g., computer/processor) read into the claims, but also the corresponding algorithm for carrying out the recited function. “The corresponding structure is not simply a general purpose computer by itself but the special purpose computer as programmed to perform the disclosed algorithm. Aristocrat, 521 F.3d at 1333, 86 USPQ2d at 1239. Thus, the specification must sufficiently disclose an algorithm to transform a general purpose microprocessor to the special purpose computer. See Aristocrat, 521 F.3d at 1338, 86 USPQ2d at 1241.  Applicant is also directed to the decision in WMS Gaming, Inc. v. Int’l Game Tech., 184 F.3d 1339, 1349, 51 USPQ2d 1385, 1391 (Fed. Cir. 1999).
Applicant next argues that there "is no new matter...and no ambiguity that rises to the level of indefiniteness" and proceeds to show written description support for each of the elements interpreted under 35 U.S.C. 112(f).  In particular, Applicant first makes arguments that the bi-directional audio connector unit is descried in a manner that "corresponds exactly to the language of claim 1 (and the method of claim 17)" (Remarks, Page 2).
In response, it is noted that the admission that the functional language of the bi-directional audio connector unit as described in the specification “corresponds exactly to the language of claim 1” is problematic in evidencing that sufficient support is found for a disclosed algorithm in the context of computer-implemented 35 U.S.C. 112(f).  See MPEP 2181 (II)(B):  
“The corresponding structure is not simply a general purpose computer by itself but the special purpose computer as programmed to perform the disclosed algorithm. Aristocrat, 521 F.3d at 1333, 86 USPQ2d at 1239. Thus, the specification must sufficiently disclose an algorithm to transform a general purpose microprocessor to the special purpose computer. See Aristocrat, 521 F.3d at 1338, 86 USPQ2d at 1241. ("Aristocrat was not required to produce a listing of source code or a highly detailed description of the algorithm to be used to achieve the claimed functions in order to satisfy 35 U.S.C. § 112  ¶ 6. It was required, however, to at least disclose the algorithm that transforms the general purpose microprocessor to a ‘special purpose computer programmed to perform the disclosed algorithm.’" (quoting WMS Gaming, 184 F.3d at 1349, 51 USPQ2d at 1391.)) An algorithm is defined, for example, as "a finite sequence of steps for solving a logical or mathematical problem or performing a task." Microsoft Computer Dictionary, Microsoft Press, 5th edition, 2002. Applicant may express the algorithm in any understandable terms including as a mathematical formula, in prose, in a flow chart, or "in any other manner that provides sufficient structure." Finisar, 523 F.3d at 1340, 86 USPQ2d at 1623; see also Intel Corp. v. VIA Techs., Inc., 319 F.3d 1357, 1366, 65 USPQ2d 1934, 1941 (Fed. Cir. 2003); In re Dossel, 115 F.3d 942, 946-47, 42 USPQ2d 1881, 1885 (Fed. Cir. 1997); Typhoon Touch Inc. v. Dell Inc., 659 F.3d 1376, 1385, 100 USPQ2d 1690, 1697 (Fed. Cir. 2011); In re Aoyama, 656 F.3d at 1306, 99 USPQ2d at 1945.”
In the case of the instant bi-directional audio connector unit, repeating the “exactly” corresponding language of the claims in the specification does not show a corresponding algorithm that indicates how the high-level function is being performed.  Applicant has not pointed to any underlying algorithm such in the form of a “mathematical formula, in prose, in a flow chart.”  Since this reply relates to the interpretation of the indicated claim elements under 35 U.S.C. 112(f) that is not a rejection nor objection and the discussion of underlying structure relates to the indefiniteness rejections under 35 U.S.C. 112(b) that will be addressed in following sections, it is noted that the applicants reply here relates more to arguments related more to rejections under 35 U.S.C. 112(a) or (b) and not traversing claim interpretation under 35 U.S.C. 112(f).  
If Applicant does wish to traverse the interpretation under 35 U.S.C. 112(f) and/or have the interpretation withdrawn, the proper response is outlined in MPEP 2181 (I)- “In response to the Office action that finds that 35 U.S.C. 112(f)  is invoked, if applicant does not want to have the claim limitation interpreted under 35 U.S.C. 112(f), applicant may: (1) present a sufficient showing to establish that the claim limitation recites sufficient structure to perform the claimed function so as to avoid interpretation under 35 U.S.C. 112(f); or (2) amend the claim limitation in a way that avoids interpretation under 35 U.S.C. 112(f)  (e.g., by reciting sufficient structure to perform the claimed function).”
Applicant has not taken either outlined approach in the reply and instead address 35 U.S.C. 112(f) as if it were an objection or rejection.  Accordingly, the claim interpretation under 35 U.S.C. 112(f) still applies and has been maintained.
It should also be noted that the only structures actually rejected under 35 U.S.C. 112(b) were the "conversation controller module," of claim 1, the "speech processing unit" of claim 9, the additional functions of the "user identification module" set forth in claim 12, and the "voice biometrics module” of claim 14.  Thus, the arguments pertaining to these elements other than the preceding are unnecessary and ineffective for overcoming the claim interpretation under 35 U.S.C. 112(f) because they pertain to written support under 35 U.S.C. 112(a) where no such rejection was raised or definiteness under 35 U.S.C. 112(b) where no rejection was raised.
In the particular case of the “conversation controller module,” Applicant again does not provide arguments and/or amendments that are effective in overcoming the 35 U.S.C. 112(f) interpretation per MPEP 2181(I) as they relate to written support in the originally filed disclosure.  Applicant has argued that the specification describes a description of receiving audio features and processing outputs of ASR and NLU during the session to optimize the session duration and chooses and/or modifies associated ASR and NLU models.  Applicant also indicates that this module adjusts thresholds for speech segment detection and triggers actions to fulfill user intent and indicates that these passage "directly correspond to the claim language in the independent claims (e.g. "receiving the audio features and choosing and/or modifying associated ASR and NLU models for the user interaction session” in claim 17, and the similar functional language in claim 1)" and accordingly concludes that the written description requirement has been satisfied (Remarks, Pages 3-4).
In response it is noted that the “conversation controller module” is defined by the function of “chooses and/or modifies associated ASR and NLU models for the user interaction session to optimize the user interaction session duration.”  Applicant arguments for this element relate to written description and are not effective in overcoming the 35 U.S.C. 112(f) claim interpretation by either amending the claims or providing arguments that this element connotes definite structure.
In taking these arguments in reference to the 35 U.S.C. 112(b) rejection, it is noted that providing language that “directly” corresponds in the specification does not provide an underlying algorithm for carrying out the high-level function set forth in the claims that in this case relates to choosing and/or modifying ASR and NLU models as required by the special case of computer-implemented 35 U.S.C. 112(f).  The instant specification does not explain how the choosing and modifying of ASR and NLU models in any of prose (verbatim or similar language to the claims is not sufficient), flow chart, formula, equation.  Instead, the functionality of this element is described at Paragraph 025 in the originally filed specification and only repeats the language of the claim.  The specification does not explain how session duration is consideration, the type of ASR and NLU models that are utilize, and how specifically those models are selected or modification in consideration of that duration (e.g., a detailed training process that brings duration under a threshold amount).  It is emphasized that computer-implemented 35 U.S.C. 112(f) must provide the underlying algorithm for how a high-level function is being performed (“For a computer-implemented 35 U.S.C. 112(f) claim limitation, the specification must disclose an algorithm for performing the claimed specific computer function, or else the claim is indefinite under 35 U.S.C. 112(b). See Net MoneyIN, Inc. v. Verisign. Inc., 545 F.3d 1359, 1367, 88 USPQ2d 1751, 1757 (Fed. Cir. 2008)) and the applicant's arguments related to 35 U.S.C. 112(b) provided in relation to 35 U.S.C. 112(f) do not show any such algorithm.  For these reasons the 35 U.S.C. 112(b) rejection related to the 35 U.S.C. 112(f) interpretation is maintained and Applicant arguments are not found to be persuasive.
In the case of the particular functions of the “speech processing unit” set forth in claim 9, the following functionality is recited:  "detect and analyze at least one of emotion, sentiment, noise profile and environmental audio information from the received audio data features."  Applicant has argued that emotional analysis is described in calculating sentiment/emotion scores and metrics like sentiment, speaking rate, etc. discussed in paragraphs 0045-0047 of the originally filed specification (Remarks, Page 3).
In response it is first noted that Applicant's reply makes no mention of a disclosed algorithm for noise profile detection and analysis that is only mentioned verbatim from claim 9 and is thus insufficient for providing corresponding structure in the 35 U.S.C. 112(f) interpretation.  In regards to the emotion and sentiment calculation, Paragraphs 0028 and 0033 merely repeats the language of the claim and while Paragraphs 0045-0047 describe various metrics, the calculation of a "happiness index" via "applying weight."  None of these citations, however, actually show an equation or formula for the happiness index or what quantities are weighted and how such weighted values are combined to arrive at the happiness index.  Thus, the applicant's arguments related to 35 U.S.C. 112(b) provided in relation to 35 U.S.C. 112(f) do not show any such algorithm where a disclosed algorithm “must” be provided in the case of computer-implemented 35 U.S.C. 112(f).  For these reasons the 35 U.S.C. 112(b) rejection related to the 35 U.S.C. 112(f) interpretation is maintained and Applicant arguments are not found to be persuasive.
In the case of the particular functions of the “user identification module" set forth in claim 12, the following functionality is recited: “distinguish between synthesized speech and the user’s human voice in the received conversation data from the user’s stored voice biometrics for detection of any fraudulent activity.”  Applicant has provided a discussion of this functionality in a manner that matches the language of claim 12 (Remarks, Pages 2-3).  Applicant and the specification (see Paragraph 0023) fail to provide any algorithm as to how the distinguishing "synthesized speech and the user’s human voice in the received conversation data from the user’s stored voice biometrics for detection of any fraudulent activity" is performed as is required in computer-implemented 35 U.S.C. 112(f) claim interpretation under the BRI.  For these reasons the 35 U.S.C. 112(b) rejection related to the 35 U.S.C. 112(f) interpretation of this element is maintained and Applicant arguments are not found to be persuasive.
In the case of the particular functions of the "voice biometrics module” set forth in claim 14, the following functionality is recited: “utilize voice biometrics to identify and register the user participating in the user interaction session.”  Applicant has not specifically mentioned the "voice biometrics module” in the remarks related to 35 U.S.C. 112(f) claim interpretation, however, generally repeats some of the functionality recited in the claim language in discussing the "user identification module" (Remarks, Pages 2-3).  As noted above in regards to the previously discussed claim elements, Applicants arguments do not address the 35 U.S.C. 112(f) claim interpretation as to why the claimed elements connote specific structure nor did Applicant file any amendments to add specific structure to overcome the 35 U.S.C. 112(f) invocation.  Moreover, Applicant arguments merely repeat the claim language where the specification (see Paragraph 0030) repeats the functional claim language without explaining what type of voice metrics are utilized, how they are utilized, and any pertinent calculations/matching algorithms used to identify and register a user in the form of disclosed algorithm corresponding to the claimed functionality as is required by computer-implemented 35 U.S.C. 112(f).  For these reasons the 35 U.S.C. 112(b) rejection related to the 35 U.S.C. 112(f) interpretation of this element is maintained and Applicant arguments are not found to be persuasive.
Applicant closes out the 35 U.S.C. 112(f) remarks by again noting that the findings under this statute are is a matter of rejection and results from "certain phrases that were not found verbatim" (Remarks, Page 4).  Applicant's conclusion does not match the position set forth in the Non-final rejection and the very statute section of 35 U.S.C. 112(f) because- 35 U.S.C. 112(f) is a matter of claim term interpretation under the BRI and that while there is no en haec verba requirement for the claim language relating to the specification such discussions are a matter of written description that is not at issue under this statute section related to computer-implementation that requires a corresponding algorithm as to how the claimed function is performed.  For the preceding reasons, the 35 U.S.C. 112(f) interpretation and the 35 U.S.C. 112(b) rejections stemming therefrom have been maintained.

(II) Claim Rejections under 35 U.S.C. 112(b):
In regards to the rejection of Claims 1-29 under 35 U.S.C. 112(b), Applicant provides several arguments with a lack of any prosecution-advancing claim amendments.  First, Applicant remarks that "terms like "speech segment," "non-speech segment," "turn-taking," and "barge-in" are well-known in the speech processing field and are further clarified in the specification."  Accordingly, Applicant concludes that "these terms are used consistently and are not indefinite" (Remarks, Page 5).
In response, while inconsistency or a conflict between the claimed invention and the specification may raise to a level of indefiniteness under 35 U.S.C. 112(b) (see MPEP 2173.03- "A claim, although clear on its face, may also be indefinite when a conflict or inconsistency between the claimed subject matter and the specification disclosure renders the scope of the claim uncertain as inconsistency with the specification disclosure or prior art teachings may make an otherwise definite claim take on an unreasonable degree of uncertainty. In re Moore, 439 F.2d 1232, 1235-36, 169 USPQ 236, 239 (CCPA 1971)"), the rationale argued by Applicant differs from the thrust of the 35 U.S.C. 112(b) rejection of these terms (e.g., "speech segment" or "non-speech segment").  These limitations were not rejected for an inconsistency between the specification and claims, but for reasons related to antecedent basis (see Non-Final Action, Pages 8-10) where it was noted that "the claims are replete with antecedent basis issues" (see MPEP 2173.05(e) for a discussion of indefiniteness under 35 U.S.C. 112(b) related to a lack of or unclear antecedent basis).  Applicant has not argued why there is antecedent basis for these terms (some of which were not even included in the rejection under 35 U.S.C. 112(b) such as barge-in) nor has Applicant provided amendments that would simplify the issues under examination and advance prosecution.  Accordingly, these arguments have not been found to be convincing and the indefiniteness rejection under 35 U.S.C. 112(b) has been maintained.
Applicant next addresses "conversation performance metrics" and describe that the metrics are grounded in "concrete examples" and "is as precise as the nature of the metric allows" where "a skilled artisan would understand the scope encompasses measurable indicators of conversation quality or user experience" (Remarks, Pages 5-6).
In response, it is noted that these metrics do not appear in the plain language of the rejected subject matter and so appears to relate to the 35 U.S.C. 112(b) indefiniteness stemming from the computer-implemented 35 U.S.C. 112(f) invocation of the “speech processing unit”.  As described above, while Paragraphs 0045-0047 do describe various metrics, the calculation of a "happiness index" via "applying weight."  None of these citations, however, actually show an equation or formula for the happiness index or what quantities are weighted and how such weighted values are combined to arrive at the happiness index.  Thus, the applicant's arguments related to 35 U.S.C. 112(b) provided in relation to 35 U.S.C. 112(f) do not show any such algorithm where a disclosed algorithm “must” be provided in the case of computer-implemented 35 U.S.C. 112(f).  In the case of computer-implemented 35 U.S.C. 112(f) claim interpretation, it is not enough for one skill in the art to understand the scope.  As per Net MoneyIN, Inc. v. Verisign. Inc., 545 F.3d 1359, 1367, 88 USPQ2d 1751, 1757 (Fed. Cir. 2008), for “a computer-implemented 35 U.S.C. 112(f) claim limitation, the specification must disclose an algorithm for performing the claimed specific computer function, or else the claim is indefinite under 35 U.S.C. 112(b).”  For these reasons the 35 U.S.C. 112(b) rejection related to the 35 U.S.C. 112(f) interpretation is maintained and Applicant arguments are not found to be persuasive.
Applicant next addresses "Optimizing a user interaction session duration" and argue that optimizing is tied to "concrete actions...like modifying ASR/TTS parameters to shorten or improve the session" and conclude that the scope is clear (Remarks, Pages 5-6).
In response, it is noted that these arguments appear to apply to the 35 U.S.C. 112(b) rejection stemming from the computer-based 35 U.S.C. 112(f) interpretation of the "conversation controller module."  Applicant remarks again repeat generic claim language and fail to point to or describe the algorithms specifically linking duration to the selection or modification of ASR and NLU modules.  In other words, neither Applicant nor their specification provides details as to how the modification and/or selection of these models is accomplished and how duration is taken into account in these processes.  As noted above, the instant specification does not explain how the choosing and modifying of ASR and NLU models in any of prose (verbatim or similar language to the claims is not sufficient), flow chart, formula, equation.  Instead, the functionality of this element is described at Paragraph 025 in the originally filed specification and only repeats the language of the claim.  The specification does not explain how session duration is consideration, the type of ASR and NLU models that are utilize, and how specifically those models are selected or modification in consideration of that duration (e.g., a detailed training process that brings duration under a threshold amount).  It is emphasized that computer-implemented 35 U.S.C. 112(f) must provide the underlying algorithm for how a high-level function is being performed (“For a computer-implemented 35 U.S.C. 112(f) claim limitation, the specification must disclose an algorithm for performing the claimed specific computer function, or else the claim is indefinite under 35 U.S.C. 112(b). See Net MoneyIN, Inc. v. Verisign. Inc., 545 F.3d 1359, 1367, 88 USPQ2d 1751, 1757 (Fed. Cir. 2008)) and the applicant's arguments related to 35 U.S.C. 112(b) provided in relation to 35 U.S.C. 112(f) do not show any such algorithm.  For these reasons the 35 U.S.C. 112(b) rejection related to the 35 U.S.C. 112(f) interpretation is maintained and Applicant arguments are not found to be persuasive.
In the "Module Terminology" bullet point (Remarks, Page 6), Applicant does provide remarks that are properly aimed at the 35 U.S.C. 112(f) invocation insisting that terms such as "audio connector unit" and "user identification" are "known in the art of IVR system" and are "sufficient" structural terms.
In response, it is first noted that the terms argued by the applicant are not terms of art connoting specific structure in interactive voice response systems.  A search to validate Applicant claims was conducted in PE2E on 9/30/2026 where the "audio connector unit" only returned the publication of the instant application and while the "user identification module" did return a few hits, no specific/consistent structure was associated with the term.  Thus, these terms do not refer to specific structure in the IVR field of technology as indicated by Applicant arguments.  
Moreover, as the claims do not recite the term "means", there is an initial presumption that 35 U.S.C. 112(f) is not invoked.  In the case of the indicated elements, however, this presumption is overcome because the three-prong analysis has been met- 1) unit/modules are well-known placeholders for "means" (see MPEP 2181(I)(A), the generic placeholders or nonce terms are modified by functional language, and the terms are not modified by sufficient structure because the terms preceding module or unit are functional/not structural in nature.  Accordingly, the presumption that 35 U.S.C. 112(f) has not been invoked is overcome and properly invoked in the case of the claim elements interpreted under this statute.  Also, as described above in detail, the issue with the disclosure which triggers a series of rejections under 35 U.S.C. 112(b) is that the specification merely repeats the language of the claims and do not provide the algorithms corresponding to how each of the recited functions is being performed.  
Thus, Applicant arguments against invocation and support for corresponding structure are not found to be convincing and the corresponding interpretations and rejections under 35 U.S.C. 112 have been maintained.
There are other rejections under 35 U.S.C. 112(b) that relate to antecedent basis and the use of unclear language (i.e. "such as") in claim 29.  Applicant was silent on these rejections and has not made any claim amendments to resolve these issues and advance prosecution.  Accordingly, these additional 35 U.S.C. 112(b) rejections have been maintained.

(III) Patent Subject Matter Eligibility Rejections under 35 U.S.C. 101:
Applicant traverses the patent subject matter eligibility rejection of claims 1-29 under 35 U.S.C. 101 on the grounds that the claims qualify as patent-eligible because they are related to a "concrete, technological solution in the field of interactive voice response (IVR)" and "integrate any purported abstract idea into a practical application an recite significantly more than a mere idea" (Remarks, Page 7).  Applicant arguments pertaining to 35 U.S.C. 101 consist of 3 points respectively related to step 2A prong 1, step 2A prong 2, and step 2B as set forth in the 2019 Patent Subject Matter Eligibility Guidelines (2019 PEG).
(III)(A) Step 2A prong 1 discussion:
Applicant first argues that the claimed invention has been mischaracterized as an abstract mental process because the claims are directed to "specific arrangement of computerized components (hardware software modules) working in concert to achieve a new and improved functionality in IVR systems:  real-time monitoring and dynamic optimization of a user interaction session." Applicant then attempts to draw an analogy between the claimed invention and the decision in McRO, Inc. v. Bandai Namco Games America, Inc. 837 F.3d 1299, 1314, 120 USPQ2d 1091, 1102 (Fed. Cir. 2016) as relating to a non-abstract improvement in computer or technological fields where the applicant indicates that the "present claims improve the functioning of an IVR communication system itself- for example by reducing call duration and errors through novel automated techniques (dynamic model selection, threshold adjustment, etc.).  Applicant also notes the use of specific mechanisms such as VAD, turn-taking, a conversation controller, and sentiment monitoring sole an "IVR-centric problem."  As such, Applicant argues that the claimed invention should be found eligible under Step 2A prong 1 of the 2019 PEG.
In response, it is noted that some of the elements argued by Applicant were set aside for further consideration for further analysis under Step 2A prong 2 and 2B and were not categorized as part of the identified abstract idea in step 2A prong 1.  Specifically, the use of computer/software hardware was not grouped into the identified abstract idea along with the use of speech synthesis.  For the remaining functionality, the rejection explains how under the BRI a human could perform the process set forth in the claimed invention.  Applicant arguments do not address whether these functions could be performed by a human and instead cite technological improvements or a combination of elements that were set aside for consideration in the inventive concept analysis steps of the 2019 PEG.  
Further the functionality recited by the applicant (i.e., VAD, turn-taking, conversation controller functions, and sentiment monitoring) were recited as such a high level as to be performed by a human under the BRI.  VAD without any further details as claimed (found in dependent claim 16 not any of the independent claims) was indicated as a human classification process.  For example, a human could listen to audio and mentally decide when someone is speaking.  Turn-taking as claimed at a high level was described as performed by a human listening to audio and parsing the audio into segments based upon what they heard and either remember the sequence or transcribe the sequence using pen and paper including turns (note also that turn-taking is recited in the alternative in the independent claims and not necessarily required under the BRI).  Conversational controller functions were described a human listening to and understanding speech and deciding models that would be most appropriate as a mental evaluation (e.g., words related to troubleshooting would relate to models associated with a user guide) and appropriate to a customer’s history (e.g., a television model that the user had purchased based upon a customer profile).  Note that modification is in the alternative to the choosing process that was addressed under the BRI.  Sentiment monitoring was described as narrowing classification decisions that can be performed by a human.  For example, a human can listen to a conversation and classify words as pertaining to specific moods or sentiments.
Thus, each of the technical improvements (such as “IVR-centric” solutions- see Remarks, Page 8) or alleged non-mental processes highlighted by Applicant are recited as such a high level with a breadth such that they could be practically be performed by a human under the BRI and as such do not constitute a “technical improvement”.  Note that if the applicant has improved any of these processes so as to exclude human performance, these improvements must be found in the claims (see MPEP 2106.04(d)(1)- "the claim must be evaluated to ensure that the claim itself reflects the disclosed improvement").
For these reasons, Applicant arguments directed towards Step 2A prong 1 of the 2019 PEG are not found to be persuasive.
(III)(B) Step 2A prong 2 discussion:
In regards to the step 2A prong 2 analysis, Applicant argues that the claimed process is applied to a technical application because it includes multiple additional elements- the bi-directional audio connector unit, the conversation controller module, and the session monitoring module (Remarks, Pages 8-9).
In response, it is noted that the underlying functionality of each of the argued components was addressed as being practically performed by a human so other than the processes themselves, all that remains are the generic computer processor acting on as the structural component to automate the otherwise mental process steps.  By Applicants own admission, these modules are nothing more than "computerized components" (Remarks, Page 7).  On Page 13 of the Non-final Office Action, these generic computer software/hardware components were addressed as "no more than mere instructions to implement and otherwise abstract idea" where "the computer is used for its standard purpose of executing a program to carry out...the abstract idea."  See MPEP 2106.04(a)(2)(III)- when "merely using a computer as a tool to perform the concept...the claim is considered to recite a mental process."
Applicant concludes by attempting to draw an analogy between case law such as DDR Holdings, McRO, and Enfish by arguing that the claimed invention improves how an IVR computer system handles voice interactions and the computer system itself (Remarks, Pages 9-10).
In response, it is noted that while the specification may detail further specifics related to the improvements alleged by Applicant, the claimed processes are recited at a “high level of generality” and use the computer merely as a tool to carry out these high-level processes such that “the claim is considered to recite a mental process."  See MPEP 2106.04(a)(2)(III).  Also Enfish, LLC v. Microsoft Corp., 822 F.3d 1327, 118 USPQ2d 1684 (Fed. Cir. 2016) related to a basic computer function in the form of computer database indexing/retrieval common to all computers, and thus, improved the use of the computer as a tool.  In contrast, functioning as an IVR system is not a basic computer function.   Instead, computer/processor in the claims is merely used as a tool to automate an otherwise mental process, and thus, does not constitute “significantly more” than the identified abstract idea.
Accordingly, Applicant arguments directed towards step 2A prong 2 of the 2019 PEG have been fully considered, but are not found to be persuasive.
(III)(C) Step B discussion:
Applicant argues that the claimed invention should be found patent eligible under step 2B.  Specifically, Applicant first argues that the examiner has not provided any evidence that the configuration of modules and real-time adaptations were well-understood routine and conventional.  Applicant then cites the Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018) and reiterates that evidence has not been provided for multiple processes featured in the claim (Remarks, Pages 10-11).
In response, it is noted that steps 2A prong 2 and 2B relate to elements/limitations in the claim in addition to the identified abstract idea. “Because a judicial exception is not eligible subject matter, Bilski, 561 U.S. at 601, 95 USPQ2d at 1005-06 (quoting Chakrabarty, 447 U.S. at 309, 206 USPQ at 197 (1980)), if there are no additional claim elements besides the judicial exception, or if the additional claim elements merely recite another judicial exception, that is insufficient to integrate the judicial exception into a practical application. See, e.g., RecogniCorp, LLC v. Nintendo Co., 855 F.3d 1322, 1327, 122 USPQ2d 1377 (Fed. Cir. 2017) ("Adding one abstract idea (math) to another abstract idea (encoding and decoding) does not render the claim non-abstract"); Genetic Techs. Ltd. v. Merial LLC, 818 F.3d 1369, 1376, 118 USPQ2d 1541, 1546 (Fed. Cir. 2016) (eligibility "cannot be furnished by the unpatentable law of nature (or natural phenomenon or abstract idea) itself.").”  The functionality referenced by Applicant was not and need to be addressed by evidentiary support to establish a prima facie case of ineligibility because that functionality is part of the abstract idea itself.  Evidence was provided for the identified additional elements as required by the Berkheimer decision (see Non-Final Action, Page 13).  
Next Applicant recites that the claims contain a particular arrangement of modules that is a non-conventional combination similar to BASCOM Global Internet v. AT&T Mobility LLC, 827 F.3d 1341, 1350-51, 119 USPQ2d 1236, 1243 (Fed. Cir. 2016) that provides new functions (Remarks, Page 11).
In response, it is noted that the only additional elements are a computer/processor running software to carry out an otherwise abstract idea.  The alleged ordered arrangement is merely an ordered sequence of processing steps addressed as being performed by a human as part of a mental process.  Since these steps are part of the identified abstract idea, they cannot be relied upon to furnish a prima facie case of subject matter eligibility under 35 U.S.C. 101. See Synopsys, Inc. v. Mentor Graphics Corp., 839 F.3d 1138, 1151, 120 USPQ2d 1473, 1483 (Fed. Cir. 2016) ("a claim for a new abstract idea is still an abstract idea. The search for a § 101 inventive concept is thus distinct from demonstrating § 102  novelty.").
For these reasons, Applicant arguments directed towards step 2B of the 2019 PEG have not been found to be persuasive and due to all of the preceding rationale under the step 2A-B analysis, the 35 U.S.C. 101 rejection has been maintained.

(IV) Prior Art Rejections under 35 U.S.C. 102(a)(1) and 35 U.S.C. 103:
As an opening remark, it is noted that Applicant provides multiple arguments against various references relied upon in the prior art rejections.  It is highlighted, however, that claims 1, 4-5, 7-8, 10-11, 13, and 15 were rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sekar, et al. (U.S. PG Publication:  2021/0201238 A1).  Aside from the passing general allegation of patentability that "[n]o cited prior art reference alone or in any combination, discloses or suggests the claimed invention" where Sekar is only mentioned as being relied upon (see Remarks, Page 12; also included in the first bullet point on this page), Applicant does not provide any specific traversal related to the anticipation of claims 1, 4-5, 7-8, 10-11, 13, and 15 under 35 U.S.C. 102(a)(1) relying on Sekar. Such Applicant's arguments fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references, and thus are not found to be persuasive.
In the second bullet point provided on Pages 12-13, Applicant alleges that the prior art combinations under 35 U.S.C. 103 are based upon hindsight reasoning and that the Examiner must provide "a rationale supported by evidence."
In response, it is noted that the rejections under 35 U.S.C. 103, contrary to Applicant allegations, go beyond "simply identifying evidence" and each address the Graham v. Deere factual inquiries, explain why the prior art is analogous/reasonably pertinent, explains the modification of the primary references, and provides a rationale underpinning in support of the obviousness findings.  In this particular section, Applicant has not specifically traversed any of these rationale underpinnings, and thus, such arguments are taken as a general allegation of non-obviousness.  Furthermore, "[a]lthough the Supreme Court in KSR cautioned against an overly rigid application of TSM, it also recognized that TSM was one of a number of valid rationales that could be used to determine obviousness. (According to the Supreme Court, establishment of the TSM approach to the question of obviousness "captured a helpful insight." 550 U.S. at 418, 82 USPQ2d at 1396 (citing In re Bergel, 292 F.2d 955, 956-57, 130 USPQ 206, 207-208 (1961))."  Thus, TSM/evidence argued by Applicant is not a requirement but one of "a number of valid rationales that could be used to determine obviousness."  When TSM was not relied upon in the Non-final action, another KSR rationale was relied upon in support of the finding of obviousness.  Again, no specific arguments were levied against non-obviousness in this particular section of the remarks.
In response to applicant's argument that the examiner's conclusion of obviousness is based upon improper hindsight reasoning, it must be recognized that any judgment on obviousness is in a sense necessarily a reconstruction based upon hindsight reasoning.  But so long as it takes into account only knowledge which was within the level of ordinary skill at the time the claimed invention was made, and does not include knowledge gleaned only from the applicant's disclosure, such a reconstruction is proper.  See In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).
In regards to Claims 1 and 17, Applicant first addresses the "bi-directional audio connector unit" that determines and stores at least one of the speech segments, non-speech segments, turn-taking segments, and barge-in speech segments."  In particular, Applicant contends that "[n]one of the cited references discloses a module that performs all of these functions."  Applicant particularly focuses on the teachings of Belvin, et al. (U.S. Patent:  7,831,433; used in a combination with Sekar to reject claim 2) and argues that Belvin "does have some speech processing, but it does not disclose parsing audio into turn-taking and barge-in segments" because it is "focused on using context data for voice commands, not low-level VAD or barge-in detection."  Applicant also points to Ljolje (U.S. PG Publication:  2009/0112599 A1; used in a combination with Sekar to reject claims 21 and 28) and argues that this reference "does address barge-in theoretically, but it lacks a teaching of storing identified segments for the session or using a bi-directional connector that feeds user input audio both to identification and speech processing concurrently (as our system does via element 103 linking to module 111 and 104)" (Remarks, Pages 13-14).  Applicant lastly discusses Weng, et al. (U.S. PG Publication:  2017/0116986 A1; used in a combination with Sekar to reject claim 20) and Sano (not relied upon in any prior art rejection nor cited in the PTO-892 so it's unclear what Applicant means by "Sano's disclosure") and contends that these references fail to teach storing segmented timing information for optimizing by storing these segments in memory) (Remarks, Page 14).
In response, it is noted that the determination and storing of "at least one of" speech segments, non-speech segments, turn-taking segments, and barge in speech segments" in claim 1 were specifically described as being addressed by Sekar (see 6/30/2025 Non-Final Action, Page 16) where the claim interpretation of these elements being in the alternative was also provided (see also MPEP 2131 for relevant discussion of Brown v. 3M, 265 F.3d 1349, 1351, 60 USPQ2d 1375, 1376 (Fed. Cir. 2001)).  Claim 17 does additionally rely upon the teachings of Mont-Raynaud, et al. (U.S. PG Publication:  2018/0301151 A1) for the determination of non-speech and barge-in segments in a dialog conversation since claim 17 did not include these limitations in the alternative as was the case in claim 1, however, none of these arguments actually apply to the references relied upon to reject the independent claims (Sekar and/or Mont-Raynaud) and instead tangentially divert to references relied upon to address the dependent claims.  Thus, these arguments do not pertain to the rejection of claims 1 and 17 under 35 U.S.C. 102(a)(1) that was set forth in the Non-final Action and are therefore considered moot and unpersuasive.
Applicant next turns to the "user identification module" and again discusses numerous reference not actually relied upon to address the subject matter of independent claims 1 and 17.  These references include Enzinger, et al. (U.S. PG Publication:  2021/0193174 A1; used in a combination with Sekar to reject claims 12 and 14), Attwater, et al. (U.S. PG Publication:  2006/0200350 A1; used in a combination with Sekar to reject claims 3 and 16), and Belvin (used in a combination with Sekar to reject claim 2).  
In response, it is again reiterated that Applicant’s arguments fail to address the prior art references actually relied upon to address the independent claims- Sekar or the combination of Sekar and Mont-Raynaud.  The aim of these traversals are unclear- if Applicant disagrees with the rejections of the independent claims, it is recommended to traverse the references relied upon to address the features contained in those claims instead a divergent discussions of secondary/tertiary references relied upon in dependent claim rejections.  As such, these arguments do not pertain to the rejection of claims 1 and 17 under 35 U.S.C. 102(a)(1) that was set forth in the Non-final Action and are therefore considered moot and unpersuasive.
Applicant next turns to the "conversation controller module" that "receives the audio features and chooses and/or modifies associated ASR and NLU models for the user interaction session."  Applicant also opines that this element "is perhaps the core distinguishing feature of the invention."  Applicant then generally alleges that none of the prior art discloses "a module that dynamically selects or adapts the speech recognition and language understanding model on a per-session...basis during the session."  Applicant then again argues and provides their own characterization of references not relied upon to reject the claimed subject matter of independent claims 1 and 17- Chang, et al. (U.S. PG Publication:  2014/0188470 A1; relied upon in a combination with Sekar and Mont-Raynaud to reject claim 18) and Belvin (Remarks, Page 15).
In response, it is noted that if Applicant considers the selection and modification of “ASR and NLU” models to be the distinguishing feature of their invention, they are invited to provide clarifying amendments to define over the teachings of the references relied upon in the rejection of the independent claims- Sekar and Mont-Raynaud.  Taking this approach would simplify issues for consideration in future rounds of prosecution by focusing arguments on a single defining element and potentially being able to define the operations of this element within the context of the system so as to overcome the teachings of the applied prior art.  Refraining from making any types of claim amendments is certainly an applicable approach, but it is advised that such an approach may lead to less compact prosecution instead of working together to find reasonably allowable subject matter unless Applicant does wish to pursue an appeal at the Patent Trials and Appeals Board (PTAB).  Lastly, Applicant’s arguments fail to address the prior art references actually relied upon to address the independent claims- Sekar or the combination of Sekar and Mont-Raynaud.  As such, these arguments do not pertain to the rejection of claims 1 and 17 under 35 U.S.C. 102(a)(1) that was set forth in the Non-final Action and are therefore considered moot and unpersuasive.
On Page 15 of the remarks, Applicant does address a prior art reference relied upon in the rejection of independent claim 17 (Mont-Raynaud) along with a reference not relied upon in the independent claim rejections (Weng).  In regards to Mont-Raynaud, Applicant indicates that this reference "may touch on dialog management and possibly mention "dialog management module" or "agent engagement" but do not describe selecting NLU/ASR implementation based upon session metrics."  Applicant then argues features that are not claimed where "if the user's speech rate is slow or they have an accent, dynamically load[ing] a different NLU model tuned for that."
In response, it is noted that Weng is not relied upon to teach the selection or modification of ASR and NLU models nor is Mont-Raynaud.  Although Mont-Raynaud was relied upon in the rejection of independent claim 17, Mont-Raynaud addressed the determination of non-speech and barge-in segments in a dialog conversation (see Non-Final Action, Page 27).  Instead, for claims 1 and 17, the teachings of Sekar were relied upon and were not addressed in Applicant arguments pertaining to the conversation controller.  Accordingly, these arguments are considered to be moot and unpersuasive.  
In response to applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., if the user's speech rate is slow or they have an accent, dynamically load[ing] a different NLU model tuned for that) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Still in regards to the conversation controller, Applicant argues that the "combination of references also doesn't fill this gap" and provides requirements (not based upon statute or case law) as to how the obviousness rejection would "need" to be drafted- "that one reference provided the base system and another provided the teaching to modify models."  Applicant then closes out argument 6 by proposing a new combination using Chang and Belvin that was not even a grounds of rejection relied upon in the Non-Final Office Action (Remarks, Page 16).
In response, it is noted that the rejection of independent claim 1 involves anticipation using a single reference under 35 U.S.C. 102(a)(1) and does not involve any “combination of references.”  The rejection of independent claim 17, while relying on the combination of Sekar and Mont-Raynaud, does not rely upon a “combination of references” to teach the "choosing and/or modifying associated ASR...and NLU...models for the user interaction session" (see Non-Final Action, Page 26) because that rejection relies upon a single reference to teach the selection and/or modification- Sekar.  In regards to the Applicant-created guidelines for drafting a prima facie case of obviousness under 35 U.S.C. 103, it is first pointed out that the claimed invention does not actually require a step that serves to “modify models” as this limitation as been drafted in the alternative- (“choosing and/or modifying” that under the BRI covers choosing, modifying, OR choosing and modifying).  Moreover, the test is what the combined teachings of the references would have suggested to those of ordinary skill in the art.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981).  Thus, the requirements for obviousness need not meet the rigid, lone requirement set forth by Applicant.  Lastly, Applicant’s arguments fail to address the prior art references and the specific teachings actually relied upon to address the independent claims.  As such, these arguments do not pertain to the rejection of claims 1 and 17 under 35 U.S.C. 102(a)(1) that was set forth in the Non-final Action and are therefore considered moot and unpersuasive.
In response to applicant's argument that the examiner has combined an excessive number of references (Remarks, Page 16) albeit over different dependent claims not in the rejection of a singular claim(s), reliance on a large number of references in a rejection does not, without more, weigh against the obviousness of the claimed invention.  See In re Gorman, 933 F.2d 982, 18 USPQ2d 1885 (Fed. Cir. 1991).
In regards to the "session monitoring module," Applicant argues that no "cited reference teaches a component that monitors and ongoing conversation for performance metrics such as user sentiment, turn-taking efficiency, error counts, etc., and feeds that into the dialog control in real time or post-call."  In support of this position, Applicant points to and characterizes alleged failings of prior art not relied upon in the rejections of the independent claims- Attwater and Mielke, et al. (U.S. PG Publication:  2023/0135179 A1; relied upon to address the particular subject matter added by dependent claim 6) (see Remarks, Pages 16-17).
In response, it is noted that the limitations argued by Applicant were addressed by the teachings of Sekar in independent claims 1 and 17 and not the references applied to reject various dependent claims (i.e., Attwater and Mielke).  Accordingly, these arguments are moot and unpersuasive.  Moreover, the only function of the session monitoring featured in independent claims 1 and 17 is monitoring of "the user interaction session" and adding "key metrics corresponding to the user interaction session to the conversation controller module."  Independent claim 17 does not even include interaction with the conversation controller module as it lacks the computer-implemented components of claim 1.  None of the independent claims log the specific metrics argued by Applicant- user sentiment, turn-taking efficiency, error counts.  Thus, the features upon which applicant relies are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
On Page 16 of the Remarks (Argument 9), Applicant makes a passing comment that the rejections under 35 U.S.C. 103 "break the claim into pieces and find each piece in isolation" and provides "no such reason" to "put them together in the manner claimed."  In response, Applicant is directed to the prior rejections under 35 U.S.C. 103 that have been maintained in this final rejection that include a rationale underpinning either in the form of evidence-based TSM (e.g., see the rejection of claim 2, Page 22) or one of the KSR rationales (see the rejection of claim 9, Page 24 relying upon KSR rationale A, see MPEP 2141 (III)).  Accordingly, these general allegations against the independent claims are not found to be persuasive where it should additionally be pointed out that the rejection of independent claim 1 does not actually rely upon a combination.   One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

In argument 10 (Remarks, Page 17), Applicant argues that the "prior art references do not recognize the same problem or propose the same solution" and goes on to explain that one of ordinary skill in the art would not look to combine the teachings of references such as Enzinger, Ljolje, and Chang.
In response, it is noted that per MPEP 2141.01(a)- "In order for a reference to be proper for use in an obviousness rejection under 35 U.S.C. 103, the reference must be analogous art to the claimed invention. In re Bigio, 381 F.3d 1320, 1325, 72 USPQ2d 1209, 1212 (Fed. Cir. 2004). A reference is analogous art to the claimed invention if: (1) the reference is from the same field of endeavor as the claimed invention (even if it addresses a different problem); or (2) the reference is reasonably pertinent to the problem faced by the inventor (even if it is not in the same field of endeavor as the claimed invention)."  
Thus, the requirement set forth by Applicant in the form of recognizing the "same problem" or "the same solution" is not one of the requirements for establishing analogous art and in fact covers references that address "a different problem."  Thus, Applicant standards for obviousness do not match those set forth in statute and case law.  It is also pointed out that the rejections under 35 U.S.C. 103 provide why the prior art references are analogous (e.g., see the reasoning provided with respect to Claim 9 on Page 24 of the Non-Final Action).  Accordingly, these allegations are not rooted in case law or statute and are not found to be persuasive.
In argument 11 (Remarks, Page 17), Applicant alleges that "some references...teach away from the integrated approach" and then characterize the teaching of Ly (a referenced not used in any prior art rejection let alone cited in the PTO-892 from 6/30/2025) as teaching away.
In response, Applicant is directed to In re Fulton, 391 F.3d 1195, 73 USPQ2d 1141 (Fed. Cir. 2004),  The court stated that "the prior art’s mere disclosure of more than one alternative does not constitute a teaching away from any of these alternatives because such disclosure does not criticize, discredit, or otherwise discourage the solution claimed…."  In this case, Applicant does not provide any evidence of a teaching away and bases their argument on a prior art reference not even relied upon or cited (Ly).  Thus, these arguments are not found to be persuasive.
In argument 12 (Remarks, Page 18), Applicant contends that there is no reasonable expectation of success for combining a voice biometric authentication system with a dialog system that does dynamic model switching."
In response, it is noted that independent claims 1 and 17 do not include voice biometrics authentication.  Accordingly, these arguments directed towards the independent claims are moot.  Moreover, "reasonable expectation of success can be implicitly shown via the prior art teachings or as part of the obviousness analysis. See Elekta Ltd. v. ZAP Surgical Sys., Inc., 81 F.4th 1368, 1376-77, 2023 USPQ2d 1100 (Fed. Cir. 2023) ("[W]e can reasonably discern that the Board considered and implicitly addressed reasonable expectation of success based on the arguments and evidence presented to the Board on motivation to combine.")."  See MPEP 2143.02 (I).  While Applicant has not provided any particular reasons why there would be no reasonable expectation of successful results by combining Enzinger with Sekar, it is noted that the Non-Final action points out that such an element could be included in the system of Sekar to provide a particular result explained with evidence from the prior art (i.e., a TSM statement).  Thus, although not relating to the independent claim rejections, the particular supporting example relied upon by Applicant is not persuasive because an expectation of predictable results explicitly provided by the prior art has been relied upon.
Applicant's arguments then address "gaps" by considering "a few specific references as exemplary contrasts (based on the claim chart analysis)" (Remarks, Page 18).
In response it is noted that Applicants reply was reviewed for presence of such a "claim chart," but the Examiner was able to find the presence of any such claim chart/table/spreadsheet.
In this section (Remarks, Pages 18-19), Applicant characterizes a number of the references relied upon in the dependent claim rejections- Belvin, Ljolje, Enzinger, and Chang.  Applicant explains how each of these references fail to teach limitations that they were not assigned to address.  For example, it is explained that Ljolje says nothing about user authentication, NLU model switching, session metrics, etc. when this reference was only relied upon to teach updating and training ASR and NLU models (see Non-Final Office Action, Pages 31-32).  These arguments seem off-base and are not persuasive.  There is also a discussion as to how Belvin is the primary reference that questions the combination of primary reference Belvin being modified by Chang.  No grounds of rejection in the Non-Final action relies upon Belvin as a primary reference.  Thus, the point of such arguments is unclear, do not apply to the grounds of rejection present in the Non-Final action and are moot and unpersuasive.
Applicant closes out this section by again referencing the alleged "claim chart" and how it summarizes the applicant's position (Remarks, Pages 19-20).  In response, it is reiterated that no such claim chart was provided in Applicant's reply.
Pages 20-23 of Applicant's reply features a section that is called "Fallback Positions".  In this section, the Applicant features several alleged "notable dependent claim limitations," also seem to suggest internal claim drafting strategies including how the Examiner might respond (“If the Examiner had combined Chang…”), and allege why they are patently distinguishable from the prior art:
Adaptive TTS Speech Rate- the reply seems uncertain about which dependent claim contains this feature by noting it is "likely" in claim 4 "or" 5 and again references the non-existent "claim chart."  It is noted that a somewhat similar feature is only found in claim 20 and indefinite claim 29 where such a feature is in the alternative and not required.  It is also noted that this concept was addressed via the teachings of Weng, not Belvin or Attwater as argued.
User Profile Updates and Personalized models- no specific dependent claim is mentioned as the Applicant calls this feature a "Dependent claim example."  Also, the only claim dealing with an interaction history as argued by applicant is claim 11 and was addressed via the teachings of Sekar not Enzinger as argued by applicant.
Detection of Specific Conversation Anomalies- again, it is unclear what dependent claim is being traversed by mention of a "Dependent claim example."  The examiner cannot find any claim discussing the term "anomaly."  Since Applicant also did not identify any dependent claim containing such features, these arguments seem to be directed towards features not claimed.
Voice Biometric Fraud Detection- Applicant talks about a "Dependent claim example" without actually identifying any specific dependent claim.  As best as the examiner can tell, Applicant might be referring to claim 12, which is the only claim mentioning the detection of "fraudulent activity".  Applicant does argue the teachings of the reference that was actually applied to the claim in this instance (Enzinger) and contends that Enzinger does not teach the claim's "comparing incoming voice to stored voice prints to ensure it's the legitimate user."  Claim 12, however, regards exactly what is being taught by Enzinger the distinguishing between a human's voice and "synthesized speech" in Paragraphs 0081 and 0084.  The claim makes no mention of voice prints as argued by Applicant.
Environment-Based ASR Switching- Applicant seems to be providing or giving advice for a potential dependent claim in regards to this feature.  This argument seems to instead be a claim drafting strategy rather than pertaining to any claim limitations subjected to any grounds of rejection.  These comments are confusing, off-base, and moot.
Sentiment Analysis to Adjust Dialogue- this argument again seems like a discussion of advice or a strategy for claim drafting going forward because it includes discussions such as "A dependent claim could involve..." These types of arguments are again confusing, off-base, and moot.
In concluding, Applicant again provides statements relating to what claims "likely," "might," or "could" cover and makes passing allegations of being "unique" or "novel."  Such statements amount to general allegations of patentability and moreover seem uncertain as to what is actually being claimed (i.e., might, could, likely).  Accordingly, these closing statements are not found to be persuasive.
For these preceding reasons, Applicant arguments have been fully considered, but are not found to be persuasive.

Claim Objections

Claims 5-6, 16, and 19 are objected to because they include reference characters which are not enclosed within parentheses.  
Reference characters corresponding to elements recited in the detailed description of the drawings and used in conjunction with the recitation of the same element or group of elements in the claims should be enclosed within parentheses so as to avoid confusion with other numbers or characters which may appear in the claims.  See MPEP § 608.01(m).

Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: the "bi-directional audio connector unit," "user identification module," "speech/audio processing unit," "conversation controller module, "session monitoring module," and "TTS (text-to-speech module "in claim 1; the “voice biometrics module” in claim 14; and the “Voice Activity Detector. module” in claim 16.
These elements are computer-implemented meaning that the structure in the form of a computer/processor and corresponding algorithm are read in from the specification.  See MPEP 2181(II)(B).  For these algorithms in the case of claim 1, the bi-directional audio connector unit performs parsing and voice activity detection (VAD) to identify segments (Paragraph 0031), the user identification module performs identification and/or registration of the user using the user's caller number and/or a unique identification number assigned to the user 101 when the user is pre-registered (Paragraph 0023), speech/audio processing unit performs translation of an input voice into a text output using a model best suited to the user (Paragraph 0033), the conversation control module obtains a plurality of user data of the and uses the profile of past statistical data and call statistics to select or modify ASR and NLU models however the algorithm for carrying out the modification is missing in the specification (Paragraph 0025), the session monitoring module determines the session state of the user interaction session, record session state related information between the user and the service instance, and stores such information in association with a user profile (Paragraphs 0045-0046), the TTS (text-to-speech) module recognizes text message information and performs speech synthesis using TTS parameters and models (Paragraph 0055), the voice biometrics module lacks a corresponding underlying algorithm in the specification that just repeats the function of the claim (Paragraph 0030), and the VAD module performs speech/noise determination based upon a threshold comparison (Paragraph 0027).
Claim 2 appears to adds functionality to the conversation controller module in the form of modifying various models.  The specification repeats the claim language (Paragraph 0025), and thus, lacks an algorithm corresponding to this claimed function.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-29 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim limitation “conversation controller module” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.  Claim 1 regards model modification wherein the specification for this computer-based component only repeats the language of the claims (Paragraph 0025) and lacks the underlying algorithm for how the modification is performed. The additional functions of the speech processing unit set forth in claim 9 also lack an underlying algorithm because the specification represents these functions with further generic modules that do not explain how any of emotion, sentiment, noise profile and environmental audio information is determined from the received audio data features (Paragraphs 0028 and 0033).  The additional functions of the user identification module set forth in claim 12 lack an underlying algorithm as the specification merely repeats the language of the claim without explaining how the function is achieved (Paragraph 0023).  The voice biometrics module of claim 14 lacks an underlying algorithm as the specification merely repeats the language of the claim without explaining how the function is achieved (Paragraph 0030).  Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.  For claim interpretation in the interest of compact prosecution, the base functionality in each claim in which the module/unit is referenced will be interpreted as being performed by a computer processor.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim 2 recites the limitation "said conversation controller" in line 3.  Claim 3 also recites this limitation in line 3.  There is insufficient antecedent basis for this limitation in the claim and it is unclear whether the term refers back to the conversation controller module in parent claim 1 or is attempting to add an additional conversation controller to the claimed invention.  For claim interpretation purposes in the interest of compact prosecution, the “said conversation controller” will be interpreted as –said conversation controller module--.

The claims are replete with antecedent basis issues:
In claim 1, Lines 25-26, "the interaction session duration" lacks antecedent basis and it is unclear what term this limitation is referencing.  For claim interpretation in the interest of compact prosecution, "the interaction session duration" will be interpreted as --an interaction session duration--.
In claim 1, Line 31, "the user's intention" lacks antecedent basis and it is unclear what term this limitation is referencing.  For claim interpretation in the interest of compact prosecution, this limitation will be interpreted as --an intention of the user--.  Claim 17 contains a similar antecedent basis issue and is rejected under similar rationale.
In claim 3, lines 4-5, "non-speech segments" appears to refer back to the term introduced in parent claim 1, but lacks a referential modifier so it is unclear whether the reference is being made or a new term is being introduced.  For claim interpretation, this limitation will be construed as --the non-speech segments--.
In claim 5, Line 4, "said NLU component" lacks antecedent basis and it is unclear what term this limitation is referencing.  For claim interpretation in the interest of compact prosecution, this limitation will be interpreted as --an NLU component--.
In claim 10, Lines 4-5, "the applicable forms and/or slots" lacks antecedent basis and it is unclear what term is being referenced, but the applicant may be attempting to refer back to the "plurality of forms and/or slots" of parent claim 7.  For claim interpretation this limitation will be construed as --the plurality of forms and/or slots--.  Note that claim 11 repeats this term and should likewise be corrected to be consistent with claim 10.
In claim 12, Lines 4-5, "the user's human voice" and "the user's stored biometrics" both lack antecedent basis and it is unclear what terms are being referenced.  For claim interpretation, these limitations will be construed as --a human voice of the user-- and --stored biometrics of the user--.
In claim 15, line 3 "said session monitoring tool" lacks antecedent basis and it is unclear whether the term attempts to reference the "session monitoring module" of claim 1 or is attempting to introduce a new term.  For claim interpretation, the limitation will be construed as --said session monitoring module.
In claim 16, lines 5-6, "speech segments" and "non-speech segments" appear to be referring back to terms appearing in parent claim 1, but lack referential modifiers.  For claim interpretation, these limitations will be interpreted as being preceded by --said--.
In claim 19, Line 5, "the audio segment" lacks antecedent basis and it is unclear what term this limitation is referencing.  For claim interpretation, this limitation will be construed as --an audio segment--.
In claim 21, Lines 6-7, "the audio data of the collection of user speech audio from the corresponding user interaction session" lacks antecedent basis and it is unclear what term this limitation is referencing.  For claim interpretation, this limitation will be construed as --audio data of a collection of user speech audio from a corresponding user interaction session--.  In line 8, "the models associated with..." also lacks antecedent basis and will be construed as not including the definite article "the".  In line 10, "the audio data features" also lacks antecedent basis and will be construed as not including the definite article "the".
In claim 23, lines 5-6, "the determined system actions" and "the applicable forms and/or slots" lacks antecedent basis and it is unclear what terms are being referenced.  For claim interpretation, these limitations will be construed as not including the definite article "the".
In claim 24, line 5, "a user interaction session" would appear to refer back to the term introduced in parent claim 17 though given the lack of a referential modifier it is unclear whether a new instance of the term is attempting to be added.  For claim interpretation, the limitation will be construed as --the user interaction session--.
In claim 28, line 5 "the goal" lacks antecedent basis and it is unclear what term this limitation is referencing.  For claim interpretation in the interest of compact prosecution, this limitation will be interpreted as --a goal--.

Regarding claim 29, the phrase "such as" renders the claim indefinite because it is unclear whether the limitations following the phrase are part of the claimed invention.  See MPEP § 2173.05(d).  For claim interpretation in the interest of compact prosecution, the limitations following such as will be interpreted in the alternative (i.e., --comprising…. or-- instead of “such as…and”).

In addition to the issues individually noted above in regards to the dependent claims, dependent claims 2-16 and 18-29 are rejected under 35 U.S.C. 112(b) by virtue of their dependency because they inherit the indefiniteness issues of their respective claims.


Claim Rejections - 35 USC § 101

35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-29 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.  
Claim 1 recites a process that, under its broadest reasonable interpretation (BRI), covers performance of the limitation in the mind but for the recitation of the use of generic computer components and computer processing.  For example, under the BRI claim 1 could be performed by:
determines and stores at least one of speech segments, non-speech segments, turn-taking speech segments and barge-in speech segments in the user interaction session (a human can listen to audio and parse the audio into segments based upon what they heard and either remember the sequence or transcribe the sequence using pen and paper);
authenticates the user to a user interaction session using the user's caller number or a unique identification number assigned to the user (a human can check a list to verify that a pin, social security number, customer number, etc. on a database list matches the voice and/or identity of the particular person as a mental judgement);
receives and analyzes conversation data and audio data features from a user speech input, and stores ASR (Automated Speech Recognition) models corresponding to the user interaction session (a human can listen to and mentally evaluate speech while writing down models that would be appropriate (e.g., for a particular topic or intent));
receives and processes transcripted text corresponding to the conversation data and stores corresponding dialogue engine components and NLU (Natural Language Understanding) models to handle voice based interactions with the user in the user interaction session (a human can receive text by reading transcribed text on paper and write down appropriate components and models based upon e.g., topic or intent of an utterance that was heard);
a dialogue state tracker (105e), the dialogue state tracker (105e) appends information related to the user interaction session (a human can mentally remember or use paper to record what was discussed and details that were agreed upon (e.g., number of tickets and destination city for a flight booking));
receives the audio features and chooses and/or modifies associated ASR and NLU models for the user interaction session to optimize the user interaction session duration (a user can listen to and understand speech and decide models that would be most appropriate as a mental evaluation (e.g., words related to troubleshooting would relate to models associated with a user guide) and appropriate to a customer’s history (e.g., a television model that the user had purchased based upon a customer profile));
monitors the user interaction session and adds key metrics corresponding to the user interaction session to the conversation controller module (a human user can record a log of conversation statistics and customer data (e.g., whether a sales pitch was successful or unsuccessful) using pen and paper);
generates a response corresponding to the user's intention during the user interaction session (a human can listen to another person’s request, evaluate that request, and think of an appropriate reply (e.g., an answer to a particular question)); and 
receives the generated response and performs speech (a human can speak the reply that they mentally determined).
This judicial exception is not integrated into a practical application. In particular, the claim only recites two additional elements – the use of a generic computer components and speech synthesis. The use of generic computer components amounts to no more than mere instructions to implement an otherwise abstract idea. Furthermore, the computer is not improved as a tool, but only used for its standard purpose of executing a program to carry out the above identified abstract idea.  Speech synthesis is mere computer automation of a manual process of human speech without specifics or relating to an improvement.  Speech synthesis as used in the claim is used for its ordinary purpose of producing an artificial speech output.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception and the remaining claim limitations are well-known, routine, and conventional such as to not qualify as an inventive concept. Specifically, the use of a generic
computer amounts to mere instructions to implement an abstract idea and is well-understood, routine, and conventional as evidenced by Bancorp Services v. Sun Life, 687 F.3d 1266, 1278, 103 USPQ2d 1425, 1433 (Fed. Cir. 2012).  Some limitations also involve storing and retrieving information in memory.  While these limitations were addressed as part of the mental process, applicant should also be aware of Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93.
	For these reasons, claim 1 is not directed towards patent eligible subject matter under 35 U.S.C. 101.
Independent method claim 17 contains functionality similar to claim 1, and thus, is not directed towards patent eligible subject matter under 35 U.S.C. 101 under similar rationale.
The remaining dependent claims fail to add patent eligible subject matter to their respective parent claims-
Claim 2, 4, 9, 12-16, 19, 24-25, and 27 regard or narrow classification decisions and consideration of customer/user profile information addressed in regards to claim 1.
Claim 3 regards threshold modification where a human can assign an appropriate value mentally.
Claims 5-6 narrow computer components and/or databases covered in the claim 1 analysis.
Claim 7-8, 22, and 26 decides on an action as a response where a human can write down a response using pen and paper or check for an answer in a book/manual relying on rules/guidance.
Claims 10-11, 23, and 28 regard form filling where a human can listen to a request and fill out a form using pen and paper.
Claim 18 regards the use of a models and thresholds for speech/non-speech classification where a human can use a timing model to mentally judge when a speech has continued for a sufficient time and use a mental sound level to discern silence.  The storage feature was addressed in claim 1 as being well-known.
Claim 20 regards a step where a human can slow down or speed up their speaking rate based upon mentally considering the preferences or feedback of another person.
In regards to claim 21, updating models regard features addressed in the claim 1 analysis.  As per training- MPEP 2106.05(f) provides the following considerations for determining whether a claim simply recites a judicial exception with the words “apply it” (or an equivalent), such as mere instructions to implement an abstract idea on a computer: (1) whether the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished; (2) whether the claim invokes computers or other machinery merely as a tool to perform an existing process; and (3) the particularity or generality of the application of the judicial exception. In the instant claim, the model training only presents the idea of a solution (i.e., approach to achieve various models) while failing to describe the process as to how the models are actually trained.
Claim 29 regards model storage that was addressed in claim 1 as being well-known.
Accordingly, the dependent claims fail to add patent eligible subject matter to their respective parent claims and are likewise rejected under 35 U.S.C. 101.

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 4-5, 7-8, 10-11, 13, and 15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sekar, et al (U.S. PG Publication:  2021/0201238 A1).
With respect to Claim 1, Sekar discloses:
A user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system with a human-computer interaction, the user interaction management system (100) for monitoring and optimizing the user interaction session comprising:
a bi-directional audio connector unit (103) (computer/server processor, Paragraphs 0022-0024), the bi-directional audio connector unit (103) determines and stores at least one of speech segments, non-speech segments, turn-taking speech segments and barge-in speech segments in the user interaction session (voice detection, Paragraph 0101; parsing/segmentation of speech (e.g., into phrases), Paragraphs 0076 and 0090; interaction data placed in storage, Paragraph 0046; “at least one of…and” is interpreted as being disjunctive since these audio types occur in the alterative during a conversation and/or may not occur in a session);
	a user identification module (111) (computer/server processor, Paragraphs 0022-0024), the user identification module (1l1) authenticates the user to a user interaction session using the user's caller number or a unique identification number assigned to the user (stored registered customer account information including "account numbers," Paragraph 0093, and "authentication" using such "account identifying information" or a "social security number," Paragraph 0100);
speech/audio processing unit (104) (computer/server processor, Paragraphs 0022-0024), the speech processing unit (104) receives and analyzes conversation data and audio data features from a user speech input, and stores ASR (Automated Speech Recognition) models corresponding to the user interaction session (speech is "transcribed into text by a speech-to-text system," Paragraph 0090; various models are "optimized" for the customer, Paragraphs 0061,0063, and 0123-0125; storage device, Fig. 2, Element 220);
a dialogue engine (105), the dialogue engine (105) receives and processes transcripted text corresponding to the conversation data and stores corresponding dialogue engine components and NLU (Natural Language Understanding) models to handle voice based interactions with the user in the user interaction session (dialog manager that  receives text and performs language processing to manage "the general flow of the conversion based on a set of decision rules," Paragraphs 0076-0078; models used for natural language processing (NLP), Paragraph 0061, 0076, and 0088);
a dialogue state tracker (105e), the dialogue state tracker (105e) appends information related to the user interaction session ("maintains history and state of the conversation, and generates an outbound communication based on the history and state,” Paragraph 0077);
a conversation controller module (109) (computer/server processor, Paragraphs 0022-0024), the conversation controller module (109) receives the audio features and chooses and/or modifies associated ASR and NLU models for the user interaction session to optimize the user interaction session duration (interaction models optimized to be specific to "a particular customer" providing a spoken input where ASR and NLU include different dictionaries, terminology, dialects, etc., Paragraph 0069, 0071, 0076, 0090, and 0123-0126; consideration of statistics on previous customer interactions, Paragraphs 0046, 0057, and 0110);
 a session monitoring module (108) (computer/server processor, Paragraphs 0022-0024), the session monitoring module monitors the user interaction session and adds key metrics corresponding to the user interaction session to the conversation controller module (109) (customer data is "maintained" in a storage device and includes "customer profiles," Paragraph 0046; reports on statistics supplied in "real-time," Paragraph 0057; dialog session controlled based upon information received pertaining to customer information, Paragraphs 0069, 0071, 0076, 0090, and 0110);
a dialogue engine dispatcher (106), the dialogue engine dispatcher generates a response corresponding to the user's intention during the user interaction session (chatbot generates a dialog response to a customer/user based upon a determined intent, Paragraphs 0068-0069, 0071, 0073-0074, 0078-0079 (describing response generation based upon a specific customer profile and their input), and 0105); and
a TTS (text-to-speech) module (107) (computer/server processor, Paragraphs 0022-0024), the TTS module (107) receives the generated response and performs speech synthesis (text-to-speech processor, Fig. 3, Element 342, used by an interactive "voice response" system chatbot, Paragraph 0035; chatbots use VoiceXML files to provide an interactive voice response (see Voice in Fig. 6), Paragraph 0068; chatbots use different models/dictionaries/dialects to produce a particular voice, Paragraphs 0069-0070).
With respect to Claim 4, Sekar further discloses:
The user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 1, wherein said conversation controller (109) is further configured to select and/or modify a conversation data model based on the received audio features and/or an existing user profile (conversation model based upon a user profile, Paragraphs 0069, 0076, and 0079-0080).
With respect to Claim 5, Sekar further discloses:
The user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 1, wherein said dialogue engine 105 further comprises at least one of the said NLU component 105a, an NLU model storage 105b,a dialogue engine core model database 105c, an action server 105d, and the dialogue state tracker 105e arranged within the dialogue engine 105 (natural language understanding and different models (e.g., grammars/lexicons), various scripts with actions and paths through a dialog at a server, dialog manager includes a "history and state" manager for the conversation, a conversation flow "script" model, and response generator at a server, Paragraphs 0075 and 0077-0078). 
With respect to Claim 7, Sekar further discloses:
The user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 1, wherein said dialogue engine (105) is further configured to predict an applicable system action based on the received conversation data using a dialogue engine core model storage (105c) (“dialog manager 272 selects a response deemed to be appropriate at the particular point of the conversation flow/script, and outputs the response to the output generator,” Paragraph 0078; storage device containing operational system data that includes the dialog manager, Paragraphs 0045-0046) wherein said applicable system action includes at least one of the following:
generating a transcription for a spoken response; querying a corresponding database or making a call; generating at least one of a plurality of forms and/or slots; and validating the one or the plurality of forms and/or slots (answers to questions from databases, Paragraphs 0050, 0052, and 0078; data form filling, Paragraph 0104; making a call to an agent, Paragraph 0101; “at least one of…and” is interpreted as disjunctive since a system action need not simultaneously consist of all actions where other actions can also be included, see Specification, Paragraph 0042).
With respect to Claim 8, Sekar further discloses:
The user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 7, wherein said applicable system action for the conversation data is determined using at least one of Transformer Embedding Dialogue (TED) Policy, Memorization Policy, and Rule Policy (actions are “rules-driven” and involve “decision rules,” Paragraphs 0051 and 0077; note that the language “at least one of…and” is interpreted as being disjunctive because the specification defines these elements in the alternative and does not describe every embodiment of the invention requiring all three options).
With respect to Claim 10, Sekar further discloses:
The user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system during a human-computer interaction, as claimed in claim 7, wherein said dialogue engine (105) is further configured to carry out the applicable system action and populate the applicable forms and/or slots corresponding to the user interaction session using an action server (105d) (customer prompt to fill in missing information in fields, Paragraph 0104; actions carried out, Paragraphs 0078 and 0126).
With respect to Claim 11, Sekar further discloses:
The user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 10, wherein said applicable forms and/or slots are populated to add personal dialogue history information to the dialogue state tracker(105e) (fields relate to customer parameters or data to be filled by a dialog manager when going through a conversation, Paragraphs 0078 and 0091).
With respect to Claim 13, Sekar further discloses:
The user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 1, wherein said user identification module (111) is further configured to store and update user profiles with past and present interaction session status and call session statistics, ASR models, dialogue engine models and TTS models corresponding to the existent user profiles (customer data is "maintained" in a storage device and includes "customer profiles," Paragraphs 0046 and 0057; note that these statistics include details of “previous” interactions and in real time from current interactions; user information also includes ASR grammars associated with the user and dialog information statistics, Paragraphs 0069 and 0076; as well as a voice lexicons used to respond, Paragraphs  0069 and 0076; Fig. 6, Element 342- text-to-speech module linked to dialog model/script output).
With respect to Claim 15, Sekar further discloses:
The user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 1, wherein said session monitoring tool (108) is further configured to determine and calculate a happiness index for the user in real- time during the user interaction session (dialog manager confidence level based upon positive and negative customer reactions, Paragraph 0078).

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Belvin, et al. (U.S. Patent:  7,831,433).
With respect to Claim 2, Sekar teaches the system for personalizing a dialog session with a user and an IVR system as applied to claim 1.  Although the system of Sekar may customize different ASR functionality for a user such as the selection of grammars for a speech input to accomplish a particular service (Paragraph 0076), Sekar does not teach the selection and modification of models related to on-speech segments, turn-taking speech segments and barge-in speech segments.
Belvin, however, discloses the selection and adaptation of various models used by a human-machine dialog system including turn-taking/initiative, speech input, non-speech input, and barge-in/user interruptions (Col. 11, Line 25- Col. 12, Line 16).
Sekar and Belvin are analogous art because they are from a similar field of endeavor in interactive dialog systems.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar to include the different dialog model types including barge-in and non-speech taught by Belvin to provide a predictable result of enabling mixed-initiative dialog in the system of Sekar (Belvin, Col. 11, Lines 38-44).

Claims 3 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Attwater, et al. (U.S. PG Publication:  2006/0200350 A1).
With respect to Claim 3, Sekar teaches the system for personalizing a dialog session with a user and an IVR system including speech segment parsing as applied to claim 1.  Sekar does not teach assigning and modifying thresholds for determining non-speech segments.  Attwater, however, discloses the assignment and “dynamic modification” of voice activity detection (VAD) thresholds that relate to “speech/noise decisions” (Paragraphs 0122-0124).
Sekar and Attwater are analogous art because they are from a similar field of endeavor in interactive voice interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar to include the adaptive VAD thresholds as taught by Attwater to provide a predictable result of allowing voice detection to be tuned for different applications or noise environments (Attwater, Paragraph 0123).
With respect to Claim 16, Attwater further discloses:
said bi-directional audio connector unit (103) is further configured to identify and parse the audio segment received from the user's 101 speech input in the user interaction session into speech segments and non-speech segments using a Voice Activity Detector (referred to as VAD hereafter) module (103a) (endpointing speech segments from non-speech segments using voice activity detector (VAD) that relies upon a threshold including thresholds related to duration, Paragraphs 0025 and 0123-0124).
The use of a VAD provides a predictable result of distinguishing between human speech and other sounds that might otherwise be incorrectly classified as user speech (Attwater, Paragraph 0025).

Claims 6 are rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Mielke, et al. (U.S. PG Publication:  2023/0135179 A1).
With respect to Claim 6, Sekar teaches the system for personalizing a dialog session with a user and an IVR system including speech segment parsing as applied to claim 1.  Sekar does not teach that dialogue engine further comprises at least one of a Large Language Module , an action server and the dialogue state tracker  arranged within the dialogue engine.
Mielke, however, disclsoes a dialog manager that includes a state tracker and action selector (Paragraph 0069).  The dialog manager of Mielke also carrying out the dialog using a large language model and prompting (Paragraphs 0261 and 0281-0282).
Sekar and Mielke are analogous art because they are from a similar field of endeavor in interactive voice interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar to include the dialog manager components taught by Mielke including large language models in order to provide a predictable result of generating more fluent and coherent dialog interactions that are more unique (Mielke, Paragraph 0280).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Machanavajhala, et al (U.S. PG Publication:  2023/0343330 A1).
With respect to Claim 9, Sekar teaches the system for personalizing a dialog session with a user and an IVR system including speech segment analysis as applied to claim 1.  Sekar does not teach that speech analysis involves the detection and analysis of at least one of emotion, sentiment, noise profile, and environmental audio information from the received audio data features.  Machanavajhala, however, disclsoes:
detect and analyze at least one of emotion (emotional analysis such as happy, angry, etc., Paragraph 0021), sentiment (sentiment in the form of an intensity level, Paragraphs 0021 and 0026), noise profile (noise analysis with specific templates, Paragraph 0037), and environmental audio information (environmental information, Paragraph 0039) from the received audio data features (received audio data 101, Paragraph 0039).
Sekar and Machanavajhala are analogous art because they are from a similar field of endeavor in interactive voice interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar to include the additional audio analyses taught by Machanavajhala to provide a predictable result of performing more thorough consideration of user speech to carry out a dialog where a sentiment may impact dialog flow.

Claims 12 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Enzinger, et al. (U.S. PG Publication:  2021/0193174 A1).
With respect to Claim 12, Sekar teaches the system for personalizing a dialog session with a user and an IVR system including user identification analysis as applied to claim 1.  Sekar does not teach that user identification involves a function to distinguish between synthesized speech and the user's human voice in the received conversation data from the user's stored voice biometrics for detection of any fraudulent activity.  Enzinger, however, teaches the identification of fraudulent voice phishing using a voice profile to discern between a voice from a given person or from a speech synthesizer (Paragraphs 0081 and 0084).
Sekar and Enzinger are analogous art because they are from a similar field of endeavor in interactive voice interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar to include the voice phishing detection taught by Enzinger to provide a predictable result of detecting and warning against voice telephony threats (Enzinger, Paragraph 0003).
With respect to Claim 14, Enzinger further disclsoes:
The user interaction management system (100) for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 1, wherein said speech and audio processing unit (104) is further configured to utilize voice biometrics to identify and register the user participating in the user interaction session using a voice biometrics module (104e) (voice biometrics are used in association with a user for identification and registration, Paragraphs 0049, 0103, 0139, and 0150-0151; see Paragraphs 0136-0138 for profile creation/registration).
Voice profile registration enables the predictable authentication of a valid user into an IVR platform.

Claims 17, 22-24, 26-27, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Mont-Raynaud, et al. (U.S. PG Publication:  2018/0301151 A1).
With respect to Claim 17, Sekar discloses:
A method for user interaction management for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, the method for user interaction management for monitoring and optimizing a user interaction session comprising the steps of:
determining speech segments, (parsing/segmentation of speech (e.g., into phrases), Paragraphs 0076 and 0090; turn taking speech such as initial step and clarification steps, Paragraph 0089-0090); 
receiving and analyzing conversation data and audio features from a user speech input (speech in a conversation is "transcribed into text by a speech-to-text system," Paragraph 0090; various models are "optimized" for the customer, Paragraphs 0061,0063, and 0123-0125; storage device, Fig. 2, Element 220);
receiving the audio features and choosing and/or modifying associated ASR (Automated Speech Recognition) and NLU (Natural Language Understanding) models for the user interaction session (interaction models optimized to be specific to "a particular customer" providing a spoken input where ASR and NLU include different dictionaries, terminology, dialects, etc., Paragraph 0069, 0071, 0076, 0090, and 0123-0126; consideration of statistics on previous customer interactions, Paragraphs 0046, 0057, and 0110);
receiving and processing transcripted text corresponding to the conversation data (receiving text and performs language processing to manage "the general flow of the conversion based on a set of decision rules," Paragraphs 0076-0078; models used for natural language processing (NLP), Paragraph 0061, 0076, and 0088); 
appending information related to the user interaction session (state and history information is appended as a conversation continues- "maintains history and state of the conversation, and generates an outbound communication based on the history and state,” Paragraph 0077); 
monitoring the user interaction session and adding key metrics (customer data is "maintained" in a storage device and includes "customer profiles," Paragraph 0046; reports on statistics supplied in "real-time," Paragraph 0057; dialog session controlled based upon information received pertaining to customer information, Paragraphs 0069, 0071, 0076, 0090, and 0110); 
generating a response corresponding to the user's intention during the user interaction session (generation of a dialog response to a customer/user based upon a determined intent, Paragraphs 0068-0069, 0071, 0073-0074, 0078-0079 (describing response generation based upon a specific customer profile and their input), and 0105); and 
performing speech synthesis on the generated response synthesis (text-to-speech processor, Fig. 3, Element 342, used by an interactive "voice response" system chatbot, Paragraph 0035; chatbots use VoiceXML files to provide an interactive voice response (see Voice in Fig. 6), Paragraph 0068; chatbots use different models/dictionaries/dialects to produce a particular voice, Paragraphs 0069-0070).
Although well-known in dialog management, IVR, and/or voice assistant procedures, Sekar does not describe the determination of non-speech and barge-in segments in a dialog conversation.  Mont-Raynaud, however, discloses a voice activity detector that separates a speech signal "from silence or other non-speech sounds" (Paragraph 0093).  Mont-Raynaud also teaches that utterance types that are analyzed and addressed includes a "barge-in" during a system/agent response (Paragraphs 0118-0119).
Sekar and Mont-Raynaud are analogous art because they are from a similar field of endeavor in interactive dialog systems.  Thus, it would have been obvious to modify the teachings of Sekar to include non-speech segments and barge-in as taught by Mont Raynaud in order to provide the predictable result of implementing an IVR capable of identifying when a user is not providing a meaningful input and enabling user-driving control of a conversation via barge-in.
Claim 22 contains subject matter similar to Claim 7, and thus, is rejected under similar rationale.
Claim 23 contains subject matter similar to Claim 10, and thus, is rejected under similar rationale.
With respect to Claim 24, Sekar further discloses:
The method for user interaction management for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 17, wherein the step of receiving and analyzing conversation data and audio features from user speech input further comprises:  authenticating the user to a user interaction session using the user's caller number or a unique identification number assigned to the user (stored registered customer account information including "account numbers," Paragraph 0093, and "authentication" using such "account identifying information" or a "social security number," Paragraph 0100).
Claim 26 contains subject matter similar to Claim 8, and thus, is rejected under similar rationale.
With respect to Claim 27, Sekar further discloses:
The method for user interaction management for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 17, wherein said key metrics added include at least one of confidence scores, users' level of expertise, number of application forms/ slots, conversation length, fall back rate, retention rate, and goal completion rate (customer data includes handle time/conversation length, Paragraph 0046; abandonment rate that corresponds to a retention rate, Paragraph 0057; note that the language “at least one of…and” is interpreted as being disjunctive because the specification defines these elements in the alternative and does not describe every embodiment of the invention requiring all alternatives).
With respect to Claim 29, Sekar further discloses:
The method for user interaction management for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 17, wherein the step of performing speech synthesis on the generated responses further comprises:  storing TTS models and TTS parameters such as, speaking rate, pitch, volume, intonation, and preferred responses corresponding to the user interaction session (text-to-speech processor, Fig. 3, Element 342, used by an interactive "voice response" system chatbot, Paragraph 0035; chatbots use VoiceXML files to provide an interactive voice response (see Voice in Fig. 6), Paragraph 0068; chatbots use different models/dictionaries/dialects to produce a particular voice, Paragraphs 0069-0070; note the interpretation of the unclear “such as” claim language set forth in the preceding rejection under 35 U.S.C. 112(b)).

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Mont-Raynaud, et al. and further in view of Chang, et al. (U.S. PG Publication:  2014/0188470 A1).
With respect to Claim 18, Sekar in view of Mont-Raynaud teaches the process for personalizing a dialog session with a user and an IVR system including speech parsing as applied to claim 17.  Sekar in view of Mont-Raynaud does not teach the speech/non-speech discernment set forth in claim 18, however, Chang further discloses:
Receiving assigned models for determining speech segments (provided “speech model,” Paragraph 0050); 
receiving an assigned threshold for determining non-speech segments (threshold to declare audio to be non-speech, Paragraph 0050); 
listening to user speech input audio (gathering audio samples via a microphone, Paragraph 0050); 
applying the assigned models for determining speech segments (speech models are applied for speech detection, Paragraph 0050); 
applying the assigned threshold for detecting non-speech segments (threshold is applied for non-speech detection, Paragraph 0050); and
 storing and sending the speech input audio for speech processing (voice samples/features are stored for “further analysis,” Paragraph 0050).
Sekar, Mont-Raynaud, and Chang are analogous art because they are from a similar field of endeavor in interactive voice interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar in view of Mont-Raynaud to include the VAD processing taught by Chang to provide a predictable result of increasing power efficiency of a user device by only operating a speech analyzer when a speech signal is detected (i.e., implementing wake-up functionality).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Mont-Raynaud, et al. and further in view of Attwater, et al.
With respect to Claim 19, Sekar in view of Mont-Raynaud teaches the process for personalizing a dialog session with a user and an IVR system including speech parsing as applied to claim 17.  Sekar in view of Mont-Raynaud does not teach identifying and parsing the audio segment received from the user's speech input in the user interaction session into speech segments and non-speech segments.  Attwater, however, disclsoes endpointing speech segments from non-speech segments using voice activity detector (VAD) that relies upon a threshold including thresholds related to duration (Paragraphs 0025 and 0123-0124).
Sekar, Mont-Raynaud, and Attwater are analogous art because they are from a similar field of endeavor in interactive voice interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar in view of Mont-Raynaud to include the VAD processing taught by Attwater to provide a predictable result of distinguishing between human speech and other sounds that might otherwise be incorrectly classified as user speech (Attwater, Paragraph 0025).

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Mont-Raynaud, et al. and further in view of Weng, et al. (U.S. PG Publication:  2017/0116986 A1).
With respect to Claim 20, Sekar in view of Mont-Raynaud teaches the process for personalizing a dialog session with a user and an IVR system including text-to-speech synthesis as applied to claim 17.  Sekar in view of Mont-Raynaud does not teach adjusting a speaking rate for the generated response corresponding to the received audio features and/or an existing user profile.  Weng, however, recites detecting a user state based upon input audio and adjusting an output dialog at a speaking rate to complement the rate of speech of the user (Paragraphs 0006, 0032, and 0044).
Sekar, Mont-Raynaud, and Weng are analogous art because they are from a similar field of endeavor in interactive voice interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar in view of Mont-Raynaud to include speaking rate adjustment based upon a user voice input taught by Weng in order to provide a predictable result of making conversation more efficient and meaningful (Weng, Paragraph 0031).

Claims 21 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Mont-Raynaud, et al. and further in view of Ljolje (U.S. PG Publication:  2009/0112599 A1).
With respect to Claim 21, Sekar in view of Mont-Raynaud teaches the process for personalizing a dialog session with a user and an IVR system including appending updating sessional information as applied to claim 17.  Sekar further discloses:
The method for user interaction management for monitoring and optimizing a user interaction session within an interactive voice response system during a human- computer interaction, as claimed in claim 17, wherein the step of appending information related to the user interaction session further comprises the steps of:
updating and training the ASR and NLU models associated with a registered user profile using the audio data of the collection of user speech audio from the corresponding user interaction session ("training and modifying...models...based on collected interaction data," Paragraphs 0059-0060 where models include natural language processing and speech recognition models, Paragraphs 0069, 0076, 0090, 0123-0125, and 0173); and 
updating and training the models associated with determining speech segments and turn taking speech segments (see above discussion, Paragraphs 0052, 0069, and 0076-0078). 
Although Sekar uses session interactions including audio to perform updates and personalization, Sekar in view of Mont-Raynaud does not teach the updating of additional models set forth in claim 21 of non-speech segments, turn-taking speech segments and barge-in speech segments.  Ljolje, however, discloses the iterative training of a spoken dialog system model that includes barge-in that relates to user-initiative turn taking, non-speech segments, and speech segments (Paragraphs 0048 and 0051).
Sekar, Mont-Raynaud, and Ljolje are analogous art because they are from a similar field of endeavor in interactive voice interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar in view of Mont-Raynaud to include the iterative barge-in modeling updating to provide a predictable result of improving and training for a user’s ability to barge-in during a dialog (Ljolje, Paragraph 0023).
Claim 28 contains subject matter similar to claim 10, and thus, is rejected under similar rationale.
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Sekar, et al. in view of Mont-Raynaud, et al. and further in view of Enzinger, et al.
With respect to Claim 25, Sekar in view of Mont-Raynaud teaches the process for personalizing a dialog session with a user and an IVR system including user authentication as applied to claim 24.  Sekar in view of Mont-Raynaud does not teach the use of voice biometrics for registration and identification.  Enzinger, however, teaches:
The step of authenticating a user to the user interaction session using the user's caller number or a unique identification number assigned to the user further comprises the step of:  utilizing voice biometrics to identify and register the user participating in the user interaction session (voice biometrics are used in association with a user for identification and registration, Paragraphs 0049, 0103, 0139, and 0150-0151; see Paragraphs 0136-0138 for profile creation/registration).
Sekar, Mont-Raynaud, and Enzinger are analogous art because they are from a similar field of endeavor in interactive voice interfaces.  Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date, to modify the teachings of Sekar in view of Mont-Raynaud to include the voice profile registration taught by Enzinger to provide a predictable result of enabling the predictable authentication of a valid user into an IVR platform.

Conclusion

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655



/JAMES S WOZNIAK/               Primary Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Oct 09, 2023
Application Filed
Jun 26, 2025
Non-Final Rejection — §101, §102, §103
Nov 19, 2025
Response Filed
Jan 26, 2026
Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/399,876
Patent 12597422
SPEAKING PRACTICE SYSTEM WITH RELIABLE PRONUNCIATION EVALUATION
2y 5m to grant Granted Apr 07, 2026
18/488,578
Patent 12586569
Knowledge Distillation with Domain Mismatch For Speech Recognition
2y 5m to grant Granted Mar 24, 2026
18/359,113
Patent 12511476
CONCEPT-CONDITIONED AND PRETRAINED LANGUAGE MODELS BASED ON TIME SERIES TO FREE-FORM TEXT DESCRIPTION GENERATION
2y 5m to grant Granted Dec 30, 2025
18/390,934
Patent 12512100
AUTOMATED SEGMENTATION AND TRANSCRIPTION OF UNLABELED AUDIO SPEECH CORPUS
2y 5m to grant Granted Dec 30, 2025
18/448,628
Patent 12475882
METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION (ASR) USING MULTI-TASK LEARNED (MTL) EMBEDDINGS
2y 5m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
59%
Grant Probability
99%
With Interview (+40.1%)
3y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 385 resolved cases by this examiner. Grant probability derived from career allow rate.