Last updated: May 29, 2026
Application No. 18/567,746
LEARNING SYSTEM, LEARNING METHOD, AND LEARNING PROGRAM

Non-Final OA §103
Filed
Dec 06, 2023
Priority
Jun 10, 2021 — nonprovisional of PCTJP2021022223
Examiner
SIRJANI, FARIBA
Art Unit
2659
Tech Center
2600 — Communications
Assignee
NTT, Inc.
OA Round
2 (Non-Final)
Interview Optional

— +31.5% interview lift. Examiner has a relatively high allowance rate (76%); +31.5% interview lift. A written response may suffice.
Based on 554 resolved cases, 2023–2026
Examiner Intelligence

SIRJANI, FARIBA View full profile →
Grants 76% — above average
Career Allowance Rate
419 granted / 554 resolved
+13.6% vs TC avg
Strong +32% interview lift
Without
With
+31.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
19 currently pending
Career history
580
Total Applications
across all art units
Statute-Specific Performance

§101
1.5%
-38.5% vs TC avg
§103
91.0%
+51.0% vs TC avg
§102
3.9%
-36.1% vs TC avg
§112
1.3%
-38.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 554 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1 and 7-8 are independent and are amended.
Claims 1-6 and 9 are in a chain of dependency.
Claim 7 and 10-15 are in a chain of dependency.
Claim 8 and 16-20 are in a chain of dependency.
This Application was published as U.S. 20240282293.
Apparent priority: 10 June 2021.
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that, if presented, were necessitated by the amendments to the Claims.
This action is Final.
Examiner attempted to reach the Applicant.  See the Conclusion section please.
Response to Amendments and Arguments
Applicant’s arguments are directed to the amended language and are moot in view of the modified grounds of rejection.
Claim 1 is amended as follows and the other independent Claims are amended similarly:
1. A learning system comprising a processor configured to execute operations comprising: 
obtaining information observed around a user who has uttered a voice command; and 
learning the obtained information as a condition for executing the voice command, 
wherein the learning further comprises learning peripheral information external to the user obtained simultaneously with the voice command and the learned peripheral information is automatically added to the condition for executing the voice command and selected based on a Levenshtein distance.
	Support for the amendments is provided in the instant Application at:
[0021] FIG. 5 illustrates an example of a peripheral information obtaining process according to the present disclosure.

    PNG
    media_image1.png
    466
    986
    media_image1.png
    Greyscale

[0022] FIG. 6A illustrates an example of an execution condition determining process according to the present disclosure.
[0047] For example, an example of the matching value is the Levenshtein distance between peripheral information and an execution condition. The Levenshtein distance will be described later with reference to FIGS. 6A and 6B. In a case where the matching value is the Levenshtein distance, the smaller the matching value, the more the peripheral information matches the execution condition.

    PNG
    media_image2.png
    648
    902
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    544
    988
    media_image3.png
    Greyscale

	The amendments add a condition/context /peripheral information to a spoken command.  Taking into consideration different categories of context (time, place, personal profile of the user, etc.) for conducting speech recognition or for determining semantics and intent of a command is well-known.  The type of context is such that it can be compared to the originally set contexts/conditions by a Levenshtein distance which is a distance between two strings of character.  So, the context/peripheral information has to be in the form of a character string.

	35 USC 101: Rejection is withdrawn in view of the amendments.
Considering the context, e.g. identity of the person issuing a command or query in responding to this person is one of the aspects of human interaction and as included in the Claim lacks any inventive concept and has not yet integrated the idea in a practical technological application.  You respond to the same question from a child differently and if asked for directions you change the response depending on weather and traffic.  The Claim needs to include the technological particulars of how its steps are going to be done by its particular inventive method.  
The use of Levenshtein distance which  is a mathematical method of assessing similarity and couches the determination of similarity in terms that a machine can handle and thus integrates the abstract idea of using context in answering to a question into a practical technological application that is best handled by a machine.
	
35 USC 103: Arguments are moot in view of the new grounds of rejection that address the added language.
Added language is addressed with new or modified grounds of rejection.  Note Azam was cited because the type of information it uses as context/peripheral is close to the types disclosed and depicted in the Drawings by the instant Application although the definition of peripheral in the instant Application is broad and includes heart rate of the user.  Azam does not use Levenshtein distance.

	Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: 
NOTE:  The phrase “External to the User” has no antecedent basis in the Specification.  Accordingly, it is not given weight and “peripheral information external to the user” is interpreted in the context of the supporting Disclosure as the types of peripheral information disclosed by the Specification and Drawings.
To overcome the Objection: (1) remove this phrase from the Claim OR (2) point to the parts of the Disclosure that provide support for it and explain what it intends to convey.
See the various Definition of Peripheral Information in the Specification and note that the Specification considers them “about the speaker” and not “external to the user.”  At any rate, use particular language in the Claim consistent with the Specification to avoid broad interpretation:
[0046] In a case where the user has uttered a voice command, the execution condition learning system 100 determines whether the current surrounding situation matches the learned execution condition (step S3). The execution condition learning system 100 can determine matching of an execution condition, on the basis of a matching value and a threshold.
[0047] For example, an example of the matching value is the Levenshtein distance between peripheral information and an execution condition. The Levenshtein distance will be described later with reference to FIGS. 6A and 6B. In a case where the matching value is the Levenshtein distance, the smaller the matching value, the more the peripheral information matches the execution condition.
[0048] The execution condition learning system 100 calculates the minimum matching value. In the example illustrated in FIG. 3, the minimum matching value is 3. In this example, the threshold is 10. As the current surrounding situation meets at least one execution condition, the execution condition learning system 100 executes a voice command A.
…
[0057] The peripheral information obtainer 121 obtains peripheral information about the speaker. The peripheral information obtainer 121 is an example of the obtainer.
[0058] The peripheral information is information observed around the user who has uttered the voice command. The peripheral information includes various kinds of information (the surrounding environment and the surrounding situation) regarding the surroundings of the user who has made the utterance. The various kinds of information regarding the surroundings of the user are information regarding the system being used by the user, for example. The peripheral information regarding the system includes at least one of the title of the foremost system screen, the process name (a numerical value), and a value displayed on the system screen (a character string or a numerical value).
…
[0062] The peripheral information is not necessarily data information related to the system screen. The peripheral information may be information observed by a peripheral device of the user. For example, in a case where the peripheral device is a wearable device, the peripheral information may be sensing data (a heart rate or an eyeball potential, for example).
…
[0067] As illustrated in FIGS. 6A and 6B, an example of the matching value is a weighted sum obtained by calculating a quantity that is given by a Levenshtein distance in a case where the peripheral information is a character string and is given by an absolute value of a difference in a case where the peripheral information is a numerical value for each piece of the peripheral information, and multiplying the calculated quantity by a weighting coefficient set for each piece of the peripheral information. Here, the Levenshtein distance corresponds to the minimum number of procedures necessary in transforming one character string into another character string by inserting, deleting, or replacing one character. For example, in the table of execution conditions shown in FIG. 6B, the matching value in the first row is 3. More specifically, the Levenshtein distance of the title column is 1, the Levenshtein distance of the process column is zero, the Levenshtein distance of the various value (URL) column is 3, the Levenshtein distance of the various value (heading) column is zero, the various value (contract price) column is a fixed value ß being none, and a matching value of 3 is obtained as a weighted sum that is the sum of these values of the corresponding columns multiplied by a. Likewise, a matching value of 4 is obtained as the matching value in the second row. Among these values, the smallest one is the matching value of 3, and the matching value of 3 is smaller than the threshold 4 set in the execution condition. Therefore, this execution condition is determined to be “valid”.
…
[0088] As illustrated in FIG. 8, the peripheral information obtainer 121 of the execution condition learning system 100 first obtains the peripheral information about the user who is the speaker (step S101).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 5-11, 13-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yoon (U.S. 20190005957) in view of Azam (U.S. 20160358603 and further in view of Vempaty (U.S. 20220399008).
Regarding Claim 1, Yoon teaches:
1. A learning system comprising a processor configured to execute operations comprising: [Yoon, Fig. 1, “[0023] Referring to FIG. 1, an application control device according to one embodiment of the present invention includes a condition setting module 110, a command setting module 130, a command recognition module 150, and an application control module 170. The term “˜module” used herein should be interpreted as a software or hardware component, or a combination thereof within the context. The software may be machine language, firmware, embedded code, and an application program. The hardware may be a circuit, a processor, a computer, an integrated circuit, an integrated circuit core, a sensor, a micro-electro-mechanical system (MEMS), a passive device, or a combination thereof.”]
obtaining information observed around a user who has uttered a voice command; and [Yoon, Fig. 2, “control condition recognition unit 151” obtains “information observed around a user” of the Claim.  “[0038] The control condition recognition unit 151 of the command recognition module 150 determines whether the smart terminal satisfies the preset speech recognition condition, when a command set to be recognized in a specific condition is successfully recognized, the per-condition designated command recognition unit 153 transmits the recognized command to the application control module 170 so that the recognized voice command can be converted into an application control signal. ….”  See “condition setting module 110” including “environment condition setting unit 115” and “device status condition setting unit 113” and “event condition setting unit 111” all of which can teach “information observed around a user” of the Claim.  “[0034] The command designation unit 131 of the command setting module 130 designates voice commands which differ from condition to condition. Therefore, only when the designated voice command is issued under a specific condition, the voice command is converted into an application program control signal. For example, when the event condition setting module 111 sets an event of a morning call as the speech recognition control condition, the command designation unit 131 may designate specific voice commands such as “dismiss”, “snooze”, and “in five minutes” which are related to control of a morning call. Therefore, when the morning call event occurs, the speech recognition mode is activated, and only the designated voice commands can be recognized.”  Fig. 4, S420. ]
learning the obtained information as a condition for executing the voice command. [Yoon, Fig. 2, “command learning 155.”  “[0037] The command learning unit 155 may learn commands input by a user in advance. For example, the command learning unit 155 may learn commands which are frequently input in a specific application program used by a user. The commands that are learned by the command learning unit 155 may include frequently used commands such as “snooze” and “in five minutes” that differ according to user daily life or per-user smart terminal usage pattern….”  The “specific application program” is one type of “condition.”  “[0044] At S410, a condition setting module 110 sets a specific condition in which to activate a speech recognition mode in a smart terminal. For example, an event condition in which a specific application program related to a morning call, an alarm, a phone, and a camera is activated may be set as one of speech recognition conditions….”]
wherein the learning further comprises learning peripheral information external to the user obtained simultaneously with the voice command and the learned peripheral information is automatically added to the condition for executing the voice command and selected based on a Levenshtein distance.
While the broad language of the Claim with respect to learning/training based on command context is taught by Yoon, Yoon does not elaborate on the learning from context/condition and another more specific reference is added.
Azam teaches:
obtaining information observed around a user who has uttered a voice command; and [Azam, Figs. 2 and 8, 110 and 710.  Figs. 2 and 8, 130/720: determine the current context scope of the electronic device.  “[0039] The method 100 begins at block 110, where the processor 30 detects at least one voice input from a user of an electronic device 10. The user may directly provide the voice input to the device (e.g., by speaking at the device 10)…”  “[0041] At 130, the control unit 33 determines a current context scope of the electronic device. This may be performed by the context scope determination module 39. The current context scope of the electronic device is the application, process, or activity that is currently running on the device or is being performed by the device. For example, if the user is browsing the internet, with the device 10, the current context scope is the browser If the user is watching a video on a video-sharing website, the current context scope is that video-sharing website. When the user is at the home screen of the device 10, the current context scope is the home screen of the device. Determining the current context scope of the device 10 benefits the described process because the processor 30 may analyze the voice input much more accurately based on the context scope of the device. As explained in additional details below, every voice command may be associated to a distinct action depending on the context scope of the device 10.”]
learning the obtained information as a condition for executing the voice command, [Azam, Figs. 2 and 8.  In the command training mode shown in Figure 8 first the current context is determined (720) and then the command is defined or learned in the current context (750). “[0062] … Thus, when the user provides the same voice input/command in the same context scope, the control unit applies the learning techniques and calculates a higher possibility score for the “new” voice input/command.”]
Yoon and Azam pertain to voice commands and each emphasize one aspect of the voice command training/learning.  It would have been obvious to combine the features of the two References in order to buttress the features that are weakly supported in one by the teachings of the other one.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Yoon and Azam do not mention the Levenshtein distance. 

Vempaty teaches:
1. A learning system comprising a processor configured to execute operations comprising: 
obtaining information observed around a user who has uttered a voice command; and [Vempaty is directed to “Interpretation of user commands is accelerated through digital user interfaces of various modalities, including generation and presentation of command modifications for rapid correction of incomplete or erroneous user commands. …”  Abstract.  Figure 1, “Initial Command 321” is converted to “Modified Commands 342” based on the “context” shown in Figure 10A.  “[0014] FIG. 8 illustrates an example computer process that generates modified commands based on an ongoing interactive context that may have any of various forms.”  “[0040] … Voice input transcription and generated recommendations are tuned based on the context or state of an application that the user is interacting with. …”  “[0046] … Example of an integrated device include a desktop computer, laptop computer, tablet computer, smartphone, or wearable device. In general, one or more of the input devices can be selected to capture user actions in addition to or instead of other activities in the surroundings, where the capture data could form part of the context used to interpret the user commands or actions. …”  Figure 8, Section 4.6 and [0110]-[0126] discuss the context/ “information around the user” in detail and have many variations on the definition of context.]
learning the obtained information as a condition for executing the voice command, [Vempaty, Figure 2, 208: Feedback Loop and Learning Instructions.   Figure 10B: “Machine Learning Model 1031 … 1034.” 3“[0078] Discussed later herein are various usage scenarios, strategies and mechanisms for implementing recommendation generators 330, and ways to arrange, filter, and refine modified commands 341 after generation. Also presented herein is a richer command log than 370 that may be horizontally and vertically sliced in various ways for data mining and for data-driven behaviors such as personalization, contextualization, and machine learning. Included herein are various feature vectors that encode one row or a few related rows of a command log as input to a machine learning model. Historical log(s) of one or many users may be used for supervised training. After training, the machine learning model may use a log of a recent few commands in an ongoing interactive session to make predictions that accelerate command execution by predicting a modified command that is likely to be accepted by the user or predicting which one of recommendation generators 330 likely generates that modified command.”]
wherein the learning further comprises learning peripheral information external to the user obtained simultaneously with the voice command and the learned peripheral information is automatically added to the condition for executing the voice command and selected based on a Levenshtein distance. [Vempaty uses the Levenshtein distance to correct a command and use it for its ML models:  “[0088] If the interpreted command 322 meets certain criteria, as discussed below, such as a low confidence score, then computer device 102 proposes variations of interpreted command 322 that might better reflect the intent of the users. In an embodiment in step 503, recommendation generators 330 generate set of modified commands 341 that are based on interpreted command 322. As discussed earlier and later herein, each of generators 331-334 may generate, in various ways, a subset of modified commands 341. For example, with phonetic replacement based on algorithms such as Metaphone or Soundex, phonetic generator 331 may generate several modified commands that phonetically differ from interpreted command 322 by no more than a threshold amount of measurable phonetic distance, such as Levenshtein distance.”]

    PNG
    media_image4.png
    558
    1022
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    690
    940
    media_image5.png
    Greyscale

Yoon/Azam and Vempaty pertain to voice commands and each emphasize one aspect of the voice command training/learning.  It would have been obvious to add the use of Levenshtein distance from Vempaty to the combination to handle machine learning when character strings are at issue.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 2, Yoon teaches:
2. The learning system according to claim 1, the processor further configured to execute operations comprising: 
identifying one or a plurality of conditions for executing the voice command; [Yoon, Fig. 2, “condition setting module 110” and Fig. 4, S410, S440.  “[0044] At S410, a condition setting module 110 sets a specific condition in which to activate a speech recognition mode in a smart terminal….”  “[0046] At S440, it is determined whether the events and the device status information satisfy the speech recognition control condition….”]
determining whether the obtained information matches at least one condition of the one or the plurality of conditions; and [Yoon, Fig. 4, S440, Yes.  “[0047] In brief, the application control device and method described above set speech recognition conditions, designate voice commands that differ from condition to condition, and recognize the designated voice commands in a specific condition….”]
executing the voice command, when the determining further comprises determining that the obtained information matches at least one condition of the one or the plurality of conditions, [Yoon, Fig. 2, S450 and S460. “[0046] … When the speech recognition control condition is satisfied, at S450, the speech recognition mode is activated and the commands designated to be recognized in the speech recognition control condition are recognized.  At S460, the command recognized by the control mode 170 is converted into a smart terminal application control signal so that an application program in the smart terminal can be controlled.”]
wherein, when the executing further comprises executing the voice command, the learning further comprises learning the obtained information as a condition for executing the voice command. [Yoon, Fig. 2, “command learning unit 155” learns a command as a part of the “command recognition module 150” before communicating with the “application control module 170” for execution of the command.] 
Yoon is not as express regarding some of the learning conditions.
Azam teaches:
identifying one or a plurality of conditions for executing the voice command; [Azam, Fig. 1, 130.  “[0041] At 130, the control unit 33 determines a current context scope of the electronic device. This may be performed by the context scope determination module 39. The current context scope of the electronic device is the application, process, or activity that is currently running on the device or is being performed by the device….”]
determining whether the obtained information matches at least one condition of the one or the plurality of conditions; and [Azam, Fig. 6 (or 2), 540.  “[0074] … At 540, the control unit determines the current context scope of the electronic device. This step is similar to step 130 of the method 100…”]
executing the voice command, when the determining further comprises determining that the obtained information matches at least one condition of the one or the plurality of conditions, [Azam, Fig. 5, 560 and Fig. 2, 160.  “[0049] At 160, the control unit 33 performs an action on the electronic device 10 based on the identified command….”]
wherein, when the executing further comprises executing the voice command, the learning further comprises learning the obtained information as a condition for executing the voice command. [Azam, Fig. 8, 720 and 750.  “[0070] The method 700 begins at 710, where the control unit 33 transitions the electronic device info a command training mode For example, the user may first provide an initial command to the device 10 to initiate training mode (e.g., “training,” “training mode;” “create new command;” etc.). At 720, the control unit 33 determines the current context scope of the electronic device 10 (similarly to step 130 of the method 100). Then, at 730, the control unit identifies an action on the electronic device performed in the current context scope of the device 10. For example, the control unit records an action (e.g., tapping, swiping, pinching, etc.) in the context space that is performed by the user. Next, the control unit receives a new voice input from the user directed to the performed action (at 740). For example, the user may open a browser (i.e., context scope), click on the refresh button (i.e., performs an action identified by the control unit), and provide a voice input to the control unit (“refresh,” “refresh the page, etc.).”  “[0071] In one implementation, the control unit 33 may then display a “new command” message box that includes the name/text of the command (e.g., “refresh”), the action associated with the command, etc. The user may confirm or cancel the new command. At 750, the control unit associates the action in the current context scope to the voice input to create the new command for the electronic device. For example, the action identified by the control unit is linked with a software code rule associated with the specific command (i.e., “refresh”). The text structure, the command, and the associate software code rule linked with the action are stored (e.g., in databases 20, 80. etc.).”]
Rationale for combination as provided for Claim 1.  Both references teach the fundamentals of these Claims and each reference may be stronger in describing some features over the other.

Regarding Claim 3, Yoon teaches:
3. The learning system according to claim 2, the processor further configured to execute operations comprising 
displaying a user interface that enables the user to select the voice command by a method other than speech, [Yoon, Figs. 3 and 5 show voice commands issued when the user cannot use his hands (driving in Fig. 3).  However, the device includes a display and can receive touch input:  “[0009] According to one embodiment, a device for controlling an application program in a smart terminal, includes: a condition setting module for setting speech recognition control conditions including an event condition and a device status condition such that a speech recognition mode is activated under a specific condition whereby the smart terminal can be controlled by a voice input or a display touch input under the specific condition;…”] 
wherein, when the displaying further configured to receiving selection of the voice command via the user interface, the executing further comprises executing the voice command. [Yoon, Figs. 3 and 5.  Goal of Yoon is to permit voice commands when input by hand/touch is not convenient:  “[0013] Voice-based control on a smart terminal is very convenient in many events such as responding to a morning call or to a call when driving, etc. To this end, the present disclosure improves user convenience in using a smart terminal by enabling accurate control on a smart terminal using voice commands in some conditions in which it is difficult for a user to reach or touch the smart terminal.”  “[0024] The condition setting module 110 sets a specific condition in which to activate speech recognition mode. For example, the condition setting module 110 sets multiple speech recognition control conditions in which to activate voice recognition, the speech recognition control conditions including an event occurrence, a device status, and a device environment such that the smart terminal can be controlled by a voice input or a touch input only in speech recognition control conditions set by the condition setting module 110.”  “[0041] … As described with reference to FIG. 3, the present disclosure automatically activates speech recognition mode when a specific event occurs in a condition (for example, during exercise) in which it is difficult to control the smart terminal by means of a touch input, and designates differently designates voice commands according to each type of events. …”]
While Yoon teaches the display feature, Azam is more express.
Azam teaches:
displaying a user interface that enables the user to select the voice command by a method other than speech, wherein, when the displaying further configured to receiving selection of the voice command via the user interface, the executing further comprises executing the voice command. [Azam, “[0061] At 440, if an ambiguity still exists about the potential command based on the users voice input, the control unit 33 may make a suggestion to the user of the device 10 regarding the target command. For example, if the text structure from the user's voice input is “create a tab” and the target command is “open a new tab,” the control unit may display a message box on the screen 32. The message box may display a message to the user (e.g., “do you want to open a new tab?”), if the user rejects the suggestion, the control unit may propose the command that has the next higher possibility score in the same manner, in addition, the control unit 33 may also propose to create a new command (at 450) that includes an action on the electronic device For example, in the message box, the control unit 33 may display a new command suggestion message (e.g., “Add “create a tab” command?”) and present the user with an option to approve the suggested command. Thus, the new command may perform the same action as the target command. The control unit also receives a conformation from the user regarding the validity of the new command (at 460). That way, the new command is associated with an existing text structure and includes an action on the electronic device (e.g., the action associated with the existing text structure of the target command).”]
Rationale for combination as provided for Claim 1.  Both references teach the fundamentals of these Claims and each reference may be stronger in describing some features over the other.

Regarding Claim 5, Yoon teaches:
5. The learning system according to claim 1, 
wherein the obtaining further comprises obtaining information regarding a voice command input screen that accepts the voice command from the user, as the information observed around the user who has uttered the voice command. [Yoon, Fig. 2, See “condition setting module 110” including “environment condition setting unit 115” and “device status condition setting unit 113” and “event condition setting unit 111” all of which can teach “information observed around a user” of the Claim.] 

Regarding Claim 6, Yoon teaches and suggests:
6. The learning system according to claim 5, 
wherein the obtaining further comprises obtaining, as the information regarding the voice command input screen, the information including at least one of: a title of the voice command input screen, a process name of the voice command input screen, or a value displayed on the voice command input screen. [Yoon teaches: “[0010] …   whereby the application program in the smart terminal is allowed to be controlled by a voice input or a display touch input ….”  Any of the options of the claim may be suggested by the teachings of Yoon.]
Yoon is not express regarding any of the options however. 
Azam teaches:
wherein the obtaining further comprises obtaining, as the information regarding the voice command input screen, the information including at least one of: a title of the voice command input screen, a process name of the voice command input screen, or a value displayed on the voice command input screen. [Azam: “[0092] If the execution condition determiner 122 determines that the peripheral information does not match the execution condition (step S102: No), the voice command display 123 of the execution condition learning system 100 determines whether the voice command has been selected by a method other than speech (step S105). The voice command display 123 can display a user interface that enables selection of a voice command by a method other than speech. The voice command display 123 can receive selection of a voice command via the user interface.”]
Rationale for combination as provided for Claim 1.  Both references teach the fundamentals of these Claims and each reference may be stronger in describing some features over the other.

Claim 9 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.  Dependency of 9 is from 2 whereas 5 depends from 1.

Claim 7 is a method claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
7. A method for learning, comprising: 
obtaining information observed around a user who has uttered a voice command; and 
learning the information obtained in the obtaining step as a condition for executing the voice command,
wherein the learning further comprises learning peripheral information external to the user obtained simultaneously with the voice command and the learned peripheral information is automatically added to the condition for executing the voice command and selected based on a Levenshtein distance.

Claim 8 is a computer program product system claim with limitations corresponding to the limitations of method Claim 1 and is rejected under similar rationale.
8. A computer-readable non-transitory recording medium storing a computer-executable program instructions that when executed by a processor cause a computer system to execute operations comprising: [Yoon: “[0004] Speech recognition (also called voice recognition) refers to a technology for understanding human speech and converting it into computer-readable code information….”] [Azam: “[0015] The present description is directed to systems, methods, and computer readable media for controlling all operations of an electronic device with voice commands ….”]
obtaining information observed around a user who has uttered a voice command; and 
learning the obtained information as a condition for executing the voice command,
wherein the learning further comprises learning peripheral information external to the user obtained simultaneously with the voice command and the learned peripheral information is automatically added to the condition for executing the voice command and selected based on a Levenshtein distance.

Claim 10 is a method claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 11 is a method claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 13 is a method claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.  Depends from 7.
Claim 14 is a method claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale. Depends form 10.
Claim 15 is a method claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.

Claim 16 is a method claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 17 is a method claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 19 is a method claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.  
Claim 20 is a method claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.

Claims 4, 12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Yoon and Azam and Vempaty in view of Lee (U.S. 20220187907).
Regarding Claim 4, Yoon and Azam and Vempaty do not teach evaluating the conditions/context by comparison with at threshold.  All teach comparing the text recognized from the voice command against pre-stored texts and using a threshold but not for conditions.
Lee teaches:
4. The learning system according to claim 2, wherein the determining further comprises: 
setting a value indicating how much the obtained information differs from at least one condition of the one or the plurality of conditions, and determining whether the set value is smaller than a threshold to determine whether the obtained information matches at least one condition of the one or the plurality of conditions. [ Lee uses an attention value which is a duration of time to determine if a voice command was intended or not:  “[0066] … The user input indicating the intent to interact with the transitional element may additionally or alternatively include one or more of a hand gesture or a speech command. The instructions may additionally or alternatively be executable to determine the intent of the user to interact with a user interface that is at least partially hidden from a current view by comparing the time-dependent attention value to a threshold condition….”  “[0041] … Thus, a speech processing system 560 may output recognized commands from speech inputs received at a microphone, and a gesture processing system 562 may output recognized gesture commands. Recognized commands 564 may include intent confirming commands, as described above. The recorded timestamp for a recognized command may be compared to time-dependent attention value data 550 to link an intent determined based on an eye gaze determine if the user has confirmed an intent to perform an action associated with a location. In some examples, a timestamp for a recognized command may be compared to timestamps 542 for time-dependent attention values to determine a location with which to associate the command. Recognized commands 564, attention value data 550, and location data 530 can be input into a user intent determination module 570, which can apply one or more threshold conditions 572 to determine user intent 574. The user intent determination module 570 may take any suitable form. In some examples, the user intent determination module 570 may simply compare time-dependent values to thresholds. In other examples, the user intent determination module 570 may utilize a trained machine learning function that receives time-dependent attention values for locations and recognized commands 564 as inputs, and outputs a probability that a user intends to interact with a user interface. Any suitable machine learning function can be used in such examples, including but not limited to neural networks, support vector machines, random decision forests, and other types of classifiers. In such examples, a probability threshold may be applied as a threshold condition.”  Lee uses a gaze input condition as the obtained information that confirms the voice command.  “[0027] As described above, ambiguous gaze signals arising from saccadic eye movement pose challenges for accurately determining a user's intended gaze input. … When the time-dependent attention value for a particular gaze location meets a predetermined threshold condition for that gaze location, an action associated with the gaze location can be triggered. For example, if a user gazes at a location associated with a hidden HUD for a user interface, and the time-dependent attention value for that location exceeds a predetermined threshold value (as one example of a threshold condition), display device 200 displays the HUD. ….”  “[0042] … Further, in some examples, each individual user interface element in a displayed user interface may have its own associated time-dependent attention value, and each user interface element may be selected for further expansion based upon a threshold condition.”]
Yoon/Azam/Vempaty and Lee pertain to conditional/contextual voice commands and it would have been obvious to combine the feature of Lee which decides that a voice command has been intended based on whether a case condition meets a particular threshold with the system of combination in order to set some criteria for the satisfaction of the condition in addition for the criteria for the recognition of voice command.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Claim 12 is a method claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.

Claim 18 is a method claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Kore (U.S. 20120116748), Fig. 2, “voice command in the database” mentioned at 207 is the “information observed around a user who has uttered a voice command.”

Palanis (U.S. 20160078864): “[0044] As an example, a user can speak the voice command “Call up Train 3 intermediate precipitator detail” and the voice controlled device 100 can process train 3 intermediate precipitator detail. If the user does not remember the voice command, the user can issue an un-stored voice command “Precipitator detail”, the voice controlled device 100 can display a list of stored voice commands that match a keyword of the un-stored voice command and/or display sub-commands of a matching multi-part command, such as “Train 3 intermediate precipitator detail”, “Train 1 intermediate precipitator detail”, “Train 3 final precipitator detail”, and “Read Train 2 intermediate precipitator detail”. The user can issue a command from the list (which is a stored voice command) that is displayed on the screen.”

Examiner attempted to contact the Applicant to suggest the following.  Note that with each submission a new search will be conducted which may reveal references that have not been found as of the date of this Office action.
1. A learning system comprising a processor configured to execute operations comprising: 
obtaining information observed around a user who has uttered a voice command; 
learning the obtained information as a condition for executing the voice command; 
obtaining peripheral information regarding a second system being used by the user, wherein the peripheral information is generated as a character string, wherein the peripheral information is obtained simultaneously with the voice command, and wherein the peripheral information is not about the user who uttered the voice command; 
obtaining a Levenshtein distance between the peripheral information and the condition; and 
executing the voice command when the Levenshtein distance is smaller than a predetermined threshold.
In particular, training a machine learning system for executing voice commands with the environmental information observed around the user at the time of the utterance of the voice command as a condition for execution of the command and additionally, training the machine learning model on the additional peripheral information with was obtained simultaneously with the voice command, which pertains to a system being used by the user and is of the type that its closeness to an acceptance criterion is judged according to a Levenshtein distance which measures the distance between two character strings  when considered in the context of the independent Claims as a whole and considering each and every limitation of these claims was not found in the prior art.
Vempaty (U.S. 20220399008) which corrects the erroneous user commands by disambiguating the commands according to their Levenshtein distance with several alternative commands.  What Vempaty does with commands, the instant Claims do with context/condition.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached on 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached on 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Dec 06, 2023
Application Filed
Jul 15, 2025
Non-Final Rejection mailed — §103
Oct 15, 2025
Response Filed
Dec 15, 2025
Final Rejection mailed — §103
Feb 17, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/390,830
Patent 12640143
UTILIZING GENERATIVE MODEL IN GENERATING SUMMARY OF LONG-FORM CONTENT
2y 5m to grant Granted May 26, 2026
18/788,501
Patent 12640159
VOICE SIGNAL PROCESSING DEVICE, VOICE SIGNAL PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM STORING VOICE SIGNAL PROCESSING PROGRAM
1y 10m to grant Granted May 26, 2026
18/457,121
Patent 12614558
Method and Apparatus for Detecting Correctness of Pitch Period
2y 8m to grant Granted Apr 28, 2026
18/406,418
Patent 12605109
APPARATUS AND METHOD FOR DETERMINING BRAIN LANGUAGE AREA INVASION BASED ON SPEECH DATA
2y 3m to grant Granted Apr 21, 2026
18/454,031
Patent 12603099
SELF-ADJUSTING ASSISTANT LLMS ENABLING ROBUST INTERACTION WITH BUSINESS LLMS
2y 7m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+31.5%)
2y 9m (~3m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 554 resolved cases by this examiner. Grant probability derived from career allowance rate.