Last updated: April 19, 2026
Application No. 18/622,606
NON-SPEECH SOUND CONTROL WITH A HEARABLE DEVICE

Non-Final OA §103
Filed
Mar 29, 2024
Examiner
KIM, JONATHAN C
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Sony Group Corporation
OA Round
1 (Non-Final)
Interview Optional

— +40.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 355 resolved cases, 2023–2026
Examiner Intelligence

KIM, JONATHAN C View full profile →
Grants 74% — above average
Career Allow Rate
261 granted / 355 resolved
+11.5% vs TC avg
Strong +41% interview lift
Without
With
+40.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
20 currently pending
Career history
375
Total Applications
across all art units
Statute-Specific Performance

§101
17.6%
-22.4% vs TC avg
§103
47.5%
+7.5% vs TC avg
§102
11.8%
-28.2% vs TC avg
§112
15.0%
-25.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 355 resolved cases
Office Action

§103
DETAILED ACTION
This Office Action is in response to the correspondence filed by the applicant on 3/29/2024.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.

Information Disclosure Statement
The Information Statements (IDS) filed on 3/29/2024 and 7/24/2025 have been accepted and considered in this office action and are in compliance with the provisions of 37 CFR 1.97.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4, 7, 8, 11, 14, 15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over ZHANG (US 2025/0358559 A1), and in further view of HARIF (US 6,820,056 B1).

REGARDING CLAIM 1, ZHANG discloses a method for using a non-speech sound to control a feature associated with a hearable device, the method comprising: 
detecting a first pattern of non-speech sounds by a user of the hearable device (Par 90 – “The receiver 170B, also referred to as an “earpiece”, is configured to convert an audio electrical signal into a sound signal. When a call is answered or voice information is received by using the electronic device 100, the receiver 170B may be put close to a human ear to listen to a voice.”) created by one or more of breath, nose, tongue, lips, and throat of the user (Fig. 4; Par 113 – “Step 402: Obtain a vibration signal collected by a bone conduction microphone connected to the electronic device.”; Par 115 – “In an embodiment, the bone conduction audio signal may be an audio signal present when the nasal cavity and/or the throat of the user make/makes a sound of a cough or a hum. Spectral flatness of a frequency domain signal feature of the cough or the hum is greater than 0.8, a time domain is similar to that of a pulse signal, and short-term energy is 50% greater than that of a voice signal.”); 
identifying the first pattern of non-speech sound as a control gesture corresponding to a particular adjustment of the feature associated with the hearable device (Par 123 – “Step 405: When a comparison result is that the extracted vibration signal feature matches the control feature, perform an operation corresponding to the control feature, to complete processing on the service request.”; Pars 124-127 – “executing a pre-specified function of a disclosure installed on the electronic device; selecting an option on the electronic device; and activating a function of the electronic device, where the function of the electronic device may include: answering a call, making a call, starting environment monitoring of a headset, starting recording, or the like.”), by applying one or more sound factors (Par 113 – “Spectral flatness of a frequency domain signal feature of the cough or the hum is greater than 0.8, a time domain is similar to that of a pulse signal, and short-term energy is 50% greater than that of a voice signal.”); 
based, at least in part, on identifying the control gesture, adjusting the feature according to the particular adjustment (Par 123 – “Step 405: When a comparison result is that the extracted vibration signal feature matches the control feature, perform an operation corresponding to the control feature, to complete processing on the service request.”); and 
outputting to the user, a feedback indicator (Par 106 – “The motor 191 may generate a vibration prompt. The motor 191 may be configured to provide an incoming call vibration prompt and a touch vibration feedback. For example, touch operations performed on different disclosures (for example, photographing and audio playing) may correspond to different vibration feedback effect. The motor 191 may also correspond to different vibration feedback effect for touch operations performed on different areas of the display 194. Different disclosure scenarios (for example, a time reminder, information receiving, an alarm clock, and a game) may also correspond to different vibration feedback effect. Touch vibration feedback effect may be further customized.”) [to describe the adjusting of the feature].
ZHANG does not explicitly teach the [square-bracketed] limitation.

HARIF discloses the [square-bracketed] limitation. HARIF discloses a method/system for controlling a device with non-verbal commands comprising:
outputting to the user, a feedback indicator [to describe the adjusting of the feature] (HARIF Col 4:55-5:14 – “When a non-verbal sound is input, e.g. hand clap=cursor, FIG. 4, the cursor command is displayed, 66, along with a dialog line 67 requesting “Yes or No” confirmation. Then, if the user confirms the cursor command, the cursor 68 appears in an initial position in the text string 62. The cursor 68 may then be moved by commands, e.g. hand clap moves cursor to the right, tongue/mouth clack moves cursor to the left, knocking on desk moves cursor up and metallic tapping moves cursor down.”; Col 6:18-43 – “If the determination from step 82 is Yes, a non-verbal sound is recognized, then, step 84, that sound is compared with the stored command sounds. If there is No compare, then there is displayed to user: “Do Not Recognize”, step 90. If Yes, there is a compare, then, step 86, the command is displayed for confirmation. In the confirmation decision step 87, if there is No confirmation, then, again, there is displayed to user: “Do Not Recognize”, step 90.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ZHANG to include a feedback to describe the adjusting of the feature, as taught by HARIF.
One of ordinary skill would have been motivated to include a feedback to describe the adjusting of the feature, in order to reduce false positive detection of commands.


REGARDING CLAIM 4, ZHANG in view of HARIF discloses the method of claim 1, further comprising: 
producing a tactile feedback by moving one or more hearable components proximal to a user ear, wherein the tactile feedback is associated with outputting of the feedback indicator (ZHANG Par 106 – “The motor 191 may generate a vibration prompt. The motor 191 may be configured to provide an incoming call vibration prompt and a touch vibration feedback. For example, touch operations performed on different disclosures (for example, photographing and audio playing) may correspond to different vibration feedback effect. The motor 191 may also correspond to different vibration feedback effect for touch operations performed on different areas of the display 194. Different disclosure scenarios (for example, a time reminder, information receiving, an alarm clock, and a game) may also correspond to different vibration feedback effect. Touch vibration feedback effect may be further customized.”).

REGARDING CLAIM 7, ZHANG in view of HARIF discloses the method of claim 1, further comprising: 
outputting an inquiry for user control (ZHANG Par 106 – “The motor 191 may generate a vibration prompt. The motor 191 may be configured to provide an incoming call vibration prompt and a touch vibration feedback.”); 
detecting the first pattern of the non-speech sounds (ZHANG Par 159 – “After viewing the call request, the user makes a specific sound by using the nasal cavity and/or the throat. In this way, the electronic device may obtain a vibration signal collected by the bone conduction microphone connected to the electronic device, and then perform a subsequent procedure, so that the call request can be answered or rejected.”; Par 123 – “Step 405: When a comparison result is that the extracted vibration signal feature matches the control feature, perform an operation corresponding to the control feature, to complete processing on the service request.”); and 
determining the first pattern of non-speech sounds is responsive to the inquiry (ZHANG Par 159 – “After viewing the call request, the user makes a specific sound by using the nasal cavity and/or the throat. In this way, the electronic device may obtain a vibration signal collected by the bone conduction microphone connected to the electronic device, and then perform a subsequent procedure, so that the call request can be answered or rejected.”).

REGARDING CLAIM 8, ZHANG discloses a sound gesture control system to adjust a feature associated with a hearable device, the sound gesture control system comprising: at least one sensor to detect at least one non-speech sound of a user using the hearable device (Fig. 3 – “Sensor module 180”); a hearable device of a user (Fig. 3 Electronic device 100; Par 54 – “The electronic device may be a smartphone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), or another device.”) comprising: one or more processors (Fig. 3 Processor 110); and logic encoded in one or more non-transitory media for execution by the one or more processors and when executed, operable to perform operations (Par 29 – “According to a fourth aspect, embodiments of this disclosure provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is run on a computer, the computer is enabled to perform the method provided in the first aspect.”) comprising: 
detecting a first pattern of non-speech sounds by a user of the hearable device (Par 90 – “The receiver 170B, also referred to as an “earpiece”, is configured to convert an audio electrical signal into a sound signal. When a call is answered or voice information is received by using the electronic device 100, the receiver 170B may be put close to a human ear to listen to a voice.”) created by one or more of breath, nose, tongue, lips, and throat of the user (Fig. 4; Par 113 – “Step 402: Obtain a vibration signal collected by a bone conduction microphone connected to the electronic device.”; Par 115 – “In an embodiment, the bone conduction audio signal may be an audio signal present when the nasal cavity and/or the throat of the user make/makes a sound of a cough or a hum. Spectral flatness of a frequency domain signal feature of the cough or the hum is greater than 0.8, a time domain is similar to that of a pulse signal, and short-term energy is 50% greater than that of a voice signal.”); 
identifying the first pattern of non-speech sounds as a control gesture corresponding to a particular adjustment of the feature associated with the hearable device (Par 123 – “Step 405: When a comparison result is that the extracted vibration signal feature matches the control feature, perform an operation corresponding to the control feature, to complete processing on the service request.”; Pars 124-127 – “executing a pre-specified function of a disclosure installed on the electronic device; selecting an option on the electronic device; and activating a function of the electronic device, where the function of the electronic device may include: answering a call, making a call, starting environment monitoring of a headset, starting recording, or the like.”), by applying one or more sound factors (Par 113 – “Spectral flatness of a frequency domain signal feature of the cough or the hum is greater than 0.8, a time domain is similar to that of a pulse signal, and short-term energy is 50% greater than that of a voice signal.”); 
based, at least in part, on identifying the control gesture, adjusting the feature according to the particular adjustment (Par 123 – “Step 405: When a comparison result is that the extracted vibration signal feature matches the control feature, perform an operation corresponding to the control feature, to complete processing on the service request.”), wherein the feature is selected from the group of: setting, mode, audio content player, audio beam focus, calling interaction, and smart assistant operation (Par 87 – “The electronic device 100 may implement an audio function, for example, music playing and recording, by using the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headset jack 170D, the disclosure processor, and the like.”; Par 128 – “Answering a call is still used as an example. After displaying the service request in step 401, and after performing step 402 to step 405, the electronic device may perform an operation of answering the call.”); and 
outputting to the user, a feedback indicator (Par 106 – “The motor 191 may generate a vibration prompt. The motor 191 may be configured to provide an incoming call vibration prompt and a touch vibration feedback. For example, touch operations performed on different disclosures (for example, photographing and audio playing) may correspond to different vibration feedback effect. The motor 191 may also correspond to different vibration feedback effect for touch operations performed on different areas of the display 194. Different disclosure scenarios (for example, a time reminder, information receiving, an alarm clock, and a game) may also correspond to different vibration feedback effect. Touch vibration feedback effect may be further customized.”) [to describe the adjusting of the feature].

HARIF discloses the [square-bracketed] limitation. HARIF discloses a method/system for controlling a device with non-verbal commands comprising:
outputting to the user, a feedback indicator [to describe the adjusting of the feature] (HARIF Col 4:55-5:14 – “When a non-verbal sound is input, e.g. hand clap=cursor, FIG. 4, the cursor command is displayed, 66, along with a dialog line 67 requesting “Yes or No” confirmation. Then, if the user confirms the cursor command, the cursor 68 appears in an initial position in the text string 62. The cursor 68 may then be moved by commands, e.g. hand clap moves cursor to the right, tongue/mouth clack moves cursor to the left, knocking on desk moves cursor up and metallic tapping moves cursor down.”; Col 6:18-43 – “If the determination from step 82 is Yes, a non-verbal sound is recognized, then, step 84, that sound is compared with the stored command sounds. If there is No compare, then there is displayed to user: “Do Not Recognize”, step 90. If Yes, there is a compare, then, step 86, the command is displayed for confirmation. In the confirmation decision step 87, if there is No confirmation, then, again, there is displayed to user: “Do Not Recognize”, step 90.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ZHANG to include a feedback to describe the adjusting of the feature, as taught by HARIF.
One of ordinary skill would have been motivated to include a feedback to describe the adjusting of the feature, in order to reduce false positive detection of commands.


CLAIM 11 is similar to claim 4; thus, it is rejected under the same rationale.

CLAIM 14 is similar to claim 7; thus, it is rejected under the same rationale.

REGARDING CLAIM 15, ZHANG in view of HARIF discloses a non-transitory computer-readable storage medium carrying program instructions thereon for using sound gesture to control a feature associated with a hearable device, the instructions when executed by one or more processors cause the one or more processors to perform operations comprising: steps of claim 8; thus, it is rejected under the same rationale.

CLAIM 20 is similar to claim 7; thus, it is rejected under the same rationale.


Claims 2-3, 6, 9-10, 13, 16-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over ZHANG (US 2025/0358559 A1) in view of HARIF (US 6,820,056 B1), and in further view of KEITH (US 2024/0022565 A1).

REGARDING CLAIM 2, ZHANG in view of HARIF discloses the method of claim 1.
ZHANG in view of HARIF does not explicitly teach training a model based on non-gesture sounds and gesture sounds.
KEITH discloses a method/system for AI model trained on non-gestures sounds and gesture sounds further comprising: 
receiving output from an artificial intelligence (AI) model trained (KEITH Par 597 – “In the step 5904, results of the analysis are utilized in performing a function. The function is able to be an alarm during or after sleep, providing the data to the user, a doctor or other professional, and/or any other function. For example, if a child is being monitored for sleep apnea, a signal is able to be sent to another device (e.g., in the parents' room) to alert the parents that the child is having an apneic episode.”; Par 587 – “Using the method described, the technology is able to collect the sleeping and breathing conditions, and using advanced machine learning/AI, is able to record various kinds of apneic events and other normal or abnormal sleeping activities.”), at least in part, on non-gesture sounds regularly made by the user (KEITH Par 586 – “The sleep apnea method/device described herein is able to be located near the sleeping patient and uses sound, motion and several other sensors to monitor the patient for a number of factors including the position of the patient during sleep, the breath patterns, the movements of the patient such as toss/turn movements and leg flapping.”; Par 515 – “The device is able to continuously gather data including various input and behavioral output which are then able to be analyzed via machine learning to determine if any correlations or patterns are determined. …The device is able to detect when someone is walking slower than usual, breathing differently, has more accidents, performs poorly on tests, and/or any other behavioral performance changes.”; Par 570 – “Generate personal behavioral baseline models. The specific human is monitored to generate a baseline behavioral model. This baseline would be considered normal behaviors. Activities or human conditions beyond a threshold of normalcy would be compared to a common model to identify conditions for potential undesired outcomes.”; Par 576 – “In the step 5802, a personal behavioral baseline model for each user is generated. The specific user/human is monitored to generate a baseline behavioral model. The baseline is considered “normal” behaviors (e.g., the user is not having a psychological event).”; Par 587 – “Using the method described, the technology is able to collect the sleeping and breathing conditions, and using advanced machine learning/AI, is able to record various kinds of apneic events and other normal or abnormal sleeping activities.”) and on the control gestures (KEITH Par 587 – “Using the method described, the technology is able to collect the sleeping and breathing conditions, and using advanced machine learning/AI, is able to record various kinds of apneic events and other normal or abnormal sleeping activities.”; Par 591 – “For example, machine learning is implemented by analyzing many datasets of sleep apnea to learn what sounds, movements, patterns occur during sleep apnea. The currently monitored (e.g., real-time) information is then compared with that stored information to determine if an apneic event is currently occurring. For example, if the historical data indicates that a sign of sleep apnea is no breathing for a period of time above a threshold followed by a gasping (or similar) sound, then when a user is sleeping, and no breathing sound is detected for seconds (or another threshold) followed by a loud gasping/inhalation sound, it is able to be considered an apneic episode.”), to predict that the detected first pattern of non-speech sounds is the control gesture rather than a non-gesture sound (KEITH Par 588 – “Apneic episodes (the cessation of breathing) are able to be identified by the specific sound patterns (or lack thereof). Typically, the cessation of breath for a period of time, then a gasp or bodily jerk, then the normal continuation of breathing is one example. There are several types of sleep apnea, and each can be identified using this technique. The identification of the breath pattern is able to be identified by machine learning models and AI technologies. The system is able to further identify bodily movements including: restless sleep movements, tossing/turning, leg tossing, and others.”; Par 597 – “In the step 5904, results of the analysis are utilized in performing a function. The function is able to be an alarm during or after sleep, providing the data to the user, a doctor or other professional, and/or any other function. For example, if a child is being monitored for sleep apnea, a signal is able to be sent to another device (e.g., in the parents' room) to alert the parents that the child is having an apneic episode.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ZHANG in view of HARIF to include training a model with gesture and non-gesture sounds.
One of ordinary skill would have been motivated to include training a model with gesture and non-gesture sounds, in order to more accurately detect a gesture sound.


REGARDING CLAIM 3, ZHANG in view of HARIF and KEITH discloses the method of claim 1.
KEITH further disclose the method/system, wherein the control gesture includes a distinct pattern of breathing that is different from regular breathing patterns of the user (KEITH Par 515 – “The device is able to continuously gather data including various input and behavioral output which are then able to be analyzed via machine learning to determine if any correlations or patterns are determined. …The device is able to detect when someone is walking slower than usual, breathing differently, has more accidents, performs poorly on tests, and/or any other behavioral performance changes.”; Par 591 – “For example, machine learning is implemented by analyzing many datasets of sleep apnea to learn what sounds, movements, patterns occur during sleep apnea. The currently monitored (e.g., real-time) information is then compared with that stored information to determine if an apneic event is currently occurring. For example, if the historical data indicates that a sign of sleep apnea is no breathing for a period of time above a threshold followed by a gasping (or similar) sound, then when a user is sleeping, and no breathing sound is detected for seconds (or another threshold) followed by a loud gasping/inhalation sound, it is able to be considered an apneic episode.”), wherein the distinct pattern includes at least one variation in a particular rate of inhale and/or exhale and includes a predefined hold time after exhale and/or after inhale (KEITH Par 432 –“ Breathing patterns are able to be detected at other times as well (e.g., when the user is not talking). Breath(ing) patterns are able to be detected by measuring a duration between each breath (e.g., start to start or end to end), the volume of each breath (e.g., in decibels), and detecting the duration and volume over a period of time (e.g., 5 s, 30 s) to determine a pattern similar to a heart beat. Detecting a breath is able to be performed by sound matching (e.g., machine learning learns what a breath sound is. Moreover, each aspect of a breath is able to be detected. For example, a breath in makes a different sound than a breath out, and each is able to be detected. Similarly, there are pauses between each breath, where the amount of time of the pause is able to be slightly different for each user.”; Par 588 – “Typically, the cessation of breath for a period of time, then a gasp or bodily jerk, then the normal continuation of breathing is one example.”; Par 591 – “For example, if the historical data indicates that a sign of sleep apnea is no breathing for a period of time above a threshold followed by a gasping (or similar) sound, then when a user is sleeping, and no breathing sound is detected for seconds (or another threshold) followed by a loud gasping/inhalation sound, it is able to be considered an apneic episode.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ZHANG in view of HARIF to include a distinct breathing pattern.
One of ordinary skill would have been motivated to include a distinct breathing pattern, in order to accurately detect a sleep disorder of a user and to inform the user.


REGARDING CLAIM 6, ZHANG in view of HARIF discloses the method of claim 1, further comprising: 
receiving a second pattern of non-speech sounds (ZHANG Par 113 – “Step 402: Obtain a vibration signal collected by a bone conduction microphone connected to the electronic device.”); 
[gathering context information associated with the second pattern of non-speech sounds]; 
applying one or more non-gesture sound factors to identify the second pattern of non-speech sounds as a non-gesture sound (ZHANG Par 117 – “Spectral flatness of a typical voice is less than 0.6, and spectral flatness of a typical hum or cough is greater than 0.8.”; Pars 148-151 – “if the autocorrelation coefficient is greater than or equal to the preset threshold, determining that the extracted vibration signal feature matches the control feature; ….”; In other words, if the autocorrelation coefficient is less the preset threshold it is determined to be a non-gesture sound. For example, when the spectral flatness is less than a predetermined threshold (a value between 0.6 and 0.8), it is determined to be non-gesture sound (e.g., not cough, not hum)); and 
rejecting the second pattern of non-speech sounds for control of the feature (ZHANG Par 123 – “Step 405: When a comparison result is that the extracted vibration signal feature matches the control feature, perform an operation corresponding to the control feature, to complete processing on the service request.”; In other words, if the feature does not match, the method/system does not perform an operation; thus, the second pattern is rejected.).

ZHANG in view of HARIF does not explicitly teach the [square-bracketed] limitations.
KEITH discloses a method/system for AI model trained on non-gestures sounds and gesture sounds further comprising:
receiving a second pattern of non-speech sounds (KEITH Par 178 – “For example, if a user has been continuously using his device as he normally does, his gait matches the stored information, and his resulting trust score is 100 (out of 100) and there have been no anomalies with the user's device (e.g., the risk score is 0 out of 100), then there may be no need for further authentication / verification of the user.”; Par 441 – “FIG. 44 illustrates a diagram of performing breath pattern analytics according to some embodiments. The user is able to hold a mobile device 4400 (e.g., a smart phone) and talk as the user typically would. The microphone, camera, and/or sensors of the mobile device 4400 are able to detect and capture the user's breath information. In some embodiments, the mobile device 4400 processes the breath information using the processor and memory of the device. Processing is able to include sound/signal processing such as using filters, masks and machine learning to determine specific breath information among other sound information. The processed information is able to be compared with stored breath information to determine if the currently acquired information is a match of previously stored information.”); 
[gathering context information associated with the second pattern of non-speech sounds] (KEITH Par 178 – “In some embodiments, MFA includes behavioral analytics, where the device continuously analyzes the user's behavior as described herein to determine a trust score for the user. The device (or system) determines a risk score for the user based on environmental factors such as where the device currently is, previous logins/locations, and more, and the risk score affects the user's confidence score. In some embodiments, the scan of a dynamic optical mark is only implemented if the user's trust score (or confidence score) is below a threshold. For example, if a user has been continuously using his device as he normally does, his gait matches the stored information, and his resulting trust score is 100 (out of 100) and there have been no anomalies with the user's device (e.g., the risk score is 0 out of 100), then there may be no need for further authentication/verification of the user.”; Par 445 – “Analytics performed in this class can quickly and accurately identify the specific user. Examples of these analytics include: live face recognition—since the user is probably staring at the personal or stationary access device, the face will likely be available to the built-in device camera; voice pattern and quality analytics; breath pattern and quality analysis; external factors including location patterns, user height, environmental and weather; and micro-motion analytics.”); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ZHANG in view of HARIF to include gathering context information, as taught by KEITH.
One of ordinary skill would have been motivated to include gathering context information, in order to more accurately identify a specific user.

CLAIM 9 is similar to claim 2; thus, it is rejected under the same rationale.

CLAIM 10 is similar to claim 3; thus, it is rejected under the same rationale.

CLAIM 13 is similar to claim 6; thus, it is rejected under the same rationale.

CLAIM 16 is similar to claim 2; thus, it is rejected under the same rationale.

CLAIM 17 is similar to claim 3; thus, it is rejected under the same rationale.

CLAIM 19 is similar to claim 6; thus, it is rejected under the same rationale.


Claims 5, 12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over ZHANG (US 2025/0358559 A1) in view of HARIF (US 6,820,056 B1), and in further view of SCHUSTER (US 20140350926 A1).

REGARDING CLAIM 5, ZHANG in view of HARIF discloses the method of claim 1.
HARIF further disclose the method/system, wherein the feature includes [audio beam focusing] movement of an object and wherein the feedback indicator includes a notification of a section of [a sound field] the object that the [audio beam focusing] movement of an object is directed (HARIF Col 5:34-49 – “For example, if the user desires to control cursor movements, he may be presented with a default menu: hand clap move to right; mouth-tongue crack move to left … ”; Col 4:24-54 – “Some examples are vocal: long and short whistles, coughs or hacks, teeth clicks, mouth-tongue clacks and hisses; or manual-physical: knocking on a desk, tapping on a computer case with a metallic object, clapping hands or rubbing sounds. These sounds may be discerned by the above-described voice recognition apparatus based upon digitized sound patterns. Since the sounds are more distinct from each other and from speech words than the standard distinctions between speech words and verbal commands, such sounds are easily recognizable and distinguished by the recognition apparatus and programs. Thus, a comparison 55 is made of an input of non-verbal sound to the stored non-verbal sound commands 52 and recognized non-verbal sounds are input via display adapter 36 to display 38 for verification, as will hereinafter be described.”).
ZHAING in view of HARIF does not explicitly teach the [square-bracketed] limitations, and teaches the underlined features instead. In other words, HARIF teaches displaying a recognized non-verbal sound. Thus, when a “mouth-tongue crack” sound is recognized, the command “move to left” will be displayed as a feedback for verification.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ZHANG to include a feedback to describe the adjusting of the feature, as taught by HARIF.
One of ordinary skill would have been motivated to include a feedback to describe the adjusting of the feature, in order to reduce false positive detection of commands.

SCHUSTER discloses the [square-bracketed] limitations. SCHUSTER teaches a method/system for receiving control commands, wherein the feature includes [audio beam focusing] (SCHUSTER Par 41 – “For example, the apparatus 100 operator may command the beamformer 120 to change the direction of the beamform using commands such as “focus left”, “focus right”, “focus forward” (or “focus ahead”), etc. In response to these or similar voice commands, the beamformer controller 140 will accordingly adjust one or more of the filters 121, 123 or 125 to fulfill the command. In some embodiments, the beamformer controller 140 may access system memory 170 to obtain predetermined filter coefficient settings related to beamforms corresponding to given commands. For example, a set of predetermined filter coefficients may be stored in system memory 170 for beamforms focused in various directions (“left”, “right”, “up”, “down”, “straight ahead”, etc.) that may be accessed by the beamformer controller 140 in response to corresponding commands.”).
Since ZHANG in view of HARIF already teaches generating a feedback associated a non-speech command (e.g., “move to the left”), the combination of ZHANG, HARIF, and SCHUSTER teaches a non-speech command to control a direction of the audio beam focus (“focus left”) and generating a feedback associated with the command (“focus left”). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of ZHANG in view of HARIF to include audio beam focusing, as taught by SCHUSTER.
One of ordinary skill would have been motivated to include audio beam focusing, in order to allow a user to efficiently control a variety types of electronic devices with desired functionalities including audio beam focusing.

CLAIM 12 is similar to claim 5; thus, it is rejected under the same rationale.

CLAIM 18 is similar to claim 5; thus, it is rejected under the same rationale.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C KIM whose telephone number is (571)272-3327. The examiner can normally be reached Monday to Friday 8:00 AM thru 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN C KIM/Primary Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

Mar 29, 2024
Application Filed
Feb 20, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/188,223
Patent 12573391
Generating Contextual Responses for Out-of-coverage Requests for Assistant Systems
2y 5m to grant Granted Mar 10, 2026
18/247,754
Patent 12561110
AUDIO PLAYBACK METHOD AND APPARATUS, COMPUTER READABLE STORAGE MEDIUM, AND ELECTRONIC DEVICE
2y 5m to grant Granted Feb 24, 2026
18/367,180
Patent 12555578
METHOD AND SYSTEM OF AUDIO FALSE KEYPHRASE REJECTION USING SPEAKER RECOGNITION
2y 5m to grant Granted Feb 17, 2026
18/278,537
Patent 12547372
DISPLAY APPARATUS AND DISPLAY METHOD
2y 5m to grant Granted Feb 10, 2026
18/067,277
Patent 12537000
METHOD OF IDENTIFYING TARGET DEVICE AND ELECTRONIC DEVICE THEREFOR
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+40.6%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 355 resolved cases by this examiner. Grant probability derived from career allow rate.