Last updated: April 19, 2026
Application No. 18/177,154
METHOD AND SYSTEM FOR TRIGGERING AN AUDIO COMPONENT CHANGE ACTION

Non-Final OA §103
Filed
Mar 02, 2023
Examiner
WASHBURN, DANIEL C
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Motorola Solutions Inc.
OA Round
3 (Non-Final)
Interview Optional

— +27.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 158 resolved cases, 2023–2026
Examiner Intelligence

WASHBURN, DANIEL C View full profile →
Grants 49% of resolved cases
Career Allow Rate
77 granted / 158 resolved
-13.3% vs TC avg
Strong +28% interview lift
Without
With
+27.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
14 currently pending
Career history
172
Total Applications
across all art units
Statute-Specific Performance

§101
13.8%
-26.2% vs TC avg
§103
50.9%
+10.9% vs TC avg
§102
16.4%
-23.6% vs TC avg
§112
11.0%
-29.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 158 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/8/25 has been entered.
Response to Arguments
Applicant’s arguments with respect to the 35 U.S.C. 103 rejection of claim(s) 1 and 11 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3-6, 8-11, 13-16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Razouane et al. (US 20180014107), hereinafter “Razouane”, in view of Chakraborty et al. (US 2022/0319528), hereinafter “Chakraborty”.

Regarding claim 1:
Razouane teaches:
A system comprising: at least one processor; at least one electronic storage device storing program instructions that when executed by the at least one processor cause the at least one processor to perform: 
[0102] … The computing system 700 includes a processor unit 701 (possibly including multiple processors, multiple cores, multiple nodes, and/or implementing multi-threading, etc.). The computing system includes memory 707. … The computing system also includes a bus 703, a network interface 705, and a storage device(s) 709…
detecting at least one natural language phrase in  first audio, wherein the natural language phrase is describing a characteristic of a  component,  in the first audio, and the first audio is corresponding to a first period of time; 
Razouane [0036] … For example, the user input detected by the wireless earpieces 102 may include voice commands, head motions, finger taps, finger swipes, motions or gestures, or other user inputs sensed by the wireless earpieces. 
Razouane [0057]… In one embodiment, the wireless earpieces 201 may be configured to recognize the speech of the user 210 as detected as the audio input 214 for enhancement…[i.e. language phrase]
Razouane [0049] … For example, the user 206 may provide a verbal command to “focus on Jill” [i.e. the NL phrase] who may represent the user 210 speaking the audio input 214. As a result, the wireless earpieces 201 may 1) amplify the audio input 214, 2) clarify and amplify the audio input 214 (e.g., signal processing to clean up the audio signal, etc.), and/or 3) filter the other noises 213 (e.g., background noises, dangerously loud noises, etc.) [i.e. the characteristics of the audio component].

analyzing  [the first audio] to: determine a preference that the component in the first audio belong to a first category corresponding to an allowed audio component, and define a set of feature parameters representative of the component in the first audio; 
Razouane [0042] … For example, the application may perform intensity, frequency, wavelength, time delay, and other analysis for any number of audio inputs [i.e. define feature parameters]  to recognize the sounds in the future. As a result, the wireless earpieces 102 may perform automatic enhancements or filtering in response to any number or combination of specified conditions, such as location, noise or sound thresholds, detection of or proximity to an individual, animal, or machine, user activity, and so forth. The user 106 may specify preferences for implement the enhancements and filtering as are herein described. [i.e. determine allowed component]

analyzing second audio corresponding to a second period of time different than the first period of time to identify one or more audio components of the second audio that are matching to the set of feature parameters, and that fall in a target frequency range; and
Razouane [0042] … For example, the application may perform intensity, frequency, wavelength, time delay, and other analysis for any number of audio inputs [i.e. define feature parameters] to recognize the sounds in the future.[i.e. matching the feature parameters]

triggering an audio component change action on output of the second audio, wherein effecting the audio component change action includes selectively increasing, by limiting the increasing based on the target frequency range, audio component volume of the one or more audio components of the second audio.
[0024] The wireless earpieces may be configured to recognize various sounds, such as user voices, mechanical or machine sounds, animal noises, crying, and any number of other sounds or audio signals. The sounds may be characterized by different features of the sound waves or signals as processed including loudness/intensity, quality, pitch/frequency, wavelength, source location, directionality, and so forth. In one embodiment, the wireless earpieces may sample or record specified audio inputs for analysis to filter or enhance the audio input.
[0072] … For example, the user may say “listen to Mark” to automatically reconfigure the volume levels of the speakers and adjust the sensitivity of the microphone to Mark's voice (e.g., Mark's voice may have previously been established for enhancement). [i.e. trigger component change.] 
[0104] …Thus, for example, a user can specify one or more sound sources that they wish to focus on or else specify one or more sound sources which they wish to not focus on and which can be blocked or attenuated. [toggle component from suppressed to allowed category.]

Razouane doesn’t describe, but Chakraborty describes detecting at least one natural language phrase speech conversation between at least a first person and a second person forming at least a portion of first audio, wherein the natural language phrase is describing a characteristic of a background noise component, that is of enhancing impact to situational awareness in relation to an incident, in the first audio, and the first audio is corresponding to a first period of time (See ¶ [0072]: “FIGS. 4A and 4B are example flow diagrams (400, 406) illustrating the method for suppressing the noise portion(s) from an ongoing call (e.g. voice call, video call, etc.) by utilizing the AI model of the electronic device (100), according to various embodiments of the disclosure.”
Also see ¶ [0077]: “At operations 406c and 406d, the method includes determining whether any noise portion or noise category is detected during the ongoing voice call or video call due to the audio information (e.g. speech context) and the visual information (e.g. dance, surrounding ambiance) present in the ongoing voice call or video call. Furthermore, the method includes updating or adjusting the weight for the noise portion based on the audio information and the visual information.” (emphasis added)
¶ [0082]: “The speech separator (160ba) receives input audio (or said sent audio or received audio) at the electronic device (100). The speech separator (160ba) then separates the speech information and the background noise from the received audio using any exiting noise removal mechanism and passes the speech information to the speech to context converter (160bb). The speech to context converter (160bb) converts the speech information to text information (speech context) using any exiting speech conversion mechanism. The video analyzer (160bc) receives an input video (or said sent video or received video) at the electronic device (100) and analyzes visual context based on the received input video.” (emphasis added)
¶ [0083]: “The noise category synonym mapper (160bd) then maps the speech context to the noise categories based on the text information and the visual context from the received input video. For example, if the speech context or conversation is about “on being in road and irritated by vehicle horns”, the noise category synonym mapper (160bd) maps the speech context “vehicle horns” to one of the known noise categories by using the AI engine (1600, in this example, “traffic noise”. The sentiment behavioral analyzer (160be) maps to the sentiment based on the text information and the visual context from the received input video by using the AI engine (1600 and then adjust the weight accordingly, the sentiment includes a positive, a negative, and a neutral. For example, if the speech context or conversation is about “on being in road and irritated by vehicle horns”, the sentiment behavioral analyzer (160be) maps the “irritated” to “negative”.” (emphasis added)
¶ [0085]: “Consider an example scenario (500b) in which the electronic device (100) receives input audio, for example, “song is very soothing”, from the user of the electronic device (100). The speech separator (160ba) then separates the speech information and the background noise from the received audio. The speech to context converter (160bb) converts the speech information to text information (speech context). The noise category synonym mapper (160bd) then maps the speech context to the noise categories based on the text information (e.g. song as a noun). For example, if the speech context or conversation is about “song is very soothing”, then the noise category synonym mapper (160bd) maps the “song” to one of the known noise categories, in this example, “music”. The sentiment behavioral analyzer (160be) maps to sentiment based on the text information (e.g. soothing as adjective). For example, if the speech context or conversation is about “song is very soothing”, then the sentiment behavioral analyzer (160be) maps the “soothing” to “positive”.” (emphasis added)
¶ [0088] “FIGS. 6 and 7 are example scenarios illustrating the weightage(s) generation for each noise portion based on the preference of the user of the electronic device (100), and the current context of the electronic device (100), according to various embodiments of the disclosure.”
¶ [0089] “The weightage(s) increments or decrements based on various types, an example of various types is given in Table 2.”
“TABLE 2
Type
Name
Weightage(s) increments or decrements
Type-1
Automatic
0.002
Type-2
Context Analyzer
0.01
Type-3
Manual override
0.02

“
¶ [0090] “Referring to FIG. 6, at 601, the intelligent noise suppressor (160) detects the media event (e.g. call) initiated at the electronic device (100). At 602, the intelligent noise suppressor (160) fetches the stored weightage(s) for each noise portion(s) or category (e.g. siren “0.45”, music “0.6”, traffic “0.3”, and dog “0.4”). At 603, the intelligent noise suppressor (160) detects one or more noise portions (e.g. music, traffic, dog, etc.) in the media event. At 604, the noise portion(s) or categories with the weightage(s) less than 0.5 are default disable (or said pre-load weightage or history of the user) whereas the rest are default enable. The intelligent noise suppressor (160) disables or enables the weightage(s) based on past weightage(s) (or said automatic, Table 2) (e.g. music enables, traffic is disabled and dog is disabled).” (emphasis added)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include in Razouane a system and method that includes detecting at least one natural language phrase in speech conversation between at least a first person and a second person forming at least a portion of first audio, wherein the natural language phrase is describing a characteristic of a background noise component, that is of enhancing impact to situational awareness in relation to an incident, in the first audio, and the first audio is corresponding to a first period of time, as taught by Chakraborty, in order to improve the speech enhancement experience by automatically changing the speech enhancement process based on determined needs or preferences of a user in a manner that is transparent to the user (Razouane ¶ [0005]), where the system will automatically disable and enable noise of various categories in order to intelligently filter out unwanted noise while not filtering out background sound that the user desires to hear.



Regarding claim 3:
Razouane teaches:
The system of claim 1 further comprising a display and a speaker communicatively coupled to the at least one processor, 
[0011] … The earpiece further includes at least one speaker operatively connected to the processor for transducing the processed audio…
[0034] … The user 106 or another party may configure the wireless earpieces 102 directly or through a connected device and app (e.g., mobile app with a graphical user interface) to store or share information, audio, images, and other data.
and wherein: original or modified versions of the first and second audio are accessible for selective output from the speaker, during operation of the display a user interface is provided thereon, and the user interface is configured to allow a user to confirm that the audio component change action is desired.
[0024] … In one embodiment, the wireless earpieces may sample or record specified audio inputs for analysis [i.e. accessible] to filter or enhance the audio input…
Razouane [0072] … in other embodiments, the user may provide user feedback for initiating a configuration process by tapping the user interface 314 … The user may also provide user input for authorizing or initiating a configuration process by moving his head in a particular direction or motion or based on the user's position, orientation, or location.
[0042] … For example, the wireless device 104 may communicate instructions received from the wireless earpieces 102 for the user 106 to provide feedback if the user does not agree with the filtering or enhancement performed by the wireless earpieces 102. The wireless device 104 (or the wireless earpieces 102) may include an application that displays, plays, or communications instructions and information to the user 106 in response to configuration being needed or required.[i.e. confirm change action]

Regarding claim 4:
Razouane teaches:
The system of claim 3 wherein the system is a mobile computing device that includes a housing, and wherein the display is integrated into the housing.
[0032] The wireless device 104 or the wireless earpieces 102 may communicate directly or indirectly with one or more wired or wireless networks, such as a network 120…
[0011] According to another aspect, a wireless earpiece includes an earpiece housing, a processor disposed within the earpiece housing  a plurality of microphones operatively connected to the processor for receiving audio input of a sound field environment …
[0060] In one embodiment, the wireless earpieces 302 may include a battery 308, a logic engine 310, a memory 312, a user interface 314,..
[0062] … The logic engine 310 may include circuitry, chips, and other digital logic. The logic engine 310 may also include programs, scripts, and instructions . that may be implemented to operate the logic engine 310. [ie the wireless earpiece is a mobile computing device]
[0069] … The user interface 314 may include the LED array, one or more touch sensitive buttons or portions, a miniature screen or display, or other input/output components.

[0079] FIG. 4 is a block diagram showing the earpiece components disposed within the earpiece housing… . It is to be understood that two or more microphones may be present in each earpiece.

Regarding claim 5:
Razouane teaches:
The system of claim 4 wherein the mobile computing device further includes a microphone integrated into the housing, and the component in the first audio is generated by an object of interest within audible distance of the microphone.

[0079] FIG. 4 is a block diagram showing the earpiece components disposed within the earpiece housing. The block diagram is the same as that included within FIG. 3, except that a plurality of microphones 317A, 317B are specifically shown as two of the sensors 317. It is to be understood that two or more microphones may be present in each earpiece.

[0056] In another example, the amount of the noise 214 detected by the wireless earpieces 201 may vary between the section 204 and the section 206. The wireless earpieces 201 may adjust the volume level of the speakers as well as the microphone sensitivity as the user 206 moves within the location 202 for the audio inputs 214, 216 and the noise 213.

Regarding claim 6:
Razouane teaches:
The system of claim 5 wherein the object of interest is different than a person possessing the mobile computing device.
[0048] … The section 204 may represent a location where the user is utilizing the wireless earpieces 201, such as an office, recreational space, commercial area, cafeteria, classroom, workspace, or so forth. The noise levels within the section 204 may vary based on other individuals,[i.e. different person] such as individuals 210, 212, machinery, activities, events, or other man-made or natural noises. [i.e. object of interest]
[0049] … For example, the user 206 may provide a verbal command to “focus on Jill”[i.e. person not in possession of device] who may represent the user 210 speaking the audio input 214.[i.e. move Jill’s input to allowed category and  move others’ inputs to suppressed category.]

Regarding claim 8:
Razouane teaches:
The system of claim 1 wherein the background noise component is one of a plurality of background noise components in the first audio.
[0021] … In one embodiment, the wireless earpieces may actively filter loud, repetitive (e.g., machinery, yelling, etc.), or other noises, sounds, or inputs that may be dangerous to the hearing of the user… The wireless earpieces may detect noise levels of the environment and then automatically adjust the configuration of the wireless earpieces to provide the best user experience possible for specified inputs (e.g., specified voices, sounds, etc.). [i.e. plurality of background noises]
[0049] . For example, the user 206 may provide a verbal command to “focus on Jill” who may represent the user 210 speaking the audio input 214. As a result, the wireless earpieces 201 may 1) amplify the audio input 214, 2) clarify and amplify the audio input 214 (e.g., signal processing to clean up the audio signal, etc.) and/or 3) filter the other noises 213 (e.g., background noises, dangerously loud noises, etc.) …

Regarding claim 9:
Razouane teaches:
The system of claim 1 wherein the audio component change action targets at least two different background noises of the second audio.
[0049] In one embodiment, the noise level of the location 202 as well as the sections 204, 206 may increase in response to the individuals 210, 212 utilizing the section 206. For example, the individuals 210, 212 may be participating in a meeting, conference call, sporting activity, discussion, or other activity that increases the noise levels within the location 202. The individuals 210, 212 as well as the equipment, devices, and natural sounds of the sections 204, 206 may generate noise 213. In one embodiment, the user may select to focus on the audio input 214. In one embodiment, the user may provide an input or selection to actively enhance or filter the voice of the user 210. For example, the user 206 may provide a verbal command to “focus on Jill” who may represent the user 210 speaking the audio input 214. As a result, the wireless earpieces 201 may 1) amplify the audio input 214, 2) clarify and amplify the audio input 214 (e.g., signal processing to clean up the audio signal, etc.)[ i.e. 1) and 2) are the change actions on one background noise] and/or 3) filter the other noises 213 (e.g., background noises, dangerously loud noises, etc.) [ i.e. 3) is the change action on a different background noise]. …

    PNG
    media_image1.png
    561
    559
    media_image1.png
    Greyscale

Regarding claim 10:
Razouane teaches:
The system of claim 1 wherein the at least one electronic storage device is further storing additional program instructions that when executed by the at least one processor cause the at least one processor to perform: 
triggering an additional audio component change action on output of the first audio, wherein effecting the additional audio component change action includes changing of audio component volume in the first audio.
[0104] It is also to be understood that a user may configure their earpiece in various ways including to map names or keywords to particular sources. This allows a user to provide user input to one or more of the earpieces to specify a sound source. Thus, for example, a user can specify one or more sound sources that they wish to focus on or else specify one or more sound sources which they wish to not focus [ie addition change action] on and which can be blocked or attenuated.
Examiner explanation: as an example, a user may input an initial command to “focus on Jill”, as described in [0049], and then later may input a second command to “listen to Mark”, as described in [0072], at which time an additional change action would be triggered in order to filter out Jill and enhance the audio of Mark.

Regarding claim 11:
A computer-implemented method comprising: detecting at least one natural language phrase in speech conversation between at least a first person and a second person forming at least a portion of first audio, wherein the natural language phrase is describing a characteristic of a background component that is of enhancing impact to situational awareness in relation to an incident, in the first audio, and the first audio is corresponding to a first period of time; analyzing, using an at least one processor, the speech conversation to: determine a preference that the component in the first audio belong to a first category corresponding to an allowed audio component, and define a set of feature parameters representative of the component in the first audio; analyzing, using the least one processor, second audio corresponding to a second period of time different than the first period of time to identify one or more audio components of the second audio that are matching to the set of feature parameters, and that fall in a target frequency range; and triggering an audio component change action on output of the second audio, wherein effecting the audio component change action includes selectively increasing, by limiting the increasing based on the target frequency range, audio component volume of the one or more audio components of the second audio.
Claim 11 is a method claim with limitations similar to the limitations of Claim 1 and is rejected under similar rationale.
Regarding claim 13:
The computer-implemented method of claim 11 wherein original or modified versions of the first and second audio are accessible for selective output from a speaker of a computing device that also includes a display upon which a user interface is provided during operation of the display, and the user interface being configured to allow a user to confirm that the audio component change action is desired.
Claim 13 is a method claim with limitations similar to the limitations of Claim 3 and is rejected under similar rationale.
Regarding claim 14:
The computer-implemented method of claim 13 wherein the computing device is a mobile computing device, and the display is integrated into a housing of the mobile computing device.
Claim 14 is a method claim with limitations similar to the limitations of Claim 4 and is rejected under similar rationale.
Regarding claim 15:
The computer-implemented method of claim 14 wherein the mobile computing device includes a microphone integrated into the housing of the mobile computing device, and the component in the first audio is generated by an object of interest within audible distance of the microphone.
Claim 15 is a method claim with limitations similar to the limitations of Claim 5 and is rejected under similar rationale.
Regarding claim 16:
The computer-implemented method of claim 15 wherein the mobile computing device is carried by a person, different than the object of interest.
Claim 16 is a method claim with limitations similar to the limitations of Claim 6 and is rejected under similar rationale.
Regarding claim 18:
The computer-implemented method of claim 11 wherein the background noise component is one of a plurality of background noise components in the first audio.
Claim 18 is a method claim with limitations similar to the limitations of Claim 8 and is rejected under similar rationale.
Regarding claim 19:
The computer-implemented method of claim 11 wherein the audio component change action targets at least two different background noises of the second audio.
Claim 19 is a method claim with limitations similar to the limitations of Claim 9 and is rejected under similar rationale.
Regarding claim 20:
The computer-implemented method of claim 11 further comprising triggering an additional audio component change action on output of the first audio, wherein effecting the additional audio component change action includes changing of audio component volume in the first audio.
Claim 20 is a method claim with limitations similar to the limitations of Claim 10 and is rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure – see additional references cited on PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daniel C Washburn whose telephone number is (571)272-5551. The examiner can normally be reached Monday-Friday 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL C WASHBURN/               Supervisory Patent Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Mar 02, 2023
Application Filed
Mar 07, 2025
Non-Final Rejection — §103
May 20, 2025
Interview Requested
Jun 05, 2025
Examiner Interview Summary
Jun 05, 2025
Applicant Interview (Telephonic)
Jun 11, 2025
Response Filed
Aug 09, 2025
Final Rejection — §103
Nov 08, 2025
Request for Continued Examination
Nov 15, 2025
Response after Non-Final Action
Feb 24, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/095,894
Patent 12602555
METHOD FOR SEARCHING FOR TEXTS IN DIFFERENT LANGUAGES BASED ON PRONUNCIATION AND ELECTRONIC DEVICE APPLYING THE SAME
2y 5m to grant Granted Apr 14, 2026
18/464,240
Patent 12603084
METHOD, APPARATUS, AND COMPUTER-READABLE RECORDING MEDIUM FOR CONTROLLING RESPONSE UTTERANCE BEING REPRODUCED AND PREDICTING USER INTENTION
2y 5m to grant Granted Apr 14, 2026
18/449,492
Patent 12511480
Pattern Recognition Using NLP-Based Tokenizing and Clustering Models
2y 5m to grant Granted Dec 30, 2025
14/608,207
Patent 9614588
Smart Appliances
2y 5m to grant Granted Apr 04, 2017
11/946,419
Patent 8373711
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Feb 12, 2013
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
49%
Grant Probability
76%
With Interview (+27.7%)
4y 8m
Median Time to Grant
High
PTA Risk
Based on 158 resolved cases by this examiner. Grant probability derived from career allow rate.