Last updated: May 29, 2026
Application No. 18/252,370
PERSONAL PROTECTIVE DEVICE WITH LOCAL VOICE RECOGNITION AND METHOD OF PROCESSING A VOICE SIGNAL THEREIN

Non-Final OA §103
Filed
May 10, 2023
Priority
Nov 13, 2020 — EU 20207551.1 +1 more
Examiner
KIM, JONATHAN C
Art Unit
2655
Tech Center
2600 — Communications
Assignee
3M Company
OA Round
2 (Non-Final)
Interview Optional

— +40.4% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 74% grant rate with +40.4% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 360 resolved cases, 2023–2026
Examiner Intelligence

KIM, JONATHAN C View full profile →
Grants 74% — above average
Career Allowance Rate
265 granted / 360 resolved
+11.6% vs TC avg
Strong +40% interview lift
Without
With
+40.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
12 currently pending
Career history
376
Total Applications
across all art units
Statute-Specific Performance

§101
3.7%
-36.3% vs TC avg
§103
90.8%
+50.8% vs TC avg
§102
1.2%
-38.8% vs TC avg
§112
1.3%
-38.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 360 resolved cases
Office Action

§103
DETAILED ACTION
This Office Action is in response to the correspondence filed by the applicant on 10/13/2025.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file.

Response to Arguments
Applicant’s argument, pages 8-11, filed 10/13/2025, with respect to the rejection of claims 1 and 14 under 103 have been fully considered and are moot upon a further consideration and a new ground(s) of rejection made under AIA  35 U.S.C. 103 as being unpatentable over GRIZZEL (US 2021/0035552 A1), and in further view of HSU (US 2016/0267806 A1) and JUNQUA (US 6,253,181 B1).  Please see the rejects for more details.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a control unit”, “a voice processing unit”, “a command processing unit”, “a feedback unit”, “an audible indication unit”, “”a visual indication unit”, “a haptic indication unit”” in claims 1-3, 5-9, 13.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-8, 11, 13-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over GRIZZEL (US 2021/0035552 A1), and in further view of HSU (US 2016/0267806 A1) and JUNQUA (US 6,253,181 B1).

REGARDING CLAIM 1, GRIZZEL discloses a personal protective device with local voice recognition comprising 
a. a voice receiving unit comprising at least one microphone  for recognizing voice signals of a user (Fig. 19 I/O Device interface 1902; Microphone(s) 103; Par 39 – “An audio capture component, such as the microphone 103 of the voice input device 110 (or other device), captures input audio 11 corresponding to a spoken utterance.”), 
b. a control unit  for controlling functions of the personal [protective] device (Fig. 3 Command Processor 290; Fig. 19 Controller(s)/Processor(s) 1904; Par 92 – “The highest scoring result (or results in the case of multiple commands being in an utterance) may be passed to a downstream command processor 290 for execution.”; Par 71 – “For example, if the NLU output includes a command to play music, the command processor 290 selected may be a music playing application, such as one located on the voice input device 110 or in a music playing appliance configured to execute a music playing command.”), 
c. a voice processing unit for processing voice signals received from the at least one microphone (Fig. 2 Acoustic Front End (AFE) 256; Wake Command Component 220; Automatic Speech Recognition 250; Natural Language Understanding (NLU) 260; Par 39 – “An audio capture component, such as the microphone 103 of the voice input device 110 (or other device), captures input audio 11 corresponding to a spoken utterance. The device 110, using a wake command detection component 220, then processes audio data corresponding to the input audio 11 to determine if a keyword (such as a wakeword) is detected in the audio data.”), 
- the voice processing unit is connected to the at least one microphone of the voice receiving unit (Fig. 19 I/O Device Interface 1902; Wake Command Detection Component 220; ASR Component 250; NLU component 260; Microphone(s) 103), 
- wherein the voice processing unit is configured to generate a voice command based on the received voice signal (Par 39 – “The device 110, using a wake command detection component 220, then processes audio data corresponding to the input audio 11 to determine if a keyword (such as a wakeword) is detected in the audio data. Following detection of a wakeword, the voice input device 110 sends audio data 111, corresponding to the utterance, to a server 120 that includes an ASR component 250.”; Par 46 – “A spoken utterance in the audio data 111 is input to a processor configured to perform ASR, which then interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models 254 stored in an ASR model knowledge base (i.e., ASR model storage 252). For example, the ASR component 250 may compare the audio data 111 with models for sounds (e.g., subword units or phonemes) and sequences of sounds to identify words that match the sequence of sounds spoken in the spoken utterance of the audio data 111.”), 
- wherein the voice processing unit (Fig. 2 Acoustic Front End (AFE) 256; Wake Command Component 220; Automatic Speech Recognition 250; Natural Language Understanding (NLU) 260) comprises predetermined command information stored internally within the voice processing unit (Fig. 2 Language Model(s) 254; Domain Intents (278); Domain Lexicon(s) 286; Knowledge Base(s) 272), 
- wherein the voice processing unit is configured to match the voice command generated from the voice signal with the stored command information to generate a voice command information block (Par 46 – “A spoken utterance in the audio data 111 is input to a processor configured to perform ASR, which then interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models 254 stored in an ASR model knowledge base (i.e., ASR model storage 252).”; Par 47 – “The ASR component 250 may also output multiple hypotheses in the form of a lattice or an N-best list with each hypothesis corresponding to a confidence score or other score (e.g., such as probability scores, etc.).”; Par 61 –“For example, a music intent database may link words and phrases such as “quiet,” “volume off,” and “mute” to a “mute” intent. The IC component 264 identifies potential intents for each identified domain by comparing words in the utterance to the words and phrases in the intents database 278.”; Par 77 – “The output of each recognizer 335 is a N-best list 340 of intents and identified slots representing the particular recognizer's top choices as to the meaning of the input text data, along with scores for each item in the N-best list 340. For example, for input text data 300 of “play poker face by lady gaga,” the music domain recognizer 263-A may output an N-best list 340 in the form of: [0.95] PlayMusicIntent ArtistName: Lady Gaga SongName: Poker Face [0.02] PlayMusicIntent ArtistName: Lady Gaga [0.01] PlayMusicIntent ArtistName: Lady Gaga AlbumName: Poker Face [0.01] PlayMusicIntent SongName: Pokerface where the NER component 262-A of the recognizer 263-A has determined that for different items in the N-best list 340, the words “poker face” correspond to a slot and the words “lady gaga” correspond to a slot.” (Though different items in the N-best list 340 interpret those slots differently, for example labeling “poker face” as a song name in one choice but labeling it as an album name in another.); Par 59 – “A domain may represent a discrete set of activities having a common theme, such as “shopping”, “music”, “calendaring”, etc. Each domain may be associated with a particular language model and/or grammar database 276, a particular set of intents/actions 278, and/or a particular personalized lexicon 286.”),
- wherein the voice processing command information block comprises a list of suggestions of possible commands and a confidence level value (CLV) of each possible command from the list of suggestions (Pars 77-81 – “The output of each recognizer 335 is a N-best list 340 of intents and identified slots representing the particular recognizer's top choices as to the meaning of the input text data, along with scores for each item in the N-best list 340. For example, for input text data 300 of “play poker face by lady gaga,” the music domain recognizer 263-A may output an N-best list 340 in the form of: [0.95] PlayMusicIntent ArtistName: Lady Gaga SongName: Poker Face [0.02] PlayMusicIntent ArtistName: Lady Gaga [0.01] PlayMusicIntent ArtistName: Lady Gaga AlbumName: Poker Face [0.01] PlayMusicIntent SongName: Pokerface where the NER component 262-A of the recognizer 263-A has determined that for different items in the N-best list 340, the words “poker face” correspond to a slot and the words “lady gaga” correspond to a slot.”… The recognizer 263-A also determined a score for each item on the list representing the recognizer's confidence that the particular item is correct. As can be seen in the example, the top item has the highest score.”), 
d. a command processing unit (Fig. 3 Cross Domain Processing 355; Entity Resolver 370; Re-scorer and Final Ranker 390; Other Data 390) for processing the voice command information block generated by the voice processing unit (Fig. 3 N-best lists 340 Cross-domain N-best 360; Par 91 – “A re-scorer and final ranker component 390 may consider such errors when determining how to rank the ultimate results for potential execution. For example, if an item of the N-best list 360 comes from a book domain and includes a read book intent, but the entity resolver 370 cannot find a book with a title matching the input text data 300, that particular result may be re-scored by the final ranker 390 to be given a lower score. Each item considered by the final ranker 390 may also be assigned a particular confidence, where the confidence may be determined by a recognizer 335, cross domain processor 355, or by the final ranker 390 itself.”),
- wherein the command processing unit (Fig. 3 Cross Domain Processing 355; Entity Resolver 370; Re-scorer and Final Ranker 390; Other Data 390) is connected to the voice processing unit (Fig. 2 Acoustic Front End (AFE) 256; Wake Command Component 220; Automatic Speech Recognition 250; Natural Language Understanding (NLU) 260) and to the control unit (Fig. 3 Command Processor 290), 
- wherein the command processing unit comprises predetermined control information stored within the command processing unit (Par 92 – “This other data 391 may include a variety of information. For example, the other data 391 may include application rating or popularity. For example, if one application has a particularly high rating, the system 100 may increase the score of results associated with that particular application. The other data 391 may also include information about applications that have been specifically enabled by the user (as indicated in a user profile).”) and 
- wherein the command processing unit is configured to match the voice command information block with the stored control information to generate a control command (Fig. 3 Re-scorer and Final Ranker 390; Par 91 – “A re-scorer and final ranker component 390 may consider such errors when determining how to rank the ultimate results for potential execution. For example, if an item of the N-best list 360 comes from a book domain and includes a read book intent, but the entity resolver 370 cannot find a book with a title matching the input text data 300, that particular result may be re-scored by the final ranker 390 to be given a lower score. Each item considered by the final ranker 390 may also be assigned a particular confidence, where the confidence may be determined by a recognizer 335, cross domain processor 355, or by the final ranker 390 itself.”; Par 92 – “This other data 391 may include a variety of information. For example, the other data 391 may include application rating or popularity. For example, if one application has a particularly high rating, the system 100 may increase the score of results associated with that particular application. The other data 391 may also include information about applications that have been specifically enabled by the user (as indicated in a user profile). … For example, the system 100 may consider when any particular applications are currently active (such as music being played, a game being played, etc.) between the system 100 and voice input device 110. The highest scoring result (or results in the case of multiple commands being in an utterance) may be passed to a downstream command processor 290 for execution.”) and to transmit the control command to the control unit (Par 92 – “The highest scoring result (or results in the case of multiple commands being in an utterance) may be passed to a downstream command processor 290 for execution.”; Par 71 – “For example, if the NLU output includes a command to play music, the command processor 290 selected may be a music playing application, such as one located on the voice input device 110 or in a music playing appliance configured to execute a music playing command.”),
- wherein the command processing unit is configured to <assign a threshold CLV> rank the candidate commands based on the CLV and a possible command is ignored by the command processing unit if the possible command is below the <threshold CLV> ranked list (Par 81 – “The IC component 264-A of the recognizer 263-A has also determined that the intent of the input text data 300 is a PlayMusiclntent (and selected that as the intent for each item on the music N-best list 340). The recognizer 263-A also determined a score for each item on the list representing the recognizer's confidence that the particular item is correct. As can be seen in the example, the top item has the highest score. Each recognizer of the recognizers 335 may operate on the input text data 300 substantially in parallel, resulting in a number of different N-best lists 340, one for each domain (e.g., one N-best 340 list for music, one N-best list 340 for video, etc.). The size of any particular N-best list 340 output from a particular recognizer is configurable and may be different across domains.”).

GRIZZEL does not explicitly teach the [square-bracketed] and <angle-bracketed> limitations. Regarding the [square-bracketed] limitations, GRIZZEL discloses wearable devices including earbuds (Fig. 7A), headset (Fig. 7B), glasses (Fig. 7C), but does not explicitly teach the wearable devices are [protective] devices.  Although one of ordinary skill in the art would recognizes the wearable devices of GRIZZEL can function as protective devices (e.g., the earbuds/headset of GRIZZEL can protect the user from a loud environmental noise and the glasses of GRIZZEL can protect the user’s eyes from small particles), the Examiner provides HSU for the clarity of the rejections.

HSU explicitly discloses the [square-bracketed] limitations. HSU discloses a personal protective device with local voice recognition comprising 
a. a voice receiving unit comprising at least one microphone for recognizing voice signals of a user (HSU Fig. 3A Par 83 – “The user interface 208 may generate electrical signals in response to user input (e.g., screen touches, button presses, voice commands, gesture recognition etc.).”; Par 97 – “The electromechanical user interface components 308 may comprise, for example, one or more touchscreen elements, speakers, microphones, physical buttons, gesture control, EEG mind control, etc. that generate electric signals in response to user input. For example, electromechanical user interface components 308 may comprise capacity, inductive, or resistive touchscreen sensors mounted on the back of the display 304 (i.e., on the outside of the headwear 20) that enable a wearer of the headwear 20 to interact with user graphics displayed on the front of the display 304 (i.e., on the inside of the headwear 20).”), 
b. a control unit  for controlling functions of the personal [protective] device (HSU Figs. 3A-3B; Par 39 –“Disclosed example methods to direct a weld operator using a weld operator personal protection equipment (PPE) include receiving instruction information associated with a welding operation and displaying the instruction information via a display device of the PPE.”; Par 169 – “In block 902, welder 18 initiates a welding operation. For example, welder 18 may give a voice command for welding system 10 to enter a weld mode, which voice command is responded to by user interface of headwear 20. The processor 410 configures the components of headwear 20 according to the voice command in order to display, on display 304, the live welding operation for viewing by the welder.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL to include a personal protective device, as taught by HSU.
One of ordinary skill would have been motivated to include a personal protective device, in order to assist a user performing a job in a hazardous environment.

GRIZZEL does not explicitly teach the <angle-bracketed> limitations, and teaches the underlined features instead. In other words, GRIZZEL discloses, instead of using a confidence threshold, ranking the candidate commands based on the confidence scores associated with the candidate commands (i.e., N-Best), and only process the top ranked candidate commands and ignore the commands below the rank (i.e., not in the list of the N-best candidates).  

JUNQUA explicitly teaches the <angle-bracketed> limitations.  JUNQUA discloses a method/system for speech recognition,  wherein the command processing unit is configured to <assign a threshold CLV> and a possible command is ignored by the command processing unit if the possible command is below the <threshold CLV> (JUNQUA Col 4:24-36 – “If desired, the confidence measurement system may be coupled to the dialogue system as one mechanism for mediating the adaptation process. As the new speaker supplies utterances to the dialogue system, the speech recognizer 14 performs speech recognition and the confidence measurement system 26 assigns a confidence measure to the results of that recognition. Recognized utterances with a sufficiently high confidence measure (those above a predetermined confidence measure threshold) are passed by the dialogue system 12 to adaptation system 18. Utterances having a low confidence measure are not passed to the adaptation system.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL in view of HSU to include a confidence threshold, as taught by JUNQUA.
One of ordinary skill would have been motivated to include the confidence threshold, in order to effectively filter out irrelevant data for data processing.


REGARDING CLAIM 2, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to claim 1. 
GRIZZEL further discloses the method/system further comprising a feedback unit comprising at least one indication unit to provide a feedback to the user of the personal protective device (Fig. 19 I/O Device Interfaces 1902; Speaker 101), wherein the feedback unit is connected to the control unit (Fig. 19 Controller(s) / Processor(s) 1904; Command Processor 290) and wherein the control unit is configured to control the feedback unit such that an indication is provided to the user about the matching of the voice command with the stored command information and/or about the matching of the voice command information block with the stored control information and/or about the control command to be transmitted to the control unit (Par 110 –“Further, the server(s) 120 may execute certain commands, such as answering or responding to spoken utterances of a user of the wearable device 110 and/or answering or responding to certain movements of the user.”; Par 114 – “For example, a device may send audio data corresponding to an utterance of “play music” and the server may respond with prompt audio data corresponding to “what music shall I play,” to which the device may responds “play my favorite playlist.” Each of the data exchanges in that dialog may include a speech-session identifier so the various components can track the speech-session across the dialog exchanges.”; Par 139 – “Alternatively or in addition, the server 120 may send prompt data to the device 110 prompting the user to confirm whether a wakeword was intended (for example “did you intend to speak a command? Please nod if yes.” or the like).”; Par 122 – “In response to determining the signal quality metric is below the threshold, the wearable device 110 may output a notification that the audio quality is insufficient and/or that a physical wake gesture is requested. The notification may include at least one of an audible signal or audible notification output (such as TTS prompt) via speakers, a light emitting diode (LED) emitting light, vibration pattern (output through a haptic component of the device), or other appropriate notification.”).


REGARDING CLAIM 3, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to claim 2.
GRIZZEL further discloses the method/system wherein the feedback unit comprises an audible indication unit (Fig. 19 I/O Device Interfaces 1902; Speaker 101) to provide an audible indication to a user (Par 114 – “For example, a device may send audio data corresponding to an utterance of “play music” and the server may respond with prompt audio data corresponding to “what music shall I play,” to which the device may responds “play my favorite playlist.” Each of the data exchanges in that dialog may include a speech-session identifier so the various components can track the speech-session across the dialog exchanges.”; Par 139 – “Alternatively or in addition, the server 120 may send prompt data to the device 110 prompting the user to confirm whether a wakeword was intended (for example “did you intend to speak a command? Please nod if yes.” or the like).”; Par 122 – “In response to determining the signal quality metric is below the threshold, the wearable device 110 may output a notification that the audio quality is insufficient and/or that a physical wake gesture is requested. The notification may include at least one of an audible signal or audible notification output (such as TTS prompt) via speakers, a light emitting diode (LED) emitting light, vibration pattern (output through a haptic component of the device), or other appropriate notification.”).


REGARDING CLAIM 4, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to claim 3.
GRIZZEL further discloses the method/system, wherein the audible indication unit comprises at least one speaker electrically connected to the feedback unit for playing an acoustic signal to the user as audible indication (Fig. 19 I/O Device Interfaces 1902; Speaker 101), wherein the control unit is configured to control the feedback unit such that the feedback unit generates an acoustic signal  to be played on the at least one speaker (Par 114 – “For example, a device may send audio data corresponding to an utterance of “play music” and the server may respond with prompt audio data corresponding to “what music shall I play,” to which the device may responds “play my favorite playlist.” Each of the data exchanges in that dialog may include a speech-session identifier so the various components can track the speech-session across the dialog exchanges.”; Par 139 – “Alternatively or in addition, the server 120 may send prompt data to the device 110 prompting the user to confirm whether a wakeword was intended (for example “did you intend to speak a command? Please nod if yes.” or the like).”; Par 122 – “In response to determining the signal quality metric is below the threshold, the wearable device 110 may output a notification that the audio quality is insufficient and/or that a physical wake gesture is requested. The notification may include at least one of an audible signal or audible notification output (such as TTS prompt) via speakers, a light emitting diode (LED) emitting light, vibration pattern (output through a haptic component of the device), or other appropriate notification.”).


REGARDING CLAIM 5, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device  according to any one of claim 2.
GRIZZEL further discloses the method/system, wherein the feedback unit comprises a visual indication unit to provide visual indication to a user (Par 158 – “Referring to the device 110 of FIG. 19, the device 110 may include a display, which may comprise a touch interface configured to receive limited touch inputs. Or the device 110 may be “headless” and may primarily rely on spoken commands for input. For example, the device 110 may be a headset worn by a user. As a way of indicating to a user that a connection between another device has been opened, the device 110 may be configured with a visual indicator, such as an LED or similar component (not illustrated), that may change color, flash, or otherwise provide visual indications by the device 110.”; Par 122 – “In response to determining the signal quality metric is below the threshold, the wearable device 110 may output a notification that the audio quality is insufficient and/or that a physical wake gesture is requested. The notification may include at least one of an audible signal or audible notification output (such as TTS prompt) via speakers, a light emitting diode (LED) emitting light, vibration pattern (output through a haptic component of the device), or other appropriate notification.”).


REGARDING CLAIM 6, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to claim 2.
GRIZZEL further discloses the method/system, wherein the feedback unit comprises a haptic indication unit to provide haptic indication to a user (Par 122 – “In response to determining the signal quality metric is below the threshold, the wearable device 110 may output a notification that the audio quality is insufficient and/or that a physical wake gesture is requested. The notification may include at least one of an audible signal or audible notification output (such as TTS prompt) via speakers, a light emitting diode (LED) emitting light, vibration pattern (output through a haptic component of the device), or other appropriate notification.”).


REGARDING CLAIM 7, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to claim 1.
GRIZZEL further discloses the method/system, wherein the voice processing unit (Fig. 2 Acoustic Front End (AFE) 256; Wake Command Component 220; Automatic Speech Recognition 250; Natural Language Understanding (NLU) 260) and/or command processing unit (Fig. 3 Cross Domain Processing 355; Entity Resolver 370; Re-scorer and Final Ranker 390; Other Data 390) is contained within the personal protective device (Fig. 19 – “Device 110”).


REGARDING CLAIM 8, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to any one of claim 1.
GRIZZEL further discloses the method/system, wherein the voice processing unit and/or command processing unit is arranged remote from the personal protective device (Fig. 2 Server(s) 120 – “Automatic Speech Recognition 250, Natural Language Understanding (NLU) 260”; Fig. 3 – “Cross Domain Processing 355, Entity Resolver 370, Re-scorer and Final Ranker 390”) and wherein the personal protective device is configured to communicate with the voice and/or command processing unit via a short-range connection (Par 124 – “Audio data may be sent to the server(s) 120 using a wireless area network (WAN) component in communication with the network. The wearable device 110 may include a communication component with various input/output device interfaces 1902 to establish a communication connection with a wireless network. The I/O interface 1902 may include wireless communication components that work with antenna 1914 to allow wireless communication by the wearable device 110 a. The communication connection may be a WiFi® connection, Bluetooth® connection, or any other type of connection known to those of skill in the art.”).


REGARDING CLAIM 11, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to claim 1.
HSU further discloses the method/system wherein the personal protective device is a welding shield (HSU Figs. 3A-3B Par 88 – “FIGS. 3A, 3B, 3C, 4A, 4B, and 4C show example welding headwear 20 in accordance with aspects of this disclosure. The example headwear 20 is a helmet comprising a shell 306 in or to which are mounted: …”) with local voice recognition (HSU Par 66 – “The processor recognizes a first audio command received via the microphone, begins a weld training operation in response to receiving the audio command, including displaying the images and the simulated object to the wearer via the display. The processor recognizes a second audio command received via the microphone and ends the weld training operation in response to the second audio command.”) further comprising an automatic darkening filter (HSU Par 62 – “AGC is also referred to as automatic exposure control or automatic brightness control. When viewing a welding arc, sudden changes in scene brightness can create difficult viewing conditions. An AGC algorithm chooses a brightness or exposure value between the brightest and darkest areas (e.g., approximately splitting the difference in brightness) to attempt to enable visualization of the entire scene.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL to include a welding shield, as taught by HSU.
One of ordinary skill would have been motivated to include a welding shield, in order to assist a user performing a job in a hazardous environment.


REGARDING CLAIM 13, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to claim 1.  
JUNQUA discloses a method/system for a wearable device, wherein the personal protective device is further configured to process a keyword training mode for training a user about the correct pronunciation of the voice signals (JUNQUA 4:37-54 – “Although utterances with low confidence can be simply discarded, the dialogue system 12 may exploit the confidence measure to prompt the new speaker in a way that: (a) queries the new speaker to repeat the utterance and (b) teaches the new speaker the proper pronunciation for the word the speech recognizer thinks the speaker has uttered. Consider the following example: System: “What color is the giraffe?” Child: (Answer unintelligible—low confidence, resembling “brown”) System: “Did you say ‘brown’?” Child: “Yes.” System: Say, “the giraffe is brown.” Child: “The giraffe is brown.” In the preceding sequence, the child's initially unintelligible answer—interpreted as the word “brown”—was used in a subsequent series of prompts designed to teach the child the correct pronunciation.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL in view of HSU to include a pronunciation training mode, as taught by JUNQUA.
One of ordinary skill would have been motivated to include a pronunciation training mode, in order to assist a user to correctly articulate the words and phrases so that the device could understood the user better in the future.


REGARDING CLAIM 14, GRIZZEL discloses a method of processing a voice signal in a personal protective device with local voice recognition comprising 
a. a voice receiving unit comprising at least one microphone for recognizing voice signals of a user (Fig. 19 I/O Device interface 1902; Microphone(s) 103; Par 39 – “An audio capture component, such as the microphone 103 of the voice input device 110 (or other device), captures input audio 11 corresponding to a spoken utterance.”), 
b. a control unit for controlling functions of the personal [protective] device (Fig. 3 Command Processor 290; Fig. 19 Controller(s)/Processor(s) 1904; Par 92 – “The highest scoring result (or results in the case of multiple commands being in an utterance) may be passed to a downstream command processor 290 for execution.”; Par 71 – “For example, if the NLU output includes a command to play music, the command processor 290 selected may be a music playing application, such as one located on the voice input device 110 or in a music playing appliance configured to execute a music playing command.”), 
c. a voice processing unit for processing voice signals received from the at least one microphone of the voice receiving unit (Fig. 2 Acoustic Front End (AFE) 256; Wake Command Component 220; Automatic Speech Recognition 250; Natural Language Understanding (NLU) 260; Par 39 – “An audio capture component, such as the microphone 103 of the voice input device 110 (or other device), captures input audio 11 corresponding to a spoken utterance. The device 110, using a wake command detection component 220, then processes audio data corresponding to the input audio 11 to determine if a keyword (such as a wakeword) is detected in the audio data.”), 
- wherein the voice processing unit is connected to the at least one microphone (Fig. 19 I/O Device Interface 1902; Wake Command Detection Component 220; ASR Component 250; NLU component 260; Microphone(s) 103), 
- wherein the voice processing unit is configured to generate a voice command based on the received voice signal (Par 39 – “The device 110, using a wake command detection component 220, then processes audio data corresponding to the input audio 11 to determine if a keyword (such as a wakeword) is detected in the audio data. Following detection of a wakeword, the voice input device 110 sends audio data 111, corresponding to the utterance, to a server 120 that includes an ASR component 250.”; Par 46 – “A spoken utterance in the audio data 111 is input to a processor configured to perform ASR, which then interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models 254 stored in an ASR model knowledge base (i.e., ASR model storage 252). For example, the ASR component 250 may compare the audio data 111 with models for sounds (e.g., subword units or phonemes) and sequences of sounds to identify words that match the sequence of sounds spoken in the spoken utterance of the audio data 111.”), 
- wherein the voice processing unit (Fig. 2 Acoustic Front End (AFE) 256; Wake Command Component 220; Automatic Speech Recognition 250; Natural Language Understanding (NLU) 260)  comprises predetermined command information stored internally within the voice processing unit (Fig. 2 Language Model(s) 254; Domain Intents (278); Domain Lexicon(s) 286; Knowledge Base(s) 272), 
- wherein the voice processing unit is configured to match the voice command generated from the voice signal with the stored command information to generate a voice command information block (Par 46 – “A spoken utterance in the audio data 111 is input to a processor configured to perform ASR, which then interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models 254 stored in an ASR model knowledge base (i.e., ASR model storage 252).”; Par 47 – “The ASR component 250 may also output multiple hypotheses in the form of a lattice or an N-best list with each hypothesis corresponding to a confidence score or other score (e.g., such as probability scores, etc.).”; Par 61 –“For example, a music intent database may link words and phrases such as “quiet,” “volume off,” and “mute” to a “mute” intent. The IC component 264 identifies potential intents for each identified domain by comparing words in the utterance to the words and phrases in the intents database 278.”; Par 77 – “The output of each recognizer 335 is a N-best list 340 of intents and identified slots representing the particular recognizer's top choices as to the meaning of the input text data, along with scores for each item in the N-best list 340. For example, for input text data 300 of “play poker face by lady gaga,” the music domain recognizer 263-A may output an N-best list 340 in the form of: [0.95] PlayMusicIntent ArtistName: Lady Gaga SongName: Poker Face [0.02] PlayMusicIntent ArtistName: Lady Gaga [0.01] PlayMusicIntent ArtistName: Lady Gaga AlbumName: Poker Face [0.01] PlayMusicIntent SongName: Pokerface where the NER component 262-A of the recognizer 263-A has determined that for different items in the N-best list 340, the words “poker face” correspond to a slot and the words “lady gaga” correspond to a slot.” (Though different items in the N-best list 340 interpret those slots differently, for example labeling “poker face” as a song name in one choice but labeling it as an album name in another.); Par 59 – “A domain may represent a discrete set of activities having a common theme, such as “shopping”, “music”, “calendaring”, etc. Each domain may be associated with a particular language model and/or grammar database 276, a particular set of intents/actions 278, and/or a particular personalized lexicon 286.”),
- wherein the voice processing command information block comprises a list of suggestions of possible commands and a confidence level value (CLV) of each possible command from the list of suggestions (Pars 77-81 – “The output of each recognizer 335 is a N-best list 340 of intents and identified slots representing the particular recognizer's top choices as to the meaning of the input text data, along with scores for each item in the N-best list 340. For example, for input text data 300 of “play poker face by lady gaga,” the music domain recognizer 263-A may output an N-best list 340 in the form of: [0.95] PlayMusicIntent ArtistName: Lady Gaga SongName: Poker Face [0.02] PlayMusicIntent ArtistName: Lady Gaga [0.01] PlayMusicIntent ArtistName: Lady Gaga AlbumName: Poker Face [0.01] PlayMusicIntent SongName: Pokerface where the NER component 262-A of the recognizer 263-A has determined that for different items in the N-best list 340, the words “poker face” correspond to a slot and the words “lady gaga” correspond to a slot.”… The recognizer 263-A also determined a score for each item on the list representing the recognizer's confidence that the particular item is correct. As can be seen in the example, the top item has the highest score.”), 
 
d. a command processing unit (Fig. 3 Cross Domain Processing 355; Entity Resolver 370; Re-scorer and Final Ranker 390; Other Data 390) for processing the voice command information block generated by the voice processing unit (Fig. 3 N-best lists 340 Cross-domain N-best 360; Par 91 – “A re-scorer and final ranker component 390 may consider such errors when determining how to rank the ultimate results for potential execution. For example, if an item of the N-best list 360 comes from a book domain and includes a read book intent, but the entity resolver 370 cannot find a book with a title matching the input text data 300, that particular result may be re-scored by the final ranker 390 to be given a lower score. Each item considered by the final ranker 390 may also be assigned a particular confidence, where the confidence may be determined by a recognizer 335, cross domain processor 355, or by the final ranker 390 itself.”), 
- wherein the command processing unit (Fig. 3 Cross Domain Processing 355; Entity Resolver 370; Re-scorer and Final Ranker 390; Other Data 390) is connected to the voice processing unit (Fig. 2 Acoustic Front End (AFE) 256; Wake Command Component 220; Automatic Speech Recognition 250; Natural Language Understanding (NLU) 260) and to the control unit (Fig. 3 Command Processor 290), 
- wherein the command processing unit comprises predetermined control information stored within the command processing unit (Par 92 – “This other data 391 may include a variety of information. For example, the other data 391 may include application rating or popularity. For example, if one application has a particularly high rating, the system 100 may increase the score of results associated with that particular application. The other data 391 may also include information about applications that have been specifically enabled by the user (as indicated in a user profile).”) and
- wherein the command processing unit is configured to match the voice command information block with the stored control information to generate a control command (Fig. 3 Re-scorer and Final Ranker 390; Par 91 – “A re-scorer and final ranker component 390 may consider such errors when determining how to rank the ultimate results for potential execution. For example, if an item of the N-best list 360 comes from a book domain and includes a read book intent, but the entity resolver 370 cannot find a book with a title matching the input text data 300, that particular result may be re-scored by the final ranker 390 to be given a lower score. Each item considered by the final ranker 390 may also be assigned a particular confidence, where the confidence may be determined by a recognizer 335, cross domain processor 355, or by the final ranker 390 itself.”; Par 92 – “This other data 391 may include a variety of information. For example, the other data 391 may include application rating or popularity. For example, if one application has a particularly high rating, the system 100 may increase the score of results associated with that particular application. The other data 391 may also include information about applications that have been specifically enabled by the user (as indicated in a user profile). … For example, the system 100 may consider when any particular applications are currently active (such as music being played, a game being played, etc.) between the system 100 and voice input device 110. The highest scoring result (or results in the case of multiple commands being in an utterance) may be passed to a downstream command processor 290 for execution.”) and to transmit the control command to the control unit (Par 92 – “The highest scoring result (or results in the case of multiple commands being in an utterance) may be passed to a downstream command processor 290 for execution.”; Par 71 – “For example, if the NLU output includes a command to play music, the command processor 290 selected may be a music playing application, such as one located on the voice input device 110 or in a music playing appliance configured to execute a music playing command.”),
- wherein the command processing unit is configured to <assign a threshold CLV> rank the candidate commands based on the CLV and a possible command is ignored by the command processing unit if the possible command is below the <threshold CLV> ranked list (Par 81 – “The IC component 264-A of the recognizer 263-A has also determined that the intent of the input text data 300 is a PlayMusiclntent (and selected that as the intent for each item on the music N-best list 340). The recognizer 263-A also determined a score for each item on the list representing the recognizer's confidence that the particular item is correct. As can be seen in the example, the top item has the highest score. Each recognizer of the recognizers 335 may operate on the input text data 300 substantially in parallel, resulting in a number of different N-best lists 340, one for each domain (e.g., one N-best 340 list for music, one N-best list 340 for video, etc.). The size of any particular N-best list 340 output from a particular recognizer is configurable and may be different across domains.”),
the method comprising the steps of 
I. recognizing a voice signal by a microphone (Fig. 19 I/O Device interface 1902; Microphone(s) 103; Par 39 – “An audio capture component, such as the microphone 103 of the voice input device 110 (or other device), captures input audio 11 corresponding to a spoken utterance.”), 
II. transmitting the voice signal to the voice processing unit, optionally of an external device (Fig. 2 Acoustic Front End (AFE) 256; Wake Command Component 220; Automatic Speech Recognition 250; Natural Language Understanding (NLU) 260; Par 39 – “An audio capture component, such as the microphone 103 of the voice input device 110 (or other device), captures input audio 11 corresponding to a spoken utterance. The device 110, using a wake command detection component 220, then processes audio data corresponding to the input audio 11 to determine if a keyword (such as a wakeword) is detected in the audio data.”), 
III. processing the received voice signal with the voice processing unit to generate a voice command based on the received voice signal (Par 39 – “The device 110, using a wake command detection component 220, then processes audio data corresponding to the input audio 11 to determine if a keyword (such as a wakeword) is detected in the audio data. Following detection of a wakeword, the voice input device 110 sends audio data 111, corresponding to the utterance, to a server 120 that includes an ASR component 250.”; Par 46 – “A spoken utterance in the audio data 111 is input to a processor configured to perform ASR, which then interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models 254 stored in an ASR model knowledge base (i.e., ASR model storage 252). For example, the ASR component 250 may compare the audio data 111 with models for sounds (e.g., subword units or phonemes) and sequences of sounds to identify words that match the sequence of sounds spoken in the spoken utterance of the audio data 111.”), 
IV. processing the voice command generated from the voice signal and matching it with the command information stored within the voice processing unit (Fig. 2 Language Model(s) 254; Domain Intents (278); Domain Lexicon(s) 286; Knowledge Base(s) 272) to generate a voice command information block (Par 46 – “A spoken utterance in the audio data 111 is input to a processor configured to perform ASR, which then interprets the spoken utterance based on a similarity between the spoken utterance and pre-established language models 254 stored in an ASR model knowledge base (i.e., ASR model storage 252).”; Par 47 – “The ASR component 250 may also output multiple hypotheses in the form of a lattice or an N-best list with each hypothesis corresponding to a confidence score or other score (e.g., such as probability scores, etc.).”; Par 61 –“For example, a music intent database may link words and phrases such as “quiet,” “volume off,” and “mute” to a “mute” intent. The IC component 264 identifies potential intents for each identified domain by comparing words in the utterance to the words and phrases in the intents database 278.”; Par 77 – “The output of each recognizer 335 is a N-best list 340 of intents and identified slots representing the particular recognizer's top choices as to the meaning of the input text data, along with scores for each item in the N-best list 340. For example, for input text data 300 of “play poker face by lady gaga,” the music domain recognizer 263-A may output an N-best list 340 in the form of: [0.95] PlayMusicIntent ArtistName: Lady Gaga SongName: Poker Face [0.02] PlayMusicIntent ArtistName: Lady Gaga [0.01] PlayMusicIntent ArtistName: Lady Gaga AlbumName: Poker Face [0.01] PlayMusicIntent SongName: Pokerface where the NER component 262-A of the recognizer 263-A has determined that for different items in the N-best list 340, the words “poker face” correspond to a slot and the words “lady gaga” correspond to a slot.” (Though different items in the N-best list 340 interpret those slots differently, for example labeling “poker face” as a song name in one choice but labeling it as an album name in another.); Par 59 – “A domain may represent a discrete set of activities having a common theme, such as “shopping”, “music”, “calendaring”, etc. Each domain may be associated with a particular language model and/or grammar database 276, a particular set of intents/actions 278, and/or a particular personalized lexicon 286.”), 
V. transmitting the voice command information block (Fig. 3 N-best list 340; Cross-Domain N-best 360) to the command processing unit (Fig. 3 Re-scorer and Final Ranker 390; ), 
VI. processing the voice command information block and matching it with the control information stored within the command processing unit to generate a control command (Fig. 3 Re-scorer and Final Ranker 390; Par 91 – “A re-scorer and final ranker component 390 may consider such errors when determining how to rank the ultimate results for potential execution. For example, if an item of the N-best list 360 comes from a book domain and includes a read book intent, but the entity resolver 370 cannot find a book with a title matching the input text data 300, that particular result may be re-scored by the final ranker 390 to be given a lower score. Each item considered by the final ranker 390 may also be assigned a particular confidence, where the confidence may be determined by a recognizer 335, cross domain processor 355, or by the final ranker 390 itself.”; Par 92 – “This other data 391 may include a variety of information. For example, the other data 391 may include application rating or popularity. For example, if one application has a particularly high rating, the system 100 may increase the score of results associated with that particular application. The other data 391 may also include information about applications that have been specifically enabled by the user (as indicated in a user profile). … For example, the system 100 may consider when any particular applications are currently active (such as music being played, a game being played, etc.) between the system 100 and voice input device 110. The highest scoring result (or results in the case of multiple commands being in an utterance) may be passed to a downstream command processor 290 for execution.”) and 
VII. transmitting the control command to the control unit (Par 92 – “The highest scoring result (or results in the case of multiple commands being in an utterance) may be passed to a downstream command processor 290 for execution.”; Par 71 – “For example, if the NLU output includes a command to play music, the command processor 290 selected may be a music playing application, such as one located on the voice input device 110 or in a music playing appliance configured to execute a music playing command.”).

GRIZZEL does not explicitly teach the [square-bracketed] and <angle-bracketed> limitations. Regarding the [square-bracketed] limitations, GRIZZEL discloses wearable devices including earbuds (Fig. 7A), headset (Fig. 7B), glasses (Fig. 7C), but does not explicitly teach the wearable devices are [protective] devices.  Although one of ordinary skill in the art would recognizes the wearable devices of GRIZZEL can function as protective devices (e.g., the earbuds/headset of GRIZZEL can protect the user from a loud environmental noise and the glasses of GRIZZEL can protect the user’s eyes from small particles), the Examiner provides HSU for the clarity of the rejections.

HSU explicitly discloses the [square-bracketed] limitations. HSU discloses a personal protective device with local voice recognition comprising 
a. a voice receiving unit comprising at least one microphone for recognizing voice signals of a user (HSU Fig. 3A Par 83 – “The user interface 208 may generate electrical signals in response to user input (e.g., screen touches, button presses, voice commands, gesture recognition etc.).”; Par 97 – “The electromechanical user interface components 308 may comprise, for example, one or more touchscreen elements, speakers, microphones, physical buttons, gesture control, EEG mind control, etc. that generate electric signals in response to user input. For example, electromechanical user interface components 308 may comprise capacity, inductive, or resistive touchscreen sensors mounted on the back of the display 304 (i.e., on the outside of the headwear 20) that enable a wearer of the headwear 20 to interact with user graphics displayed on the front of the display 304 (i.e., on the inside of the headwear 20).”), 
b. a control unit  for controlling functions of the personal [protective] device (HSU Figs. 3A-3B; Par 39 –“Disclosed example methods to direct a weld operator using a weld operator personal protection equipment (PPE) include receiving instruction information associated with a welding operation and displaying the instruction information via a display device of the PPE.”; Par 169 – “In block 902, welder 18 initiates a welding operation. For example, welder 18 may give a voice command for welding system 10 to enter a weld mode, which voice command is responded to by user interface of headwear 20. The processor 410 configures the components of headwear 20 according to the voice command in order to display, on display 304, the live welding operation for viewing by the welder.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL to include a personal protective device, as taught by HSU.
One of ordinary skill would have been motivated to include a personal protective device, in order to assist a user performing a job in a hazardous environment.

GRIZZEL does not explicitly teach the <angle-bracketed> limitations, and teaches the underlined features instead. In other words, GRIZZEL discloses, instead of using a confidence threshold, ranking the candidate commands based on the confidence scores associated with the candidate commands (i.e., N-Best), and only process the top ranked candidate commands and ignore the commands below the rank (i.e., not in the list of the N-best candidates).  

JUNQUA explicitly teaches the <angle-bracketed> limitations.  JUNQUA discloses a method/system for speech recognition,  wherein the command processing unit is configured to <assign a threshold CLV> and a possible command is ignored by the command processing unit if the possible command is below the <threshold CLV> (JUNQUA Col 4:24-36 – “If desired, the confidence measurement system may be coupled to the dialogue system as one mechanism for mediating the adaptation process. As the new speaker supplies utterances to the dialogue system, the speech recognizer 14 performs speech recognition and the confidence measurement system 26 assigns a confidence measure to the results of that recognition. Recognized utterances with a sufficiently high confidence measure (those above a predetermined confidence measure threshold) are passed by the dialogue system 12 to adaptation system 18. Utterances having a low confidence measure are not passed to the adaptation system.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL in view of HSU to include a confidence threshold, as taught by JUNQUA.
One of ordinary skill would have been motivated to include the confidence threshold, in order to effectively filter out irrelevant data for data processing.


REGARDING CLAIM 15, GRIZZEL in view of HSU in JUNQUA discloses method according to claim 14.
GRIZZEL further discloses the method/system further comprising the step of providing an indication to the user about the matching of the voice command with the stored command information and/or about the matching of the voice command information block with the stored control information and/or about the control command to be transmitted to the control unit (Par 110 –“Further, the server(s) 120 may execute certain commands, such as answering or responding to spoken utterances of a user of the wearable device 110 and/or answering or responding to certain movements of the user.”; Par 114 – “For example, a device may send audio data corresponding to an utterance of “play music” and the server may respond with prompt audio data corresponding to “what music shall I play,” to which the device may responds “play my favorite playlist.” Each of the data exchanges in that dialog may include a speech-session identifier so the various components can track the speech-session across the dialog exchanges.”; Par 139 – “Alternatively or in addition, the server 120 may send prompt data to the device 110 prompting the user to confirm whether a wakeword was intended (for example “did you intend to speak a command? Please nod if yes.” or the like).”), wherein the indication to the user comprises an audible indication, a visual indication, a haptic feedback or a combination thereof (Par 122 – “In response to determining the signal quality metric is below the threshold, the wearable device 110 may output a notification that the audio quality is insufficient and/or that a physical wake gesture is requested. The notification may include at least one of an audible signal or audible notification output (such as TTS prompt) via speakers, a light emitting diode (LED) emitting light, vibration pattern (output through a haptic component of the device), or other appropriate notification.”).

REGARDING CLAIM 16, GRIZZEL in view of HSU in JUNQUA discloses a method according to claim 14.
GRIZZEL further discloses the method/system, wherein the command processing unit is electrically connected to the voice processing unit (Fig. 19 Device 110; Bus 1924, Wake Command Detection Component 220, ASR Component 250, NLU Component 260, Command Processor 290, Controllers/Processors 1904 …; Par 157 – “Each component within a device (110/120) may also be directly connected to other components in addition to (or instead of) being connected to other components across the bus (1924/2024).”; Par 158 – “For example, the device 110 may be a headset worn by a user. … The device 110 (using microphone 103, wake command detection component 220, ASR component 250, etc.) may be configured to determine audio data corresponding to detected audio data. The device 110 (using input/output device interfaces 1902, antenna 1914, etc.) may also be configured to transmit the audio data to server 120 for further processing or to process the data using internal components such as a wake command detection component 220.”).

Claim 17 is similar to claim 16; thus, it is rejected under the same rationale.

REGARDING CLAIM 19, GRIZZEL in view of HSU in JUNQUA discloses the personal protective device of claim 13, wherein the keyword training mode results in an increased CLV of the possible command provided in the voice signal (JUNQUA Col 2:25-34–“One example is the computer based teaching system that guides children or foreign speakers in the correct pronunciation of new words within the language.”; Col 2:35-44 – “In a language teaching system the confidence measure can also be used to query the user on words that are not confidently recognized. The teaching system may include a speech playback system containing speech data representing prerecorded speech. This data can supply proper pronunciation of words as part of the query, thereby seeking user verification of a potentially misunderstood word, while at the same time pronouncing the word correctly for the user to hear.”; Col 1:8-22 – “In this way, the system automatically adapts to the user quite rapidly, increasing the recognizer's chance of having good recognition performance, without adapting to incorrect pronunciations. The system is thus useful with difficult speakers, such as children or foreign speakers.”; In other words, JUQUA teaches pronouncing a word properly increases the speech recognition confidence value. Thus, teaching a user a correct pronunciation will increase the confidence value for recognizing the user’s utterance.).


Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over GRIZZEL (US 2021/0035552 A1) in view of HSU (US 2016/0267806 A1) and JUNQUA (US 6,253,181 B1), and in further view of MIGLIETTA (US 2011/0224981 A1).

REGARDING CLAIM 9, GRIZZEL in view of HSU and JUQUA discloses the personal protective device according to claim 1.
GRIZZEL further discloses updating the classifier / models (Par 152); thus, it is implicitly suggested that new additional command information for speech/NLP models are obtained to update the models.  Although, GRIZZEL implicitly suggests the limitations, the Examiner provides MIGLIETTA for the clarity of the rejections.
MIGLIETTA explicitly discloses a method/system for speech recognition models, wherein the voice processing unit comprises storage space for storing additional command information (MIGLIETTA Par 43 – “Databases include User Databases which are unique to a specific User, including the User Profile and can include User specific pre-programmed System responses to Commands for specifying spoken text to the System Transaction Manager to facilitate Directed Dictation for a specific User or group of Users; …”) different from the predetermined command information (MIGLIETTA Par 43 – “a Universal Database which is associated with Constrained Recognition and/or Structured Transcription such as a dictionary or vocabulary and/or Directed Dictation containing generic or specific prompts, templates, and the like to facilitate Free Form Dictation available for all Users, or a specific group of Users of the System;”) and/or wherein the command processing unit comprises storage space for storing additional control information (MIGLIETTA Par 119 – “In another embodiment, the System can be configured such that all Speech is recognized within a Dictation Context. In accordance with this aspect, Directed Dictation and/or Constrained Recognition and/or Structured Transcription can be preset in Response to a specific set of Commands which are usually Non-Audio. The Dictation Context includes elements from the legacy User protocol, the User Interface Device, User Profile associated with the recognition of the spoken text, including vocabulary, User Enrollment, spoken language, preferred Automated Speech Recognition and/or Transcription Engine (ASR), Correctionist, Correctionist Pool, and the like.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL in view of HSU to include storing additional new data for models, as taught by MIGLIETTA.
One of ordinary skill would have been motivated to include storing additional new data for models, in order to provide a new/updated models tailored for specific users.

Claims 10 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over GRIZZEL (US 2021/0035552 A1) in view of HSU (US 2016/0267806 A1) and JUNQUA (US 6,253,181 B1), and in further view of LOBNER (WO 2019/051349 A1).

REGARDING CLAIM 10, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to claim 1, wherein the personal protective device is a [hearing protection] device (GRIZZEL Figs. 7A-7B; HSU Figs. 3A-3B) with local voice recognition (GRIZZEL Fig. 19 – “ASR Component 250”; HSU Par 66 – “The processor recognizes a first audio command received via the microphone, …”) further comprising two earmuffs or ear plugs (GRIZZEL Fig. 7A Inner-Lobe Insert 750; Fig. 7B) and noise reduction means (GRIZZEL Par 48 – “The AFE 256 may reduce noise in the audio data 111 and divide the digitized audio data 111 into frames representing time intervals for which the AFE 256 determines a number of values (i.e., features) representing qualities of the audio data 111, along with a set of those values (i.e., a feature vector or audio feature vector) representing features/qualities of the audio data 111 within each frame.”).
GRIZZEL in view of HSU does not explicitly teach the [square-bracketed] limitation. In other words, GRIZZEL discloses wearable devices including earbuds (Fig. 7A), headset (Fig. 7B), glasses (Fig. 7C), but does not explicitly teach the wearable devices are [hearing protective] devices.  Although one of ordinary skill in the art would recognizes the wearable devices of GRIZZEL can function as protective devices (e.g., the earbuds/headset of GRIZZEL can protect the user from a loud environmental noise), the Examiner provides LOBNER for the clarity of the rejections.

LOBNER discloses a method/system for digital configuration and security of safety equipment, wherein the personal [protective] device is a [hearing protection] device (LOBNER Fig. 3; Par 214 – “System 300 may include head top 326 and hearing protector 328, in accordance with this disclosure.”) comprising two earmuffs or ear plugs (LOBNER Par 213 – “Hearing protector 328 may include two separate ear muff cups 336, one of which is visible in FIG. 3 and the other on the opposite side of the user's head and similarly configured to the visible ear muff cup in FIG. 3.”) and noise reduction means (LOBNER Par 64 – “For example, if a change in blower speed of a PAPR is detected, management engine 324 may change a noise-cancellation and/or volume level in a hearing protector that is assigned to the worker having the PAPR.”; Par 214 – “Head top 326 (or other headwom device, such as a head band) may include hearing protector 328 that includes, ear muff attachment assembly 330.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL in view of HSU to include a hearing protective device, as taught by LOBNER.
One of ordinary skill would have been motivated to include a hearing protective device, in order to assist a user performing a job in a loud noise environment.


REGARDING CLAIM 12, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device according to claim 1, wherein the personal protective device is a [respiratory] device (GRIZZEL Figs. 7A-7B; HSU Figs. 3A-3B) with local voice recognition (GRIZZEL Fig. 19 – “ASR Component 250”; HSU Par 66 – “The processor recognizes a first audio command received via the microphone, …”) [further comprising filter cartridges for filtering inhaled air].
GRIZZEL in view of HSU does not explicitly teach the [square-bracketed] limitations.
LOBNER discloses a method/system for digital configuration and security of safety equipment, wherein the personal protective device is a [respiratory] device (LOBNER Par 26 – “As further described herein, each of respirators 13 includes embedded sensors or monitoring devices and processing electronics configured to capture data in real-time as a user (e.g., worker) engages in activities while wearing the respirators.”) [further comprising filter cartridges for filtering inhaled air] (LOBNER Par 26 – “For example, as described in greater detail herein, respirators 13 may include a number of components (e.g., a head top, a blower, a filter, and the like) respirators 13 may include a number of sensors for sensing or controlling the operation of such components.”; Par 110 – “Respirator 13A may include a filter to remove particulates but not organic vapors. Data hub 14A may be initially configured with and store a unique identifier of worker 10A.”; Par 114 – “number of times or frequency with which a visor of respirator 13 has been opened or closed, a filter/cartridge consumption rate, …”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL in view of HSU and JUNQUA to include a respiratory device, as taught by LOBNER.
One of ordinary skill would have been motivated to include a respiratory device, in order to assist a user performing a job in an air-polluted environment.


Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over GRIZZEL (US 2021/0035552 A1) in view of HSU (US 2016/0267806 A1) and JUNQUA (US 6,253,181 B1), and in further view of ALEKSIC (US 2014/0337032 A1).

REGARDING CLAIM 18, GRIZZEL in view of HSU and JUNQUA discloses the personal protective device of claim 1.
GRIZZEL in view of HSU and JUNQUA does not explicitly teach an autarkic mode.
ALEKSIC discloses a method/system for speech recognition, wherein the personal protective device is configured to operate in an autarkic mode when no WAN or internet connection is available to the personal protective device (ALEKSIC Par 20 – “In some implementations, the limited speech recognizer 110 and/or the expanded speech recognizer 120 may be hosted on the mobile device 104. In some implementations, the limited speech recognizer 110 and/or the expanded speech recognizer 120 may be hosted on by one or more server(s) that is/are remote from mobile device 104… In yet another example, the device 104 may have sufficient computing ability to host the limited speech recognizer 110 and the expanded speech recognizer locally, e.g., to provide substantially full multi-recognizer capabilities in an offline mode when the network 106 is unavailable or unwanted for use.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method/system of GRIZZEL in view of HSU and JUNQUA to include an autarkic mode, as taught by ALEKSIC.
One of ordinary skill would have been motivated to include an autarkic mode, in order to provide a full speech recognition capability seamlessly.


Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN C KIM whose telephone number is (571)272-3327. The examiner can normally be reached Monday to Friday 8:00 AM thru 4:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew C Flanders can be reached at 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN C KIM/Primary Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

May 10, 2023
Application Filed
Jul 15, 2025
Non-Final Rejection mailed — §103
Oct 07, 2025
Applicant Interview (Telephonic)
Oct 07, 2025
Examiner Interview Summary
Oct 13, 2025
Response Filed
Dec 11, 2025
Final Rejection mailed — §103
Feb 11, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/052,698
Patent 12640148
Network Microphone Device With Command Keyword Conditioning
3y 6m to grant Granted May 26, 2026
18/228,411
Patent 12640150
VOICE-BASED CHATBOT POLICY OVERRIDE(S) FOR EXISTING VOICE-BASED CHATBOT(S)
2y 10m to grant Granted May 26, 2026
17/765,606
Patent 12614547
METHOD FOR CONTROLLING UTTERANCE OF UTTERANCE DEVICE, SERVER CONTROLLING UTTERANCE OF UTTERANCE DEVICE, UTTERANCE DEVICE, AND PROGRAM
2y 3m to grant Granted Apr 28, 2026
17/444,683
Patent 12609108
ADAPTIVE SELF-TRAINED COMPUTER ENGINES WITH ASSOCIATED DATABASES AND METHODS OF USE THEREOF
4y 8m to grant Granted Apr 21, 2026
18/188,223
Patent 12573391
Generating Contextual Responses for Out-of-coverage Requests for Assistant Systems
2y 11m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+40.4%)
2y 5m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 360 resolved cases by this examiner. Grant probability derived from career allowance rate.