DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the non-final office action dated 02/19/2025, applicant has amended claims 1, 8, 11 and 18. Claims 1-20 are currently pending in the application.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claim(s) 1-4, and 6-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Winton et al (US Pub No. 20190206423) in view of Cyr et al (US Pub No. 20220036878).
Regarding claim 1, Winton teaches a method comprising: obtaining biometric data for a user (¶ [0045], biometric data 314) of an audio system (Fig 3, biometric personalization system 300), from a sensor array (Fig 3, sensors 120); determining a communication energy score of the user from the biometric data for the user (Fig 6, step 603); selecting a profile from a plurality of stored profiles based on the communication energy score (Fig 6, step 607), the selected profile including one or more acoustic parameters associated with the communication energy score (Fig 6 & ¶ [0029], acoustic profile includes acoustic parameters employed when reproducing audio , step 607 chosen based on mood determination of step 603); modifying audio content for presentation by one or more transducers of the audio system based on the one or more acoustic parameters included in the selected profile (Fig 5, personalized output audio signal 252); and presenting audio content by one or more transducers of the audio system (¶ [0066], audio output device 250 uses one or more amplifiers and/or one or more speakers to produce sound output corresponding to personalized output audio signal 252) based on the one or more acoustic parameters included in the selected profile (Fig 5, personalized output audio signal 252); and presenting the modified audio content to a user via the one or more transducers of the audio system (¶ [0066], audio output device 250 uses one or more amplifiers and/or one or more speakers to produce sound output corresponding to personalized output audio signal 252).
Winton does not explicitly teach biometric data including the voice of a user.
Cyr teaches biometric data including audio including a voice of the user (See Cyr ¶ [0067], speech assessment system 218 and voice of user 104).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the user voice biometric data taught by Cyr with the method taught by Winton. Voice biometrics are well known in the art and widely used in the audio field for providing users hands-free accessibility and biometric security allowing for an improved user experience.
Regarding claim 2, Winton in view of Cyr teaches the method of claim 1, wherein obtaining biometric data for the user of the audio system comprises: capturing audio including the voice of the user (See Cyr ¶ [0067], speech assessment system 218 and voice of user 104) from the sensor array; and obtaining physiological data of the user (¶ [0031], physiological biometric data) from one or more physiological sensors (Fig 3, sensors 120).
Regarding claim 3, Winton in view of Cyr teaches the method of claim 2, wherein the physiological data includes one or more selected from a group consisting of: a heart rate, a blood pressure, a heart rate variability, an electrocardiogram, a temperature, and any combination thereof (¶ [0031], physiological biometric data).
Regarding claim 4, Winton in view of Cyr teaches the method of claim 2, wherein a physiological sensor is included in a device external to the audio system (¶ [0031], sensors 120 may include one or more devices).
Regarding claim 6, Winton in view of Cyr teaches the method of claim 1, wherein modifying audio content for presentation by one or more transducers of the audio system based on the one or more acoustic parameters included in the profile comprises: selecting alternative audio to present to the user in response to determining the communication energy score user changing from a value to an alternative value within a threshold amount of time of audio being presented to the user (¶ [0059], continually retrieve on pre-defined acoustic profile and dynamically modify the operation of dynamic equalizer 510).
Regarding claim 7, Winton in view of Cyr teaches the method of claim 1, wherein modifying audio content for presentation by one or more transducers of the audio system based on the one or more acoustic parameters included in the profile comprises: prompting the user to select a set of acoustic parameters in response to a difference between the average communication energy score and the extended communication energy score exceeding a threshold (Fig 6, step 601 receive user parameters); and modifying the audio content based on the selected set of acoustic parameters ((¶ [0066], audio output device 250 uses one or more amplifiers and/or one or more speakers to produce sound output corresponding to personalized output audio signal 252).
Winton does not explicitly teach determining an average communication energy score for the user; determining an extended communication energy score for the user during a time interval.
Cyr teaches determining an average communication energy score for the user (See Cyr Fig 9B, step 916 weighted speech score); determining an extended communication energy score for the user during a time interval (See Cyr Fig 9B, step 916 historical speech score).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the energy scores taught by Cyr with the method taught by Winton in view of Cyr. Doing so provides known values for comparison allowing for improved customization and user accessibility.
Regarding claim 8, Winton in view of Cyr teaches the method of claim 1.
Winton does not explicitly teach enhancing audio from one or more sound sources captured by a sensor array of the audio system relative to audio from other sound sources captured by the sensor array.
Cyr teaches enhancing audio from one or more sound sources captured by a sensor array of the audio system relative to audio from other sound sources captured by the sensor array (See Cyr ¶ [0049], processor 112A may process the signal generated by microphone 210 to enhance, amplify, or cancel-out particular channels within incoming sound).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the sound enhancing taught by Cyr with the method taught by Winton. Sound enhancing is well known in the art and widely used for various applications including hearing aids and voice calling allowing for an improved user experience.
Regarding claim 9, Winton in view of Cyr teaches the method of claim 1.
Winton does not explicitly teach removing audio having one or more characteristics specified by the one or more acoustic parameters.
Cyr teaches removing audio having one or more characteristics specified by the one or more acoustic parameters (See Cyr ¶ [0049], processor 112A may process the signal generated by microphone 210 to enhance, amplify, or cancel-out particular channels within incoming sound).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the sound enhancing taught by Cyr with the method taught by Winton. Sound enhancing is well known in the art and widely used for various applications including hearing aids and voice calling allowing for an improved user experience.
Regarding claim 10, Winton in view of Cyr teaches the method of claim 1, wherein modifying audio content for presentation by one or more transducers of the audio system based on the one or more acoustic parameters included in the profile comprises: selecting audio content having characteristics specified by the one or more acoustic parameters included in the profile (Fig 5, selected acoustic profile 242).
Claim(s) 11-14, and 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Winton et al (US Pub No. 20190206423) in view of Cyr et al (US Pub No. 20220036878) as applied to the claims above, and further in view of Osterhout et al (US Pub No. 20210173480).
Regarding claim 11, Winston teaches a position sensor configured to generate data indicating a position of the headset in a local area (¶ [0032], position sensors); and an audio system including a transducer array configured to present audio (¶ [0066], audio output device 250 uses one or more amplifiers and/or one or more speakers to produce sound output corresponding to personalized output audio signal 252), a sensor array configured to capture audio from a local area including the headset (¶ [0033], microphone), and an audio controller (Fig 1, computing device 110), the audio controller including a processor and a non- transitory computer readable storage medium having stored instructions (Fig 1, processing unit 112 and memory 114) that, when executed by the processor, cause the audio system to: obtain biometric data for a user of the audio system (¶ [0045], biometric data 314), from the sensor array; determine a communication energy score of the user from the biometric data for the user (Fig 6, step 603); determine a profile corresponding to the communication energy score (Fig 6, step 607), the profile including one or more acoustic parameters associated with the communication energy score (Fig 6 & ¶ [0029], acoustic profile includes acoustic parameters employed when reproducing audio , step 607 chosen based on mood determination of step 603); modify audio content for presentation by one or more transducers of the transducer array based on the one or more acoustic parameters included in the profile (Fig 5, personalized output audio signal 252); and present the modified audio content to a user via the one or more transducers of the transducer array (¶ [0066], audio output device 250 uses one or more amplifiers and/or one or more speakers to produce sound output corresponding to personalized output audio signal 252).
Winton does not explicitly teach a headset comprising: a frame; one or more display elements coupled to the frame, each display element configured to generate image light and biometric data including the voice of a user.
Cyr teaches biometric data including audio including a voice of the user (See Cyr ¶ [0067], speech assessment system 218 and voice of user 104).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the user voice biometric data taught by Cyr with the device taught by Winton. Voice biometrics are well known in the art and widely used in the audio field for providing users hands-free accessibility and biometric security allowing for an improved user experience.
Winton in view of Cyr does not explicitly teach a headset comprising: a frame; one or more display elements coupled to the frame, each display element configured to generate image light.
Osterhout teaches a headset (See Osterhout Fig 1 Eyepiece 100) comprising: a frame (See Osterhout Fig 1 Frame 102); one or more display elements coupled to the frame (See Osterhout Fig 1 Projector 108), each display element configured to generate image light (See Osterhout paragraph [0225], “projector 200 may be an RGB projector”);
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the headset taught by Osterhout with the device taught by Winston in view of Cyr. Both Winston and Cyr teach systems that can be implemented on a wireless device such as that taught by Osterhout. The augmented reality glasses taught by Osterhout provide several advantages including portability, ease of use, and compact design allowing for an improved user experience.
Regarding claim 12, Winston in view of Cyr and Osterhout teaches the headset of claim 11 wherein the stored instructions to obtain biometric data for the user of the audio system further comprises stored instruction that when executed cause the audio system to: capture audio including the voice of the user (See Cyr ¶ [0067], speech assessment system 218 and voice of user 104) from the sensor array; and obtain physiological data of the user (¶ [0031], physiological biometric data) from one or more physiological sensors (Fig 3, sensors 120).
Regarding claim 13, Winston in view of Cyr and Osterhout teaches the headset of claim 12, wherein the physiological data includes one or more selected from a group consisting of: a heart rate, a blood pressure, a heart rate variability, an electrocardiogram, a temperature, and any combination thereof (¶ [0031], physiological biometric data).
Regarding claim 14, Winston in view of Cyr and Osterhout teaches the headset of claim 12, wherein a physiological sensor is included in a device external to the headset (¶ [0031], sensors 120 may include one or more devices).
Regarding claim 16, Winton in view of Cyr and Osterhout in view of Rider teaches the headset of claim 11, wherein the stored instructions to modify audio content for presentation by one or more transducers of the audio system based on the one or more acoustic parameters included in the profile further comprises stored instruction that when executed cause the audio system to: select alternative audio to present to the user in response to determining the communication energy score user changing from a value to an alternative value within a threshold amount of time of audio being presented to the user (¶ [0059], continually retrieve on pre-defined acoustic profile and dynamically modify the operation of dynamic equalizer 510).
Regarding claim 17, Winton in view of Cyr and Osterhout teaches the headset of claim 11, wherein the stored instructions to modify audio content for presentation by one or more transducers of the audio system based on the one or more acoustic parameters included in the profile further comprises stored instruction that when executed cause the audio system to: prompting the user to select a set of acoustic parameters in response to a difference between the average communication energy score and the extended communication energy score exceeding a threshold (Fig 6, step 601 receive user parameters); and modifying the audio content based on the selected set of acoustic parameters ((¶ [0066], audio output device 250 uses one or more amplifiers and/or one or more speakers to produce sound output corresponding to personalized output audio signal 252).
Winton does not explicitly teach determining an average communication energy score for the user; determining an extended communication energy score for the user during a time interval.
Cyr teaches determining an average communication energy score for the user (See Cyr Fig 9B, step 916 weighted speech score); determining an extended communication energy score for the user during a time interval (See Cyr Fig 9B, step 916 historical speech score).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the energy scores taught by Cyr with the headset taught by Winton in view of Cyr and Osterhout. Doing so provides known values for comparison allowing for improved customization and user accessibility.
Regarding claim 18, Winston in view of Cyr and Osterhout teaches the headset of claim 11.
Winton does not explicitly teach enhancing audio from one or more sound sources captured by a sensor array of the audio system relative to audio from other sound sources captured by the sensor array.
Cyr teaches enhancing audio from one or more sound sources captured by a sensor array of the audio system relative to audio from other sound sources captured by the sensor array (See Cyr ¶ [0049], processor 112A may process the signal generated by microphone 210 to enhance, amplify, or cancel-out particular channels within incoming sound).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the sound enhancing taught by Cyr with the device taught by Winton in view of Cyr and Osterhout. Sound enhancing is well known in the art and widely used for various applications including hearing aids and voice calling allowing for an improved user experience.
Regarding claim 19, Winston in view of Cyr and Osterhout teaches the headset of claim 11.
Winton does not explicitly teach removing audio having one or more characteristics specified by the one or more acoustic parameters.
Cyr teaches removing audio having one or more characteristics specified by the one or more acoustic parameters (See Cyr ¶ [0049], processor 112A may process the signal generated by microphone 210 to enhance, amplify, or cancel-out particular channels within incoming sound).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the sound enhancing taught by Cyr with the device taught by Winton in view of Cyr and Osterhout. Sound enhancing is well known in the art and widely used for various applications including hearing aids and voice calling allowing for an improved user experience.
Regarding claim 20, Winston in view of Cyr and Osterhout teaches The headset of claim 11, wherein the stored instructions to modify audio content for presentation by one or more transducers of the audio system based on the one or more acoustic parameters included in the profile further comprises stored instruction that when executed cause the audio system to: select audio content having characteristics specified by the one or more acoustic parameters included in the profile (Fig 5, selected acoustic profile 242).
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Winton et al (US Pub No. 20190206423) in view of Cyr et al (US Pub No. 20220036878) as applied to claims above, and further in view of Audhkhasi et al (US Pub No. 20200251096).
Regarding claim 5, Winton in view of Cyr teaches the method of claim 1.
Winton does not explicitly teach applying a trained model to the biometric data.
Cyr teaches applying a trained model to the biometric data for the user (See Cyr ¶ [0126], applying machine learning model to generate score).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the trained model taught by Cyr with the method taught by Winton in view of Cyr. Trained models provide several advantages including increased accuracy and lower computational costs allowing for higher performance and accelerated AI development.
Winton in view of Cyr does not explicitly teach the speech recognition model being trained by: generating a set of training examples, each training example including biometric data and having a label indicating a communication energy score for a training example; applying the model to each training example of the set to generate an output for the training example; scoring the output for the training example using a loss function and the label of the training example; and updating one or more parameters of the model by backpropagation based on the scoring.
Audhkhasi teaches a speech recognition model trained by: generating a set of training examples (See Audhkhasi Fig 2 Automatic Speech Recognition Model (ASR) & paragraph [0023], utilizes acoustic-to-word (A2W) and speech acoustic features to set a baseline), each training example including biometric data (See Audhkhasi paragraph [0023], “speech acoustic features”, speech is biometric data) and having a label indicating a communication energy score for a training example (See Audhkhasi Fig 2 A2W Word Embeddings 106, includes scores and labels); applying the model to each training example of the set to generate an output for the training example (See Audhkhasi Fig 2 Recognized Words, output of training examples); scoring the output for the training example using a loss function and the label of the training example (See Audhkhasi paragraph [0023], ““connectionist temporal classification (CTC)” loss function 122 shown in FIG. 2, use for comparing/correlating the predictions of the A2W network with the correct word sequence”); and updating one or more parameters of the model by backpropagation based on the scoring (See Audhkhasi paragraph [0023], “Backpropagation is used to update the entire network's weights in order to minimize this loss function”).
It would have been prima facia obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the speech recognition training method taught by Audhkhasi with the speech recognition model taught by Winton in view of Cyr. The training method taught by Audhkhasi not only has the benefit of a trained speech recognition model built on “in-vocabulary” words but is also capable or adding “out-of-vocabulary” words without further training the model as stated by Audhkhasi paragraph [0002].
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Winton et al (US Pub No. 20190206423) in view of Cyr et al (US Pub No. 20220036878) and Osterhout et al (US Pub No. 20210173480) as applied to claims above, and further in view of Audhkhasi et al (US Pub No. 20200251096).
Regarding claim 15, Winton in view of Cyr and Osterhout teaches the headset of claim 11.
Winton does not explicitly teach applying a trained model to the biometric data.
Cyr teaches applying a trained model to the biometric data for the user (See Cyr ¶ [0126], applying machine learning model to generate score).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the trained model taught by Cyr with the headset taught by Winton in view of Cyr and Osterhout. Trained models provide several advantages including increased accuracy and lower computational costs allowing for higher performance and accelerated AI development.
Winton in view of Cyr and Osterhout does not explicitly teach the speech recognition model being trained by: generating a set of training examples, each training example including biometric data and having a label indicating a communication energy score for a training example; applying the model to each training example of the set to generate an output for the training example; scoring the output for the training example using a loss function and the label of the training example; and updating one or more parameters of the model by backpropagation based on the scoring.
Audhkhasi teaches a speech recognition model trained by: generating a set of training examples (See Audhkhasi Fig 2 Automatic Speech Recognition Model (ASR) & paragraph [0023], utilizes acoustic-to-word (A2W) and speech acoustic features to set a baseline), each training example including biometric data (See Audhkhasi paragraph [0023], “speech acoustic features”, speech is biometric data) and having a label indicating a communication energy score for a training example (See Audhkhasi Fig 2 A2W Word Embeddings 106, includes scores and labels); applying the model to each training example of the set to generate an output for the training example (See Audhkhasi Fig 2 Recognized Words, output of training examples); scoring the output for the training example using a loss function and the label of the training example (See Audhkhasi paragraph [0023], ““connectionist temporal classification (CTC)” loss function 122 shown in FIG. 2, use for comparing/correlating the predictions of the A2W network with the correct word sequence”); and updating one or more parameters of the model by backpropagation based on the scoring (See Audhkhasi paragraph [0023], “Backpropagation is used to update the entire network's weights in order to minimize this loss function”).
It would have been prima facia obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the speech recognition training method taught by Audhkhasi with the speech recognition model taught by Winton in view of Cyr and Osterhout. The training method taught by Audhkhasi not only has the benefit of a trained speech recognition model built on “in-vocabulary” words but is also capable or adding “out-of-vocabulary” words without further training the model as stated by Audhkhasi paragraph [0002].
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Chong et al (US Pub No. 20160371372) teaches music recommendations based on biometric and motion sensors.
Kariman (US Pub No. 20180032612) teaches audio-aided data collection and retrieval using audio biometrics.
Trim et al (US Pub No. 20210011684) teaches dynamic augmented reality interface creation using an ambiguity level determined from a detected user utterance.
Jung et al (US Pub No. 20240403358) teaches a device utilizing artificial intelligence to provide user-personalized content based on biometric information.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TYLER LIEBGOTT whose telephone number is (703)756-1818. The examiner can normally be reached Mon-Fri 10-6:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fan Tsang can be reached at (571)272-7547. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/T.M.L./Examiner, Art Unit 2694
/FAN S TSANG/Supervisory Patent Examiner, Art Unit 2694