Prosecution Insights
Last updated: April 19, 2026
Application No. 18/721,972

VOICE ASSISTANT OPTIMIZATION DEPENDENT ON VEHICLE OCCUPANCY

Non-Final OA §103§112
Filed
Jun 20, 2024
Examiner
AGAHI, DARIOUSH
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Cerence Operating Company
OA Round
1 (Non-Final)
86%
Grant Probability
Favorable
1-2
OA Rounds
2y 9m
To Grant
99%
With Interview

Examiner Intelligence

Grants 86% — above average
86%
Career Allow Rate
142 granted / 166 resolved
+23.5% vs TC avg
Strong +29% interview lift
Without
With
+29.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
27 currently pending
Career history
193
Total Applications
across all art units

Statute-Specific Performance

§101
25.8%
-14.2% vs TC avg
§103
47.8%
+7.8% vs TC avg
§102
10.0%
-30.0% vs TC avg
§112
12.6%
-27.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 166 resolved cases

Office Action

§103 §112
DETAILED ACTION This office action is in response to Applicant’s submission filed on 6/20/2024. Claims 1-20 are pending in the application of which Claims 1, 7, and 15 are independent and have been examined. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Priority Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, or 365 is acknowledged. The prior-filed application (Provisional application No. 63/293266 Filed on 12/23/2021) is acknowledged. Information Disclosure Statement The information disclosure statement(s)(IDS) submitted on 6/27/2024 has been considered by the examiner. Specification The specification is objected to because of the following informalities: Par. 0008, line 6 & Par. 0053, line 5 recites: “… occupant specific factors ...” The occupant specific should be hyphenated “occupant-specific”. Appropriate correction is required. Claim Objections Listed claims are objected to for the informalities shown and may be addressed with suggested amendments: Claim 1, line 7-8 recite: “… the at least one audio signal, …” Claims 6, 14 and 17, line 2 recite: “… the probability exceeding the classification threshold.” Claim 15, line 2 recite:” … the [[system]] method comprising:” Claim 15, line 9 recite:” … occupants and [[occupant specific]] occupant-specific factors associated with the one of the vehicles …” Applicant is advised to review all claims for any potential claim objection issues. Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1, and therefore claims 2 -6 which depend therefrom; are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 1, line 10, recites “… probability that the utterance is system...”, which appears to be indefinite since it is not clear which utterance it is referring to. Applicant is advised to review all claims for any potential antecedent basis issues. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1-6, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US20200219493A1)(herein "Li"), and in further view of Nagata et al. (US20220212658A1)(herein " Nagata "), and Secker-Walker et al. (US9818407B1)(herein “Secker-Walker”). Regarding claims 1, and 15 Li teaches [A vehicle system for classifying spoken utterance within a vehicle cabin as one of system-directed and non-system directed, the system comprising: - claim 1], and [A method for classifying spoken utterance as one of system-directed and non-system directed, the system comprising: - claim 15] (Li, Par. 0017:” … systems and methods of voice control in a multi-talker and multimedia environment, and in particular, multi-zone speech recognition systems with interference and echo cancellation and related methods which may be used in a vehicle or other suitable environment.”, and Par. 0043:” … in-car communications system to facilitate communication between occupants of the vehicle.”, and Par. 0059:” In FIG. 1, an application post processor (or processor system) 704 receives processed microphone signals and other inputs from the vehicle 200 to provide information, ... The application post processor 704 comprises an automatic speech recognition (ASR) engine/system, … Since the speech recognition system isolates speech and/or other content delivered in the vehicle 200 a ... Word selections may occur based on a likelihood calculation, confidence level, or confidence score (referred to as a confidence score) that may be preserved in ASR metadata. When the highest likelihood or confidence score exceeds a predetermined or contextual threshold, an alignment system within the ASR may identify the spoken utterance and classify the spoken utterance as correctly recognized.”) [claim 1 only] at least one microphone configured to detect at least one audio signal from at least one occupant of a vehicle; and (Li, par. 0066:” … A person in a vehicle containing multiple occupants wishes to initiate an ASR session by uttering a keyword, such as “Hey Siri”, “Alexa” or “Okay Google”. Each occupant is in a separate zone of the cabin. The vehicle cabin contains one or more microphones which may or may not be dedicated for each zone. Each microphone picks up the voice of the occupant that wishes to initiate an ASR session as well as the voices of other occupants, known as “interference speech”. The one or more microphone signals (or audio channels) are available to a wake word detector which determines not only whether/when a keyword was spoken but also from which zone the keyword was spoken.”) [a processor programmed to: receive the at least one audio signal including at least one acoustic utterance, - claim 1], [receiving at least one utterance from one of the vehicle occupants; - claim 15] (Li, Par. 0019:” … receiving a plurality of microphone signals for each zone in a plurality of zones of an acoustic environment; generating a processed microphone signal for each zone in the plurality of zones of the acoustic environment, … and performing speech recognition on the processed microphone signals.”, and Par. 0103:”The speech recognition modules 2015 convert a spoken command (utterance) to text by interpreting captured (and processed) microphone audio signals to deduce words and sentences.”) [claim 1 only] determine a probability that the utterance is system directed based at least in part one the utterance and the number of vehicle occupants, (Li, Par. 0059:” … Word selections may occur based on a likelihood calculation, confidence level, or confidence score (referred to as a confidence score) that may be preserved in ASR metadata. When the highest likelihood or confidence score exceeds a predetermined or contextual threshold, an alignment system within the ASR may identify the spoken utterance and classify the spoken utterance as correctly recognized.”, and Par. 0145:” The active zone is the zone which is the most active as determined by a zone activity detector of the multi-zone speech recognition front-end 1400. The system 1400 may perform level comparisons across microphones 1402 or compute speech probabilities to determine the active zone, ... The zone activity detector may in addition use information such as seat detectors, voice activity detection and signal-to-noise ratios, to determine the most active zone.”) Note: signal to noise ratio (S/N) is impacted by the number of occupants since the larger the number of occupants the lower the S/N due to the interference/noise that larger number of occupants creates. Therefore, the probability that the utterance is system directed correspond to the utterance being recognized by the confidence score of the word being identified by the spoken utterance which is also related to the number of occupants since the lower the number of occupants, the higher the S/N and higher probability that the utterance is the system directed. [determine a classification threshold based at least in part on the number of vehicle occupants, and – claim 1], [determining a classification threshold based at least in part on the number of vehicle occupants and occupant specific factors associated with the one of the vehicle occupants; and – claim 15] (Li, Par. 0145:” The active zone is the zone which is the most active as determined by a zone activity detector of the multi-zone speech recognition front-end 1400. The system 1400 may perform level comparisons across microphones 1402 or compute speech probabilities to determine the active zone, ... The zone activity detector may in addition use information such as seat detectors, voice activity detection and signal-to-noise ratios, to determine the most active zone.”) Note: signal to noise ratio (S/N) reads on the classification threshold, since the higher S/N signifies more occupants, due to the environmental noise (non-directed speech) in the vehicle cabin. Also, per as-filed specification Par. 0044:” … The thresholds may be classification thresholds used by the multimodal processing system 130 to determine whether an utterance is SD or NSD. This threshold may be based, at least in part, on the number of occupants in the vehicle. In this example, classification threshold the more occupants, the higher the threshold so as to minimize false accepts by the system when occupants are conversing. Li, does not teach, however Nagata teaches [determine a number of vehicle occupants based at least in part on the at least one signal, - claim 1], [receiving at least one signal indicative of a number of vehicle occupants, - claim 15] (Nagata, Par. 0027:” FIG. 1B illustrates occupants 104 (e.g., a driver 104A, a front passenger 104B, a rear passenger's side occupant 104C, and a rear driver's side occupant 104D) within the passenger cabin of the vehicle 102.”, and Par. 0055:” … the vehicle 102 may be driving, with a first occupant 302A in the driver's seat, a second occupant 302B in the front passenger's seat, and a third occupant 302C in a rear seat behind the front passenger's seat.”) Note: Note: combination of each and every occupant indicates the number of vehicle occupants. [claim 15 only] identifying the one of the vehicle occupants; (Nagata, Par. 0003:” … one or more sensors of a vehicle configured to detect sensor data associated with an identification of an occupant within the vehicle and a location of the occupant within the vehicle. … one or more vehicle settings based on the identification of the occupant within the vehicle and the location of the occupant within the vehicle.”, and Par. 0023:” … the vehicle 102 is capable of identifying an occupant of the vehicle 102 and adjusting one or more settings based on the identification of the occupant. In some embodiments, the vehicle 102 identifies the occupant when the occupant is within the passenger cabin of the vehicle 102. In some embodiments, the vehicle 102 is capable of identifying the occupants 104A, 104B even as they approach the vehicle 102.”) Nagata is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li further in view of Nagata to determine a number of vehicle occupants, and identifying the one of the vehicle occupants. Motivation to do so would improve the comfort of the occupants within the vehicle (Nagata, Par. 0020). Li, as modified above, does not teach, however, Secker-Walker teaches [determining a probability that the at least one utterance is system directed; and – claim 15] (Secker-Walker, Col. 4, ll. 46- 51:” … the wakeword detection module 206 may determine a confidence level whose value corresponds to a likelihood that a wakeword is actually present in the speech [utterance]. If the confidence level satisfies a confidence level threshold, it is determined that the wakeword is present [system directed] in the speech [utterance].") Note: System directed reads on determining a probability that one utterance is system directed. compare the classification threshold to the probability to determine whether the at least one acoustic utterance is one of a system directed utterance and a non-system directed utterance. (Secker-Walker, Col. 4, ll. 46- 51:” … the wakeword detection module 206 may determine a confidence level whose value corresponds to a likelihood that a wakeword is actually present in the speech [utterance]. If the confidence level satisfies a confidence level threshold, it is determined that the wakeword is present [system directed] in the speech [utterance].") Note: wake words detection is in general considered as being system directed, and if the wake word is not detected, it implies non-system directed utterance. Secker-Walker is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li, as modified above, further in view of Secker-Walker to determine whether the at least one acoustic utterance is one of a system directed utterance. Motivation to do so would improve human-computer interactions (Secker-Walker, Col. 1, ll. 20-21). Regarding claim 2, Li, as modified above, teaches the vehicle system of claim 1. Li, as modified above, does not teach, however Nagata further teaches receive occupant data from at least one sensor, the occupant data indicative of a presence of an occupant. (Nagata, Par. 0003:” … The system includes one or more sensors of a vehicle configured to detect sensor data associated with an identification of an occupant within the vehicle and a location of the occupant within the vehicle.”. and Par. 0024:” … The one or more sensors may include an image sensor configured to detect image data of the occupants 104. The facial recognition may be performed on the detected image data to identify the occupants 104.”) Regarding claim 3, Li, as modified above, teaches the vehicle system of claim 2. Li, as modified above, does not teach, however Nagata further teaches determine the number of occupants based at least in part on the occupant data. (Nagata, Par. 0027:” FIG. 1B illustrates occupants 104 (e.g., a driver 104A, a front passenger 104B, a rear passenger's side occupant 104C, and a rear driver's side occupant 104D) within the passenger cabin of the vehicle 102.”, and Par. 0055:” … the vehicle 102 may be driving, with a first occupant 302A in the driver's seat, a second occupant 302B in the front passenger's seat, and a third occupant 302C in a rear seat behind the front passenger's seat.”) Note: combination of each and every occupant indicates the number of vehicle occupants. Regarding claims 4 and 16, Li, as modified above, teaches the vehicle system, and the method of claims 1, and 15 respectively. Li, as modified above, further teaches wherein the classification threshold increases as the number of occupants increases and decreases as the number of occupants decreases. (Li, Par. 0059:” … Word selections may occur based on a likelihood calculation, confidence level, or confidence score (referred to as a confidence score) that may be preserved in ASR metadata. When the highest likelihood or confidence score exceeds a predetermined or contextual threshold, an alignment system within the ASR may identify the spoken utterance and classify the spoken utterance as correctly recognized.”, and Par. 0145:” The active zone is the zone which is the most active as determined by a zone activity detector of the multi-zone speech recognition front-end 1400. The system 1400 may perform level comparisons across microphones 1402 or compute speech probabilities to determine the active zone, ... The zone activity detector may in addition use information such as seat detectors, voice activity detection and signal-to-noise ratios, to determine the most active zone.”) Note: signal to noise ratio (S/N) is impacted by the number of occupants since the larger the number of occupants the lower the S/N due to the interference/noise that larger number of occupants creates. Therefore, the probability that the utterance is system directed correspond to the utterance being recognized by the confidence score of the word being identified by the spoken utterance which is also related to the number of occupants since the lower the number of occupants, the higher the S/N and higher probability that the utterance is the system directed, and vice versa. Regarding claim 5, Li, as modified above, teaches the vehicle system of claim 1. Li, as modified above, further teaches wherein at least one of the classification threshold and probability is based at least in part on the number of vehicle occupants and at least one occupant-specific factor. (Li, Par. 0145:” The active zone is the zone which is the most active as determined by a zone activity detector of the multi-zone speech recognition front-end 1400. The system 1400 may perform level comparisons across microphones 1402 or compute speech probabilities to determine the active zone, ... The zone activity detector may in addition use information such as seat detectors, voice activity detection and signal-to-noise ratios, to determine the most active zone.”) Note: signal to noise ratio (S/N) reads on the classification threshold, since the higher S/N signifies more occupants, due to the environmental noise (non-directed speech) in the vehicle cabin. Also, per as-filed specification Par. 0044:” … The thresholds may be classification thresholds used by the multimodal processing system 130 to determine whether an utterance is SD or NSD. This threshold may be based, at least in part, on the number of occupants in the vehicle. In this example, classification threshold the more occupants, the higher the threshold so as to minimize false accepts by the system when occupants are conversing. Furthermore, according to Par. 0047 of the As-filed specification, threshold setting is a design choice based on the presence of the occupants. Regarding claims 6 and 17, Li, as modified above, teaches the vehicle system, and the method of claims 1, and 15 respectively. Li, as modified above, does not teach, however, Secker-Walker further teaches wherein the processor is programmed to determine that the utterance is system directed in response to the probability exceeding the threshold. (Secker-Walker, Col. 4, ll. 46- 51:” … the wakeword detection module 206 may determine a confidence level whose value corresponds to a likelihood that a wakeword is actually present in the speech [utterance]. If the confidence level satisfies a confidence level threshold, it is determined that the wakeword is present [system directed] in the speech [utterance].") Note: wake words detection is in general considered as being system directed, and if the wake word is not detected, it implies non-system directed utterance. Regarding claim 18, Li, as modified above, teaches the method of claim 15. Li, as modified above, further teaches wherein the utterance is received as part of an audio signal detected by at least one vehicle microphone. (Li, par. 0066:” … A person in a vehicle containing multiple occupants wishes to initiate an ASR session by uttering a keyword, such as “Hey Siri”, “Alexa” or “Okay Google”. Each occupant is in a separate zone of the cabin. The vehicle cabin contains one or more microphones which may or may not be dedicated for each zone. Each microphone picks up the voice of the occupant that wishes to initiate an ASR session as well as the voices of other occupants, known as “interference speech”. The one or more microphone signals (or audio channels) are available to a wake word detector which determines not only whether/when a keyword was spoken but also from which zone the keyword was spoken.”) Regarding claim 19, Li, as modified above, teaches the method of claim 15. Li, as modified above, does not teach, however, Nagata further teaches wherein the at least one signal indicative of a number of vehicle occupants is received from at least one sensor configured to detect at least one occupancy signal from the at least one occupant of a vehicle. (Nagata, Par. 0027:” FIG. 1B illustrates occupants 104 (e.g., a driver 104A, a front passenger 104B, a rear passenger's side occupant 104C, and a rear driver's side occupant 104D) within the passenger cabin of the vehicle 102.”, and Par. 0055:” … the vehicle 102 may be driving, with a first occupant 302A in the driver's seat, a second occupant 302B in the front passenger's seat, and a third occupant 302C in a rear seat behind the front passenger's seat.”) Note: Note: combination of each and every occupant indicates the number of vehicle occupants. Regarding claim 20, Li, as modified above, teaches the method of claim 15. Li, as modified above, does not teach, however, Nagata further teaches wherein the classification threshold is determined based at least on part on additional factors, including at least one of a personal preference associated with the at least one occupant. (Nagata, Par. 0052:” … configured to display content 132 (e.g., content 132A and 132B) to rear occupants. The vehicle 102 may identify the occupant and may present content according to the occupant's preferences and access qualifications. The occupant's preferences may include specifically which movies, TV shows, or music the occupant prefers, as well as genres of movies, TV shows, or music. The occupant's access qualifications may include age-based restrictions or subscription-based restrictions. For example, the occupant may be identified as being 8 years old, and accordingly, content identified as being for individuals over 18 years old may not be presented to the occupant. In another example, the occupant may have paid subscriptions to Streaming Service N and Streaming Service H, but not Streaming Service P. Thus, content from Streaming Service N and Streaming Service H may be available to the occupant, but not content from Streaming Service P. The occupant may provide authentication credentials for the paid subscriptions, which may thereafter be associated with the occupant.”) Claims 7-12 are rejected under 35 U.S.C. 103 as being unpatentable over Li (US20200219493A1), and in further view of Nagata (US20220212658A1). Regarding claim 7, Li teaches A vehicle system for classifying spoken utterance within a vehicle cabin as one of system-directed and non-system directed, the system comprising: (Li, Par. 0017:” … systems and methods of voice control in a multi-talker and multimedia environment, and in particular, multi-zone speech recognition systems with interference and echo cancellation and related methods which may be used in a vehicle or other suitable environment.”, and Par. 0043:” … in-car communications system to facilitate communication between occupants of the vehicle.”, and Par. 0059:” In FIG. 1, an application post processor (or processor system) 704 receives processed microphone signals and other inputs from the vehicle 200 to provide information, ... The application post processor 704 comprises an automatic speech recognition (ASR) engine/system, … Since the speech recognition system isolates speech and/or other content delivered in the vehicle 200 a ... Word selections may occur based on a likelihood calculation, confidence level, or confidence score (referred to as a confidence score) that may be preserved in ASR metadata. When the highest likelihood or confidence score exceeds a predetermined or contextual threshold, an alignment system within the ASR may identify the spoken utterance and classify the spoken utterance as correctly recognized.”) a processor programmed to: receive at least one audio signal from a vehicle microphone, and (Li, Par. 0095:” … each software system 1861 including instructions that may be executed by the processor 1804.”, and Par. 0066:” … A person in a vehicle containing multiple occupants wishes to initiate an ASR session by uttering a keyword, such as “Hey Siri”, “Alexa” or “Okay Google”. Each occupant is in a separate zone of the cabin. The vehicle cabin contains one or more microphones which may or may not be dedicated for each zone. Each microphone picks up the voice of the occupant that wishes to initiate an ASR session as well as the voices of other occupants, known as “interference speech”. The one or more microphone signals (or audio channels) are available to a wake word detector which determines not only whether/when a keyword was spoken but also from which zone the keyword was spoken.”) determine a classification threshold based at least in part on the occupancy signal to apply to a probability that acoustic utterances spoken by at least one of the vehicle occupants is a system directed utterance. (Li, Par. 0145:” The active zone is the zone which is the most active as determined by a zone activity detector of the multi-zone speech recognition front-end 1400. The system 1400 may perform level comparisons across microphones 1402 or compute speech probabilities to determine the active zone, ... The zone activity detector may in addition use information such as seat detectors, voice activity detection and signal-to-noise ratios, to determine the most active zone.”) Note: signal to noise ratio (S/N) reads on the classification threshold, since the higher S/N signifies more occupants, due to the environmental noise (non-directed speech) in the vehicle cabin. Also, per as-filed specification Par. 0044:” … The thresholds may be classification thresholds used by the multimodal processing system 130 to determine whether an utterance is SD or NSD. This threshold may be based, at least in part, on the number of occupants in the vehicle. In this example, classification threshold the more occupants, the higher the threshold so as to minimize false accepts by the system when occupants are conversing. Li, does not teach, however Nagata teaches at least one sensor configured to detect at least one occupancy signal from at least one occupant of a vehicle; and (Nagata, Par. 0003:” … one or more sensors of a vehicle configured to detect sensor data associated with an identification of an occupant within the vehicle and a location of the occupant within the vehicle. … one or more vehicle settings based on the identification of the occupant within the vehicle and the location of the occupant within the vehicle.”, and Par. 0023:” … the vehicle 102 is capable of identifying an occupant of the vehicle 102 and adjusting one or more settings based on the identification of the occupant. In some embodiments, the vehicle 102 identifies the occupant when the occupant is within the passenger cabin of the vehicle 102. In some embodiments, the vehicle 102 is capable of identifying the occupants 104A, 104B even as they approach the vehicle 102.”) Nagata is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li further in view of Nagata to detect at least one occupancy signal from at least one occupant of a vehicle. Motivation to do so would improve the comfort of the occupants within the vehicle (Nagata, Par. 0020). Regarding claim 8, Li, as modified above, teaches the vehicle system of claim 7. Li, as modified above, does not teach, however Nagata further teaches wherein the occupancy signal is indicative of a presence of an occupant. (Nagata, Par. 0003:” … The system includes one or more sensors of a vehicle configured to detect sensor data associated with an identification of an occupant within the vehicle and a location of the occupant within the vehicle.”. and Par. 0024:” … The one or more sensors may include an image sensor configured to detect image data of the occupants 104. The facial recognition may be performed on the detected image data to identify the occupants 104.”) Regarding claim 9, Li, as modified above, teaches the vehicle system of claim 8. Li, as modified above, does not teach, however Nagata further teaches determine a number of occupants based at least in part on the occupancy signal. (Nagata, Par. 0027:” FIG. 1B illustrates occupants 104 (e.g., a driver 104A, a front passenger 104B, a rear passenger's side occupant 104C, and a rear driver's side occupant 104D) within the passenger cabin of the vehicle 102.”, and Par. 0055:” … the vehicle 102 may be driving, with a first occupant 302A in the driver's seat, a second occupant 302B in the front passenger's seat, and a third occupant 302C in a rear seat behind the front passenger's seat.”) Note: combination of each and every occupant indicates the number of vehicle occupants. Regarding claim 10, Li, as modified above, teaches the vehicle system of claim 9. Li, as modified above, further teaches wherein the classification threshold is based at least in part on the number of occupants and at least one occupant-specific factor. (Li, Par. 0145:” The active zone is the zone which is the most active as determined by a zone activity detector of the multi-zone speech recognition front-end 1400. The system 1400 may perform level comparisons across microphones 1402 or compute speech probabilities to determine the active zone, ... The zone activity detector may in addition use information such as seat detectors, voice activity detection and signal-to-noise ratios, to determine the most active zone.”) Note: signal to noise ratio (S/N) reads on the classification threshold, since the higher S/N signifies more occupants, due to the environmental noise (non-directed speech) in the vehicle cabin. Also, per as-filed specification Par. 0044:” … The thresholds may be classification thresholds used by the multimodal processing system 130 to determine whether an utterance is SD or NSD. This threshold may be based, at least in part, on the number of occupants in the vehicle. In this example, classification threshold the more occupants, the higher the threshold so as to minimize false accepts by the system when occupants are conversing. Furthermore, according to Par. 0047 of the As-filed specification, threshold setting is a design choice based on the presence of the occupants. Regarding claim 11, Li, as modified above, teaches the vehicle system of claim 10. Li, as modified above, does not teach, however Nagata further teaches wherein at least one occupant-specific factor includes a personal preference associated with the at least one occupant. (Nagata, Par. 0052:” … configured to display content 132 (e.g., content 132A and 132B) to rear occupants. The vehicle 102 may identify the occupant and may present content according to the occupant's preferences and access qualifications. The occupant's preferences may include specifically which movies, TV shows, or music the occupant prefers, as well as genres of movies, TV shows, or music. The occupant's access qualifications may include age-based restrictions or subscription-based restrictions. For example, the occupant may be identified as being 8 years old, and accordingly, content identified as being for individuals over 18 years old may not be presented to the occupant. In another example, the occupant may have paid subscriptions to Streaming Service N and Streaming Service H, but not Streaming Service P. Thus, content from Streaming Service N and Streaming Service H may be available to the occupant, but not content from Streaming Service P. The occupant may provide authentication credentials for the paid subscriptions, which may thereafter be associated with the occupant.”) Regarding claim 12, Li, as modified above, teaches the vehicle system of claim 9. Li, as modified above, further teaches wherein the classification threshold increases as the number of occupants increases and decreases as the number of occupants decreases. (Li, Par. 0059:” … Word selections may occur based on a likelihood calculation, confidence level, or confidence score (referred to as a confidence score) that may be preserved in ASR metadata. When the highest likelihood or confidence score exceeds a predetermined or contextual threshold, an alignment system within the ASR may identify the spoken utterance and classify the spoken utterance as correctly recognized.”, and Par. 0145:” The active zone is the zone which is the most active as determined by a zone activity detector of the multi-zone speech recognition front-end 1400. The system 1400 may perform level comparisons across microphones 1402 or compute speech probabilities to determine the active zone, ... The zone activity detector may in addition use information such as seat detectors, voice activity detection and signal-to-noise ratios, to determine the most active zone.”) Note: signal to noise ratio (S/N) is impacted by the number of occupants since the larger the number of occupants the lower the S/N due to the interference/noise that larger number of occupants creates. Therefore, the probability that the utterance is system directed correspond to the utterance being recognized by the confidence score of the word being identified by the spoken utterance which is also related to the number of occupants since the lower the number of occupants, the higher the S/N and higher probability that the utterance is the system directed, and vice versa. Claims 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Li and Nagata, and in further view of Seker-Walker. Regarding claim 13, Li, as modified above, teaches the vehicle system of claim 7. Li, as modified above, does not teach, however, Seker-walker teaches wherein the processor is further programmed to compare the classification threshold to the probability to determine whether the at least one acoustic utterance is one of a system directed utterance and a non-system directed utterance. (Secker-Walker, Col. 4, ll. 46- 51:” … the wakeword detection module 206 may determine a confidence level whose value corresponds to a likelihood that a wakeword is actually present in the speech [utterance]. If the confidence level satisfies a confidence level threshold, it is determined that the wakeword is present [system directed] in the speech [utterance].") Note: wake words detection is in general considered as being system directed, and if the wake word is not detected, it implies non-system directed utterance. Secker-Walker is considered to be analogous to the claimed invention because it is in the same field of endeavor. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li, as modified above, further in view of Secker-Walker to determine whether the at least one acoustic utterance is one of a system directed utterance and a non-system directed utterance. Motivation to do so would improve human-computer interactions (Secker-Walker, Col. 1, ll. 20-21). Regarding claim 14, Li, as modified above, teaches the vehicle system of claim 13. Li, as modified above, does not teach, however, Seker-walker further teaches determine that the utterance is system directed in response to the probability exceeding the threshold. (Secker-Walker, Col. 4, ll. 46- 51:” … the wakeword detection module 206 may determine a confidence level whose value corresponds to a likelihood that a wakeword is actually present in the speech [utterance]. If the confidence level satisfies a confidence level threshold, it is determined that the wakeword is present [system directed] in the speech [utterance].") Note: wake words detection is in general considered as being system directed, and if the wake word is not detected, it implies non-system directed utterance. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure. Lortz (US 20140200737A1) teaches in ABS:” A system and method for identifying an occupant of a vehicle as an authorized user and managing settings and configurations of vehicle components based on personal preferences of the authorized user includes detecting occupant characteristics from multiple sensors, determining whether the occupant is an authorized user of the vehicle based on a comparison of the occupant characteristics with a user database including registered user profiles, and automatically adjusting vehicle cabin and/or control components based on personal preferences of the occupant identified as a registered user.” Examiner's Note: Examiner has cited particular columns and line numbers and/or paragraph numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARIOUSH AGAHI whose telephone number is (408)918-7689. The examiner can normally be reached Monday - Thursday and alternate Fridays, 7:30-4:30 PT. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. DARIOUSH AGAHI, P.E. Primary Examiner /DARIOUSH AGAHI/Primary Examiner, Art Unit 2656
Read full office action

Prosecution Timeline

Jun 20, 2024
Application Filed
Jan 25, 2026
Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12596890
SYSTEMS AND METHODS FOR CROSS-LINGUAL TRANSFER LEARNING
2y 5m to grant Granted Apr 07, 2026
Patent 12596876
SYSTEMS AND METHODS FOR IMPROVING TEXTUAL DESCRIPTIONS USING LARGE LANGUAGE MODELS
2y 5m to grant Granted Apr 07, 2026
Patent 12591743
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND STORAGE MEDIUM FOR EXTRACTING A NAMED ENTITY FROM A DOCUMENT
2y 5m to grant Granted Mar 31, 2026
Patent 12586586
SPEECH RECOGNITION WITH SELECTIVE USE OF DYNAMIC LANGUAGE MODELS
2y 5m to grant Granted Mar 24, 2026
Patent 12579448
TECHNIQUES FOR POSITIVE ENTITY AWARE AUGMENTATION USING TWO-STAGE AUGMENTATION
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
86%
Grant Probability
99%
With Interview (+29.0%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 166 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month