DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Objections
Claims 1 and 19 are objected. The claim limitation “one or more occupants of a vehicle space” should be read “one or more occupants of [[a]] the vehicle space”. Appropriate corrections are required.
Claims 1 and 19 are objected. The claim limitation “from the reflected image data of one or more occupants of the vehicle space” should be read “from the reflected image data of the one or more occupants of the vehicle space”. Appropriate corrections are required.
Claims 1 and 19 are objected. The claim limitation “the eye region image data, and head pose data” should be read “the eye region image data, and the head pose data”. Appropriate corrections are required.
The drawings are objected to under 37 CFR 1.83(a). The drawings must show every feature of the invention specified in the claims. Therefore, “digital user identifier with each face”, “anonymized unique digital user identifier” must be shown or the feature(s) must be canceled from the claims 16-17. No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejection – 35 U.S.C. § 112
The following is a quotation of 35 U.S.C. 112(b):
(B) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of pre-AIA 35 U.S.C. 112, second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter, which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention. Claims 1 and 19 recite "the location of the one or more occupants". There is insufficient antecedent basis for this limitation in the claim. Therefore, claims 1, 19, and their dependent claims are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.
Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter, which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention. Claim 5 recites “wherein the multi-view localization is performed using reflected image data captured by a single camera”. It is well known in the art that the multi-view localization is a process of determining a camera's or object's position and orientation in 3D space using images from multiple viewpoints that were captured by multiple different cameras. In this invention, claim 5 claims that the multi-view localization can be done by using image data from a single camera; however excepting in claim 5, the word “single camera” has not mentioned anywhere in the specification. Hence it is not clear to readers how “the multi-view localization is performed using reflected image data captured by a single camera”. For the reasons discussed above, claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under pre-AIA 35 U.S.C. 103(a) are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims under pre-AIA 35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA 35 U.S.C. 103(c) and potential pre-AIA 35 U.S.C. 102(e), (f) or (g) prior art under pre-AIA 35 U.S.C. 103(a).
Claims 1-4, 7-9, 11, 13, 15-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Asghar et al. (US Patent 11,562,550 B1), (“Asghar”), in view of Lind et al. (US Patent Application Publication US 2024/0046506 A1), (“Lind”).
Regarding claim 1, Asghar meets the claim limitations as follow.
A computer-implemented method ((a method) [Asghar: col. 1, line 63]; (executed by one or more processors) [Asghar: col. 2, line 10]) for performing multi-user gaze tracking in a vehicle space ((the XR device (e.g., AR device), a position/ orientation of the occupant, user activity (e.g., gestures movement, etc.), eye tracking) [Asghar: col. 12, line 58-60]; (determining an eye gaze of an occupant of the vehicle) [Asghar: col. 3, line 17-18]) through multi-surface optical reflections, the method comprising (a method) [Asghar: col. 1, line 63]:
a) receiving reflected image data ((receiving data (e.g., messaging, video, audio, etc.), among other operations) [Asghar: col. 20, line 18-18]; (detected using one or more image sensors) [Asghar: col. 26, line 54-55] - Note: It is well known in the art that a captured image of an object is formed on an image sensor of a digital camera when light reflected from the object is focused on the image sensor. In other words, videos and images are reflected image data) of one or more occupants of a vehicle space (determining an eye gaze of an occupant of the vehicle) [Asghar: col. 3, line 17-18; col. 33, line 50-52];
b) based on the reflected image data ((receiving data (e.g., messaging, video, audio, etc.)) [Asghar: col. 20, line 18]; (detected using one or more image sensors) [Asghar: col. 26, line 54-55] – Note: Captured images are the reflected image data), estimating the location of the one or more occupants of the vehicle space ((the one or more sensor systems 102 can include one or more image sensors such as image sensors 104A and 104N (collectively "image sensors 104" hereinafter), a location sensor) [Asghar: col. 18, line 51-55]; (a positioning beacon, a location measurement unit) [Asghar: col. 20, line 35]; (To illustrate, in some examples, the one or more compute 40 components 110 can perform monitoring ( e.g., device monitoring, user monitoring, vehicle monitoring, event monitoring, activity monitoring, object monitoring, etc.), device control / management, tracking (e.g., device tracking, object tracking, hand tracking, eye gaze tracking, etc.), localization, object detection and/or recognition, object classification, pose estimation, shape estimation, scene mapping, scene detection and/or scene recognition, face detection and/or recognition, emotion detection and/or recognition, content anchoring, content rendering, content filtering, image processing, modeling, content generation, gesture detection and/or recognition, user interface generation, power management, event detection and/or recognition, and/or other operations based on data from one or more of the components of the user computing system 100, the vehicle computing system 210, and/or any other system or component) [Asghar: col. 16, line 39-56; Figs. 5-7I]; (In some cases, the state of the occupant can additionally or alternatively include a pose of the occupant (e.g., a location, an orientation, a posture, etc.)) [Asghar: col. 40, line 56-58]; (position (e.g., location, orientation, posture, etc.) of the occupant) [Asghar: col. 40, line 64-65]; (an occupant of the vehicle) [Asghar: col. 3, line 18]);
c) obtaining (the method, non-transitory computer-readable medium, and apparatuses described above can obtain, using one or more image sensors of the mobile device, a set of images of an interior portion of the vehicle) [Asghar: col. 4, line 31-34] face image data (face detection) [Asghar: col. 16, line 47; Figs. 6A-7I], eye region image data (eye gaze tracking) [Asghar: col. 16, line 44], and head pose data (a user head pose, etc.) [Asghar: col. 12, line 61-62] from the reflected image data of one or more occupants of the vehicle space (To illustrate, in some examples, the one or more compute 40 components 110 can perform monitoring (e.g., device monitoring, user monitoring, vehicle monitoring, event monitoring, activity monitoring, object monitoring, etc.), device control / management, tracking (e.g., device tracking, object tracking, hand tracking, eye gaze tracking, etc.), localization, object detection and/or recognition, object classification, pose estimation, shape estimation, scene mapping, scene detection and/or scene recognition, face detection and/or recognition, emotion detection and/or recognition, content anchoring, content rendering, content filtering, image processing, modeling, content generation, gesture detection and/or recognition, user interface generation, power management, event detection and/or recognition, and/or other operations based on data from one or more of the components of the user computing system 100, the vehicle computing system 210, and/or any other system or component) [Asghar: col. 16, line 39-56; Figs. 5-7I]; and
d) using a deep learning model (a machine learning model(s)) [Asghar: col. 16, line 62] trained on vehicle space occupant reflection image data (a machine learning classifier trained on positive and negative examples) [Asghar: col. 43, line 66-67], performing eye tracking for at least one of the one or more occupants (In some aspects, the XR device can implement an occupant monitoring process, such as eye tracking using one or more cameras on the XR device) [Asghar: col. 10, line 39-44; Fig. 4B] based on the face image data (face detection and/or recognition, emotion detection and/or recognition) [Asghar: col. 16, line 50-51; Figs. 5-7I], the eye region image data ((The eye tracking engine 434 can use image data from the one or more image sensors 104 to track the eyes and/or an eye gaze of the occupant) [Asghar: col. 33, line 50-52]; (eye-tracking data indicating that the occupant's gaze) [Asghar: col. 13, line 15-16]; (determining an eye gaze of an occupant of the vehicle) [Asghar: col. 3, line 17-18]), and head pose data (the XR device (e.g., AR device), a position/ orientation of the occupant, user activity (e.g., gestures movement, etc.), eye tracking, inertial sensor data, audio data (e.g., speech data, acoustic data, sound waves, etc.), a user head pose, etc.) [Asghar: col. 12, line 58-62].
Asghar does not explicitly disclose the following claim limitations (Emphasis added).
through multi-surface optical reflections,
However, in the same field of endeavor Lind further discloses the deficient claim limitations and claim limitation as follows:
through multi-surface optical reflections (A computer-implemented method for image analysis is disclosed comprising: sending an infrared (IR) light source toward an individual within a vehicular environment; receiving, by at least one video camera, IR light reflections from the individual, wherein at least one eye of the individual is obscured by a semitransparent material; evaluating low accuracy tracking of the individual using the IR light reflections through the semitransparent material; enhancing a signal-to-noise ratio (SNR) for a sequence of video image frames using the IR light reflections of the individual through the semitransparent material, wherein the enhancing enables detection of feature points in an eye region for the individual; determining eye location for the individual based on results of the enhancing a signal-to-noise ratio; and deriving real-time gaze direction of a pupil for the at least one eye, wherein the deriving is based on the determining the eye location.) [Lind: para. 0007]; (The flow 200 further includes detecting that the individual is wearing tinted glasses 236. The detecting the wearing of tinted glasses can be based on detecting a semitransparent material located in an eye region associated with the individual. In embodiments, the determining eye location can be based on machine learning. The machine learning can be accomplished by training a network such as a neural network with training data and expected results associated with the training data. The training data can include video image frames that include faces of individuals wearing sunglasses) [Lind: para. 0033]; (The two or more flashes of IR light can be used to produce multiple reflections of IR light from the face of the individual) [Lind: para. 0034] – Note: The semitransparent material, the tinted glasses, and the face of the individual in these paragraphs are reflective surfaces),
reflected image data (using the IR light reflections of the individual through the semitransparent material) [Lind: para. 0007].
using a deep learning model trained on vehicle space occupant reflection image
data, performing eye tracking for at least one of the one or more occupants based on the face image data (The flow 200 further includes detecting that the individual is wearing tinted glasses 236. The detecting the wearing of tinted glasses can be based on detecting a semitransparent material located in an eye region associated with the individual. In embodiments, the determining eye location can be based on machine learning. The machine learning can be accomplished by training a network such as a neural network with training data and expected results associated with the training data. The training data can include video image frames that include faces of individuals wearing sunglasses) [Lind: para. 0033]; (The two or more flashes of IR light can be used to produce multiple reflections of IR light from the face of the individual) [Lind: para. 0034], the eye region image data (determining eye location for the individual based on results of the enhancing a signal-to-noise ratio; and deriving real-time gaze direction of a pupil for the at least one eye, wherein the deriving is based on the determining the eye location) [Lind: para. 0007], and head pose data (In the flow 100, the determining is based on a rate of change for head position 152 of the individual. Change for head position of the individual can include changes in three dimensions. The head position changes can include rotation left or right, up or down; head tilt forward, back, left, right, up, or down; or a combination of rotation, tilt, translation, etc.) [Lind: para. 0025].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Asghar with Lind to program the system to implement of Lind’s method.
Therefore, the combination of Asghar with Lind will enable the system to perform low light gaze detection in an vehicle environment [Lind: para. 0003].
Regarding claim 2, Asghar meets the claim limitations as set forth in claim 1. Asghar further meets the claim limitations as follow.
receiving (the method, non-transitory computer-readable medium, and apparatuses described above can obtain, using one or more image sensors of the mobile device, a set of images of an interior portion of the vehicle) [Asghar: col. 4, line 31-34] reflected image data of one or more occupants of a vehicle space ((In some cases, the state of the occupant can additionally or alternatively include a pose of the occupant (e.g., a location, an orientation, a posture, etc.)) [Asghar: col. 40, line 56-58]; (position (e.g., location, orientation, posture, etc.) of the occupant) [Asghar: col. 40, line 64-65]; (an occupant of the vehicle) [Asghar: col. 3, line 18]) by one or more cameras (a device that can include one or more cameras) [Asghar: col. 1, line 21-22; Figs. 1, 3, 4A-B, 7A-I].
However, in the same field of endeavor Lind further discloses the claim limitations as follows:
reflected image data (using the IR light reflections of the individual through the semitransparent material) [Lind: para. 0007].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Asghar with Lind to program the system to implement of Lind’s method.
Therefore, the combination of Asghar with Lind will enable the system to perform low light gaze detection in an vehicle environment [Lind: para. 0003].
Regarding claim 3, Asghar meets the claim limitations as set forth in claim 2. Asghar further meets the claim limitations as follow.
wherein the one or more cameras comprises (a device that can include one or more cameras) [Asghar: col. 1, line 21-22; Figs. 1, 3, 4A-B, 7A-I]:
at least one of a digital camera (In some cases, the localization process can detect the in-vehicle landmarks based on data from one or more devices (e.g., sensors, emitters, transceivers, imaging devices, etc.) on the XR device (e.g., the AR device), such as one or more cameras) [Asghar: col. 9, line 62-66] with a wide field-of-view (FOV) (In some examples, the XR device can pulse an edge of the occupant's field-of-view (FOV), circle or highlight the location of the vehicle event in a displayed image/video, render head-locked arrows, pulse the periphery of the occupant's vision, use other directional indicators to direct the occupant's head and/or gaze in the direction of a vehicle event, stream a vehicle camera feed (e.g., a backup camera, a side-view camera, etc.) near the location of the vehicle event, stream vehicle status (e.g., instrumentation, etc.) as a head-locked HUD or world locked user interface (UI) element, render a warped and/or perspective-corrected exterior view locked to an area of the vehicle (e.g., the vehicle wall, etc.)) [Asghar: col. 13, line 48-60]), a plurality of cameras directed at one or more reflective surfaces (a surface within the interior portion of the vehicle, and an illuminated object inside of the vehicle) [Asghar: col. 4, line 23-24] within the vehicle space (A vehicle is one example of a device that can include one or more cameras. For instance, a vehicle can include cameras that can capture frames of the interior of the vehicle and/or an area(s) outside of the vehicle.) [Asghar: col. 1, line 21-24], or a plurality of cameras capturing one or more of direct and reflected images of the one or more occupants ((In some examples, a vehicle can include cameras that can capture frames of the interior of the vehicle and/or an area(s) outside of the vehicle (e.g., the vehicle surroundings). The frames can be processed for various purposes, such as determining or recognizing road conditions; recognizing an identity of a person(s) in the vehicle; identifying other vehicles, objects, pedestrians, and/or obstacles in proximity to the vehicle; determining and/or recognizing activities and/or events in an environment associated with the vehicle (e.g., an environment outside of the vehicle, an environment inside of the vehicle, etc.); among others) [Asghar: col. 8, line 1-11]; (the XR device (e.g., AR device), a position/ orientation of the occupant, user activity (e.g., gestures movement, etc.), eye tracking, inertial sensor data, audio data (e.g., speech data, acoustic data, sound waves, etc.), a user head pose, etc.) [Asghar: col. 12, line 58-62]; (determining an eye gaze of an occupant of the vehicle) [Asghar: col. 3, line 17-18]).
Regarding claim 4, Asghar meets the claim limitations as set forth in claim 1. Asghar further meets the claim limitations as follow.
wherein the b) based on the reflected image data ((receiving data (e.g., messaging, video, audio, etc.)) [Asghar: col. 20, line 18]; (detected using one or more image sensors) [Asghar: col. 26, line 54-55]), estimating the location (To illustrate, in some examples, the one or more compute 40 components 110 can perform monitoring (e.g., device monitoring, user monitoring, vehicle monitoring, event monitoring, activity monitoring, object monitoring, etc.), device control / management, tracking (e.g., device tracking, object tracking, hand tracking, eye gaze tracking, etc.), localization, object detection and/or recognition, object classification, pose estimation, shape estimation, scene mapping, scene detection and/or scene recognition, face detection and/or recognition, emotion detection and/or recognition, content anchoring, content rendering, content filtering, image processing, modeling, content generation, gesture detection and/or recognition, user interface generation, power management, event detection and/or recognition, and/or other operations based on data from one or more of the components of the user computing system 100, the vehicle computing system 210, and/or any other system or component) [Asghar: col. 16, line 39-56; Figs. 5-7I] of the one or more occupants of the vehicle space ((In some cases, the state of the occupant can additionally or alternatively include a pose of the occupant (e.g., a location, an orientation, a posture, etc.)) [Asghar: col. 40, line 56-58]; (an occupant of the vehicle) [Asghar: col. 3, line 18]; (position (e.g., location, orientation, posture, etc.) of the occupant) [Asghar: col. 40, line 64-65]) comprises:
selecting (In some examples, the content filtering engine 412 can use the localization data (e.g., localization data 314), the tracking data (e.g., tracking data 422, device tracking data 440, hand tracking data 442, and/or eye tracking data 444), impairment information from the monitoring engine 410 (and/or the vehicle monitoring engine 402) and/or any other 15 event, state, and/or sensor data, such as the vehicle data 408 (or a portion thereof), to determine/select content to filter/
block and/or render for the occupant. For example, the content filtering engine 412 can use a pose of the mobile device 150 relative to a coordinate system of the vehicle 202 (e.g., of the vehicle computing system 210) and vehicle data
indicating a context of the vehicle 202 (e.g., a state/status of the vehicle 202, a vehicle event/operation, etc.) to filter/block certain content that may distract the occupant from an operation of the vehicle 202 and/or a vehicle event and/or that may obstruct a view of the occupant to the vehicle event and/or a certain region outside of the vehicle 202 that the content filtering engine 412 determines should remain visible to the occupant) [Asghar: col. 9, line 9-28] one or more optimal views of each of the one or more occupants ((FIG. 7A through FIG. 7I are diagrams illustrating different example states of an occupant driving a vehicle, in accordance with some examples of the present disclosure) [Asghar: col. 5, line 34-36]; (a field-of-view of an occupant of the vehicle) [Asghar: col. 2, line 67]); and estimating a position of at least one of the one or more occupants based on the selecting one or more optimal views of each of the one or more occupants (To illustrate, in some examples, the one or more compute 40 components 110 can perform monitoring (e.g., device monitoring, user monitoring, vehicle monitoring, event monitoring, activity monitoring, object monitoring, etc.), device control / management, tracking (e.g., device tracking, object tracking, hand tracking, eye gaze tracking, etc.), localization, object detection and/or recognition, object classification, pose estimation, shape estimation, scene mapping, scene detection and/or scene recognition, face detection and/or recognition, emotion detection and/or recognition, content anchoring, content rendering, content filtering, image processing, modeling, content generation, gesture detection and/or recognition, user interface generation, power management, event detection and/or recognition, and/or other operations based on data from one or more of the components of the user computing system 100, the vehicle computing system 210, and/or any other system or component) [Asghar: col. 16, line 39-56; Figs. 5-7I], for multi-view localization ((images rendered in the virtual environment also change, giving the user the perception that the user is moving within the VR environment. For example, a user can tum left or right, look up or down, and/or move forwards or backwards, thus changing the user's point of view of the VR environment. The VR content presented to the user can change accordingly, so that the user's experience is as seamless as in the real world. VR content can include VR video in some cases, which can be captured and rendered at very high quality, potentially providing a truly immersive virtual reality experience) of the occupant) [Asghar: col. 7, line 30-40]; (In some cases, the state of the occupant can additionally or alternatively include a pose of the occupant (e.g., a location, an orientation, a posture, etc.)) [Asghar: col. 40, line 56-58]).
In the same field of endeavor, Lind further discloses the multi-view localization as follows:
multi-view localization (The cameras or imaging devices that can be used to obtain images including IR reflection data from the occupants of the vehicle 410 can be positioned to capture the face of the vehicle operator, the face of a vehicle passenger, multiple views of the faces of occupants of the vehicle, and so on. The cameras or imaging devices can detect that the eyes of the operator are obscured by a semitransparent material. The semitransparent material can be associated with tinted, polarized, or other types of sunglasses 424. The cameras can be located near a rear-view mirror 440, such as camera 442; coupled to a windshield 444 of the vehicle 410, such as camera 446; positioned within the dashboard 450, such as camera 452; positioned on or near the dashboard (not shown); positioned on or behind the steering wheel 460 such as camera 462; and so on. In embodiments, additional cameras, imaging devices, etc., can be located throughout the vehicle. In further embodiments, each occupant of the vehicle could have multiple cameras positioned to capture video data from that occupant) [Lind: para. 0007].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Asghar with Lind to program the system to implement of Lind’s method.
Therefore, the combination of Asghar with Lind will enable the system to perform low light gaze detection in an vehicle environment [Lind: para. 0003].
Regarding claim 7, Asghar meets the claim limitations as set forth in claim 1. Asghar further meets the claim limitations as follow.
wherein the reflected image data ((receiving data (e.g., messaging, video, audio, etc.)) [Asghar: col. 20, line 18]; (detected using one or more image sensors) [Asghar: col. 26, line 54-55]; (the computing system 100 can include one or more sensor systems 102. In some examples, the one or more sensor systems 102 can include one or more image sensors such as image sensors 104A and 104N (collectively "image sensors 104" hereinafter), a location sensor) [Asghar: col. 18, line 49-55]; (a positioning beacon, a location measurement unit) [Asghar: col. 20, line 34-35]) comprises: reflected image data from one or more of a highly reflective surface (a surface within the interior portion of the vehicle, and an illuminated object inside of the vehicle) [Asghar: col. 4, line 23-24], a mirrored surface, a metal-coated surface, or a reflective plastic surface.
In the same field of endeavor, Lind further discloses the multi-view localization as follows:
reflected image data from one or more of a highly reflective surface (The cameras or imaging devices that can be used to obtain images including IR reflection data from the occupants of the vehicle 410 can be positioned to capture the face of the vehicle operator, the face of a vehicle passenger, multiple views of the faces of occupants of the vehicle, and so on. The cameras or imaging devices can detect that the eyes of the operator are obscured by a semitransparent material. The semitransparent material can be associated with tinted, polarized, or other types of sunglasses 424. The cameras can be located near a rear-view mirror 440, such as camera 442; coupled to a windshield 444 of the vehicle 410, such as camera 446; positioned within the dashboard 450, such as camera 452; positioned on or near the dashboard (not shown); positioned on or behind the steering wheel 460 such as camera 462; and so on. In embodiments, additional cameras, imaging devices, etc., can be located throughout the vehicle. In further embodiments, each occupant of the vehicle could have multiple cameras positioned to capture video data from that occupant) [Lind: para. 0007], a mirrored surface, a metal-coated surface ((The semitransparent material can include a tinted material, a material comprising a polarizer, and the like) [Lind: para. 0007] - Note: the tinted material and the material comprising a polarizer are metal-coated surfaces), or a reflective plastic surface
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Asghar with Lind to program the system to implement of Lind’s method.
Therefore, the combination of Asghar with Lind will enable the system to perform low light gaze detection in an vehicle environment [Lind: para. 0003].
Regarding claim 8, Asghar meets the claim limitations as set forth in claim 2. Asghar further meets the claim limitations as follow.
wherein at least one of the one or more cameras is configured to capture (In some cases, the localization process can detect the in-vehicle landmarks based on data from one or more devices (e.g., sensors, emitters, transceivers, imaging devices, etc.) on the XR device (e.g., the AR device), such as one or more cameras) [Asghar: col. 9, line 62-66] within its field of view (In some examples, the XR device can pulse an edge of the occupant's field-of-view (FOV), circle or highlight the location of the vehicle event in a displayed image/video, render head-locked arrows, pulse the periphery of the occupant's vision, use other directional indicators to direct the occupant's head and/or gaze in the direction of a vehicle event, stream a vehicle camera feed (e.g., a backup camera, a side-view camera, etc.) near the location of the vehicle event, stream vehicle status (e.g., instrumentation, etc.) as a head-locked HUD or world locked user interface (UI) element, render a warped and/or perspective-corrected exterior view locked to an area of the vehicle (e.g., the vehicle wall, etc.)) [Asghar: col. 13, line 48-60]) one or more surface reflections (a surface within the interior portion of the vehicle, and an illuminated object inside of the vehicle) [Asghar: col. 4, line 23-24] of at least one occupant of the vehicle space ((In some examples, a vehicle can include cameras that can capture frames of the interior of the vehicle and/or an area(s) outside of the vehicle (e.g., the vehicle surroundings). The frames can be processed for various purposes, such as determining or recognizing road conditions; recognizing an identity of a person(s) in the vehicle; identifying other vehicles, objects, pedestrians, and/or obstacles in proximity to the vehicle; determining and/or recognizing activities and/or events in an environment associated with the vehicle (e.g., an environment outside of the vehicle, an environment inside of the vehicle, etc.); among others) [Asghar: col. 8, line 1-11]; (the XR device (e.g., AR device), a position/ orientation of the occupant, user activity (e.g., gestures movement, etc.), eye tracking, inertial sensor data, audio data (e.g., speech data, acoustic data, sound waves, etc.), a user head pose, etc.) [Asghar: col. 12, line 58-62]; (determining an eye gaze of an occupant of the vehicle) [Asghar: col. 3, line 17-18]).
Regarding claim 9, Asghar meets the claim limitations as set forth in claim 8. Asghar further meets the claim limitations as follow.
wherein at least one of the one or more cameras (In some cases, the localization process can detect the in-vehicle landmarks based on data from one or more devices (e.g., sensors, emitters, transceivers, imaging devices, etc.) on the XR device (e.g., the AR device), such as one or more cameras) [Asghar: col. 9, line 62-66] is positioned to capture within its field of view at least one reflection from at least one of a window surface, a dashboard surface, a side panel surface, a center console surface, a seat surface, a mirror surface, or a display surface (In some cases, the XR device can implement a localization process that can perform imaging-based (e.g., vision-based) and/or audio-based (e.g., via audio beamforming) localization of in-vehicle landmarks/markers (e.g., quick response (QR) codes inside of the vehicle, lights inside of the vehicle, objects inside of the vehicle (e.g., doors, windows, seats, headrests, dashboard components, vehicle control systems (e.g., steering wheel, horn, signaling systems, etc.), patterns inside of the vehicle, shapes inside of the vehicle, etc.). The localization process can use such in-vehicle landmarks / markers to localize the XR device within the vehicle. For example, the localization process can use the in-vehicle landmarks/markers to determine a pose of the XR device relative to a coordinate system of the XR device and/or the vehicl) [Asghar: col. 9, line 47-61]
Regarding claim 11, Asghar meets the claim limitations as set forth in claim 9.
Asghar does not explicitly disclose the following claim limitations (Emphasis added).
wherein at least one of the at least one reflections comprises: at least one surface reflection of at least one reflective surface.
However, in the same field of endeavor Lind further discloses the deficient claim limitations and claim limitations as follows:
wherein at least one of the at least one reflections comprises (A computer-implemented method for image analysis is disclosed comprising: sending an infrared (IR) light source toward an individual within a vehicular environment; receiving, by at least one video camera, IR light reflections from the individual, wherein at least one eye of the individual is obscured by a semitransparent material; evaluating low accuracy tracking of the individual using the IR light reflections through the semitransparent material; enhancing a signal-to-noise ratio (SNR) for a sequence of video image frames using the IR light reflections of the individual through the semitransparent material, wherein the enhancing enables detection of feature points in an eye region for the individual; determining eye location for the individual based on results of the enhancing a signal-to-noise ratio; and deriving real-time gaze direction of a pupil for the at least one eye, wherein the deriving is based on the determining the eye location) [Lind: para. 0007]: at least one surface reflection of at least one reflective surface ((the IR light reflections of the individual through the semitransparent material) [Lind: para. 0007]; (The flow 200 further includes detecting that the individual is wearing tinted glasses 236. The detecting the wearing of tinted glasses can be based on detecting a semitransparent material located in an eye region associated with the individual. In embodiments, the determining eye location can be based on machine learning. The machine learning can be accomplished by training a network such as a neural network with training data and expected results associated with the training data. The training data can include video image frames that include faces of individuals wearing sunglasses) [Lind: para. 0033]; (The two or more flashes of IR light can be used to produce multiple reflections of IR light from the face of the individual) [Lind: para. 0034] – Note: The semitransparent material, the tinted glasses, and the face of the individual in these paragraphs are reflective surfaces).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Asghar with Lind to program the system to implement of Lind’s method.
Therefore, the combination of Asghar with Lind will enable the system to perform low light gaze detection in an vehicle environment [Lind: para. 0003].
Regarding claim 13, Asghar meets the claim limitations as set forth in claim 9. Asghar further meets the claim limitations as follow.
wherein the d) using a deep learning model (a machine learning model(s)) [Asghar: col. 16, line 62] trained on vehicle space occupant reflection image data (In some examples, events can be categorized as requiring the attention and/or interaction of the occupant 502 by the monitoring engine 402 of the vehicle application 302 and/or the vehicle user interface 414, e.g., via a machine learning classifier trained on positive and negative examples) [Asghar: col. 43, line 62-67], performing eye tracking for at least one of the one or more occupants (In some aspects, the XR device can implement an occupant monitoring process, such as a driver monitoring process, that can monitor a user for impairment based on a status of the vehicle, a position/orientation of the occupant, virtual content rendered by the XR device, eye tracking using one or more cameras on the XR device) [Asghar: col. 10, line 39-44; Fig. 4B] based on the face image data (To illustrate, in some examples, the one or more compute 40 components 110 can perform monitoring (e.g., device monitoring, user monitoring, vehicle monitoring, event monitoring, activity monitoring, object monitoring, etc.), device control / management, tracking (e.g., device tracking, object tracking, hand tracking, eye gaze tracking, etc.), localization, object detection and/or recognition, object classification, pose estimation, shape estimation, scene mapping, scene detection and/or scene recognition, face detection and/or recognition, emotion detection and/or recognition) [Asghar: col. 16, line 39-51; Figs. 5-7I], the eye region image data (eye-tracking data indicating that the occupant's gaze is focused elsewhere when the vehicle event monitor detects an imminent event or object) [Asghar: col. 13, line 15-17], and head pose data (a user head pose, etc.) [Asghar: col. 12, line 61-62] comprises:
a) determining (The tracking engine 420 can obtain sensor data and use the sensor data to perform tracking operations) [Asghar: col. 32, line 15-17] a point of regard (PoR) of each eye of each of the one or more occupants ((The tracking engine 420 can obtain sensor data and use the sensor data to perform tracking operations. The tracking operations can track the occupant (and/or one or more body parts of the occupant such as hands, eyes, fingers, a head pose, etc.) of the vehicle 202, etc) [Asghar: col. 32, line 15-20]; (the position, attention/focus, eye gaze, orientation, motion, content engagement, activity, etc., of the occupant) [Asghar: col. 36, line 13-15]; (the user’s point of view) [Asghar: col. 7, line 34-35]; (point-of-view or field-of-view of the occupant) [Asghar: col. 9, line 16]);
b) determining (The tracking engine 420 can obtain sensor data and use the sensor data to perform tracking operations) [Asghar: col. 32, line 15-17] an eye state of each eye of each of the one or more occupants ((the XR device (e.g., AR device), a position/ orientation of the occupant, user activity (e.g., gestures movement, etc.), eye tracking, inertial sensor data, audio data (e.g., speech data, acoustic data, sound waves, etc.), a user head pose, etc.) [Asghar: col. 12, line 58-62]; (The tracking engine 420 can use one or more tracking algorithms, such as a Kalman filter, a hand tracking algorithm, a machine learning algorithm, a ray tracing algorithm, a gaze and/or eye tracking algorithm, a computer vision algorithm, a position tracking algorithm, etc., to track the mobile device 150 and/or one or more body parts (e.g., eyes, hands, fingers, head, etc.) of an occupant of the vehicle 202) [Asghar: col. 32, line 38-44]); and
c) determining (The tracking engine 420 can obtain sensor data and use the sensor data to perform tracking operations) [Asghar: col. 32, line 15-17] a gaze direction of each eye of each of the one or more occupants ((The eye tracking engine 434 can use image data from the one or more image sensors 104 to track the eyes and/or an eye gaze of the occupant) [Asghar: col. 33, line 50-52]; (determining an eye gaze of an occupant of the vehicle) [Asghar: col. 3, line 17-18]; (eye-tracking data indicating that the occupant's gaze is focused elsewhere when the vehicle event monitor detects an imminent event or object) [Asghar: col. 13, line 15-17]).
Regarding claim 15, Asghar meets the claim limitations as set forth in claim 9. Asghar further meets the claim limitations as follow.
a deep learning network (a machine learning model(s)) [Asghar: col. 16, line 62] trained on face and eye images reflected from one or more surfaces within one or more vehicle spaces ((a machine learning classifier trained on positive and negative examples impairments such as drowsiness, intoxication, a health emergency, stress, a heightened emotional state, loss of consciousness, etc.) [Asghar: col. 51, line 16-19]; (For example, in some cases, occupant impairment can be detected based on eye-tracking data) [Asghar: col. 13, line 10-12]) – Note: Drowsiness and intoxication can be determined by a face and eyes).
In the same field of endeavor Lind further discloses the limitations as follows:
a deep learning network (An AI model) [Lind: para. 0028] trained on face and eye images reflected from one or more surfaces within one or more vehicle spaces ((The machine learning can be accomplished by training a network such as a neural network with training data and expected results associated with the training data. The training data can include video image frames that include faces of individuals wearing sunglasses) [Lind: para. 0033]; (A network such as a neural network can be trained to determine eye location within one or more video image frames) [Lind: para. 0025]; (An AI model can be developed and trained for the evaluating gaze direction and the target observed. In a usage example, an individual gazing in a certain direction can be viewing a vehicle speedometer. Thus, when the individual again gazes in the certain direction, then the AI model can infer that the individual is again viewing the speed indication gauge. Various computer vision techniques can be utilized. In some embodiments, an image is obtained. The driver's head can be found within the image along with features on the head and those features tracked over time. A three-dimensional model of the head can be generated based on fitting to the head features. Based on these features, a head pose can be determined. This head pose can be determined within the interior of the vehicle. Eye regions can be found on the head within the image. The head pose and eye region location can be put into a neural network to define a gaze vector. The gaze vector can be aligned to the three-dimensional model of the head in relation to the vehicle interior. The gaze vector can be projected to a three-dimensional model of the vehicle interior to determine where the driver is looking. In some cases, another neural network is used to define the eye openings. These openings can be used to further refine the gaze direction. The refined gaze direction can be used to identify where the driver is looking. A similar process could be used to determine where different occupants are looking.) [Lind: para. 0028]).
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Asghar with Lind to program the system to implement of Lind’s method.
Therefore, the combination of Asghar with Lind will enable the system to perform low light gaze detection in an vehicle environment [Lind: para. 0003].
Regarding claim 16, Asghar meets the claim limitations as set forth in claim 1. Asghar further discloses the claim limitation as follow.
at least one digital intensity image, wherein the at least one digital intensity image includes at least one visible eye region (Note: A visible eye region with three different intensities is shown in Fig. 6) [Asghar: col. 16, line 62]).
In the same field of endeavor Lind further discloses the limitations as follows:
at least one digital intensity image (intensities) [Lind: para. 0034].
It would have been obvious to one with an ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Asghar with Lind to program the system to implement of Lind’s method.
Therefore, the combination of Asghar with Lind will enable the system to perform low light gaze detection in an vehicle environment [Lind: para. 0003].
Regarding claim 19, Asghar meets the claim limitations as follow.
A system (systems) [Asghar: col. 1, line 60] configured for performing multi-user gaze tracking in a vehicle space ((the XR device (e.g., AR device), a position/ orientation of the occupant, user activity (e.g., gestures movement, etc.), eye tracking) [Asghar: col. 12, line 58-62]; (determining an eye gaze of an occupant of the vehicle) [Asghar: col. 3, line 17-18]) through multi-surface optical reflections, the system (systems) [Asghar: col. 1, line 60] comprising:
one or more hardware processors configured by machine-readable instructions to (a non-transitory computer-readable medium is provided for augmented reality for vehicle occupant assistance. The non-transitory computer-readable medium can include instructions stored thereon that, when executed by one or more processors, cause the one or more processors to) [Asghar: col. 2, line 6-11]:
a) receive reflected image data ((receiving data (e.g., messaging, video, audio, etc.), among other operations) [Asghar: col. 20, line 18-18]; (detected using one or more image sensors) [Asghar: col. 26, line 54-55] - Note: It is well known in the art that a captured image of an object is formed on an image sensor of a digital camera when light reflected from the object is focused on the image sensor. In other words, videos and images are reflected image data) of one or more occupants of a vehicle space (determining an eye gaze of an occupant of the vehicle) [Asghar: col. 3, line 17-18];
b) based on the reflected image data ((receiving data (e.g., messaging, video, audio, etc.)) [Asghar: col. 20, line 18]; (detected using one or more image sensors) [Asghar: col. 26, line 54-55] – Note: Captured images are the reflected image data), estimating the location of the one or more occupants of the vehicle space ((The computing system 100 can include one or more sensor systems 102. In some examples, the one or more sensor systems 102 can include one or more image sensors such as image sensors 104A and 104N (collectively "image sensors 104" hereinafter), a location sensor) [Asghar: col. 18, line 49-55]; (a positioning beacon, a location measurement unit) [Asghar: col. 20, line 34-35]; (To illustrate, in some examples, the one or more compute 40 components 110 can perform monitoring (e.g., device monitoring,