Office Action Analysis: 18603936 — THREE-DIMENSIONAL (3D) HEAD POSE PREDICTION FOR AUTOMOTIVE SYSTEMS AND APPLICATIONS

Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The claims will be read under the broadest reasonable interpretation standard outlined in
MPEP § 2111.01.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 12 and 19 are rejected under 35 U.S.C. 112(d), as being of improper dependent form for failing to further limit the subject matter of the claims upon which they depend. Claims 12 and 19 recite the systems of claim 1 and 13 respectively, wherein the processor/system is comprised in at least one of a number of systems for various purposes. In accordance with the rulings of the Patent Trial and Appeal Board, this is improper. See Ex Parte WENZHENG CHEN, YUXUAN ZHANG, SANJA FIDLER, HUAN LING, JUN GAO, and ANTONIO TORRALBA BARRIUSO, decision of the Patent Trial and Appeal Board, App. No. 17/981,770, Appeal 2024-003924, pages 7-9 (““An intended use or purpose usually will not limit the scope of the claim because such statements usually do no more than define a context in which the invention operates.” Boehringer Ingelheim Vetmedica, Inc. v. Schering-Plough Corp., 320 F.3d 1339, 1345 (Fed. Cir. 2003). Although “[s]uch statements often…appear in the claim’s preamble,” a statement of intended use or purpose can appear elsewhere in a claim. In re Stencel, 828 F.2d 751, 754 (Fed. Cir. 1987).”)
Claims 12 and 19 fail to provide any limitations that distinguish the claimed processor/system in actual structure. The claims only discuss intended usages for the very same processor/system of claims 1 and 13. Accordingly, the claims are rejected under 35 U.S.C. § 112(d).
Applicant may cancel the claims, amend the claims to place the claims in proper dependent form, rewrite the claims in independent form, or present a sufficient showing that the dependent claims comply with the statutory requirements. 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 9-15, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hwang et. al (US 20210217195 A1) (Hereinafter, “Hwang”) in view of Fanelli et. al, Real Time Head Pose Estimation from Consumer Depth Cameras (Hereinafter, “Fanelli”).
With respect to claim 1, Hwang teaches one or more processors comprising processing circuitry to ([14]):
capture optical image data representing at least a portion of the occupant of the vehicle, wherein the optical image data is synchronized with the depth image data ([196]; [201])
translate the at least one 3D measurement into a frame of reference of the optical image data to generate at least one 3D ground truth measurement ([234]-[236]): 

    PNG
    media_image1.png
    495
    456
    media_image1.png
    Greyscale

	update a machine learning model to generate a prediction of a 3D pose of at least a portion of the occupant based at least on the optical image data and the at least one 3D ground truth measurement ([234-236]; Fig. 5):

    PNG
    media_image2.png
    1712
    1076
    media_image2.png
    Greyscale

	Hwang does not explicitly teach circuitry to:
	determine at least one three-dimensional (3D) measurement corresponding to a head pose of an occupant of a vehicle based at least on a deviation between a first 3D point cloud representation of a model customized for the occupant of the vehicle and a second 3D point cloud representation of at least a portion of the occupant of the vehicle based at least on depth image data;
	However, Fanelli, in the same field of endeavor of head pose estimation, teaches the same ([4]; Fig. 3):

    PNG
    media_image3.png
    828
    545
    media_image3.png
    Greyscale

	It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hwang to include the elements of depth-based 3D point cloud modeling as taught by Fanelli. Doing so would provide a specific means of displaying the 3D information already collected by Hwang. Including this means would allow the system of Hwang to better compare a 2D-estimated head pose model with a 3D head pose model – the ultimate goal of Hwang. 
The method of Fanelli also allows for greater personalization of the estimation. This increases the accuracy of the system by accommodating user-specific features. As a system of 3D depth sensing, Hwang readily integrates the teachings of Fanelli with no compromise to underlying functionality. 
With respect to claim 2, Hwang and Fanelli teach the elements of claim 1 upon which claim 2 depends. 
Fanelli further teaches the additional limitations of claim 2:
generate the model customized for the occupant of the vehicle based at least on optimizing a generic model using a 3D point cloud representation of at least a portion of the occupant of the vehicle ([4]; Fig. 3).
It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to further modify Hwang to include the specific means of 3D point cloud optimization taught by Fanelli. Fanelli teaches not only the basic concept of 3D point modeling, but the exact process of optimizing a generic model for a specific vehicle occupant. A person of ordinary skill in the art would be motivated to include this means, such that a template could be used as a starting point to decrease the difficulty of customization. 
With respect to claim 3, Hwang and Fanelli teach the elements of claim 2 upon which claim 3 depends. 
Fanelli further teaches the additional limitations of claim 3:
output instructions to the occupant of the vehicle to rotate their head during a registration process to generate the first 3D point cloud representation ([4])
	It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to further modify Hwang to include user instruction during a registration process. Fanelli teaches not only the basic concept of 3D point modeling, but the exact process of optimizing a generic model for a specific vehicle occupant. A person of ordinary skill in the art would be motivated to include these specific methods, to decrease the difficulty of customization.
	With respect to claim 4, Hwang and Fanelli teach the elements of claim 1 upon which claim 4 depends.
	Hwang further teaches the additional limitations of claim 4:
	synchronously operate a depth sensor and one or more optical image sensors to capture the depth image data and the optical data ([192]-[194]):

    PNG
    media_image4.png
    356
    450
    media_image4.png
    Greyscale

	A person of ordinary skill in the art as of the effective filing date of the claimed invention would not be dissuaded from combining Hwang with Fanelli due to the limitations of claim 4. The synchronous operation of the sensors is a goal of both Hwang and the claimed invention. Fanelli is neutral as to this point, and would be consulted for purposes related to 3D modeling. 
	With respect to claim 9, Hwang and Fanelli teach the elements of claim 1 upon which claim 9 depends.
	Fanelli further teaches the additional limitations of claim 9:
	compute the deviation between the first 3D point cloud representation and the second 3D point cloud representation of at least a portion of the head of the occupant of the vehicle based at least on an iterative closest point algorithm ([4])
	It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hwang to further include the iterative closest point algorithm	 as a specific means of modeling 3D head pose changes. Doing so would provide a known method for tracking rotations, readily compatible with the broader systems of 3D modeling and depth sensing. 
	With respect to claim 10, Hwang and Fanelli teach the elements of claim 1 upon which claim 10 depends.
	Hwang further teaches the additional limitation of claim 10:
	generate a training sample to train the machine learning model, wherein the training sample comprises an image frame based at least on the optical image data, and a 3D ground truth label based on the at least one 3D ground truth measurement ([Fig. 5 (S508, S509)]):

    PNG
    media_image2.png
    1712
    1076
    media_image2.png
    Greyscale

A person of ordinary skill in the art as of the effective filing date of the claimed invention would not be dissuaded from combining Hwang with Fanelli due to the limitations of claim 10. The validation of 2D estimations with 3D information is a goal of both Hwang and the claimed invention. Fanelli is neutral as to this point, and would be consulted for purposes related to 3D modeling. 
With respect to claim 11, Hwang and Fanelli teach the elements of claim 1 upon which claim 11 depends.
	Hwang further teaches the additional limitation of claim 11:
wherein the optical data comprises image frames captured by a plurality of cameras within the vehicle having different points of view of the occupant of the vehicle ([Fig. 9; [189] “The AI device 100 may include a plurality of cameras 901, 902, 903, 904, and 905 in order to acquire a 2D image and 3D head pose information of the head of the person. The plurality of cameras may include at least one 2D image camera including a 2D image sensor and at least one 3D camera including a 3D image sensor.”)
A person of ordinary skill in the art as of the effective filing date of the claimed invention would not be dissuaded from combining Hwang with Fanelli due to the limitations of claim 11. The effective gathering of 2D information for machine learning analysis is a goal of both Hwang and the claimed invention. Fanelli is neutral as to this point, and would be consulted for purposes related to 3D modeling. 
With respect to claim 12, Hwang and Fanelli teach the elements of claim 1 upon which claim 12 depends.
	Hwang further teaches the additional limitation of claim 12, wherein the one or more processors are comprised in at least one of [including]:
	a system for performing deep learning operations ([81])
A person of ordinary skill in the art as of the effective filing date of the claimed invention would not be dissuaded from combining Hwang with Fanelli due to the limitations of claim 12. The application of deep learning for head pose estimation is an aspect of both Hwang and the claimed invention. Fanelli, while not explicitly disclosing deep learning, teaches machine learning more generally. All teachings may be used in the furtherance of deep learning without contradiction.  
With respect to claim 13, Hwang teaches a system comprising one or more processors [0014] to:
capture optical image data representing at least a portion of an occupant of a vehicle ([Fig. 12])
using a machine learning model, generate a prediction of a three-dimensional (3D) pose corresponding to the occupant based at least on the optical image data, wherein the machine learning model is to infer the predicted 3D pose ([Fig. 5]) based at least on:
optical image data representing at least a portion of the training subject, wherein the optical image data is synchronized with the depth image data ([Fig. 5])
Hwang does not explicitly teach an inference of the 3D predicted pose based at least on:
3D ground truth measurement data representing a deviation between a first 3D point cloud representation of a model customized for a training subject and a second 3D point cloud representation of at least a portion of the training subject based at least on depth image data.
However, Fanelli, in the same field of endeavor of 3D head pose estimation, teaches the same ([4]).  
It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hwang to include the elements of depth-based 3D point cloud modeling as taught by Fanelli. Doing so would provide a specific means of displaying the 3D information already collected by Hwang. Including this means would allow the system of Hwang to better compare a 2D-estimated head pose model with a 3D head pose model – the ultimate goal of Hwang. 
The method of Fanelli also allows for greater personalization of the estimation. This increases the accuracy of the system by accommodating user-specific features. As a system of 3D depth sensing, Hwang readily integrates the teachings of Fanelli with no compromise to underlying functionality. 
With respect to claim 14, Hwang and Fanelli teach the elements of claim 13 upon which claim 14 depends. 
Hwang further teaches the additional limitation of claim 14:
control the vehicle to perform one or more operations based at least on the predicted 3D pose ([127] “For example, when it is determined that the driver is in a drowsy state, the robot 100 a may activate the self-driving function of the self-driving vehicle 100 b or assist the control of the driver of the self-driving vehicle 100 b.”)
A person of ordinary skill in the art as of the effective filing date of the claimed invention would not be dissuaded from combining Hwang with Fanelli due to the limitations of claim 14. The control of vehicle operations is a goal of both Hwang and the claimed invention. It is a logical next step that a vehicle response would be triggered by any determination of 3D head pose. Fanelli is neutral as to this point, and would be consulted for purposes related to 3D modeling.
With respect to claim 15, Hwang and Fanelli teach the elements of claim 13 upon which claim 15 depends.
Fanelli further teaches the additional limitations of claim 15:
generate the model customized for the training subject based at least on optimizing a generic model using a 3D point cloud representation of at least a portion of the training subject ([4])
It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to further modify Hwang to include the specific means of 3D point cloud optimization taught by Fanelli. Fanelli teaches not only the basic concept of 3D point modeling, but the exact process of optimizing a generic model for a specific vehicle occupant. A person of ordinary skill in the art would be motivated to include this means, such that a template could be used as a starting point to decrease the difficulty of customization. 
With respect to claim 19, Hwang and Fanelli teach the elements of claim 13 upon which claim 19 depends.
Hwang further teaches the additional limitation of claim 19, wherein the system is comprised in at least one of [including]:
	a system for performing deep learning operations ([81])
A person of ordinary skill in the art as of the effective filing date of the claimed invention would not be dissuaded from combining Hwang with Fanelli due to the limitations of claim 19. The application of deep learning for head pose estimation is an aspect of both Hwang and the claimed invention. Fanelli, while not explicitly disclosing deep learning, teaches machine learning more generally. All teachings may be used in the furtherance of deep learning without contradiction.  
With respect to claim 20, it recites the elements of claim 1 more broadly, and as a method. Accordingly, it is taught by Hwang and Fanelli for the reasons discussed in the rejection of claim 1 above. A processor is capable of performing the recited method. 
Claim 5 is rejected under 35 U.S.C. 103 over Hwang and Fanelli in view of Komenczi et. al (US 20150161818 A1) (Hereinafter, “Komenczi”). 
With respect to claim 5, Hwang and Fanelli teach the elements of claim 4 upon which claim 5 depends.
	Hwang and Fanelli do not teach the additional limitations of claim 5:
	synchronously operate the depth sensor and the one or more optical image sensors based on a synchronization signal generated by the vehicle.
	However, Komenczi, in the same field of endeavor of 3D modeling, teaches the same ([50]; Fig. 3):

    PNG
    media_image5.png
    568
    887
    media_image5.png
    Greyscale

	It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hwang and Fanelli to include a means to synchronize the imaging systems. Doing so would ensure that their outputs are aligned, and errors are reduced. Desynchronization of imaging systems (when multiple types of systems are used) is a common problem, and Komenczi provides a specific means of overcoming this problem for one of ordinary skill in the art.
	Claim 6 is rejected under 35 U.S.C. 103 over Hwang and Fanelli in view of Send et. al (US 20180276843 A1) (Hereinafter, “Send”). 
	With respect to claim 6, Hwang and Fanelli teach the elements of claim 4 upon which claim 6 depends. 
Hwang and Fanelli do not explicitly teach the additional limitations of claim 6:
trigger the depth sensor to capture the depth image data based on an offset-synchronization from triggering the one or more optical image sensors to capture the optical image data.
However, Send, in the same field of endeavor of optical sensing, teaches the same ([6]; [59]; [499]):

    PNG
    media_image6.png
    352
    700
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    501
    730
    media_image7.png
    Greyscale

It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention to modify Hwang and Fanelli to include the offset synchronization of Send. One of ordinary skill in the art would readily consult Send for the purposes of specifically detailing a principal-agent offset synchronization relationship between optical sensing and depth sensing processes. This would have the advantage of enabling interval timing into the system, should this function be necessary for other downstream processing and machine learning tasks. 
Claim 7 is rejected under 35 U.S.C. 103 over Hwang and Fanelli in view of Liu et. al Head Pose Estimation through Keypoints Matching between Reconstructed 3D Face Model and 2D Image (Hereinafter, “Liu”).
With respect to claim 7, Hwang and Fanelli teach the elements of claim 1 upon which claim 7 depends. 
Hwang and Fanelli do not explicitly teach the additional limitations of claim 7:
	update the machine learning model further based on an input comprising the model customized for the occupant of the vehicle, the input being translated into the frame of reference of the optical image data.
	However, Liu, in the same field of endeavor of 3D modeling, teaches the same ([3]; [3.3.1]; Fig. 2):

    PNG
    media_image8.png
    513
    897
    media_image8.png
    Greyscale

It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hwang and Fanelli to include the elements of 3D modeling input and frame transfer as taught by Liu. Doing so would allow the 3D representation and 2D optical image data to be closer in line. The teachings integrate readily with the general concepts of 3D/2D machine learning disclosed by Hwang, and the specific depth-based 3D modeling of Fanelli.   
	Claims 8 and 16 are rejected under 35 U.S.C. 103 over Hwang and Fanelli in view of Selim et. al AutoPOSE: Large-Scale Automotive Driver Head Pose and Gaze Dataset with Deep Head Orientation Baseline (Hereinafter, “Selim”).
	With respect to claim 8, Hwang and Fanelli teach the elements of claim 1 upon which claim 8 depends.
	Hwang and Fanelli do not explicitly teach the additional limitations of claim 8:
translate the at least one 3D measurement into the frame of reference of the optical image data based on applying one or more extrinsic calibration parameters representing one or more rotation-translation (RT) transforms between a depth sensor that captured the depth image data and one or more optical image sensors that captured the optical image data.
However, Selim, in the same field of endeavor of 3D modeling, teaches the same ([4]):

    PNG
    media_image9.png
    429
    452
    media_image9.png
    Greyscale


    PNG
    media_image10.png
    920
    665
    media_image10.png
    Greyscale


    PNG
    media_image11.png
    508
    504
    media_image11.png
    Greyscale

	It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hwang and Fanelli to include the elements of depth-imaging calibration as taught by Selim. Doing so would align the output of the depth sensor with the optical sensor, as would be necessary for accurate computation. The methods of Hwang and Fanelli readily integrate with these teachings, as Hwang seeks to correlate 3D and 2D outputs, and Fanelli seeks to accurately model 3D representations based on depth data. 
	With respect to claim 16, Hwang and Fanelli teach the elements of claim 13 upon which claim 16 depends. 
	Hwang and Fanelli do not explicitly teach the additional limitations of claim 16:
	wherein the ground truth measurement data is translated into a frame of reference of the optical image data based on applying one or more extrinsic calibration parameters representing one or more rotation-translation (RT) transforms between a depth sensor that captured the depth image data and one or more optical image sensors that captured the optical image data.
However, Selim, in the same field of endeavor of 3D modeling, teaches the same ([4]).
It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to modify Hwang and Fanelli to include the elements of depth-imaging calibration as taught by Selim. Doing so would align the output of the depth sensor with the optical sensor, as would be necessary for accurate computation. The methods of Hwang and Fanelli readily integrate with these teachings, as Hwang seeks to correlate 3D and 2D outputs, and Fanelli seeks to accurately model 3D representations based on depth data.
Claim 17 is rejected under 35 U.S.C. 103 over Hwang and Fanelli in view of Gausebeck et. al (US 20190026956 A1) (Hereinafter, “Gausebeck”).
With respect to claim 17, Hwang and Fanelli teach the elements of claim 13 upon which claim 17 depends.
Hwang and Fanelli do not explicitly teach the additional limitation of claim 17:
	apply one or more extrinsic calibration parameters to translate the predicted 3D pose to a global reference frame of the vehicle.
	However, Gausebeck, in the same field of endeavor of 3D data prediction from 2D imaging, teaches the same ([73]-[75]):

    PNG
    media_image12.png
    143
    455
    media_image12.png
    Greyscale


    PNG
    media_image13.png
    709
    454
    media_image13.png
    Greyscale

	It would have been obvious to one of ordinary skill in the art as of the effective filing date of the claimed invention, to include the elements of global reference framing as taught by Gausebeck. Doing so would ensure that the multiple camera inputs of Hwang and the claimed invention are aligned in a single space. One or ordinary skill in the art would readily consult Gausebeck, in the field of predicting 3D information from 2D images, for specific solutions to overcome any issues caused by differing sensor placements and orientations. 
	Claim 18 is rejected under 35 U.S.C. 103 over Hwang and Fanelli in view of Miao (US 20160202757 A1) (Hereinafter, “Miao”).
With respect to claim 18, Hwang and Fanelli teach the elements of claim 13 upon which claim 18 depends.
	Hwang and Fanelli do not teach explicitly teach the additional limitation of claim 18:
	compute a gaze direction of the occupant of the vehicle based at least on the predicted 3D pose.
	However, Miao, in the same field of endeavor of head pose estimation, teaches the same ([0027]):

    PNG
    media_image14.png
    450
    705
    media_image14.png
    Greyscale

	It would have been obvious to one of ordinary skill in the art to modify Hwang and Fanelli, to include the elements of gaze tracking as taught by Miao. Doing so would provide an additional use for the 3D information already capable of being captured by the system, and increase the overall functionality. A driver’s head pose alone is not completely informative of their attention, as their eyes might not be directed to the road.  
Additional References
Additionally cited references (see attached PTO-892) otherwise not relied upon above have been made of record in view of the manner in which they evidence the general state of the art.
Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NOAH WILLIAM BOYAR whose telephone number is (571)272-8392. The examiner can normally be reached 8:30 – 5:00 EST, Monday – Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached at 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/NOAH W BOYAR/Examiner, Art Unit 2669
/CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669
Read full office action
THREE-DIMENSIONAL (3D) HEAD POSE PREDICTION FOR AUTOMOTIVE SYSTEMS AND APPLICATIONS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

THREE-DIMENSIONAL (3D) HEAD POSE PREDICTION FOR AUTOMOTIVE SYSTEMS AND APPLICATIONS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email