DETAILED ACTION
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 5, 8, 11, 15, and 18 are rejected under 35 U.S.C. 102(a)(2) as being unpatentable by Shetty (US 20240087230 A1) in view of Free (US 20120002086 A1).
As per claim 1, Shetty teaches the claimed:
1. A system, comprising:
a memory; and
at least one processing device, operatively coupled to the memory, configured to perform operations comprising:
(Shetty [0029]: “Embodiments further provide a computer program and a computer-readable (storage) medium including instructions that, when executed by a computer or a computer network, cause the computer or computer network to execute an embodiment of the method.”)
receiving, from a client device using a camera, two-dimensional (2D) image data representing a scene including a subject;
(Shetty [0015]: “The first step of the method is to provide one single 2D image. The single 2D image may be provided in a memory. The image may be gathered by a camera or other imaging devices.”)
Shetty alone does not explicitly teach the remaining claim limitations.
However, Shetty in combination with Free teaches the claimed:
providing, to a camera pose identification model, an input comprising information identifying a set of attributes of the camera, wherein the set of attributes of the camera comprises at least one orientation angle of the camera about at least one axis;
(Shetty [0018]: “A step of the method includes computing the shape and/or pose of the person from the predicted 2D vertex projections and from the approximated pose by using a pregiven (perspective) camera model”.
Free [0009]: “In some implementations, the appliance can include a geo-coordinate detector communicatively coupled with the CPU and configured to obtain the different locations of the image capture device, and an orientation detector communicatively coupled with the CPU and configured to obtain the different orientations of the image capture device. For example, the orientation detector includes accelerometers configured to obtain pitch and roll angles of an optical axis of the image capture device, and a compass configured to obtain a yaw angle of the image capture device.”
Free [0019]: “The rays represent the orientations of the appliance 110's camera with respect to the scene 105 and are determined by a set of three angles, usually referred to as pitch (angle with respect to a horizontal axis parallel to the scene 105,) roll (angle with respect to a horizontal axis perpendicular to the scene 105), and yaw (angle with respect to a vertical axis parallel to the scene 105).”
Shetty teaches computing the shape and pose of a person by using a pregiven perspective camera mode, which corresponds to the camera pose identification model. Free teaches the orientation detector that can detect and obtain pitch and roll angles of an optical axis.)
obtaining, from the camera pose identification model, an output comprising information identifying at least one camera pose parameter and an estimated value of the at least one camera pose parameter; and
(Shetty [0021]: “In an embodiment the pose parameters for the articulated 3D mesh model are obtained by inverse geometric transformation of the computed rotations. Specifically, the inverse geometric transformation may be performed by a so-called Pseudo-Linear-Inverse-Kinematic-Solver (PLIKS). Such PLIKS algorithm for 3D human shape and pose estimation may incorporate a perspective camera while solving for all the parameters of a parametric model analytically.”
Shetty teaches the method of obtaining the 3D human shape and pose estimation, this corresponds to obtaining the estimated value of the at least one camera pose parameter, since it would be necessary to obtain the camera pose parameters before actually generating a 3D human shape and pose estimation. Similarly, to obtain the estimated values, the actual camera pose parameter is required. Shetty states that the algorithm incorporates a perspective camera that solves for all parameters of a parametric model, which will include every variable needed for the human model, including the camera pose, such as rotation, translation, scale, distance, etc.)
performing at least one task based on the output, wherein performing the at least one task comprises generating a three-dimensional (3D) representation of the subject depicted in the 2D image data.
(Shetty [Abstract]: “A 3D mesh of a person shall be reconstructed based on one single 2D image.”
Shetty [0021]: “In an embodiment the pose parameters for the articulated 3D mesh model are obtained by inverse geometric transformation of the computed rotations. Specifically, the inverse geometric transformation may be performed by a so-called Pseudo-Linear-Inverse-Kinematic-Solver (PLIKS). Such PLIKS algorithm for 3D human shape and pose estimation may incorporate a perspective camera while solving for all the parameters of a parametric model analytically.”
Similar to the claim limitation above, Shetty teaches generating a 3D mesh model (3D representation) given the outputs of the camera pose parameter and the estimated value.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the pan and tilt camera rotation information as taught by Free with the system of Shetty in order to position a camera specifically to generate a 3D model that is more precise and accurate.
As per claim 11, this claim is similar in scope to limitations recited in claim 1, and thus is rejected under the same rationale.
As per claim 5, Shetty teaches the claimed:
5. The system of claim 1, wherein the input further comprises information identifying at least one of: a shape parameter of a 3D model of the subject, a pose parameter of the 3D model of the subject, or a vector comprising the shape parameter and the pose parameter.
(Shetty [0019]: “Embodiments provide reconstructing the 3D model from the shape/pose (that are the input parameters for a SMPL model). Thereby a camera model is incorporated during reconstruction.”
Shetty [0037]: “The Skinned Multi-Person Linear 8 (SMPL) may be used to parameterize the human body. The SMPL model 8 is a statistical parametric function parameterized by shape β and pose vectors (including relative rotation θ). The output of this function is triangulated surface mesh with e.g., N=6890 vertices. The shape parameters β are represented by a low dimensional principal component. The pose of the model is defined with the help of a kinematics chain involving a set of relative rotation vectors θ made up of e.g., 24 joints represented using axis-angle rotations. Additional model parameters represented as Φ are used in the deformation process of the SMPL model. Starting from a mean template mesh, the desired body mesh is obtained by applying forward kinematics based on the relative rotations θ and shape deformations β. The 3D body joints may be obtained by a linear combination of the mesh vertices using any desired linear regressor.”)
As per claim 15, this claim is similar in scope to limitations recited in claim 5, and thus is rejected under the same rationale.
As per claim 8, Shetty and Free teach the claimed:
8. The system of claim 1, wherein the at least one camera pose parameter comprises at least one of: a vertical position of the camera relative to ground, an orientation angle of the camera, or a distance between the camera and the subject.
(Free [0009]: “In some implementations, the appliance can include a geo-coordinate detector communicatively coupled with the CPU and configured to obtain the different locations of the image capture device, and an orientation detector communicatively coupled with the CPU and configured to obtain the different orientations of the image capture device. For example, the orientation detector includes accelerometers configured to obtain pitch and roll angles of an optical axis of the image capture device, and a compass configured to obtain a yaw angle of the image capture device.”
Free teaches the orientation detector that can obtain the pitch and roll angles of an image capture device, which correlates to the orientation angle of the camera.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the pan and tilt camera rotation information as taught by Free with the system of Shetty in order to position a camera specifically to generate a 3D model that is more precise and accurate.
As per claim 18, this claim is similar in scope to limitations recited in claim 8, and thus is rejected under the same rationale.
Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Shetty (US 20240087230 A1) in view of Free (US 20120002086 A1) and in further view of Varekamp (US 20210264658 A1).
As per claim 2, Shetty and Free alone do not explicitly teach the claimed limitations.
However, Shetty and Free in combination with Varekamp teaches the claimed:
2. The system of claim 1, wherein providing the input to the camera pose identification model further comprises:
receiving sensor data from at least one sensor operatively coupled to the camera; and
generating at least one attribute of the set of attributes of the camera based at least in part on the sensor data.
(Varekamp [0085]: “For example, the apparatus may comprise a sensor input processor (not shown) which is arranged to receive data from sensors detecting the movement of a viewer or equipment related to the viewer. … For example, based on acceleration, gyro, and camera sensor data from a headset, the sensor input processor can estimate and track the position and orientation of the headset and thus the viewer's head. Alternatively or additionally, a camera may e.g. be used to capture the viewing environment and the images from the camera may be used to estimate and track the viewer's head position and orientation. The following description will focus on embodiments wherein the head pose is determined with six degrees of freedom, but it will be appreciated that fewer degrees of freedom may be considered in other embodiments. The sensor input processor 201 may then feed the head pose to the receiver 201 to use as the viewer pose.”
Varekamp teaches the sensor input processor that receives data from sensors, which can be used with a camera. The sensor input processor can estimate and track the position and orientation of the camera, which corresponds to the at least one attribute of the set of attributes as it can identify the orientation angle of the camera on about at least one axis, determined with six degrees of freedom.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the sensor input processor as taught by Varekamp with the system of Shetty in order to determine and track a user’s pose, position and orientation with the sensors data to generate a more realistic 3D image based on the 2D image with sensors.
As per claim 12, this claim is similar in scope to limitations recited in claim 2, and thus is rejected under the same rationale.
Claims 3-4 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Shetty (US 20240087230 A1) in view of Free (US 20120002086 A1) and in further view of Papandreou (US 20230267687 A1).
As per claim 3, Shetty and Free alone do not explicitly teach the claimed limitations.
However, Shetty and Free in combination with Papandreou teaches the claimed:
3. The system of claim 1, wherein the input further comprises information representing the 2D image data and information identifying at least one 2D keypoint of the 2D image data, and wherein the at least one 2D keypoint identifies at least one specific point of the subject.
(Papandreou [0021]: “The position of the key point in the 3D scene can be determined by: determining, using the same or another neural network, an intersection of a ray connecting a camera that captured the two-dimensional image to the key point of the object with an image plane of the two-dimensional image and determining the position of the key point of the object in the coordinate system of the three-dimensional scene based on the world depth and the intersection. The position of the key point in the three-dimensional scene may be further determined based on a focal length of a camera that captured the two-dimensional image.”
Papandreou [0106]: “The object 504 is associated with an object key point 508 (e.g., the pelvis or sternum of a human) located at a position P=[X, Y, Z] in the three-dimensional scene. A ray 510 (shown as a dotted line) connects the camera 502 to the key point 508, and intersects the image plane at a position p=[x, y, f] in the three-dimensional scene. This corresponds to a pixel position I=(i.sub.x, i.sub.y) in the two-dimensional image captured by the camera.”
Papandreou teaches the information representing the 2D image data as pixel positions in the 2D image captured by the camera, which correspond to the different points of the 2D image data. Papandreou also states the key points of an object within a 3D scene, and since this corresponds to a pixel position in a 2D image captured by the camera, the information of the 2D key point will also be represented. This key points also identifies a specific point of the subject as Papandreou states the object key point can be a pelvis or sternum of a human.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the key points as taught by Papandreou with the system of Shetty in order to anchor the 2D image with reliable points that can help build a 3D model by performing triangulation.
As per claim 13, this claim is similar in scope to limitations recited in claim 3, and thus is rejected under the same rationale.
As per claim 4, Shetty and Free alone do not explicitly teach the claimed limitations.
However, Shetty and Free in combination with Papandreou teaches the claimed:
4. The system of claim 1, wherein the input to the camera pose identification model further comprises information identifying a set of attributes of the subject, and wherein the set of attributes of the subject comprises at least one of: a height of the subject relative to ground, a body ratio of the subject, a 2D keypoint of the subject, or a 3D keypoint of the subject.
(Papandreou [0021]: “The position of the key point in the 3D scene can be determined by: determining, using the same or another neural network, an intersection of a ray connecting a camera that captured the two-dimensional image to the key point of the object with an image plane of the two-dimensional image and determining the position of the key point of the object in the coordinate system of the three-dimensional scene based on the world depth and the intersection. The position of the key point in the three-dimensional scene may be further determined based on a focal length of a camera that captured the two-dimensional image.”
Papandreou [0106]: “The object 504 is associated with an object key point 508 (e.g., the pelvis or sternum of a human) located at a position P=[X, Y, Z] in the three-dimensional scene. A ray 510 (shown as a dotted line) connects the camera 502 to the key point 508, and intersects the image plane at a position p=[x, y, f] in the three-dimensional scene. This corresponds to a pixel position I=(i.sub.x, i.sub.y) in the two-dimensional image captured by the camera.”
Similar to the claim limitation above, Papandreou teaches information that identifies a set of attributes of the subject, such as object key points, that can be a pelvis or sternum of a human, in a 3D scene, which also corresponds to pixel positions of a 2D image captured by a camera. Therefore, Papandreou teaches the set of attributes of the subject that comprises 2D and 3D key points of the subject.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the key points as taught by Papandreou with the system of Shetty in order to anchor the 2D image with reliable points that can help build a 3D model by performing triangulation.
As per claim 14, this claim is similar in scope to limitations recited in claim 4, and thus is rejected under the same rationale.
Claims 6-7 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Shetty (US 20240087230 A1) in view of Free (US 20120002086 A1) and in further view of Choi (WO 2024005619 A1).
As per claim 6, Shetty and Free alone do not explicitly teach the claimed limitations.
However, Shetty and Free in combination with Choi teaches the claimed:
6. The system of claim 1, wherein the input further comprises information identifying a set of attributes of at least one background object in the scene, and wherein the set of attributes of the at least one background object in the scene comprises at least one of: location information describing at least one location of the at least one background object represented by the 2D image data, or at least one measure of distortion of the at least one background object based on a 2D projection of the scene.
(Choi (page 25, line 8 … line 16): “In one embodiment, the computing device 100 may detect a predetermined first landmark present in the query image from the query image using an artificial intelligence-based landmark detection model. A landmark in the present disclosure may refer to a specific object that can represent a specific area and/or location. … For example, the landmark detection model may include a model pre-trained through supervised learning to receive a query image as input and detect objects corresponding to predetermined landmarks within the query image.”
Choi teaches the detection of landmark objects and in FIG. 6, Choi shows an image that contains the detection results, this corresponds to the location information describing at least one location of the at least one background object represented by the 2D image data. Choi also states that the landmark detection model can determine landmarks, which represent or describe specific areas or locations, and that it can also detect objects within the landmarks in the image. Therefore, Choi teaches information of a landmark (background object) that comprises specific area and location information in the 2D image.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the landmark detection model as taught by Choi with the system of Shetty in order to efficiently perform visual localization and to identify and separate a background object, so constructing a 3D image of the subject does not get mixed with the background.
As per claim 16, this claim is similar in scope to limitations recited in claim 6, and thus is rejected under the same rationale.
As per claim 7, Shetty and Free alone do not explicitly teach the claimed limitations.
However, Shetty and Free in combination with Choi teaches the claimed:
7. The system of claim 1, wherein the set of attributes of the camera comprises camera calibration data.
(Choi (page 22, line 28 … page 23, line 6): “In one embodiment of the present disclosure, the additional information of the device may further include camera information of the device, and an example of such camera information may include intrinsic parameters of the camera. … For example, given the camera's internal parameters (i.e., a calibrated camera), the pose estimation algorithm uses the Perspective-n-Point (PnP) methodology”
Choi teaches additional information includes camera information, one of which are the camera’s internal parameters, which includes calibrated camera data.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the calibrated camera data as taught by Choi with the system of Shetty in order to estimate the position and orientation of the camera to generate an accurate 3D reconstruction of the 2D image.
As per claim 17, this claim is similar in scope to limitations recited in claim 7, and thus is rejected under the same rationale.
Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Shetty (US 20240087230 A1) in view of Free (US 20120002086 A1) and in further view of Sun (US 20230181144 A1).
As per claim 9, Shetty and Free alone do not explicitly teach the claimed limitations.
However, Shetty and Free in combination with Sun teaches the claimed:
9. The system of claim 1, wherein the operations further comprise analyzing at least one movement of the subject by using the 3D representation, and wherein analyzing the at least one movement of the subject comprises measuring a set of motion parameters associated with the at least one movement of the subject.
(Sun [0549]: “As used herein, a motion of the target subject may be measured by one or more motion parameters, such as a moving distance, a moving direction, a moving trajectory, a change of a posture, or the like, or any combination thereof, of the target subject (or a portion thereof). A moving distance may include a pixel distance in the image domain and/or an actual distance in the physical space. A posture of the target subject may reflect one or more of a position, a pose, a shape, a size, etc., of the target subject (or a portion thereof).”
Sun [0550]: “In some embodiments, the processing device 120 may determine a motion of the target subject over the time series based on the plurality of sets of image data, and determine whether the target subject moves over the time series based on the determined motion. For example, the processing device 120 may determine whether the motion of the target subject exceeds a threshold T. If the motion of the target subject exceeds than the threshold T, the processing device 120 may determine that the target subject moves over the series of time points.”
Sun [0380]: “In some embodiments, the target subject model may be represented in a 3D image or a 2D image. … In some embodiments, the target subject model 1240 may be a 3D model or a 2D model. … The target image may be generated directly based on the reference subject model. In some embodiments, the target image may be generated based on the subject model, and the generation of the reference subject model and the target subject model may be omitted.”
Sun teaches the target subject model (subject by using the 3D representation) that is analyzed by a processing device to determine a motion of the target subject. Sun also teaches that the motion of the target subject may include one or more motion parameters, which include moving distance and direction.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the determination of movement of a target subject model as taught by Sun with the system of Shetty in order to understand how and when a subject or scene evolves and changes over time.
As per claim 19, this claim is similar in scope to limitations recited in claim 9, and thus is rejected under the same rationale.
Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shetty (US 20240087230 A1) in view of Free (US 20120002086 A1) and in view of Sun (US 20230181144 A1) and in further view of Halevy (WO 2022169999 A1).
As per claim 10, Shetty, Free and Sun alone do not explicitly teach the claimed limitations.
However, Shetty, Free and Sun in combination with Halevy teaches the claimed:
10. The system of claim 9, wherein analyzing the at least one movement of the subject further comprises:
determining whether the at least one movement of the subject deviates from a target movement; and
in response to determining that the at least one movement of the subject deviates from a target movement, providing, to at least one entity, at least one of: an indication of the deviation from the target movement, or a recommendation to correct the at least one movement.
(Halevy [0230]: “Example 1 includes a method, comprising generating, for display to a user, a representation of the user performing a movement pattern of an activity from a first point of view; sensing a deviation of movement of the user from a model movement pattern for the activity; selecting a second point of view based on a type of the deviation; and generating, for display to the user, a representation of the user performing the movement pattern for the activity from the second point of view.”
Halevy teaches sensing a deviation of movement from a model movement pattern (target movement) and in response to this deviation, generates, to the user, a second point of view that displays the user performing the correct movement pattern. Halevy, in FIG. 23, shows an example of this, as the subject doing a pushup and is not following the target movement, and in the corner of the screen it displays the proper movement of the pushup, which corresponds to the recommendation to correct the at least one movement.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the sensing a deviation of a movement as taught by Halevy with the system of Shetty as modified by Sun in order to determine that a movement error has occurred and to indicate that this movement form is suboptimal and needs to be changed.
As per claim 20, this claim is similar in scope to limitations recited in claim 10, and thus is rejected under the same rationale.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSHUA SUO whose telephone number is (571) 272-8387. The examiner can normally be reached Mon-Fri 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached on (571) 272-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JOSHUA SUO/Examiner, Art Unit 2616
/DANIEL F HAJNIK/Supervisory Patent Examiner, Art Unit 2616