Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Allowable Subject Matter
Claims 5-6 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
§ 112(f) interpretation despite the absence of “means.”
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: “an optimization unit” and “a dense reconstruction unit” in claim 20,
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2 and 12-16 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong (Pub No. US 20230140170 A1) in view of Cleveland (Pub No. US 20150325003 A1) and further in view of Keetha (Keetha, N., Karhade, J., Jatavallabhula, K. M., Yang, G., Scherer, S., Ramanan, D., & Luiten, J. (2023). “SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM.” arXiv preprint arXiv:2312.02126.) and further in view of Fan (Fan, Zhiwen, et al. "LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS." arXiv e-prints (2023): arXiv-2311..).
As per claim 1, Xiong teaches the claimed:
1. A method for visual-inertial simultaneous localization and mapping (VI-SLAM), comprising: (Xiong [0041]: “As shown in FIG. 2, at an operation 241, the processing platform performs Visual-Inertial Simultaneous Localization and Mapping (“VISLAM”) to obtain a 6DOF pose 242 for the AR or XR display. According to various embodiments, the operation 241 may be implemented using a combination of the first and second rectified images 215 and 216 and inertial sensor data, such as data obtained from an inertial measurement unit (“MU”) or other sensor disposed on the same apparatus as the first and second cameras. The operation 241 may be performed using any suitable VISLAM algorithms, including, without limitation, the RGBD-SLAM and RTABMap algorithms, or other algorithms.” Xiong also teaches the inertial measurement unit and the estimation of the camera pose based on a frame of the camera claimed below as well. Xiong fig. 1 shows the IMU and the Camera under “resources”. Xiong [0006-0007] talks about tracking the pose of the IMU based on input images.).
Xiong alone does not explicitly teach the following limitation:
However, Xiong in combination with Cleveland teaches the claimed:
estimating an inertial measurement unit (IMU) bias using Interactive Closest Point (ICP) for estimating an acceleration bias; (The examiner is interpreting the Interactive Closest Point algorithm described in the specification to be the same as the Iterative Closest Point algorithm, where error is identified and position and mapping of objects is estimated between frames. This is used for identifying estimates of mapping and camera position, which is how it is used in both the specification and the claims. Cleveland teaches ICP used with acceleration measurement from an inertial measurement unit. Cleveland [0129]-‘[0135]: “Estimate absolute position of the camera given estimates of current 3D points in space adjusted with the absolute rotation and corresponding matches of 2D points in images is performed. [0130] b. Estimate absolute position of the camera given estimates of current 3D lines positions in space adjusted with the absolute rotation and corresponding matches of 2D lines in images is performed. [0131] c. Fuse the translation estimates in 8a-d adjusted with potential acceleration measurements read from an Inertial Measurement Unit if available is performed. [0132] i. The rotation matrices produced by 8a-d are multiplied by a confidence from their relative covariance matrices and summed. [0133] 10. The global scale of all measurements can be adjusted at a specified frequency as error propagates. [0134] a. Because of the use of the triplet, absolute scale of current triangulation can be immediately obtained at the next frame using the scale between the previous frame and the current frame. Having triangulations in the same scale an update of the keypoint/line absolute positions as well as an update of the absolute translation and rotation can be obtained at each step using ICL (Iterative Closest Line) or ICP (Iterative Closest Point) over the 3D line segments or points in order to increase accuracy of pose and map estimation. [0135] b. Conventional Bundle Adjustment methods can also be used.” Since the measurements are done with an IMU, the ICP is done to determine the accuracy and bias of the IMU.
It would be obvious to use this for the acceleration bias since acceleration measurements are being read and estimated.).
Xiong teaches the claimed:
estimating a camera pose of a current frame of a camera of the IMU using Red, Green, Blue (RGB) images and IMU measurements in real-time; (Xiong [0006]: “In a first embodiment, a method for obtaining a three-dimensional scene reconstruction and dense depth map for an augmented reality (AR) or extended reality (XR) display includes obtaining, at a first time, first image data of a real-world scene from a first camera of a stereoscopic pair of an apparatus and second image data of the real-world scene from a second camera of the stereoscopic pair. The method also includes performing feature extraction on the first image data to obtain a first feature map, performing feature extraction on the second image data to obtain a second feature map, and performing pose tracking based on at least one of the first image data, the second image data, and pose data from an inertial measurement unit (IMU) of the apparatus to obtain a six-degree-of-freedom (6DOF) pose of the apparatus. The method further includes generating, based on the 6DOF pose of the apparatus, the first feature map, and the second feature map, a disparity map between the first and second image data and generating an initial depth map based on the disparity map. The method also includes generating a dense depth map of the real-world scene based on the initial depth map and a camera model of the apparatus and generating, based on the dense depth map, a three-dimensional reconstruction of at least part of the real-world scene. In addition, the method includes rendering an AR or XR display, where the AR or XR display includes one or more virtual objects positioned to contact one or more surfaces of the three-dimensional reconstruction of at least part of the real-world scene.” The information from real-world images and the presence of AR implies real time. The camera model is the camera estimation. It can be used for algorithms involving RGB images: Xiong [0041]: “…The operation 241 may be performed using any suitable VISLAM algorithms, including, without limitation, the RGBD-SLAM and RTABMap algorithms, or other algorithms.”).
Xiong alone does not explicitly teach the remaining claim limitations.
However, Xiong in combination with Keetha teaches the claimed:
and performing dense mapping by managing a Gaussian map by optimizing 3D Gaussian parameters, (Keetha teaches Gaussian mapping for SLAM. Keetha abstract: “Dense simultaneous localization and mapping (SLAM) is crucial for robotics and augmented reality applications. However, current methods are often hampered by the non volumetric or implicit way they represent a scene. This work introduces SplaTAM, an approach that, for the first time, leverages explicit volumetric representations, i.e., 3D Gaussians, to enable high-fidelity reconstruction from a sin gle unposed RGB-D camera,..” Keetha pg. 2: “Direct optimization of scene parameters: As the scene is represented by Gaussians with physical 3D locations, colors, and sizes, there is a direct, almost linear (projective) gradient flow between the parameters and the dense photometric loss. Because camera motion can be thought of as keeping the camera still and moving the scene, we also have a direct gradient into the camera parameters, which enables fast optimization. Prior differentiable (implicit & volumetric) representations don’t have this, as the gradient needs to flow through (potentially many) non linear neural network layers.”).
expanding the Gaussian map, (Keetha pg. 2 under “Maps with explicit spatial extent: “Furthermore, this enables easy map updates, where the map can be expanded by adding Gaussians to the unseen regions while still allowing high-fidelity rendering. In contrast, for prior implicit & volumetric map representations, it is not easy to update the map due to their interpolatory behavior, while for explicit non-volumetric representations, the photo-realistic & geometric fidelity is limited.”).
Xiong alone does not explicitly teach the remaining claim limitations.
However, Xiong in combination with Keetha and Fan teaches the claimed:
and compressing the Gaussian map by removing redundant Gaussians. (Fan Fig.1: “Compressibility and Rendering Speed. We present LightGaussian to transform 3D Gaussians into a more compact representation. LightGaussian effectively prunes redundant Gaussians while preserving visual fidelity (on the left). It reduces the average storage from 724MB to 42MB and improves the FPS from 119 to 209.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the ICP algorithm with data from an IMU to analyze acceleration data as taught by Cleveland with the system of Xiong in order to use it to test the accuracy of the IMU for the V-SLAM taught by Xiong.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the Gaussian map created to perform SLAM as taught by Keetha with the system of Xiong in order to represent the objects as Gaussians and perform optimizations on them.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the compression of Gaussians by removing redundant ones as taught by Fan with the system of Xiong as modified by Keetha in order to clean the final rendering of the scene for lower file size and faster transmission.
As per claim 2, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with Keetha teaches the claimed:
2. The method of Claim 1, wherein estimating an inertial measurement unit (IMU) bias comprises initializing the Gaussian map based on a first frame of the camera. (Keetha pg. 4 under “Differentiable Rendering via Splatting: “The core of our approach is the ability to render high-fidelity color, depth, and silhouette images from our underlying Gaussian Map into any possible camera reference frame in a differentiable way.” Keetha teaches doesn’t track but instead initialized a map and is used to set the camera pose. Keetha pg. 4 Initialization: “For the first frame, the tracking step is skipped, and the camera pose is set to identity. In the densification step, since the rendered silhouette is empty, all pixels are used to initialize new Gaussians.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the Gaussian map representation of a scene based on the first frame of the camera as taught by Keetha with the system of Xiong in order to represent the objects in motion as Gaussians and use their position on the first frame of the camera as a reference point to validate the IMU data of Xiong.
As per claim 12, Xiong teaches the claimed:
12. The method of Claim 1, determine if an incoming frame is a new keyframe based on average parallax or visual similarity. (Xiong [0042]: Referring to the illustrative example of FIG. 2, at an operation 243, a determination is performed as to whether one or more of the pose of the apparatus or the contents of one or both rectified images 215-216 has changed. For example, in some embodiments, the determination of whether the image content has changed may include calculating a perceptual hash comparing a current rectified image frame against a previously-obtained rectified image frame to obtain a quantification of the differences in the images. Where the quantified value of the differences between present and earlier images is below a threshold value, the architecture 200 proceeds to an operation 244, where previously-determined depths and three-dimensional reconstruction of the scene are re-used, thereby avoiding waste of battery and processor resources associated with recalculating a depth map and three-dimensional reconstruction in the absence of any substantive change of view. Instead, the processing architecture 200 revers back to the image rectification stage 210, which can process new images from the first and second cameras. However, when the operation 243 indicates that the difference between the present and past image frames exceeds a similarity threshold (such as due to activity in the scene or a change of pose), a feature mapping stage 230, a disparity mapping stage 260, a depth mapping stage 280, and a three-dimensional reconstruction stage 290 may be performed.”. The current rectified frame is a new keyframe if it meets the similarity criteria. The quantified value of the distance is the hamming distance. If it is great enough, it is treated as a new keyframe and the operations are performed. This is a measure of visual similarity to determine if it is the new keyframe.).
As per claim 13, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with Keetha teaches the claimed:
13. The method of Claim 12, performing a densification procedure to determine which pixels are to be added as new Gaussians when the incoming frame is the new keyframe. (Keetha pg. 4 under “Camera Tracking”: “We minimize the image and depth reconstruction error of the RGB-D frame with respect to camera pose parameters for t +1, but only evaluate errors over pixels within the visible silhouette. (2) Gaussian Densification. We add new Gaussians to the map based on the rendered silhouette and input depth. (3) Map Update. Given the camera poses from frame 1 to t +1, we update the parameters of all the Gaussians in the scene by minimizing the RGB and depth errors over all images up to t+1. In practice, to keep the batch size manageable, a selected subset of keyframes that overlap with the most recent frame are optimized”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the adding of new Gaussians to represent new pixels of a new frame as taught by Keetha with the system of Xiong in order to update the 3D representation of the objects in the system of Xiong and compare them with the previous ones to analyze of the physical properties of the differences between them.
As per claim 14, Xiong teaches the claimed:
14. The method of Claim 12, comprising: hashing each incoming RGB image using perceptual hashing; (Xiong [0036]: “In some cases, the camera calibration information 213 may be obtained by capturing images of the same subject (such as a test card) from the same pose, computing a perceptual hash or otherwise analyzing the obtained images to ascertain differences, calculating initial calibration values, correcting one or both images based on the calibration values, and repeating the process until corrected images are identical.”).
and comparing hash values of the incoming RGB image to hash values of a current keyframe to determine if the incoming RGB image is the new keyframe (Xiong [0042]: “Referring to the illustrative example of FIG. 2, at an operation 243, a determination is performed as to whether one or more of the pose of the apparatus or the contents of one or both rectified images 215-216 has changed. For example, in some embodiments, the determination of whether the image content has changed may include calculating a perceptual hash comparing a current rectified image frame against a previously-obtained rectified image frame to obtain a quantification of the differences in the images.”).
As per claim 15, Xiong teaches the claimed:
15. The method of Claim 14, wherein comparing hash values of the incoming RGB image to hash values of the current keyframe comprises using Hamming distance, (Xiong [0036]: “In some cases, the camera calibration information 213 may be obtained by capturing images of the same subject (such as a test card) from the same pose, computing a perceptual hash or otherwise analyzing the obtained images to ascertain differences, calculating initial calibration values, correcting one or both images based on the calibration values, and repeating the process until corrected images are identical.” Xiong teaches quantifying the difference between the current and previous images. This is the hamming distance, which quantifies how many points are different. Xiong [0042]: “Referring to the illustrative example of FIG. 2, at an operation 243, a determination is performed as to whether one or more of the pose of the apparatus or the contents of one or both rectified images 215-216 has changed. For example, in some embodiments, the determination of whether the image content has changed may include calculating a perceptual hash comparing a current rectified image frame against a previously-obtained rectified image frame to obtain a quantification of the differences in the images. Where the quantified value of the differences between present and earlier images is below a threshold value, the architecture 200 proceeds to an operation.”).
wherein if the Hamming distance between the incoming RGB image and all current keyframes in the buffer are greater than a predetermined threshold value, the incoming RGB image is the new keyframe. (Xiong [0042]: Referring to the illustrative example of FIG. 2, at an operation 243, a determination is performed as to whether one or more of the pose of the apparatus or the contents of one or both rectified images 215-216 has changed. For example, in some embodiments, the determination of whether the image content has changed may include calculating a perceptual hash comparing a current rectified image frame against a previously-obtained rectified image frame to obtain a quantification of the differences in the images. Where the quantified value of the differences between present and earlier images is below a threshold value, the architecture 200 proceeds to an operation 244, where previously-determined depths and three-dimensional reconstruction of the scene are re-used, thereby avoiding waste of battery and processor resources associated with recalculating a depth map and three-dimensional reconstruction in the absence of any substantive change of view. Instead, the processing architecture 200 revers back to the image rectification stage 210, which can process new images from the first and second cameras. However, when the operation 243 indicates that the difference between the present and past image frames exceeds a similarity threshold (such as due to activity in the scene or a change of pose), a feature mapping stage 230, a disparity mapping stage 260, a depth mapping stage 280, and a three-dimensional reconstruction stage 290 may be performed.”. The quantified value of the distance is the hamming distance. If it is greater than a threshold, it is treated as a new keyframe and the operations are performed.).
As per claim 16, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with Keetha teaches the claimed:
16. The method of Claim 12, comprising: adding new Gaussians from the new keyframe to an existing Gaussians map; (Keetha abstract: “This combination enables several benefits over prior representations, including fast rendering and dense optimization, quickly determining if areas have been previously mapped, and structured map expansion by adding more Gaussians. Extensive experiments show that SplaTAM achieves up to 2× superior performance in camera pose estimation, map construction, and novel-view synthesis over existing methods, paving the way for more immersive high-fidelity SLAM applications.” Keetha teaches updating the Gaussian map based on new keyframe. This would involve adding more Gaussians to make the current map reflect the new keyframe. “This aims to update the parameters of the 3D Gaussian Map given the set of online camera poses estimated so far. This is done again by differentiable rendering and gradient-based-optimization, however unlike tracking, in this setting the camera poses are fixed, and the parameters of the Gaussians are updated. This is equivalent to the “classic” problem of fitting a radiance field to images with known poses. However, we make two important modifications. Instead of starting from scratch, we warm-start the optimization from the most recently constructed map. We also do not optimize over all previous (key)frames but select frames that are likely to influence the newly added Gaussians. We save each nth frame as a keyframe and select k frames to optimize, including the current frame, the most recent keyframe, and k−2 previous keyframes which have the highest overlap with the current frame. Overlap is determined by taking the point cloud of the current frame depth map and determining the number of points inside the frustum of each keyframe.”).
and updating the Gaussians map by optimizing parameters of all 3D Gaussians. (Keetha introduction: “Specifically, we use an explicit volumetric representation based on 3D Gaussians [14] to Splat (Render), Track, and Map for SLAM. We find that this leads to the following benefits over existing map representations: Fast rendering and dense optimization: 3D Gaussians can be rendered as images at speeds up to 400 FPS, making them significantly faster to optimize than the implicit & volumetric alternatives. The key enabling fac tor for this fast optimization is the rasterization of 3D primitives. We introduce several simple modifications that make splatting even faster for SLAM, including the removal of view-dependent appearance and the use of isotropic (spherical) Gaussians. Furthermore, this allows us to use dense photometric loss for SLAM in real-time, in contrast to prior explicit & implicit map representations that rely respectively on sparse 3D geometric features or pixel sampling to maintain efficiency.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the adding of new Gaussians from a new keyframe and optimize them as taught by Keetha with the system of Xiong in order to update the representations of the objects for the V-SLAM and accurately measure the changes in physical characteristics based on the Gaussian representation.
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Cleveland and further in view of Keetha and further in view of Fan and further in view of Sahoo (Sahoo, Bismaya. "Direct Visual-Inertial Odometry using Epipolar Constraints for Land Vehicles." (2018).).
As per claim 3, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with Sahoo teaches the claimed:
3. The method of Claim 1, wherein estimating the IMU bias comprises estimating a gyroscope bias using epipolar constraints within a first predetermined number N frames of data. (Sahoo 1.2: “In this thesis, a novel direct semi-tightly coupled visual-inertial fusion technique is pre sented which is robust in presence of sudden, unintended spikes in IMU measurements experienced when the camera-IMU platform is mounted on a land-vehicle traversing a bumpy terrain. The primary contribution of this thesis is the development of an optimization framework that enforces epipolar constraints to correct pose priors, obtained by integrating noisy IMU measurements, while taking into account geometric misalignment 4 arising due to direct visual optimization. To the best of the author’s knowledge, this thesis is the first to handle sudden spikes in IMU measurements in a direct visual-inertial framework.”
Sahoo teaches basing the measurement on two consecutive frames. Sahoo 4.3: “denotes the retracted rotation residual from Lie Group SO(3) to Lie algebra so(3), (.)i j is obtained by integrating IMU measurements from time frame i to j. (.)w denotes world frame of reference. (.)w i is the state at the previous time frame i and (.)w j is the parameter to be optimized. R,t,v,ba,bg denote the rotation, translation, linear velocity, accelerometer bias and gyroscope bias, respectively.” Sahoo teaches only using this for two frames to get an accurate pose estimation. This is the predetermined number. Sahoo abstract: “Hence, in a camera-IMU configuration, an IMU typically is used only for short-durations, i.e. in-between two camera frames. This is desirable as it not only helps to estimate the global scale, but also to give a pose estimate during temporary camera failure. Due to these reasons, a camera iii IMU configuration is being increasingly used in applications such as in Unmanned Aerial Vehicles (UAVs) and Augmented/ Virtual Reality (AR/VR)”).
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Cleveland and further in view of Keetha and further in view of Fan and further in view of Sahoo and further in view of Keal (Pub No. US 9880185 B2).
As per claim 4, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with Sahoo and Keal teaches the claimed:
4. The method of Claim 3, wherein estimating the accelerating bias comprises performing IMU integration with the gyroscope bias removed from gyroscope measurements. (Keal Claim 17: “A sensor device comprising: a Motion Processing Unit (MPU) including, a three-axis microelectromechanical system (MEMS) gyroscope configured to generate raw gyroscope data, the raw gyroscope data being a measurement output from the three-axis MEMS gyroscope and having a gyroscope bias; a three-axis MEMS accelerometer configured to generate raw accelerometer data; an integration device, wherein the three-axis MEMS accelerometer, three-axis MEMS gyroscope and integration device are form ed on a first silicon substrate, a second silicon substrate including a first processor electrically coupled to the first silicon substrate, wherein the first processor is configured to calculate the gyroscope bias and to remove the calculated gyroscope bias from the raw gyroscope data to generate an unbiased gyroscope data, further wherein the integration device is responsive to the unbiased gyroscope data and configured to integrate the unbiased gyroscope data into a gyroscope quaternion at an integration rate and to transmit the integrated gyroscope data at a rate lower than the integration rate,” Keal uses accelerometer data to perform integration and represent the gyroscope quaternion which represents the gyroscope data. Keal teaches a motion tracking device, Keal col. 2 line 57-col. 3 line 2: “In the described embodiments, a motion tracking device also referred to as Motion Processing Unit (MPU) includes at least one sensor in addition to electronic circuits. The sensors, such as the gyroscope, the compass, the accelerometer, microphone, pressure sensors, proximity, ambient light sensor, among others known in the art, are contemplated. Some embodiments include accelerometer, gyroscope, and magnetometer, which each provide a measurement along three axis that are orthogonal relative to each other referred to as a 9-axis device.” It would have been obvious to combine this with an IMU. It would have been obvious to combine this analysis and cleaning of gyroscope bias with the inertial measurement unit which finds acceleration information of objects measured by the IMU of Xiong with the gyroscope bias of Sahoo.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the integration of gyroscope data of an accelerometer with gyroscope bias removed as taught by Keal with the IMU for a VI-SLAM Xiong and the gyroscope bias of Sahoo in order to account for the gyroscope bias and measure the real position and physical characteristics of the objects more easily.
Claims 10-11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Cleveland and further in view of Keetha and further in view of Fan and further in view of Sahoo and further in view of Yu (Pub No. US 20190337344 A1).
As per claim 10, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with Sahoo and Yu teaches the claimed:
10. The method of Claim 3, comprising updating the gyroscope bias during estimation of the camera pose. (Yu describes a camera pose module as an estimation of the camera position. Yu [0034]: “The location estimation and path planning system 400 includes the location estimation system 300 and the path planning system 400. Referring to FIG. 3, the location estimation system 300 includes an imaging module 310 that includes a trailer ROI detection module 312 and a feature detection and tracking module 314 In addition, the location estimation system 300 includes the iterated Extended Kalman filter 320, a camera pose module 330, and a trailer pose estimator module 340.”. Yu teaches the Gyroscope bias being updated during the camera pose module. Yu [0039]: “The coordinate of any point in the world space are defined with respect to the world origin. Once the world coordinate system is defined, the position of the camera 142a may be defined by a position in the world space and the orientation of camera 142a may be defined by three-unit vectors orthogonal to each other. In some examples, the world origin is defined by the initial position of the camera, and the three-unit axes are defined by the initial camera orientation. The position of the camera 142a is determined or known by the camera pose module 330 as will be described below. As previously mentioned, the IMU 144 includes an accelerometer for determining the linear acceleration of the tow vehicle and a gyroscope for determining the rotational rate of the vehicle wheels. In addition, in some examples, the filter states 322, 322c, 322u include an Accelerometer bias state 324d and a Gyroscope bias state 324e. The inertial sensors such as the accelerometer and gyroscope often include small offset in the average signal output, even when there is no movement. The Accelerometer bias state 324d estimates the small offset of the accelerometer sensor in the average signal output, and the Gyroscope bias state 324e estimates the small offset of the gyroscope sensor in the average signal output.” When the gyroscope bias estimates the small offset of the gyroscope sensor, the bias is being updated in relation to what is sensed. This is related to the camera pose module when the camera pose is estimated.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the updating of the gyroscope bias during the camera pose estimation as taught by Yu with the system of Xiong modified by Sahoo in order to use changes in camera position to determine the gyroscopic bias of the IMU when the camera position is estimated to determine how the bias affects the camera in relation to the objects.
As per claim 11, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with Sahoo and Keetha teaches the claimed:
11. The method of Claim 10, wherein updating the gyroscope bias comprises differentially rendering RGB, depth, and silhouette images, (Keetha section 3: “Differentiable Rendering via Splatting. The core of our approach is the ability to render high-fidelity color, depth, and silhouette images from our underlying Gaussian Map into any possible camera reference frame in a differentiable way. This differentiable rendering allows us to directly calculate the gradients in the underlying scene representation (Gaussians) and camera parameters with respect to the error between the renders and provided RGB-D frames, and update both the Gaussians and camera parameters to minimize this error, thus fitting both accurate camera poses and an accurate volumetric representation of the world.”).
and adjusting the gyroscope bias to minimize a loss between the rendered RGB, depth and silhouette images and corresponding input images while keeping Gaussian parameters fixed. (Keetha under pg. 4 under “Camera tracking”: “The camera pose is then updated iteratively by gradient based optimization through differentiably rendering RGB, depth, and silhouette maps, and updating the camera parameters to minimize the following loss while keeping the Gaussian parameters fixed:” The updating of the camera parameters would serve to adjust the gyroscopic bias determined by the positions of the objects as taught by Sahoo.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the updating of camera parameters after rendering objects and using the rendered image to minimize loss as taught by Keetha with the system of Xiong in order to account for the error caused by the physical motion of the objects being represented.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the updating of camera parameters after rendering objects and representing the rendered images as Gaussians to minimize loss as taught by Keetha with the system of Sahoo in to use Gaussians to represent the objects and measure the gyroscope bias through the camera parameters.
As per claim 19, this claim is similar in scope to limitations recited in claim 11, and thus is rejected under the same rationale.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Cleveland and further in view of Keetha and further in view of Fan and further in view of Sahoo and further in view of Jagadeesan (US 20220051431 A1).
As per claim 7, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with Cleveland and Jagedeesan teaches the claimed:
7. The method of Claim 1, wherein estimating the camera pose comprises estimating the camera pose using a multi-scale ICP with Compute Unified Device Architecture (CUDA) acceleration. (Jagadeesan teaches SLAM. Jagadeesan [0117]: “PnP algorithms are widely used in applications such as structure from motion and monocular simultaneous localization and mapping (SLAM), which require dealing with hundreds or even
thousands of noisy feature points and outliers in real-time. The fact that outliers have a much greater impact on PnP accuracy than image Gaussian white noise makes it is necessary for the PnP algorithm to handle outliers efficiently.” Jagadeesan teaches ICP using CUDA. Jagadeesan [0210]: “We employ the truncated signed distance field (TSDF) method to mosaic the raw 3D point cloud generated from pixel disparities results of stereo matching and the camera calibration parameters to obtain the extended 3D model of the tissue surface, as shown in FIG. 17, which shows an example of our model mosaicking process with a phantom. The prerequisite to perform TSDF is to align the raw 3D point cloud accurately, which is equivalent to the estimation of camera motion in this video-based 3D reconstruction problem. Conventional iterative closest points (ICP)-based model alignment is difficult to handle smooth tissue surfaces.
The methods proposed in this disclosure can greatly improve the robustness of the SLAM system. Experimental results on ex- and in vivo videos captured using different types of imaging modalities have demonstrated the feasibility of our methods, and the obtained models have high quality textures and the same resolution as the input videos. We have also introduced the CUDA implementation details to accelerate the computation with the GPU and enable real-time performance.” Jagadeesan teaches a scale factor that is updated throughout the algorithm. Jagadeesan [0015]: “FIG. 3 is an illustration of the updating method of the scale factor μ.” [0133] FIG. 3 is an illustration of the updating method of the scale factor μ. One possible method is to update μ according to the Euclidean distances between p.sub.i, q.sub.i and o, which works for p.sub.1 and q.sub.1 because they have close depths as o. However, this method may result in slow μ updating rate for p.sub.2 and q.sub.2 because ∥q.sub.2−o∥≈∥p.sub.2−o∥. Hence, it is more efficient to compare v.sub.i and x.sub.i to move points p.sub.i to the related lines of sight.” The examiner is treating this as the multi-scale ICP.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the CUDA hardware to implement ICP as taught by Jagadeesan with the system of Xiong modified by Cleveland in order to use an efficient computer architecture to implement the algorithm.
Claims 8-9, 17, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Xiong in view of Cleveland and further in view of Keetha and further in view of Fan and further in view of Sahoo and further in view of Jagadeesan and further in view of He (Y. He, B. Xu, Z. Ouyang and H. Li, "A Rotation-Translation-Decoupled Solution for Robust and Efficient Visual-Inertial Initialization," 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 739-748, doi: 10.1109/CVPR52729.2023.00078.).
As per claim 8, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with He teaches the claimed:
8. The method of Claim 7, comprising performing rotation-translation-decoupled optimization to provide a starting point. (He abstract: “Second, the initial velocity and gravity vector are solved with lin ear translation constraints in a globally optimal fashion and without reconstructing 3D point clouds. Extensive ex periments have demonstrated that our method is 8 ∼ 72 times faster (w.r.t. a 10-frame set) than the state-of-the-art methods, and also presents significantly higher robustness and accuracy. The source code is available at”
He 741 under “related work”: With the development of visual odometry or SfM [10, 13,27], loosely-coupled methods for estimating VIO initial variables with high-precision camera trajectories as mea surements were naturally proposed [28,31]. Recently, Cam pos et al. [5] pointed out that the previous method did not consider the IMU measurement uncertainty, and proposed to use the maximum a posteriori to optimize the initial variables.” The initial variables are the starting point.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the rotation-translation-decoupled optimization as taught by He with the system of Xiong in order to isolate the rotation and translation of the movement of the objects to analyze them more clearly and understand the initial conditions of the object.
As per claims 17 and 20, these claims are similar in scope to limitations recited in claims 1-3 and 7-8, and thus is rejected under the same rationale.
As per claim 9, Xiong alone does not explicitly teach the claimed limitations.
However, Xiong in combination with He teaches the claimed:
9. The method of Claim 8, wherein performing the rotation-translation-decoupled optimization comprises estimating a relative rotation between a previous frame and a current frame using IMU pre-integration of rotation and a camera-IMU calibration matrix. (He described the motion model before integration. He section 3: “In this section, notations are defined and IMU motion model is given. Let Fci and Fbi denote the camera frame and IMU frame at time-index i. Tbibj to be the Euclidean transformation that take 3D points from IMU frame at time index j to the one at time-index i, which consisted of translation pbibj and rotation Rbibj . The calibrated extrinsic transformation from Fb to Fc is denoted by Tcb. ⌊·⌋× and ∥·∥ are skew-symmetric operator and Euclidean norm oper ator, respectively.” The calibration extrinsic transformation is the calibration matrix. The time indices i and j are two consecutive frames. The IMU-integration is described in the next section, implying that the above is IMU pre-integration: “The IMU integration follows the standard approach on SO(3) manifold as proposed in [12]…”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the isolated rotation and camera-IMU calibration mathematical values as taught by He with the system of Xiong in order to improve the accuracy of the IMU integration and use it to improve the accuracy of the IMU data used for VI-SLAM in Xiong.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS JOHN FOSTER whose telephone number is (571)272-5053. The examiner can normally be reached Mon, Fri 8:30-6. Tues-Thurs 7:30-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached at 571-272-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/THOMAS JOHN FOSTER/Examiner, Art Unit 2616
/DANIEL F HAJNIK/Supervisory Patent Examiner, Art Unit 2616