DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
1. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
2. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
3. Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Khosla et al., US 20220212811 A1, and further in view of Wang US 2022/0414911 A1.
4. As per claim 1, Khosla discloses: A vision-based optimization apparatus, comprising:
a camera fixed relative to an object; (Khosla, [0029], “A camera 108 provides a video stream 200a (shown in FIG. 1B) for use in fuel receptacle and boom tip position and pose estimation.”)
at least one processor; and a memory device storing instructions, which, when executed by the at least one processor, cause the at least one processor to, (Khosla, [0057], “In some examples, the operations illustrated in FIG. 7 are performed, at least in part, by executing instructions 902a (stored in the memory 902) by the one or more processors 904 of the computing device 900 of FIG. 9.”) at least:
receive a two-dimensional (2D) image of at least a portion of the object via the camera; (Khosla, [0040], “2D projections in an image collected by the camera”, and [0044], “identifiable locations in a two dimensional (2D) image.”)
generate 3D keypoints using a 3D model of the object and known rotational and translational information of the object; (Khosla, [0040], “3D keypoint locations on an object model to rotate and position the object in space such that the camera's view of the 3D keypoints matches the 2D pixel locations.”)
generate predicted 2D keypoints based at least in part on the generated 3D keypoints and at least one initial extrinsic parameter; (Khosla, [0065], “The 2D to 3D transform 620 estimates the boom tip 2D keypoint, solving for the boom control parameters given constraints of the boom pivot position and camera intrinsic and extrinsic parameters.”)
compare the predicted 2D keypoints and the 2D image to generate a feedback value; (Khosla, [0038], “Heatmap pixel values indicate, for each keypoint, the likelihood of a 3D object's keypoint being found at each pixel location of the image.” And [0039] “The aircraft keypoint heatmap 400 is described in further detail in relation to FIG. 4A. In some examples, the aircraft keypoint heatmap 400 is filtered with a filter 316 which, in some examples, comprises a Kalman filter (and thus filters heatmaps across video frames). In some examples, a threshold 318 is applied to eliminate keypoints having a low confidence level.”)
based on the feedback value, update the at least one initial extrinsic parameter to generate an at least one updated initial extrinsic parameter. (Khosla, [0062],” Operation 724 includes filtering out aircraft keypoints 402 in the aircraft keypoint heatmap 400 having confidence values below a threshold. Operation 726 includes, based on at least the aircraft keypoints 402, determining a position and pose of the fuel receptacle 116 (e.g., the fuel receptacle position 330) on the aircraft 110. In some examples, the position and pose of the fuel receptacle represent 6DOF. In some examples, determining the position and pose of the fuel receptacle comprises performing the 2D to 3D transform 320 for the aircraft keypoints 402. In some examples, the 2D to 3D transform 320 for the aircraft keypoints 402 uses a PnP algorithm. In some examples, determining the position and pose of the fuel receptacle 116 comprises determining a position and pose of the aircraft 110 (e.g., the aircraft position 334). In some examples, determining the position and pose of the fuel receptacle 116 comprises identifying aircraft keypoints associated with the fiducial marker 118.”)
5. Khosla doesn’t expressly disclose:
The extrinsic parameter is an extrinsic camera parameter.
6. Wang discloses:
extrinsic parameter is an extrinsic camera parameter. (Wang, [0156], “Because an extrinsic camera parameter for photographing the two-dimensional image is known, that is, a pose of the camera in real three-dimensional space is known, an absolute pose of the three-dimensional model corresponding to the target modeling object in the real three-dimensional space .”)
7. Wang is analogous art with respect to Khosla because they are from the same field of endeavor, namely image processing. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to include the process of that:” The extrinsic parameter is an extrinsic camera parameter.” as taught by Wang into the teaching of Khosla. The suggestion for doing so would provide an accurate estimation of the position of the object. Therefore, it would have been obvious to combine Wang with Khosla.
8. As per claim 2, Khosla in view of Wang discloses: The apparatus of claim 1, wherein the instructions further cause the at least one processor to: generate a second set of predicted 2D keypoints based at least in part on the at least one updated extrinsic camera parameter; compare the second set of predicted 2D keypoints and the 2D image to update the feedback value; and based on the updated feedback value, update the at least one updated initial extrinsic parameter. (Khosla, [0061], “Operation 718 includes determining, within the video frame 200, the aircraft bounding box 210 for the aircraft 110 to be refueled. In some examples, determining the aircraft bounding box 210 comprises determining the aircraft bounding box 210 using a first NN, the first NN comprising a CNN. Operation 720 includes cropping the video frame 200 to the aircraft bounding box 210. Operation 722 includes determining, within the (cropped) video frame 220, aircraft keypoints 402 for the aircraft 110 to be refueled. In some examples, determining the aircraft keypoints 402 comprises determining the aircraft keypoints 402 using a second NN, the second NN comprising a ResNet. In some examples, determining the aircraft keypoints 402 comprises determining the aircraft keypoints 402 within the aircraft bounding box 210. In some examples, determining the aircraft keypoints 402 comprises generating the aircraft keypoint heatmap 400 of the aircraft keypoints 402. In some examples, generating the aircraft keypoint heatmap 400 comprises determining a confidence value for each aircraft keypoint.”, and [0065],”Operation 736 includes filtering at least one of the aircraft bounding box 210, the aircraft keypoint heatmap 400, the position and pose of the aircraft 110, the position and pose of the fuel receptacle 116, the boom tip bounding box 206, the boom tip keypoint heatmap 450, or the position and pose of the boom tip 106 with a Kalman filter.” )
9. As per claim 3, Khosla in view of Wang discloses: The apparatus of claim 1, further comprising a boom control system, wherein the instructions cause the at least one processor to receive the known rotational and translational information of the object from the boom control system. (Khosla, [0033],” Boom control parameters 158, as used herein include variables that describe how the boom 104 can move (e.g., roll, pitch, yaw, translate, telescope, extend, retract, pivot, rotate, and the like) and may include limits and rates of such movement.”)
10. As per claim 4, Khosla in view of Wang discloses: The apparatus of claim 1, wherein the instructions further cause the at least one processor to utilize a neural network system to generate the predicted 2D keypoints. (Khosla, [0028], “ The location occurs in stages, such as object bounding box detection in the input two-dimensional (2D) video frames, 2D keypoint (object landmark) detection, and a 2D to 3D transform that determines the 6DoF information for each of the fuel receptacle and a tip of the refueling boom. Multi-stage pose estimation pipelines use real-time deep learning-based detection algorithms, for example, a neural network (NN) such as a deep convolutional neural network (CNN), which may be a residual neural network (ResNet)”, and [0049])
11. As per claim 5, Khosla in view of Wang discloses: The apparatus of claim 1, wherein the at least one initial extrinsic parameter comprises of at least one of: an offset of the camera along an x-axis relative to a first setpoint; an offset of the camera along a y-axis relative to the first setpoint; an offset of the camera along a z-axis relative to the first setpoint; a pitch offset of the camera relative to the first setpoint; a yaw offset of the camera relative to the first setpoint; and a roll offset of the camera relative to the first setpoint. (Khosla, [0027], “Aspects of the disclosure are able to estimate the position and orientation of a three-dimensional object (e.g., an aircraft fuel receptacle) in a video stream collected by a single camera, such as in support of autonomous aerial refueling operations and/or human-assisted aerial refueling operations. For example, aspects of the disclosure locate the relative positions and orientations (poses) of an aircraft fuel receptacle and a refueling platform's refueling boom in order to automate control of the refueling boom during refueling. In some examples, position and pose information is represented as six degrees-of-freedom (6DoF) including the three-dimensional (3D) position (x, y, and z coordinates) and orientation (roll, pitch, and yaw). “)
12. As per claim 6, Khosla in view of Wang discloses: The apparatus of claim 5, wherein the object comprises a refueling boom of an aircraft. (Khosla, [0029],” In the arrangement 100, the refueling platform 102 uses an aerial refueling boom 104 to refuel the aircraft 110. A camera 108 provides a video stream 200a (shown in FIG. 1B) for use in fuel receptacle and boom tip position and pose estimation.”)
13. As per claim 7, Khosla in view of Wang discloses: The apparatus of claim 5, wherein the at least one initial camera extrinsic parameter comprises an estimated offset between the camera and the first setpoint, and wherein the first setpoint is an estimated location of the camera on the aircraft. (Khosla, [0055],” Camera parameter information includes the parameters used in a camera model to describe the mathematical relationship between the 3D coordinates of a point in the scene from which the light comes from and the 2D coordinates of its projection onto the image plane. Intrinsic parameters, also known as internal parameters, are the parameters intrinsic to the camera itself, such as the focal length and lens distortion. Extrinsic parameters, also known as external parameters or camera pose, are the parameters used to describe the transformation between the camera and its external world. The camera extrinsic information, resolution, magnification, and other intrinsic information are known.”)
14. Claim 8, which is similar in scope to claim 1, thus rejected under the same rationale.
15. Claim 9, which is similar in scope to claim 4, thus rejected under the same rationale.
16. Claim 10, Khosla in view of Wang discloses: The computer implemented method of claim 9, further comprising training the neural network system based at least in part on: image data from camera, and the 3D model of the object, based at least in part on obtained position and rotational information the object. (Khosla, [0040], “2D projections in an image collected by the camera”, and [0044], “identifiable locations in a two dimensional (2D) image.”, and [0040], “3D keypoint locations on an object model to rotate and position the object in space such that the camera's view of the 3D keypoints matches the 2D pixel locations.”)
17. Claim 11, Khosla in view of Wang discloses: The computer implemented method of claim 8, further comprising obtaining position and rotational information of the object, wherein generating the 3D model of the object is based at least in part on the obtained position, rotational, and joint angle information. (Khosla, [0040], “3D keypoint locations on an object model to rotate and position the object in space such that the camera's view of the 3D keypoints matches the 2D pixel locations.”, and [0056],”The boom tip 2D to 3D transform 620 uses the known angles, extrinsics, and geometry of an object at each time instance to capture its world position using a similar approach as described the boom tip bounding box derivation 660)”)
18. Claim 12, Khosla in view of Wang discloses: The computer implemented method of claim 8, wherein: using the optimizer to minimize the feedback value further comprises using an objective function, based at least in part on the feedback value and the at least one initial extrinsic parameter and iteratively using the optimizer to minimize the feedback value; and as the feedback value is reduced, the at least one initial extrinsic parameter is optimized.( Khosla, [0101]” representing the aircraft keypoints in the aircraft keypoint heatmap with Gaussian point spread representations corresponding to the confidence values for the aircraft keypoints; [0102] filtering out aircraft keypoints in the aircraft keypoint heatmap having confidence values below a threshold; [0103] determining the position and pose of the fuel receptacle comprises performing a 2D to 3D transform for the aircraft keypoints;” and [0049], “In some examples, the boom tip keypoint heatmap 450 is filtered with a filter 616 which, in some examples, comprises a Kalman filter (and thus filters heatmaps across video frames). In some examples, a threshold 618 is applied to eliminate keypoints having a low confidence level.”)
19. Claim 13, Khosla in view of Wang discloses: The computer implemented method of claim 12, further comprising using the at least one optimized extrinsic parameter as input to determine a plurality of 3D keypoints of the object based on a second two-dimensional (2D) image of at least a portion of the object. (Khosla, [0128] obtaining one or more 3D aircraft models and a 3D boom model; [0129] identifying points on the aircraft model that correspond to detectable keypoints in 2D images; [0130] generating training images for the first NN using a simulator that sweeps the aircraft model through various 6DoF values to produce a set of aircraft images and aircraft ground truth data, and labeling the aircraft images using the aircraft ground truth data; generating training images for the second NN using aircraft training heatmaps that correspond to the set of aircraft images, the aircraft training heatmaps having keypoints based on the identified points on the aircraft model, and labeling the aircraft training heatmaps using the aircraft ground truth data.)
20. Claim 14, which is similar in scope to claim 5, thus rejected under the same rationale.
21. Claim 15, which is similar in scope to claim 6, thus rejected under the same rationale.
22. Claim 16, which is similar in scope to claim 1, thus rejected under the same rationale.
23. Claim 17, which is similar in scope to claim 4, thus rejected under the same rationale.
24. Claim 18, which is similar in scope to claim 1, thus rejected under the same rationale.
25. Claim 19, Khosla in view of Wang discloses: The non-transitory computer-readable medium of claim 16, wherein the instructions further cause the processor to obtain position and rotational information of the object, and wherein generating the 3D model of the object is based at least in part on the obtained position and rotational information. (Khosla, [0040],” The aircraft keypoint heatmap 400 (filtered and thresholded, in some examples) is provided to an aircraft 2D to 3D transform 320. In some examples, the aircraft 2D to 3D transform 320 uses a perspective-n-point (PnP) algorithm. PnP algorithms estimate the pose of a calibrated camera relative to an object, given a set of N 3D points on the object and their corresponding 2D projections in an image collected by the camera. The PnP algorithm used leverages the correspondences between the 2D pixel locations of detected keypoints and 3D keypoint locations on an object model to rotate and position the object in space such that the camera's view of the 3D keypoints matches the 2D pixel locations.”)
26. Claim 20, which is similar in scope to claim 12, thus rejected under the same rationale.
Response to Arguments
27. Applicant’s arguments with respect to claims 1-20 filed 09/22/2025 have been considered but are moot because Applicant submitted new amended claims. Accordingly, new grounds of rejection are set forth above. The new grounds of rejection conclusion have been necessitated by Applicant's amendments to the claims.
Conclusion
28. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDERRAHIM MEROUAN whose telephone number is (571)270-5254. The examiner can normally be reached on Monday to Friday 7:30 AM to 5:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on 571-272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ABDERRAHIM MEROUAN/Primary Examiner, Art Unit 2614