Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/14/2025 has been entered.
Response to Amendments
Submission dated 11/14/2025 amends claims 1, 7, 13, 14, 17, and 20. Claims 1-20 are pending.
In view of the amendments to the claims, the previously set forth claim objections and 112 rejections have been withdrawn.
Response to Arguments
Applicant’s arguments with respect to the independent claim(s) have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1, 7, 13, 14 , and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Non-patent literature “Robust 3D Hand Pose Estimation From Single Depth Images Using Multi-View CNNs” by Ge et al. (hereinafter Ge)1 published in 2018, and alternatively under 35 U.S.C. 102(a)(1) as being anticipated by us patent application publication no. 2022/0351405 to Zhou et al. (hereinafter Zhou).
For claims 1 and 13, Ge as applied discloses a pose identification method comprising:
obtaining a depth image of a target (see, e.g., abstract, 4th full par. of sec. I, 1st full par. of sec. III, 1st full par. of sec. III(A) and FIG. 1, which teach obtaining a hand depth image from an input depth image);
obtaining feature information of the depth image and position information corresponding to the feature information (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st-5th full pars. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st–5th full pars. of sec. IV(A) and FIGS. 1, 3 and 7, which teach generating a set of projected images on multiple views from the hand depth image, mapping the projected images onto corresponding heatmaps to generate 2D probability distributions of hand joints, and inferring a 3D probability distribution of hand joints from the 2D probability distributions); and
obtaining a pose identification result of the target based on the feature information and the position information (see, e.g., abstract, 4th full par. of sec. I, 1st full par. of sec. IV, 1st and 2nd full pars. of sec. IV(B), 1st-5th full pars. of sec. IV(C), which teach obtaining optimized hand joint locations based on the projected image based heatmaps and the 3D probability distribution inferred therefrom);
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st full par. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st – 5th full pars. of sec. IV(A), 1st and 2nd full pars. of sec. V(D) and FIGS. 1, 3 and 7, which teach that multiple 2D heatmaps correspond to 2D projections of the hand depth image and that the 3D probability distribution is 3D coordinates corresponding to all positions of the hand features mapped in the heatmaps before the optimization).
For claims 7 and 14, Ge as applied discloses a pose identification apparatus comprising:
a processor (see, e.g., 1st full par. of sec. V(C)) configured to:
acquire a depth image of a target (see, e.g., abstract, 4th full par. of sec. I, 1st full par. of sec. III, 1st full par. of sec. III(A) and FIG. 1 of Ge, which teach obtaining a hand depth image from an input depth image);
acquire feature information of the depth image and position information corresponding to the feature information (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st-5th full pars. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st–5th full pars. of sec. IV(A) and FIGS. 1, 3 and 7, which teach generating a set of projected images on multiple views from the hand depth image, mapping the projected images onto corresponding heatmaps to generate 2D probability distributions of hand joints, and inferring a 3D probability distribution of hand joints from the 2D probability distributions); and
acquire a pose identification result of the target based on the feature information and the position information (see, e.g., abstract, 4th full par. of sec. I, 1st full par. of sec. IV, 1st and 2nd full pars. of sec. IV(B), 1st-5th full pars. of sec. IV(C), which teach obtaining optimized hand joint locations based on the projected image based heatmaps and the 3D probability distribution inferred therefrom),
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st full par. of sec. III(A), 1st and 2nd pars. of sec. III(B), 1st full par. of sec. IV, 1st – 5th full pars. of sec. IV(A), 1st and 2nd full pars. of sec. V(D) and FIGS. 1, 3 and 7, which teach that multiple 2D heatmaps correspond to 2D projections of the hand depth image and that the 3D probability distribution is 3D coordinates corresponding to all positions of the hand features mapped in the heatmaps before the optimization).
For claim 20, Ge as applied discloses a pose identification method comprising:
obtaining a depth image of a target object (see, e.g., abstract, 4th full par. of sec. I, 1st full par. of sec. III, 1st full par. of sec. III(A) and FIG. 1 of Ge, which teach obtaining a hand depth image from an input depth image);
obtaining a coordinate map of the depth image (see, e.g., abstract, 4th and 6th full pars. of sec. I, 1st full par. of sec. VI, which teach generating a point cloud having from the hand depth image);
obtaining feature information of the depth image (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st-5th full pars. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st–5th full pars. of sec. IV(A) and FIGS. 1, 3 and 7, which teach generating a set of projected images on multiple views from the hand depth image, mapping the projected images onto corresponding heatmaps to generate 2D probability distributions of hand joints, and inferring a 3D probability distribution of hand joints from the 2D probability distributions);
obtaining position information corresponding to the feature information (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st-5th full pars. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st–5th full pars. of sec. IV(A) and FIGS. 1, 3 and 7, which teach generating a set of projected images on multiple views from the hand depth image, mapping the projected images onto corresponding heatmaps to generate 2D probability distributions of hand joints, and inferring a 3D probability distribution of hand joints from the 2D probability distributions);
identifying a correspondence between a position in the depth image and a position in the feature information (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st-5th full pars. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st–5th full pars. of sec. IV(A) and FIGS. 1, 3 and 7, which teach generating a set of projected images on multiple views from the hand depth image and mapping the projected images onto corresponding heatmaps to represent the probability distribution); and
obtaining a pose identification of the target object, based on identifying the correspondence between the position in the depth image and the position in the feature information (see, e.g., abstract, 4th full par. of sec. I, 1st full par. of sec. IV, 1st and 2nd full pars. of sec. IV(B), 1st-5th full pars. of sec. IV(C), which teach obtaining optimized hand joint locations based on the projected image based heatmaps and the 3D probability distribution inferred therefrom),
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st full par. of sec. III(A), 1st and 2nd pars. of sec. III(B), 1st full par. of sec. IV, 1st – 5th full pars. of sec. IV(A), 1st and 2nd full pars. of sec. V(D) and FIGS. 1, 3 and 7, which teach that multiple 2D heatmaps correspond to 2D projections of the hand depth image and that the 3D probability distribution is 3D coordinates corresponding to all positions of the hand features mapped in the heatmaps before the optimization).
For claims 1 and 13, Zhou as applied discloses a pose identification method comprising:
obtaining a depth image of a target (see, e.g., pars. 71-77 and 93-95 and FIGS. 9-11, which teach obtaining a depth image of a hand);
obtaining feature information of the depth image and position information corresponding to the feature information (see, e.g., pars. 74-92, 96-102 and 123-130 and FIGS. 3, 10 and 13, which teach obtaining plane and depth features of the keypoints in the hand and determining position coordinates of the keypoints therefrom); and
obtaining a pose identification result of the target based on the feature information and the position information (see, e.g., pars. 89-92 and 150-156 and FIGS. 5, 10 and 15, which teach determining a pose of region of interest corresponding to the keypoints based on the position coordinates);
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., pars. 74-92, 96-102, 123-130 and FIGS. 3, 10, 13, which teach that plane feature is a 2D features map of the depth image and that the position coordinates is 3D coordinates corresponding to each keypoints).
For claims 7 and 14, Zhou as applied discloses a pose identification apparatus comprising:
a processor (see, e.g., pars. 211-217 and FIG. 18) configured to:
acquire a depth image of a target (see, e.g., pars. 71-77 and 93-95 and FIGS. 9-11, which teach obtaining a depth image of a hand);
acquire feature information of the depth image and position information corresponding to the feature information (see, e.g., pars. 74-92, 96-102 and 123-130 and FIGS. 3, 10 and 13, which teach obtaining plane and depth features of the keypoints in the hand and determining position coordinates of the keypoints therefrom); and
acquire a pose identification result of the target based on the feature information and the position information (see, e.g., pars. 89-92 and 150-156 and FIGS. 5, 10 and 15, which teach determining a pose of region of interest corresponding to the keypoints based on the position coordinates);
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., pars. 74-92, 96-102, 123-130 and FIGS. 3, 10, 13, which teach that plane feature is a 2D features map of the depth image and that the position coordinates is 3D coordinates corresponding to each keypoints).
For claim 20, Zhou as applied discloses a pose identification method comprising:
obtaining a depth image of a target object (see, e.g., pars. 71-77 and 93-95 and FIGS. 9-11, which teach obtaining a depth image of a hand);
obtaining a coordinate map of the depth image (see, e.g., abstract, 4th and 6th full pars. of sec. I, 1st full par. of sec. VI, which teach generating a point cloud having from the hand depth image);
obtaining feature information of the depth image (see, e.g., pars. 74-92, 96-102, and 123-130 and FIGS. 3, 10 and 13, which teach obtaining plane and depth features of the keypoints in the hand);
obtaining position information corresponding to the feature information (see, e.g., pars. 74-92, 96-102, and 123-130 and FIGS. 3, 10 and 13, which teach determining position coordinates of the keypoints from the plane and depth features);
identifying a correspondence between a position in the depth image and a position in the feature information (see, e.g., pars. 93-118 and FIG. 11, which teach identifying a correspondence between the position in the depth image and the position/region of interest); and
obtaining a pose identification of the target object, based on identifying the correspondence between the position in the depth image and the position in the feature information (see, e.g., pars. 89-92 and 150-156 and FIGS. 5, 10 and 15, which teach determining a pose of region of interest corresponding to the keypoints based on the position coordinates),
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., pars. 74-92, 96-102, 123-130 and FIGS. 3, 10, 13, which teach that plane feature is a 2D features map of the depth image and that the position coordinates is 3D coordinates corresponding to each keypoints).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 5, 11 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Ge/Zhou in view of US Patent Application Publication No. 20130071009 to Ha (hereinafter Ha).
For claims 5, 11 and 19, while Ge does not explicitly teach, Ha in the analogous art teaches that the processor, when obtaining of the depth image of the target, is configured to:
obtain a first image and a second image of the target (see, e.g., pars. 62-67 and FIGS. 8 and 9A-B of Ha, which teach obtaining right and left view images);
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image (see, e.g., pars. 29, 62-67, 72-74 and FIGS. 8, 9A-B, 10 of Ha, which teach obtaining the minimum and maximum disparity values by defining/obtaining pivot points therebetween based on right and left view disparity estimates; the examiner interprets defining/obtaining pivot points that symmetrically matches left and right view images as the claimed coarse matching);
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map (see, e.g., pars. 29, 62-67 and FIGS. 8 and 9A-B of Ha, which teach identifying/recognizing corresponding regions/group of pixels to be compressed/expanded);
obtain a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range (see, e.g., pars. 29, 62-67 and FIGS. 8 and 9A-B of Ha, which teach compressing/expanding the minimum/maximum disparity values by the same amount/scale); and
obtain the depth image of the target based on the disparity map corresponding to the matching search range (see, e.g., pars. 29, 62-67 and FIGS. 8 and 9A-B of Ha, which teach generating right and left view output pictures based on the left and right adjusted disparity maps).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Ge to obtain the depth image as taught by Ha because doing so would shift corresponding pixels in the same direction by same amount, maintaining the inter-relationship of disparity among the pixel pairs on the same side and providing a depth image obtaining method that is much more robust to error (see par. 27 of Ha).
For claims 5, 11 and 19, while Zhou does not explicitly teach, Ha in the analogous art teaches that the processor, when obtaining of the depth image of the target, is configured to:
obtain a first image and a second image of the target (see, e.g., pars. 62-67 and FIGS. 8 and 9A-B of Ha, which teach obtaining right and left view images);
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image (see, e.g., pars. 29, 62-67, 72-74 and FIGS. 8, 9A-B, 10 of Ha, which teach obtaining the minimum and maximum disparity values by defining/obtaining pivot points therebetween based on right and left view disparity estimates; the examiner interprets defining/obtaining pivot points that symmetrically matches left and right view images as the claimed coarse matching);
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map (see, e.g., pars. 29, 62-67 and FIGS. 8 and 9A-B of Ha, which teach identifying/recognizing corresponding regions/group of pixels to be compressed/expanded);
obtain a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range (see, e.g., pars. 29, 62-67 and FIGS. 8 and 9A-B of Ha, which teach compressing/expanding the minimum/maximum disparity values by the same amount/scale); and
obtain the depth image of the target based on the disparity map corresponding to the matching search range (see, e.g., pars. 29, 62-67 and FIGS. 8 and 9A-B of Ha, which teach generating right and left view output pictures based on the left and right adjusted disparity maps).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teaching of Zhou to obtain the depth image as taught by Ha because doing so would shift corresponding pixels in the same direction by same amount, maintaining the inter-relationship of disparity among the pixel pairs on the same side and providing a depth image obtaining method that is much more robust to error (see par. 27 of Ha).
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 12051221 in view of Ge.
us patent application no. 18/736,088
us patent no. 12,051,221
1. A pose identification method comprising:
obtaining a depth image of a target;
obtaining feature information of the depth image and position information corresponding to the feature information; and
obtaining a pose identification result of the target based on the feature information and the position information,
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
1. A pose identification method comprising:
obtaining a depth image of a target;
obtaining feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
obtaining position information corresponding to the feature information by performing coordinate down-sampling simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling; and
obtaining a pose identification result of the target based on the feature information and the position information.
While Claim 1 of the patent does not explicitly teach, Ge in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st full par. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st – 5th full pars. of sec. IV(A), 1st and 2nd full pars. of sec. V(D) and FIGS. 1, 3 and 7, which teach that multiple 2D heatmaps correspond to 2D projections of the hand depth image and that the 3D probability distribution is 3D coordinates corresponding to all positions of the hand features mapped in the heatmaps before the optimization).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Ge because doing so would allow a fully recovery of 3D information of hand joints (see, e.g., the 4th full par. of sec. I of Ge).
2. The pose identification method of claim 1, wherein the obtaining of the feature information of the depth image and the position information corresponding to the feature information comprises:
obtaining an initial three-dimensional coordinate map corresponding to the depth image;
obtaining the feature information by performing, on the depth image, feature extraction and feature down-sampling based on an accumulated weight; and
obtaining the position information by performing coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
2. The pose identification method of claim 1, wherein the obtaining of the feature information of the depth image and the position information corresponding to the feature information comprises:
obtaining an initial three-dimensional coordinate map corresponding to the depth image;
obtaining the feature information by preforming the feature down-sampling based on an accumulated weight; and
obtaining the position information by performing the coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
3. The pose identification method of claim 2, further comprising:
obtaining, based on performing the feature down-sampling, a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling, based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
3. The pose identification method of claim 2, further comprising:
obtaining the cumulative weight based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
4. The pose identification method of claim 3, wherein the performing of the feature extraction comprises:
obtaining a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtaining a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtaining an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
4. The pose identification method of claim 3, wherein the performing of the feature extraction comprises:
obtaining a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtaining a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtaining an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
5. The pose identification method of claim 1, wherein the obtaining of the depth image of the target comprises:
obtaining a first image and a second image of the target;
obtaining a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtaining a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtaining a disparity map corresponding to the matching search range by matching search range; and
obtaining the depth image of the target based on the disparity map corresponding to the matching search range.
5. The pose identification method of claim 1, wherein the obtaining of the depth image of the target comprises:
obtaining a first image and a second image of the target;
obtaining a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtaining a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtaining a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range; and
obtaining the depth image of the target based on the disparity map corresponding to the matching search range.
6. The pose identification method of claim 1, wherein the obtaining of the pose identification result of the target based on the feature information and the position information comprises:
obtaining normal vector feature information of each point in the depth image;
obtaining a corresponding fusion feature by feature-stitching the normal vector feature information, the feature information, and the position information; and
obtaining the pose identification result of the target based on the fusion feature.
6. The pose identification method of claim 1, wherein the obtaining of the pose identification result of the target based on the feature information and the position information comprises:
obtaining normal vector feature information of each point in the depth image;
obtaining a corresponding fusion feature by feature-stitching the normal vector feature information, the feature information, and the position information; and
obtaining the pose identification result of the target based on the fusion feature.
7. A pose identification apparatus comprising:
a processor configured to:
acquire a depth image of a target;
acquire feature information of the depth image and position information corresponding to the feature information; and
acquire a pose identification result of the target based on the feature information and the position information, wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
7. A pose identification apparatus comprising:
a processor configured to:
acquire a depth image of a target;
acquire feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
acquire position information corresponding to the feature information by performing coordinate down-sampling simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling; and
acquire a pose identification result of the target based on the feature information and the position information.
While Claim 7 of the patent does not explicitly teach, Ge in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st full par. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st – 5th full pars. of sec. IV(A), 1st and 2nd full pars. of sec. V(D) and FIGS. 1, 3 and 7, which teach that multiple 2D heatmaps correspond to 2D projections of the hand depth image and that the 3D probability distribution is 3D coordinates corresponding to all positions of the hand features mapped in the heatmaps before the optimization).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Ge because doing so would allow a fully recovery of 3D information of hand joints (see, e.g., the 4th full par. of sec. I of Ge).
8. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the feature information of the depth image and the position information corresponding to the feature information, is configured to:
obtain an initial three-dimensional coordinate map corresponding to the depth image;
obtain the feature information by performing, on the depth image, feature extraction and feature down-sampling based on an accumulated weight; and
obtain the position information by performing coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
8. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the feature information of the depth image and the position information corresponding to the feature information, is configured to:
obtain an initial three-dimensional coordinate map corresponding to the depth image;
obtain the feature information by performing the feature down-sampling based on an accumulated weight; and
obtain the position information by performing the coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
9. The pose identification apparatus of claim 8, wherein the processor is further configured to:
obtain, based on performing the feature down-sampling, a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling, based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
9. The pose identification apparatus of claim 8, wherein the processor is further configured to:
obtaining the cumulative weight based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
10. The pose identification apparatus of claim 9, wherein the processor, when performing of the feature extraction, is configured to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
10. The pose identification apparatus of claim 9, wherein the processor, when performing of the feature extraction, is configured to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
11. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the depth image of the target, is configured to:
obtain a first image and a second image of the target;
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtain a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range; and
obtain the depth image of the target based on the disparity map corresponding to the matching search range.
11. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the depth image of the target, is configured to:
obtain a first image and a second image of the target;
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtain a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range; and
obtain the depth image of the target based on the disparity map corresponding to the matching search range.
12. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the pose identification result of the target based on the feature information and the position information, is configured to:
obtain normal vector feature information of each point in the depth image;
obtain a corresponding fusion feature by feature-stitching the normal vector feature information, the feature information, and the position information; and
obtain the pose identification result of the target based on the fusion feature.
12. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the pose identification result of the target based on the feature information and the position information, is configured to:
obtain normal vector feature information of each point in the depth image;
obtain a corresponding fusion feature by feature-stitching the normal vector feature information, the feature information, and the position information; and
obtain the pose identification result of the target based on the fusion feature information.
13. An electronic device comprising:
a memory; and
a processor,
wherein the memory stores a computer program,
wherein the processor is configured to execute the computer program to:
obtain a depth image of a target;
obtain feature information of the depth image and position information corresponding to the feature information; and
obtain a pose identification result of the target based on the feature information and the position information,
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
13. An electronic device comprising:
a memory; and
a processor,
wherein the memory stores a computer program,
wherein the processor is configured to execute the computer program to:
obtain a depth image of a target;
obtain feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
obtain position information corresponding to the feature information by performing coordinate down-sampling simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling; and
obtain a pose identification result of the target based on the feature information and the position information.
While Claim 13 of the patent does not explicitly teach, Ge in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st full par. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st – 5th full pars. of sec. IV(A), 1st and 2nd full pars. of sec. V(D) and FIGS. 1, 3 and 7, which teach that multiple 2D heatmaps correspond to 2D projections of the hand depth image and that the 3D probability distribution is 3D coordinates corresponding to all positions of the hand features mapped in the heatmaps before the optimization).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Ge because doing so would allow a fully recovery of 3D information of hand joints (see, e.g., the 4th full par. of sec. I of Ge).
14. A non-transitory computer-readable medium storing a computer program that, when executed by a processor, causes the processor to:
obtain a depth image of a target;
obtain feature information of the depth image and position information corresponding to the feature information; and
obtain a pose identification result of the target based on the feature information and the position information,
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
14. A non-transitory computer-readable medium storing a computer program that, when executed by a processor, causes the processor to:
obtain a depth image of a target;
obtaining feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
obtaining position information corresponding to the feature information by performing coordinate down-sampling simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling; and
obtain a pose identification result of the target based on the feature information and the position information.
While Claim 14 of the patent does not explicitly teach, Ge in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st full par. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st – 5th full pars. of sec. IV(A), 1st and 2nd full pars. of sec. V(D) and FIGS. 1, 3 and 7, which teach that multiple 2D heatmaps correspond to 2D projections of the hand depth image and that the 3D probability distribution is 3D coordinates corresponding to all positions of the hand features mapped in the heatmaps before the optimization).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Ge because doing so would allow a fully recovery of 3D information of hand joints (see, e.g., the 4th full par. of sec. I of Ge).
15. The non-transitory computer-readable medium of claim 14, wherein the computer program, when causing the processor to obtain the feature information of the depth image and the position information corresponding to the feature information, causes the processors to:
obtain an initial three-dimensional coordinate map corresponding to the depth image;
obtain the feature information by performing, on the depth image, feature extraction and feature down-sampling based on an accumulated weight; and
obtain the position information by performing coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
15. The non-transitory computer-readable medium of claim 14, wherein the computer program, when causing the processor to obtain the feature information of the depth image and the position information corresponding to the feature information, causes the processors to:
obtain an initial three-dimensional coordinate map corresponding to the depth image;
obtain the feature information by performing the feature down-sampling based on an accumulated weight; and
obtain the position information by performing the coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
16. The non-transitory computer-readable medium of claim 15, wherein the computer program is further configured to cause the processor to:
obtain, based on performing the feature down-sampling, a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling, based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
16. The non-transitory computer-readable medium of claim 15, wherein the computer program is further configured to cause the processor to:
obtaining the cumulative weight based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
17. The non-transitory computer-readable medium of claim 16, wherein the program, when causing the processors to perform the feature extraction, causes the processor to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
17. The non-transitory computer-readable medium of claim 16, wherein the program, when causing the processors to perform the feature extraction, causes the processor to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
18. The non-transitory computer-readable medium of claim 16, wherein the computer program, when causing the processor to perform the feature extraction, causes the processor to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
18. The non-transitory computer-readable medium of claim 15, wherein the computer program, when causing the processor to perform the feature extraction, causes the processor to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
19. The non-transitory computer-readable medium of claim 14, wherein the program, when causing the processor to obtain the depth image of the target, causes the processor to:
obtain a first image and a second image of the target;
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtain a disparity map corresponding to the matching search range by matching search range; and
obtain the depth image of the target based on the disparity map corresponding to the matching search range.
19. The non-transitory computer-readable medium of claim 14, wherein the program, when causing the processor to obtain the depth image of the target, causes the processor to:
obtain a first image and a second image of the target;
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtain a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range; and
obtain the depth image of the target based on the disparity map corresponding to the matching search range.
20. A pose identification method comprising:
obtaining a depth image of a target object;
obtaining a coordinate map of the depth image;
obtaining feature information of the depth image;
obtaining position information corresponding to the feature information;
identifying a correspondence between a position in the depth image and a position in the feature information; and
obtaining a pose identification of the target object, based on identifying the correspondence between the position in the depth image and the position in the feature information,
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
20. A pose identification method comprising:
obtaining a depth image of a target object;
obtaining a coordinate map of the depth image;
obtaining feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
obtaining position information corresponding to the feature information by performing coordinate down-sampling on the coordinate map simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling;
identifying a correspondence between a position in the depth image and a position in the feature information; and
obtaining a pose identification of the target object, based on identifying the correspondence between the position in the depth image and the position in the feature information.
While Claim 20 of the patent does not explicitly teach, Ge in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., 4th and 5th full pars. of sec. I, 1st and 2nd full pars. of sec. III, 1st full par. of sec. III(A), 1st and 2nd full pars. of sec. III(B), 1st full par. of sec. IV, 1st – 5th full pars. of sec. IV(A), 1st and 2nd full pars. of sec. V(D) and FIGS. 1, 3 and 7, which teach that multiple 2D heatmaps correspond to 2D projections of the hand depth image and that the 3D probability distribution is 3D coordinates corresponding to all positions of the hand features mapped in the heatmaps before the optimization).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Ge because doing so would allow a fully recovery of 3D information of hand joints (see, e.g., the 4th full par. of sec. I of Ge).
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 12051221 in view of Zhou.
us patent application no. 18/736,088
us patent no. 12,051,221
1. A pose identification method comprising:
obtaining a depth image of a target;
obtaining feature information of the depth image and position information corresponding to the feature information; and
obtaining a pose identification result of the target based on the feature information and the position information,
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
1. A pose identification method comprising:
obtaining a depth image of a target;
obtaining feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
obtaining position information corresponding to the feature information by performing coordinate down-sampling simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling; and
obtaining a pose identification result of the target based on the feature information and the position information.
While Claim 1 of the patent does not explicitly teach, Ge in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., pars. 74-92, 96-102, 123-130 and FIGS. 3, 10, 13, which teach that plane feature is a 2D features map of the depth image and that the position coordinates is 3D coordinates corresponding to each keypoints).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Zhou because doing so would improve the accuracy of determining the position information and simplified the determination the pose of the ROI (see, e.g., pars. 92 and 235 of Zhou).
2. The pose identification method of claim 1, wherein the obtaining of the feature information of the depth image and the position information corresponding to the feature information comprises:
obtaining an initial three-dimensional coordinate map corresponding to the depth image;
obtaining the feature information by performing, on the depth image, feature extraction and feature down-sampling based on an accumulated weight; and
obtaining the position information by performing coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
2. The pose identification method of claim 1, wherein the obtaining of the feature information of the depth image and the position information corresponding to the feature information comprises:
obtaining an initial three-dimensional coordinate map corresponding to the depth image;
obtaining the feature information by preforming the feature down-sampling based on an accumulated weight; and
obtaining the position information by performing the coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
3. The pose identification method of claim 2, further comprising:
obtaining, based on performing the feature down-sampling, a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling, based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
3. The pose identification method of claim 2, further comprising:
obtaining the cumulative weight based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
4. The pose identification method of claim 3, wherein the performing of the feature extraction comprises:
obtaining a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtaining a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtaining an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
4. The pose identification method of claim 3, wherein the performing of the feature extraction comprises:
obtaining a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtaining a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtaining an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
5. The pose identification method of claim 1, wherein the obtaining of the depth image of the target comprises:
obtaining a first image and a second image of the target;
obtaining a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtaining a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtaining a disparity map corresponding to the matching search range by matching search range; and
obtaining the depth image of the target based on the disparity map corresponding to the matching search range.
5. The pose identification method of claim 1, wherein the obtaining of the depth image of the target comprises:
obtaining a first image and a second image of the target;
obtaining a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtaining a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtaining a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range; and
obtaining the depth image of the target based on the disparity map corresponding to the matching search range.
6. The pose identification method of claim 1, wherein the obtaining of the pose identification result of the target based on the feature information and the position information comprises:
obtaining normal vector feature information of each point in the depth image;
obtaining a corresponding fusion feature by feature-stitching the normal vector feature information, the feature information, and the position information; and
obtaining the pose identification result of the target based on the fusion feature.
6. The pose identification method of claim 1, wherein the obtaining of the pose identification result of the target based on the feature information and the position information comprises:
obtaining normal vector feature information of each point in the depth image;
obtaining a corresponding fusion feature by feature-stitching the normal vector feature information, the feature information, and the position information; and
obtaining the pose identification result of the target based on the fusion feature.
7. A pose identification apparatus comprising:
a processor configured to:
acquire a depth image of a target;
acquire feature information of the depth image and position information corresponding to the feature information; and
acquire a pose identification result of the target based on the feature information and the position information, wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
7. A pose identification apparatus comprising:
a processor configured to:
acquire a depth image of a target;
acquire feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
acquire position information corresponding to the feature information by performing coordinate down-sampling simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling; and
acquire a pose identification result of the target based on the feature information and the position information.
While Claim 7 of the patent does not explicitly teach, Zhou in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., pars. 74-92, 96-102, 123-130 and FIGS. 3, 10, 13, which teach that plane feature is a 2D features map of the depth image and that the position coordinates is 3D coordinates corresponding to each keypoints).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Zhou because doing so would improve the accuracy of determining the position information and simplified the determination the pose of the ROI (see, e.g., pars. 92 and 235 of Zhou).
8. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the feature information of the depth image and the position information corresponding to the feature information, is configured to:
obtain an initial three-dimensional coordinate map corresponding to the depth image;
obtain the feature information by performing, on the depth image, feature extraction and feature down-sampling based on an accumulated weight; and
obtain the position information by performing coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
8. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the feature information of the depth image and the position information corresponding to the feature information, is configured to:
obtain an initial three-dimensional coordinate map corresponding to the depth image;
obtain the feature information by performing the feature down-sampling based on an accumulated weight; and
obtain the position information by performing the coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
9. The pose identification apparatus of claim 8, wherein the processor is further configured to:
obtain, based on performing the feature down-sampling, a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling, based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
9. The pose identification apparatus of claim 8, wherein the processor is further configured to:
obtaining the cumulative weight based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
10. The pose identification apparatus of claim 9, wherein the processor, when performing of the feature extraction, is configured to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
10. The pose identification apparatus of claim 9, wherein the processor, when performing of the feature extraction, is configured to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
11. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the depth image of the target, is configured to:
obtain a first image and a second image of the target;
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtain a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range; and
obtain the depth image of the target based on the disparity map corresponding to the matching search range.
11. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the depth image of the target, is configured to:
obtain a first image and a second image of the target;
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtain a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range; and
obtain the depth image of the target based on the disparity map corresponding to the matching search range.
12. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the pose identification result of the target based on the feature information and the position information, is configured to:
obtain normal vector feature information of each point in the depth image;
obtain a corresponding fusion feature by feature-stitching the normal vector feature information, the feature information, and the position information; and
obtain the pose identification result of the target based on the fusion feature.
12. The pose identification apparatus of claim 7, wherein the processor, when obtaining of the pose identification result of the target based on the feature information and the position information, is configured to:
obtain normal vector feature information of each point in the depth image;
obtain a corresponding fusion feature by feature-stitching the normal vector feature information, the feature information, and the position information; and
obtain the pose identification result of the target based on the fusion feature information.
13. An electronic device comprising:
a memory; and
a processor,
wherein the memory stores a computer program,
wherein the processor is configured to execute the computer program to:
obtain a depth image of a target;
obtain feature information of the depth image and position information corresponding to the feature information; and
obtain a pose identification result of the target based on the feature information and the position information,
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
13. An electronic device comprising:
a memory; and
a processor,
wherein the memory stores a computer program,
wherein the processor is configured to execute the computer program to:
obtain a depth image of a target;
obtain feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
obtain position information corresponding to the feature information by performing coordinate down-sampling simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling; and
obtain a pose identification result of the target based on the feature information and the position information.
While Claim 13 of the patent does not explicitly teach, Zhou in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., pars. 74-92, 96-102, 123-130 and FIGS. 3, 10, 13, which teach that plane feature is a 2D features map of the depth image and that the position coordinates is 3D coordinates corresponding to each keypoints).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Zhou because doing so would improve the accuracy of determining the position information and simplified the determination the pose of the ROI (see, e.g., pars. 92 and 235 of Zhou).
14. A non-transitory computer-readable medium storing a computer program that, when executed by a processor, causes the processor to:
obtain a depth image of a target;
obtain feature information of the depth image and position information corresponding to the feature information; and
obtain a pose identification result of the target based on the feature information and the position information,
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
14. A non-transitory computer-readable medium storing a computer program that, when executed by a processor, causes the processor to:
obtain a depth image of a target;
obtaining feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
obtaining position information corresponding to the feature information by performing coordinate down-sampling simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling; and
obtain a pose identification result of the target based on the feature information and the position information.
While Claim 14 of the patent does not explicitly teach, Zhou in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., pars. 74-92, 96-102, 123-130 and FIGS. 3, 10, 13, which teach that plane feature is a 2D features map of the depth image and that the position coordinates is 3D coordinates corresponding to each keypoints).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Zhou because doing so would improve the accuracy of determining the position information and simplified the determination the pose of the ROI (see, e.g., pars. 92 and 235 of Zhou).
15. The non-transitory computer-readable medium of claim 14, wherein the computer program, when causing the processor to obtain the feature information of the depth image and the position information corresponding to the feature information, causes the processors to:
obtain an initial three-dimensional coordinate map corresponding to the depth image;
obtain the feature information by performing, on the depth image, feature extraction and feature down-sampling based on an accumulated weight; and
obtain the position information by performing coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
15. The non-transitory computer-readable medium of claim 14, wherein the computer program, when causing the processor to obtain the feature information of the depth image and the position information corresponding to the feature information, causes the processors to:
obtain an initial three-dimensional coordinate map corresponding to the depth image;
obtain the feature information by performing the feature down-sampling based on an accumulated weight; and
obtain the position information by performing the coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.
16. The non-transitory computer-readable medium of claim 15, wherein the computer program is further configured to cause the processor to:
obtain, based on performing the feature down-sampling, a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling, based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
16. The non-transitory computer-readable medium of claim 15, wherein the computer program is further configured to cause the processor to:
obtaining the cumulative weight based on the input feature map corresponding to the feature down-sampling and down-sampling information corresponding to the feature down-sampling.
17. The non-transitory computer-readable medium of claim 16, wherein the program, when causing the processors to perform the feature extraction, causes the processor to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
17. The non-transitory computer-readable medium of claim 16, wherein the program, when causing the processors to perform the feature extraction, causes the processor to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
18. The non-transitory computer-readable medium of claim 16, wherein the computer program, when causing the processor to perform the feature extraction, causes the processor to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
18. The non-transitory computer-readable medium of claim 15, wherein the computer program, when causing the processor to perform the feature extraction, causes the processor to:
obtain a three-dimensional distance corresponding to a feature of each position in the input feature map, based on a three-dimensional coordinate map corresponding to an input feature map corresponding to the feature extraction;
obtain a distance weight corresponding to the feature of each position in the input feature map based on the three-dimensional distance; and
obtain an output feature map corresponding to the input feature map by performing feature extraction on the input feature map based on the distance weight.
19. The non-transitory computer-readable medium of claim 14, wherein the program, when causing the processor to obtain the depth image of the target, causes the processor to:
obtain a first image and a second image of the target;
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtain a disparity map corresponding to the matching search range by matching search range; and
obtain the depth image of the target based on the disparity map corresponding to the matching search range.
19. The non-transitory computer-readable medium of claim 14, wherein the program, when causing the processor to obtain the depth image of the target, causes the processor to:
obtain a first image and a second image of the target;
obtain a minimum disparity map and a maximum disparity map corresponding to the first image and the second image by performing coarse matching on the first image and the second image;
obtain a matching search range corresponding to the minimum disparity map and the maximum disparity map based on the minimum disparity map and the maximum disparity map;
obtain a disparity map corresponding to the matching search range by performing fine matching on the first image and the second image based on the matching search range; and
obtain the depth image of the target based on the disparity map corresponding to the matching search range.
20. A pose identification method comprising:
obtaining a depth image of a target object;
obtaining a coordinate map of the depth image;
obtaining feature information of the depth image;
obtaining position information corresponding to the feature information;
identifying a correspondence between a position in the depth image and a position in the feature information; and
obtaining a pose identification of the target object, based on identifying the correspondence between the position in the depth image and the position in the feature information,
wherein the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map.
20. A pose identification method comprising:
obtaining a depth image of a target object;
obtaining a coordinate map of the depth image;
obtaining feature information of the depth image by performing, on the depth image, feature extraction and feature down-sampling;
obtaining position information corresponding to the feature information by performing coordinate down-sampling on the coordinate map simultaneously with the feature down-sampling based on a cumulative weight corresponding to a feature of each position in an input feature map corresponding to the feature down-sampling;
identifying a correspondence between a position in the depth image and a position in the feature information; and
obtaining a pose identification of the target object, based on identifying the correspondence between the position in the depth image and the position in the feature information.
While Claim 20 of the patent does not explicitly teach, Zhou in the analogous art teaches that the feature information is two-dimensional feature map of the depth image and the position information is three-dimensional coordinates corresponding to each position of the two-dimensional feature map (see, e.g., pars. 74-92, 96-102, 123-130 and FIGS. 3, 10, 13, which teach that plane feature is a 2D features map of the depth image and that the position coordinates is 3D coordinates corresponding to each keypoints).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention that the feature information may be 2D feature map of the depth image and the position information is 3D coordinates corresponding to each position of the 2D feature map as taught by Zhou because doing so would improve the accuracy of determining the position information and simplified the determination the pose of the ROI (see, e.g., pars. 92 and 235 of Zhou).
Allowable Subject Matter
Claims 2-4, 6, 8-10, 12, and 15-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and overcome the double patenting and/or 112(b) rejections.
In regard to claims 2, 8 and 15, when considered as a whole, none of the cited prior art, either individually or in combination, teaches or suggests:
“wherein the obtaining of the feature information of the depth image and the position information corresponding to the feature information comprises:
obtaining an initial three-dimensional coordinate map corresponding to the depth image;
obtaining the feature information by performing, on the depth image, feature extraction and feature down-sampling based on an accumulated weight; and
obtaining the position information by performing coordinate down-sampling on the initial three-dimensional coordinate map, based on the accumulated weight corresponding to the feature down-sampling.”
In regard to claims 3-4, 9, 10, 16-18, they would be allowable for their dependencies to claims 2, 8 and 15.
In regard to claims 6 and 12, when considered as a whole, none of the cited prior art, either individually or in combination, teaches or suggests:
obtaining normal vector feature information of each point in the depth image;
obtaining a corresponding fusion feature by feature-stitching the normal vector feature information, the feature information, and the position information; and
obtaining the pose identification result of the target based on the fusion feature.
Additional Citations
The following table lists several references that are relevant to the subject matter claimed and disclosed in this Application. The references are not relied on by the Examiner, but are provided to assist the Applicant in responding to this Office action.
Citation
Relevance
Chen et al. (us pat. no. 11915501)
Describes the field of artificial intelligence (AI), and in particular, an object detection technology. In one embodiment, an object detection method and apparatus include obtaining a point cloud of a scene that includes location information of points. The point cloud is mapped to a 3D voxel representation. A convolution operation is performed on the feature information of the 3D voxel to obtain a convolution feature set and initial positioning information of a candidate object region is determined based on the convolution feature set. A target point is located in the candidate object region in the point cloud is determined and the initial positioning information of the candidate object region is adjusted based on the location information and target convolution feature information of the target point. Positioning information of a target object region is obtained to improve object detection accuracy.
Lin et al. (us pat. app. Pub. No. 20210180942
Describes the field of computing, and more particularly estimating 3D hand poses. A computer-implemented method, computer readable storage medium, and computer system is provided for estimating three-dimensional (3D) hand poses in images by receiving data corresponding to a hand image, generating a depth map corresponding to the received hand image data, and estimating a hand pose from the received hand image data and the generated depth map.
Zhou et al. (us pat. app. Pub. No. 2022/0277595
Provided are a hand gesture detection method and device, and a computer storage medium. The method includes: obtaining an initial depth image including a hand to be detected, and performing detection processing on the initial depth image by using a backbone feature extractor and a bounding box detection model to obtain initial bounding boxes and a first feature map corresponding to the hand to be detected; determining one of the initial bounding boxes as a target bounding box; cropping, based on the target bounding box, the first feature map by using RoIAlign feature extractor, to obtain a second feature map corresponding to the hand to be detected; and performing, based on the second feature map, a three-dimensional gesture estimation processing on the hand to be detected by using a gesture estimation model to obtain a gesture detection result of the hand to be detected.
Table 1
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See Table 1 and form 892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WOO RHIM whose telephone number is (571)272-6560. The examiner can normally be reached Mon - Fri 9:30 am - 6:00 pm et.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henok Shiferaw can be reached at 571-272-4637. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/WOO C RHIM/Examiner, Art Unit 2676
1 Ge was first cited by the applicant in the information disclosure statement dated 06/27/2025 and hence, not listed in the appending form 892.