Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Acknowledgement of Priority
The specification asserts the following:
“The present application claims the benefit under 35 U.S.C. § 119 of German Patent Application No. DE 10 2022 202 030.8 filed on February 28, 2022, which is expressly incorporated herein by reference in its entirety.”
The effective filing date of February 28, 2022 is acknowledged.
Information Disclosure Statement
The information disclosure statement(s) submitted on 02/10/2023 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement(s) is/are being considered by the examiner.
Specification
The disclosure is objected to because of the following informalities: the specification provides a circular definition for data showing a “complete image” (Specification 7:8, “The term “image data showing complete images” or “higher-dimensional image data” is understood to mean image data that characterize, or represent, not only a part, for example, a two-dimensional portion of an image or individual pixels of an image, but the complete image.”).
This definition defines a complete image in a circular manner and does not provide enough information as to what makes an image “complete”. For example, a 2D 2x2 4-pixel crop of a 3x3 9-pixel image could be interpreted as a complete image because it is “not only a part” of the same 2x2 image, yet a conflicting interpretation that it is not a complete image could be held because it is a portion of the original 3x3 image.
For the purpose of compact prosecution, until this definition is resolved, examiner will interpret a “complete image” as any image.
Appropriate correction is required.
Status of Claims
The present application is being examined under the claims filed on 02/10/2023.
Claim(s) 1-12 is/are rejected.
Claim(s) 1-12 is/are pending.
Prior Art References
The short names that are used to identify the references of prior art in the analysis that follows are:
Short Name
Reference
Gondal
Gondal, M.W., Joshi, S., Rahaman, N., Bauer, S., Wuthrich, M. and Schölkopf, B., 2021, July. Function contrastive learning of transferable meta-representations. In International Conference on Machine Learning (pp. 3755-3765). PMLR.
He
He, Y., Sun, W., Huang, H., Liu, J., Fan, H. and Sun, J., 2020. Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11632-11641).
Le
Le, T., Jang, S. and Lien, J.J.J., 2020. 3D Visual-guided Robot Arm Control for A Warehouse Automation System. International Journal of iRobotics, 3(4), pp.1-8.
Claim Rejections - 35 USC § 112(d)
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
Claims 4 and 10 are rejected under 35 U.S.C. 112(d) as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends. The parent claims 1 and 7, respectively, recite “wherein the training data include labeled image data” so the claim 4 and 10 recitation “wherein the image data and the comparison image data respectively are image data showing complete images” does not further limit the parent claim.
Refer to the specification objection regarding the term “complete image”.
Applicant may cancel the claim, amend the claim to place the claim in proper dependent form, rewrite the claim in independent form, or present a sufficient showing that the dependent claim complies with the statutory requirements.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, and 7-11 are rejected under 35 U.S.C. 103 as being unpatentable over Gondal in view of He.
Claims 6 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Gondal in view of He in further view of Le.
In reference to claim 1.
Gondal teaches:
“1. A method for training a conditional neural process for determining a position of an object from image data, the method comprising the following steps: providing training data for training the conditional neural process, [wherein the training data include labeled image data showing a particular object and labeled comparison image data regarding the particular object]; and training the conditional neural process based on the provided training data, wherein the training of the conditional neural process includes applying functional contrastive learning, and the training of the conditional neural process includes applying an end-to-end learning approach.” (Gondal Figure 1)
PNG
media_image1.png
301
922
media_image1.png
Greyscale
“Conditional neural process” is defined in the specification (Specification 2:7, “Conditional neural processes are in particular based on using a feed-forward neural network to calculate the training data information, to aggregate this information, and to transmit this information to another feed-forward network for inference.”).
The “conditional neural process” is taught by Figure 1 (a), (b) and (c). Figure 1(a) teaches the basic definition of a CNP whereas Figures 1(b) and 1(c) teach an extension of CNPs that utilize “functional contrastive learning” and “end-to-end” learning.
“feed-forward neural network” is taught in Figure 1 by “h[Symbol font/0x46]“
“training data information” is taught in Figure 1 by “a”
“to aggregate this information” is taught in Figure 1 by “r”
“to transmit this information to another feed-forward neural network” is taught in Figure 1 by “p[Symbol font/0x79]”
“Functional contrastive learning is defined in the specification (Specification 4:20, “The term “functional contrastive learning” is in particular understood to mean an algorithm designed to reduce the distance between two corresponding representations, in particular the distance or difference between two representations relating to the same task or the same object, and to find matching representations.”).
“Functional contrastive learning” is taught by FCRL as depicted in Figure 1 (b) and (c).
“End-to-end learning” is defined in the specification (Specification 4:26, “The term “end-to-end learning approach” is furthermore understood to mean an approach based on input and output data of a neural network, wherein the neural network is trained on output data desired with respect to an input or corresponding input data.”).
“End-to-end learning” is taught by the input-output pairs depicted in Figure 1 (a), (b), and (c) wherein each pair xi and yi constitutes neural network input and output data.
He teaches:
“wherein the training data include labeled image data showing a particular object and labeled comparison image data regarding the particular object” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
“Image data” and “comparison image data” is taught by the YCB dataset.
The “labeled image data” is taught by the 6D pose annotations.
Motivation to combine Gondal, He.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Gondal and He. Gondal discloses an extension of conditional neural processes that makes CNPs more robust by utilizing functional contrastive learning. He discloses a dataset for training systems for pose estimation. One would be motivated to combine these references because the disclosure of He provides a practical application for the learning method of Gondal. Further, MPEP 2143 sets forth the Supreme Court rationales for obviousness including:
(A) Combining prior art elements according to known methods to yield predictable results;
(D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results;
(E) "Obvious to try" – choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success;
(F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art
In reference to claim 2.
Gondal teaches:
“2. The method according to claim 1, wherein the step of training the conditional neural process based on the provided training data furthermore includes the following steps: generating first latent representations based on the labeled [image] data and information about the labeled [image] data; generating second latent representations based on the labeled comparison [image] data and the information about the labeled comparison [image] data;” (Gondal Figure 1)
The “conditional neural process” is taught by Figure 1 (a), (b) and (c).
“Latent representations are defined in the specification (Specification 5:28, “The term “latent representations” is understood to mean intermediate states of the input data or image data during the processing of the image data by the conditional neural process, wherein the latent representations usually have a smaller dimension than the original image data.”).
Figure 1 (b) teaches first and second latent representations. The latent space is computed from feeding the outputs of f1 and f2 into [Symbol font/0x72][Symbol font/0x66] where f1 generates the “first latent representations” and f2 generates the “second latent representations”.
“determining, using the functional contrastive learning, a first cost function based on the first latent representations and the second latent representations; and training the conditional neural process based on the first cost function.” (Gondal Equation 4)
PNG
media_image2.png
119
713
media_image2.png
Greyscale
The equation depicts the cost function computed for performing functional contrastive learning between the first and second latent representations.
He teaches:
“image data” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
In reference to claim 3
Gondal teaches:
“3. The method according to claim 1, wherein the step of training the conditional neural process based on the provided training data furthermore includes the following steps: determining, using the conditional neural process, a position [of the particular object in the image data] based on the labeled [image] data, the labeled comparison [image] data, and information about the labeled comparison image data; determining a comparison position [of the particular object] in the labeled [image] data based on the information about the labeled [image] data;” (Gondal Figure 1)
“determining a second cost function based on the [determined position of the particular object in the image] data and the comparison [position of the particular object]; and training [the conditional neural process] based on the second cost function.” (Gondal Figure 1(c))
The “second cost function” for training is taught by the decoders that get trained, “p[Symbol font/0x79]”. The decoder training teaches a second cost function (Gondal 5, “The decoder p[Symbol font/0x79] is an MLP with two hidden layers and it is trained with the same training functions as the encoder h[Symbol font/0x46])”
He teaches:
“determining, [using the conditional neural process,] a position of the particular object in the image data”
(He 11632, “In this paper, we study the problem of 6DoF pose estimation, i.e. recognize the 3D location and orientation of an object in a canonical frame.”)
“determining a position of the particular object in the image data” is taught by the 6-degrees of freedom pose estimation.
“image data” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
In reference to claim 4.
“4. The method according to claim 1, wherein the image data and the comparison image data respectively are image data showing complete images.”
This claim was rejected under 35 U.S.C. 112(d) for failing to further limit the parent claim. Thus, this claim is taught by the mappings of the parent claim. Refer to the 35 U.S.C. 112(d) rejections of this document.
In reference to claim 5.
Gondal teaches:
“5. A method [for determining a position of an object], the method comprising the following steps: providing [image] data, wherein the [image] data include target [image] data showing the object and labeled comparison [image] data regarding the object; providing a trained conditional neural process, [the conditional neural process being trained for determining a position of an object from image data by]: providing training data for training the conditional neural process, wherein the training data include labeled [image] data [showing a particular object] and labeled comparison [image] data [regarding the particular object]; and training the conditional neural process based on the provided training data, wherein the training of the conditional neural process includes applying functional contrastive learning, and the training of the conditional neural process includes applying an end-to-end learning approach;” (Gondal Figure 1)
He teaches:
“image data” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
“and determining, [using the trained conditional neural process for determining] a position of an object from image data, the position of the object based on the provided image data.” (He 11632, “In this paper, we study the problem of 6DoF pose estimation, i.e. recognize the 3D location and orientation of an object in a canonical frame.”)
In reference to claim 6.
Gondal teaches:
“6. A method [for controlling a controllable system], the method comprising the following steps: [determining a position of an object by:] providing [image] data, wherein the [image] data include target [image] data [showing the object] and labeled comparison [image] data [regarding the object]; providing a trained conditional neural process, the conditional neural process being trained [for determining a position of an object from image data] by: providing training data for training the conditional neural process, wherein the training data include labeled [image] data [showing a particular object] and labeled comparison [image] data [regarding the particular object]; and training the conditional neural process based on the provided training data, wherein the training of the conditional neural process includes applying functional contrastive learning, and the training of the conditional neural process includes applying an end-to-end learning approach;” (Gondal Figure 1)
He teaches:
“image data” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
“[…] determining a position of an object by […]”;
“[…] for determining a position of an object from image data […]”;
“and determining, [using the trained conditional neural process for determining a position of an object from image data], the position of the object based on the provided image data;”;
(He 11632, “In this paper, we study the problem of 6DoF pose estimation, i.e. recognize the 3D location and orientation of an object in a canonical frame.”)
Motivation to combine Gondal, He, Le.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Gondal, He, and Le. Gondal, He discloses an extension of conditional neural processes and an application thereof for pose estimation. Le discloses a robot arm and system for controlling it. One would be motivated to combine these references because the disclosure of Le provides a specific application for the combined methods and systems of Gondal and He. Further, MPEP 2143 sets forth the Supreme Court rationales for obviousness including:
(A) Combining prior art elements according to known methods to yield predictable results;
(D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results;
(E) "Obvious to try" – choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success;
(F) Known work in one field of endeavor may prompt variations of it for use in either the same field or a different one based on design incentives or other market forces if the variations are predictable to one of ordinary skill in the art
Le teaches:
“A method for controlling a controllable system […]”;
“and controlling the controllable system based on the determined position of the object.”
(Le 2, “The system consists of a 6-DOF Epson ProSix C4-A601S robot arm, a stereo camera with two FLIR BFS-PGE-50S5C cameras […] The principle of the system is positioning the robot arm and allowable tolerance at a storage grid in the frame structure to retrieve a magazine.”)
In reference to claim 7.
Gondal teaches:
“7. A control device for training a conditional neural process for determining a position of an object from [image] data, the control device comprising: a provisioning unit configured to provide training data for training the conditional neural process, wherein the training data include labeled [image] data [showing a particular object] and labeled comparison [image] data [regarding the particular object]; and a training unit configured to train the conditional neural process based on the provided training data, wherein the training of the conditional neural process includes applying functional contrastive learning, and the training of the conditional neural process includes applying an end-to-end learning approach.” (Gondal Figure 1)
He teaches:
“image data” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
“[…] showing a particular object […]”;
“[…] regarding the particular object […]”
(He 11632, “In this paper, we study the problem of 6DoF pose estimation, i.e. recognize the 3D location and orientation of an object in a canonical frame.”)
In reference to claim 8.
Gondal teaches:
“8. The control device according to claim 7, wherein the training unit includes: a first generation unit configured to generate first latent representations based on the labeled [image] data and information about the labeled [image] data; a second generation unit configured to generate second latent representations based on the labeled comparison [image] data and information about the labeled comparison [image] data;” (Gondal Figure 1)
“and a first determination unit configured to determine, using the functional contrastive learning, a first cost function based on the first latent representations and the second latent representations, and wherein the training unit is configured to train the conditional neural process based on the first cost function.” (Gondal Equation 4)
He teaches:
“image data” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
In reference to claim 9.
Gondal teaches:
“9. The control device according to claim 8, wherein the training unit includes: a second determination unit configured to determine, using the conditional neural process, [a position of the particular object in the image data] based on the labeled [image] data, the labeled comparison [image] data, and the information about the labeled comparison [image] data; a third determination unit configured to determine a comparison [position of the particular object] in the labeled [image] data based on the information about the labeled [image] data;” (Gondal Figure 1)
“and a fourth determination unit configured to determine a second cost function based on the determined position of the object in the image data and the comparison [position of the object]; wherein the training unit is configured to train [the conditional neural process] based on the second cost function.” (Gondal Figure 1(c))
He teaches:
“determine, [using the conditional neural process,] a position of the particular object in the image data based on the labeled image data” (He 11632, “In this paper, we study the problem of 6DoF pose estimation, i.e. recognize the 3D location and orientation of an object in a canonical frame.”)
“image data” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
In reference to claim 10.
“10. The control device according to claim 7, wherein the image data and the comparison image data respectively are image data showing complete images.”
This claim was rejected under 35 U.S.C. 112(d) for failing to further limit the parent claim. Thus, this claim is taught by the mappings of the parent claim. Refer to the 35 U.S.C. 112(d) rejections of this document.
In reference to claim 11.
Gondal teaches:
“11. A control device [for determining a position of an object], the control device comprising: a provisioning unit configured to provide [image] data, wherein the [image] data comprise target [image] data [showing the object] and labeled comparison [image] data [regarding the object]; a reception unit configured to receive a trained conditional neural process, the conditional neural process being trained by a control device for training a conditional neural network [for determining a position of an object] from [image] data [for determining a position of an object from image data], the control device for training including: a provisioning unit configured to provide training data for training the conditional neural process, wherein the training data include labeled [image] data [showing a particular object] and labeled comparison [image] data [regarding the particular object]; and a training unit configured to train the conditional neural process based on the provided training data, wherein the training of the conditional neural process includes applying functional contrastive learning, and the training of the conditional neural process includes applying an end-to-end learning approach;” (Gondal Figure 1)
He teaches:
“image data” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
“and a determination unit configured to determine, [using the provided trained conditional neural process for determining an object from image data], the position of the object based on the provided [image] data.” (He 11632, “In this paper, we study the problem of 6DoF pose estimation, i.e. recognize the 3D location and orientation of an object in a canonical frame.”)
In reference to claim 12.
Gondal teaches:
“12. A [control device for controlling a controllable system, the control] device comprising: a reception unit [configured to receive a position of an object determined by a control device for determining a position of an object] including: a provisioning unit configured to provide [image] data, wherein the [image] data comprise target [image] data [showing the object] and labeled comparison [image] data [regarding the object]; a reception unit configured to receive a trained conditional neural process, the conditional neural process being trained by a control device for training a conditional neural network [for determining a position of an object] from [image] data [for determining a position of an object] from [image] data, the control device for training including: a provisioning unit configured to provide training data for training the conditional neural process, wherein the training data include labeled [image] data [showing a particular object] and labeled comparison [image] data [regarding the particular object]; and a training unit configured to train the conditional neural process based on the provided training data, wherein the training of the conditional neural process includes applying functional contrastive learning, and the training of the conditional neural process includes applying an end-to-end learning approach;” (Gondal Figure 1)
He teaches:
“image data” (He 11636, “YCB-Video Dataset contains 21 YCB [4] objects of varying shape and texture. 92 RGBD videos of the subset of objects were captured and annotated with 6D pose and instance semantic mask. […] We follow [52] and split the dataset into 80 videos for training and another 2,949 keyframes chosen from the rest 12 videos for testing.”)
“[…] determining a position of an object by […]”;
“[…] for determining a position of an object from image data […]”;
“and a determination unit configured to determine, [using the provided trained conditional neural process for determining an object from image data,] the position of the object based on the provided image data;”
(He 11632, “In this paper, we study the problem of 6DoF pose estimation, i.e. recognize the 3D location and orientation of an object in a canonical frame.”)
Le teaches:
“A control device for controlling a controllable system, the control device comprising: a reception unit configured to receive a position of an object determined by a control device for determining a position of an object including:”;
“and a control unit configured to control the controllable system based on the determined position of the object.”
(Le 2, “The system consists of a 6-DOF Epson ProSix C4-A601S robot arm, a stereo camera with two FLIR BFS-PGE-50S5C cameras […] The principle of the system is positioning the robot arm and allowable tolerance at a storage grid in the frame structure to retrieve a magazine.”)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CODY RYAN GILLESPIE whose telephone number is (571)272-1331. The examiner can normally be reached M-F, 8 AM - 5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker A Lamardo can be reached on 5172705871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CODY RYAN GILLESPIE/Examiner, Art Unit 2147
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147