Last updated: April 19, 2026

Application No. 18/538,554

AUTOMATED BORESCOPE POSE ESTIMATION VIA VIRTUAL MODALITIES

Non-Final OA §103§112

Filed

Dec 13, 2023

Examiner

YANG, JIANXUN

Art Unit

2662

Tech Center

2600 — Communications

Assignee

Rtx Corporation

OA Round

1 (Non-Final)

Interview Optional

— +18.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 635 resolved cases, 2023–2026

Examiner Intelligence

YANG, JIANXUN View full profile →

Grants 74% — above average

Career Allow Rate

472 granted / 635 resolved

+12.3% vs TC avg

Strong +19% interview lift

Without

With

+18.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

45 currently pending

Career history

680

Total Applications

across all art units

Statute-Specific Performance

§101

3.8%

-36.2% vs TC avg

§103

56.1%

+16.1% vs TC avg

§102

16.7%

-23.3% vs TC avg

§112

17.1%

-22.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 635 resolved cases

Office Action

§103 §112

DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-20 are pending.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more Claim(s) particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more Claim(s) particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim(s) 1-7, 13 and 15-20 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.

Claims 1, 13 and 15 recites limitation “the test target object”. There is insufficient antecedent basis for this limitation in the claim.

Claim(s) 2-7, and 16-20 is/are rejected under 112(b) for the same reason as given in their respective base claim(s).


Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claim(s) 1-1-3 and 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sundermeyer et al (Augmented Autoencoders, 2019) in view of Heisele (US20140176551).

Regarding claims 1 and 15, Sundermeyer teaches an object pose prediction system comprising:
a training system configured to:
	repeatedly receive a plurality of training image sets,(Sundermeyer, Fig. 4; each object corresponds to a set of input augmented images b))
	each training image set comprising
		a two-dimensional (2D) training image including a target object having a captured pose,
(Sundermeyer, Fig. 4; each 2D images in b) has a unique pose)
	Sundermeyer does not expressly disclose but Heisele teaches:
		a positive three-dimensional (3D) training image representing the target object and having a rendered pose that is the same as the captured pose, and
		a negative 3D training image representing the 3D object and having a rendered pose that is different from the captured pose,
(Heisele, “A training set, which includes positive samples (images including an object of a particular class) and negative samples (images not including an object of the particular class, such as images including an object of another class), is provided to a machine learning algorithm to produce an object classification model”, [0007]; training the machine learning model with positive and negative samples; binary classifier training uses positive and negative samples for multiple pose classes)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Heisele into the system or method of Sundermeyer in order to accurately detect object poses by learning from correct poses (positive) alongside common mistakes (negative) in training a machine learning model for pose estimation. The combination of Sundermeyer and Heisele also teaches other enhanced capabilities.
	The combination of Sundermeyer and Heisele further teaches:
	wherein the test target object included in each training image set has a different captured pose; and
(Sundermeyer, “Using OpenGL, we render 20000 views of each object uniformly at random 3D orientations”, p7:c2; training using a large variety of poses for the target object to ensure different poses per image set; Heisele, “The steps illustrated in FIG. 4A may be repeated multiple times (using different pedestrian models, image parameters, and/or backgrounds) to generate multiple annotated synthetic pedestrian images”, [0054]; repeated rendering/training over different poses)
	to train a machine learning model to learn a plurality of different poses associated with the test target object in response to repeatedly receiving the plurality of training image sets;
(Sundermeyer, “we learn representations that specifically encode 3D orientations while achieving robustness against occlusion, cluttered backgrounds and generalizing to different environments and test sensors... the AAE ... it is trained to encode 3D model views in a self-supervised way”, p2:c1; training a model to learn multiple object poses from large sets of training images; Heisele, “The training module 110 then uses the synthetic pedestrian data to train a pedestrian pose classifier for classifying the pose of a pedestrian in an image”, [0026]; training pose classifiers with varied training image sets and poses)
an imaging system configured to
	receive a 2D test image of a test object,
(Sundermeyer, “After training, the AAE is able to extract a 3D object from real scene crops of many different camera sensors (Fig. 6) ... At test time, the considered object(s) are first detected in an RGB scene”, p7:c2-p8:c1; receiving and processing 2D test images for pose inference; Heisele, “The overall classification module 120 receives a still image of a pedestrian and the pedestrian pose classifiers trained by the training module 110”, [0027]; a system to receive and classify real 2D images)
	process the 2D test image using the trained machine learning model to predict a pose of the test object, and
(Sundermeyer, “After encoding we compute the cosine similarity between the test code z(test) and all codes z(i) from the codebook ... The highest similarities are determined in a k-Nearest-Neighbor (kNN) search and the corresponding rotation matrices {R(kNN} from the codebook are returned as estimates of the 3D object orientation”, p8:c1; 3D object orientation => pose; inferring object pose using trained neural model on a test image; Heisele, “A binary classification module 313 ... uses a classifier (e.g., support vector machine or “SVM”) and the HOG features to determine whether the pose of a pedestrian present in the image belongs to a particular class”, [0042]; pose prediction/classification using trained model on incoming test data)
	output a 3D test image including a rendering of the 2D test image having the predicted pose.
(Sundermeyer, Fig. 1, 2D input (upper left) corresponds to the rendered 3D pose output image (lower right); outputs a decoder-generated synthetic image reconstructed to the predicted pose)

Regarding claims 2 and 16, The combination of Sundermeyer and Heisele teaches its/their respective base claim(s).
The combination further teaches the object pose prediction system of claim 1, wherein the 2D test image is generated by an image sensor that captures the test object in real-time, and the 3D test image is a computer-generated digital representation of the test object.
(Sundermeyer, “our method operates on single RGB images, which significantly increases the usability as no depth information is required ... As a first step, we build upon state-of-the-art 2D Object Detectors”, p1:c2; Fig. 1, “after detecting an object (2D Object Detector), the object is quadratically cropped”; real-time image acquisition via RGB cameras, plus the use of computer-generated 3D digital models (synthetic views) for training and inference)

Regarding claims 3 and 17, The combination of Sundermeyer and Heisele teaches its/their respective base claim(s).
The combination further teaches the object pose prediction system of claim 2, wherein
the 2D training image is generated by an image sensor that captures the test object, and
(Sundermeyer, Fig. 1, upper left, 2D input images; “Using OpenGL, we render 20000 views of each object uniformly at random 3D orientations and constant distance along the camera axis (700mm)”, p7:c2)
both of the positive 3D training image and the negative 3D training image are computer-generated digital representations of the training object.
(Heisele, “A training set, which includes positive samples (images including an object of a particular class) and negative samples (images not including an object of the particular class, such as images including an object of another class), is provided to a machine learning algorithm to produce an object classification model”, [0007]; “generating a two-dimensional (2D) synthetic image based on the received 3D model and the received set of image parameters ... training a plurality of pedestrian pose classifiers through the annotated synthetic image”, [0009]; obviously, both positive and negative training images may be the synthetic images generated by a 3D model by a computer) 

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sundermeyer et al (Augmented Autoencoders, 2019) in view of Heisele (US20140176551) and further in view of Wong (Borescope damage assessments, 2021).

Regarding claim 7, The combination of Sundermeyer and Heisele teaches its/their respective base claim(s).
The combination does not expressly disclose but Wong teaches the object pose prediction system of claim 1, wherein the imaging system includes a borescope configured to capture the 2D test image of the test object.
(Wong, Figs. 1 and 3; “Borescope imaging is a popular technique that allows an inspector to probe the interior of an engine; the borescope’s flexible tube is designed to fit into engine access ports, permitting the inspection to be carried out with minimal disassembly. For example, special access plugs or the hole of a removed igniter can be used to access the hot section of a turbine [8]. The video feed captured by borescopes is analysed in real time by inspectors to identify potential anomalies”, p1; “instance segmentation on contiguous frames of borescope videos is performed by adapting the Mask R-CNN”, p3)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Wong into the modified system or method of Sundermeyer and Heisele in order to use a borescope camera for defect pose estimation for high precision, the ability to inspect hard-to-reach areas without disassembly, and enabling detailed, real-time analysis for improved safety and efficiency. The combination of Sundermeyer, Heisele and Wong also teaches other enhanced capabilities.

Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sundermeyer et al (Augmented Autoencoders, 2019).

Regarding claim 8, Sundermeyer teaches an object pose prediction system comprising:
an image sensor configured to generate at least one 2D test image of a test object existing in real space and having a pose and depth;
(Sundermeyer, Fig. 1, a 2D input image in real space with a pose; the input image may have depth information as well; “our method operates on single RGB images, which significantly increases the usability as no depth information is required. We note though that depth maps may be incorporated optionally to refine the estimation. As a first step, we build upon state-of-the-art 2D Object Detectors ... On the resulting scene crops, we employ our novel 3D orientation estimation algorithm”, p1:c2; using a 2D image sensor (camera, e.g., RGB sensor) to obtain 2D images of real objects for object pose estimation; R1 optionally allows depth data, but the primary operation is on real-world object images as test input)
a processing system configured
	to generate an intermediate digital representation of the test object having a predicted pose and predicted depth based on the 2D test image, and
(Sundermeyer, Figs. 1 and 4; “our method operates on single RGB images ... We note though that depth maps may be incorporated optionally to refine the estimation. As a first step, we build ... 2D Object Detectors ... which provide object bounding boxes and identifiers. On the resulting scene crops, we employ our novel 3D orientation estimation algorithm”; p1:c2; “Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space”, [abstract]; “The training objective is to reconstruct the input x ... after passing through a low-dimensional bottleneck, referred to as the latent representation z”, p5; “The motivation behind the AAE is to control what the latent representation encodes and which properties are ignored”, p6:c1; “The clarity and orientation of the decoder reconstruction is an indicator of the encoding quality”, p7:c2; teaching an image processing method that takes 2D image input, generates an intermediate latent representation (vector z), and uses it to predict object pose (orientation and position) as an estimate; while depth is mentioned as optionally refinable, estimation of 3D translation (distance/depth) may occur via projective distance estimation using bounding boxes, as described on sec. 3.6.2, p9)
	to generate a 3D digital image of the test object having the predicted pose and the predicted depth.
(Sundermeyer, “After training, the AAE is able to extract a 3D object from real scene crops of many different camera sensors (Fig. 13). The clarity and orientation of the decoder reconstruction is an indicator of the encoding quality. To determine 3D object orientations from test scene crops we create a codebook (Fig. 14 (top))”, p7:c2; “We estimate the full 3D translation t(real) from camera to object center ... At test time, we compute the ratio between the detected bounding box diagonal ||bb(real)|| and the corresponding codebook diagonal ||bb(syn, argmax(cos_i)||, i.e. at similar orientation. The pinhole camera model yields the distance estimate t^(real, z) ... with synthetic rendering distance t(syn, z) and focal lengths  f(real), f(syn) of the real sensor and synthetic views”; p9:c1; generating a 3D digital estimate (orientation + translation/depth) of the test object from a 2D image. The AAE reconstructs 3D orientation and projective distance estimation provides depth, so the output is a digital 3D pose + depth model; obviously, the reconstructed image (in Fig. 1, lower right) may include the pose and depth of an object)

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sundermeyer et al (Augmented Autoencoders, 2019) in view of Heisele (US20140176551) and further in view of Ji et al (US20210004984).

Regarding claim 9, Sundermeyer teaches its/their respective base claim(s).
Sundermeyer does not expressly disclose but Ji teaches the object pose prediction system comprising of claim 8,
wherein the at least one 2D test image includes a video stream containing movement of the test object, and
wherein the processing system performs optical flow processing on the video stream to determine the predicted pose and the predicted depth.
(Ji, “the deep convolutional neural network may return the 6D pose estimation, the third segmentation mask, and the optical flow respectively in three branches for iterative training”, [0027]; “the network is based on FlowNet Convs and FlowNet DeConvs (FlowNet convolution and de-convolution) models, and during training, inputs the enlarged rendered image and the segmentation mask thereof, and the enlarged observed image and the segmentation mask thereof into the FlowNet Convs model, and obtains a relative 6D pose estimation (including a relative rotation transformation (Rotation) and a relative translation transformation (Translation)). The FlowNet DeConvs model obtains the optical flow and a segmentation mask”, [0054]; “frame rate”, [0004], optical flow usually uses video frames to create more frames; processes both observed (real) images and rendered images, supporting video streams; direct optical flow modeling fully taught via FlowNet architecture)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Ji into the system or method of Sundermeyer in order to use optical flow in dynamic pose estimation for providing crucial real-time motion cues that enhance accuracy, robustness, and efficiency in dynamic environments. The combination of Sundermeyer and Ji also teaches other enhanced capabilities.

Claim(s) 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sundermeyer et al (Augmented Autoencoders, 2019) in view of Wong (Borescope damage assessments, 2021).

Regarding claim 14, Sundermeyer teaches its/their respective base claim(s).
The combination of Sundermeyer and Wong teaches the object pose prediction system of claim 8, wherein the image sensor is a borescope.
(Wong, see comments on claim 7; “Current methods of blade visual inspection are mainly based on borescope imaging. During these inspections”, [abstract]; “the video feed captured by borescopes”, p1; the sensor used is a borescope for capturing inspection images/videos)


Allowable Subject Matter
Claim(s) 10-12 is/are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening Claim(s).
The following is a statement of reasons for the indication of allowable subject matter:

Claim(s) 10 recite(s) limitation(s) related to generating a depth map that maps the depth of a training object with a respective pose, and training a machine learning model using the depth map. There are no explicit teachings to the above limitation(s) found in the prior art cited in this office action and from the prior art search.

Claim(s) 11-12 depend on claim 10.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIANXUN YANG whose telephone number is (571)272-9874. The examiner can normally be reached on MON-FRI: 8AM-5PM Pacific Time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached on (571)272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/JIANXUN YANG/
Primary Examiner, Art Unit 2662				2/7/2026

Read full office action

Prosecution Timeline

Dec 13, 2023

Application Filed

Feb 08, 2026

Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/191,202

Patent 12602917

OBJECT DETECTION DEVICE AND METHOD

2y 5m to grant Granted Apr 14, 2026

18/502,347

Patent 12602853

METHODS AND APPARATUS FOR PET IMAGE RECONSTRUCTION USING MULTI-VIEW HISTO-IMAGES OF ATTENUATION CORRECTION FACTORS

2y 5m to grant Granted Apr 14, 2026

18/072,471

Patent 12590906

X-RAY INSPECTION APPARATUS, X-RAY INSPECTION SYSTEM, AND X-RAY INSPECTION METHOD

2y 5m to grant Granted Mar 31, 2026

17/927,692

Patent 12586223

METHOD FOR RECONSTRUCTING THREE-DIMENSIONAL OBJECT COMBINING STRUCTURED LIGHT AND PHOTOMETRY AND TERMINAL DEVICE

2y 5m to grant Granted Mar 24, 2026

18/130,022

Patent 12586152

METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR TRAINING IMAGE PROCESSING MODEL

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

74%

Grant Probability

93%

With Interview (+18.6%)

2y 9m

Median Time to Grant

Low

PTA Risk

Based on 635 resolved cases by this examiner. Grant probability derived from career allow rate.