Last updated: May 29, 2026

Application No. 18/521,736

TRAINING DATA SELECTION DEVICE FOR SELECTING TRAINING DATA TO IMPROVE PERFORMANCE OF A DEPTH ESTIMATION NETWORK AND A TRAINING DATA SELECTION METHOD THEREFOR

Non-Final OA §103

Filed

Nov 28, 2023

Priority

Apr 24, 2023 — RE 10-2023-0053422

Examiner

PEDAPATI, CHANDHANA

Art Unit

2669

Tech Center

2600 — Communications

Assignee

Kia Corporation

OA Round

1 (Non-Final)

Interview Optional

— +35.6% interview lift. Examiner has a relatively high allowance rate (67%); +35.6% interview lift. A written response may suffice.

Based on 24 resolved cases, 2023–2026

Examiner Intelligence

PEDAPATI, CHANDHANA View full profile →

Grants 67% — above average

Career Allowance Rate

16 granted / 24 resolved

+4.7% vs TC avg

Strong +36% interview lift

Without

With

+35.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

15 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

2.6%

-37.4% vs TC avg

§103

89.5%

+49.5% vs TC avg

§102

5.3%

-34.7% vs TC avg

§112

2.6%

-37.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 24 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Notice to Applicants
Limitations appearing inside of {} are intended to indicate the limitations not taught by said prior art(s)/combinations.

Claims 1-16 are pending in the application.

Information Disclosure Statement
No Information Disclosure Statement(s) (IDS) was/were filed; therefore, no applicant-submitted references were considered. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.


The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
depth estimation network in claims 1 and 7,
Vulnerability output device in claims 1, and 3-5,
Training data acquisition support device in claims 1 and 6.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 6-8, 9-11, and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over “Guizilini” (Vitor Guizilini and Rares Ambrus and Dian Chen and Sergey Zakharov and Adrien Gaidon.  “Multi-Frame Self-Supervised Depth with Transformers”, (2022). arXiv: arXiv:2204.07616v2.) in view of Srinivasan, U.S. Patent No. US 11087494 B1.
Regarding claim 1, Guizilini teaches a training data selection device, comprising: a depth estimation network configured to 
apply depth estimation calculation to an input image obtained in real time to output depth distribution information corresponding to the input image (See Guizilini, FIG 5 exhibits input image into depth estimation calculation to output depth distribution information)

    PNG
    media_image1.png
    312
    471
    media_image1.png
    Greyscale

a vulnerability output device configured to output depth estimation vulnerability corresponding to the input image with reference to the depth distribution information (Guizilini, FIG 4(c), shown below, exhibits Maximum attention, “normalized attention values which can be used as a measure of confidence”, [p 5, §3.1.1, Col 1, ¶1])

    PNG
    media_image2.png
    192
    467
    media_image2.png
    Greyscale

    PNG
    media_image2.png
    192
    467
    media_image2.png
    Greyscale



a training data acquisition support device configured {to store the input image and specific point cloud data} corresponding to the input image as new training data {in a certain storage space or configured to transmit the input image and the specific point cloud data to another device}, when it is determined that the depth estimation vulnerability is greater than or equal to a predetermined threshold (Guizilini, [p 5, §3.3.1, Col 1, ¶1]; leverage this novel matching confidence metric by masking out pixels with maximum attention value below a certain threshold _min, both from the high response loss calculation and the decoded features (Figure 4d)).
Guizilini does not explicitly disclose store the input image and specific point cloud data corresponding to the input image as new training data in a certain storage space or configured to transmit the input image and the specific point cloud data to another device.  
However, Srinivasan, in a similar field of endeavor of depth estimation, teaches store the input image and specific point cloud data corresponding to the input image as new training data in a certain storage space or configured to transmit the input image and the specific point cloud data to another device (Srinivasan, [Col 2: 22-24]; The machine-learning model can be trained using training image data and training lidar data (i.e., point cloud) as a ground truth for training the machine-learning model;  Examiner interprets that image and point cloud data are stored if they are used to train the model). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include storing point cloud data as training data as taught by Srinivasan to the invention of Guizilini.  The motivation to do so would be to provide supervision for the machine-learning model.

Regarding claim 2, the combination of Guizilini and Srinivasan teach the training data selection device of claim 1.  Guizilini further teaches wherein the depth estimation network performs a process of outputting j_1st to j_Kth probability values corresponding to 1st to Kth default depths for a jth pixel being any one of 1st to nth pixels of the input image with respect to the 1st to nth pixels to output probability values from 1_1st to 1_Kth probability values to n_1st to n_Kth probability values for the 1st to nth pixels as the depth distribution information (Guizilini 2022, [p 4, FIG 3(c) caption]; Matching probability distribution along depth bins for different pixels relative to their depth-discretized epipolar candidates).

Regarding claim 3, the combination of Guizilini and Srinivasan teach the training data selection device of claim 2.  Guizilini further teaches wherein the vulnerability output device is further configured to:  generate 1st to nth predicted depth values of the 1st to nth pixels and 1st to nth offsets corresponding to the 1st to nth predicted depth values with reference to the probability values from the 1_1st to 1_Kth probability values to the n_1st to n_Kth probability values (Guizilini 2022, [p 4,§3.1.1, Col 2, ¶1]; for each pixel, puv, the argmax operation is used to find the index huv of the most probable alongside its sampled epipolar line                         
                            
                                
                                    ε
                                
                                
                                    t
                                     
                                    →
                                    c
                                
                                
                                    u
                                    v
                                
                            
                        
                    . A 1-dimensional 2s+1 window is placed around huv, and a re-normalization step is applied such that its sum is 1: 

    PNG
    media_image3.png
    60
    499
    media_image3.png
    Greyscale
)
and the 1st to Kth default depths ([p 5, §3.1.1, Col 1, ¶1]; The depth value for puv is calculated by multiplying this re-normalized distribution with the corresponding depth bins:

    PNG
    media_image4.png
    62
    353
    media_image4.png
    Greyscale
); and
output the depth estimation vulnerability with reference to the 1st to nth predicted depth values and the 1st to nth offsets ([p 5, §3.1.1, Col 1, ¶1];The normalized attention values can also be used as a measure of matching confidence).

Regarding claim 6, the combination of Guizilini and Srinivasan teach the training data selection device of claim 1.  Srinivasan further teaches wherein the training data acquisition support device is further configured to store point cloud data obtained in a first time interval set on the basis of a time point when the input image is obtained as the specific point cloud data (Srinivasan, [Col 2:60-64]; After training (i.e., first time point) , the machine-learned model can receive image data captured by image sensor(s) to determine depth data associated with image data. In some instances, the machine-learned model can receive captured depth data captured by depth sensors (e.g., lidar sensors)) in the certain storage space or further configured to transmit the point cloud data to the other device (Srinivasan, FIG 12 exhibits memory 1238).

Regarding claim 7, the combination of Guizilini and Srinivasan teach the training data selection device of claim 1.  Guizilini further teaches wherein the depth estimation network is further configured to:
apply the depth estimation calculation to the input image to output the depth distribution information corresponding to the input image (Guizilini, See FIG 5 above), in a state where a learning device applies the depth estimation calculation to beforehand training image to generate predicted depth distribution information corresponding to the beforehand training image (i.e., pre-training);  generate a depth loss using the predicted depth distribution information and ground truth (GT) depth distribution information corresponding to the predicted depth distribution information (Guizilini [p 5, §3.4, Col 2, ¶1]; We train our self-supervised depth…using reprojection loss and absolute error (L1) terms.

    PNG
    media_image5.png
    346
    374
    media_image5.png
    Greyscale
; and
[p 8, §4.5, Col 2, ¶1]; the VKITTI2, PD, and TartainAir models are pre-trained with depth supervision (using a Smooth L1 loss) and use ground-truth relative poses.  Real-world datasets (DDAD and Cityscapes) are pre-trained using the self-supervised loss described in Section 3.4) ; and
perform back propagation of the depth loss to learn a parameter of the depth estimation network (Guizilini 2022, [p 3, §3.1, Col 1, ¶1]; gradient back-propagation for end-to-end training).

Regarding claim 8, the combination of Guizilini and Srinivasan teach the training data selection device of claim 7.  Guizilini teaches wherein:  the {GT} depth distribution information is generated by applying point cloud data for training to an image coordinate system corresponding to the beforehand training image; {and the point cloud data is obtained in a second time interval set on the basis of a time point when the beforehand training image is obtained}.  Guizilini 2022 teaches point cloud data is reconstructed (See FIG 10 and FIG 11), and that “no ground-truth is used at training or inference time, only videos” (p 11, Appendix D, Col 1, ¶1).  Rather, images are used as ground truth (p 8, §4.5, Col 2, ¶1).  While recognizing the need for point cloud ground truth ([p 11, Appendix F, Col 2, ¶1]; Another common limitation of self-supervised monocular depth estimation is scale ambiguity, since models trained purely on image information cannot produce metrically-accurate predictions. Scale-aware results are necessary for downstream tasks that ingest our reconstructed pointclouds), the Guizilini does not explicitly disclose the GT depth distribution information is generated by applying point cloud data for training to an image coordinate system corresponding to the beforehand training image the point cloud data is obtained in a second time interval set on the basis of a time point when the beforehand training image is obtained.  
However, Srinivasan teaches the GT depth distribution information is generated by applying point cloud data for training to an image coordinate system corresponding to the beforehand training image the point cloud data is obtained in a second time interval set on the basis of a time point when the beforehand training image is obtained (Srinivasan, [Col 16:15-21];  ground truth data (e.g., from lidar data and/or other sensor data) associated with the image data 804 can be used to train the machine-learned model (i.e., beforehand training)).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include ground truth point cloud data as taught by Srinivasan to the invention of Guizilini.  The motivation to do so would be to provide supervision for the machine-learning model.
Claim 9 is similarly analyzed as analogous claim 1.
Claim 10 is similarly analyzed as analogous claim 2.
Claim 11 is similarly analyzed as analogous claim 3.
Claim 14 is similarly analyzed as analogous claim 6.
Claim 15 is similarly analyzed as analogous claim 7.
Claim 16 is similarly analyzed as analogous claim 8.

Claims 4-5, and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Guizilini in view of Srinivasan, and further in view of Dudzik et al., US 20210150278 A, hereinafter Dudzik.
Regarding claim 4, the combination of Guizilini and Srinivasan teach the training data selection device of claim 3.  Guizilini teaches wherein the vulnerability output device is further configured to: perform a process of generating a j_ith predicted depth value corresponding to an ith default depth {with reference to an ith middle value} determined on the basis of at least one default depth including the ith default depth and a j_ith probability value corresponding to the ith default depth for the jth pixel with respect to the 1st to Kth default depths to generate j_1st to j_Kth predicted depth values; and perform a process of generating a jth predicted depth value of the jth pixel with reference to the j_1st to j_Kth predicted depth values with respect to the 1st to nth pixels to generate the 1st to nth predicted depth values (Guizilini, [p 4, §3.1.1, Col , ¶1]); We use a localized high-response window [72] to estimate continuous depth values from discretized bins, thus increasing robustness to multi-modal distributions [48]. A diagram is shown in Figure 4a;  See Eq 8 shown above, and [p 5, §3.1.1, Col 1, ¶1]) The depth value for puv is calculated by multiplying this re-normalized distribution with the corresponding depth bins:).
Srinivasan also teaches wherein the vulnerability output device is further configured to: perform a process of generating a j_ith predicted depth value corresponding to an ith default depth {with reference to an ith middle value} determined on the basis of at least one default depth including the ith default depth and a j_ith probability value corresponding to the ith default depth for the jth pixel with respect to the 1st to Kth default depths to generate j_1st to j_Kth predicted depth values; and perform a process of generating a jth predicted depth value of the jth pixel with reference to the j_1st to j_Kth predicted depth values with respect to the 1st to nth pixels to generate the 1st to nth predicted depth values (Srinivasan, ¶[Col 3:25-40]; the machine-learned model can determine discrete depth portions/bins associated with the image data. For example, output values falling within a range of depths (e.g., within a depth bin) can be associated with a discrete depth bin and output a discrete value.)  The combination does does not explicitly disclose an ith default depth with reference to an ith middle value.  
However, Dudzik, similar field of endeavor of depth estimation, teaches an ith default depth with reference to an ith middle value (Dudzik, ¶[0020]; a combination of binning and offsets may be used (as may be measured from the “center” of the bin). In some instances, the machine-learned algorithm can use a loss function and/or softmax loss that is associated with a depth bin to determine the continuous offset (i.e., vulnerability depends upon the offset).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include measuring offset from the “center” of the bin as taught by Dudzik to the combined invention of Guizilini and Srinivasan.  The motivation to do so would be to output a “coarse” measurement of a bin.

Regarding claim 5, the combination of Guizilini, Srinivasan, and Dudzik teach the training data selection device of claim 4. 
Srinivasan further teaches wherein the vulnerability output device is further configured to: perform a process of generating a j_ith offset {corresponding to the ith default depth with reference to the ith middle value}, the jth predicted depth value, and the j_ith probability value for the jth pixel with respect to the 1st to Kth default depths to generate j_1st to j_Kth offsets; and perform a process of generating a jth offset of the jth pixel with reference to the j_1st to j_Kth offsets with respect to the 1st to nth pixels to generate the 1st to nth offsets (Srinivasan, [Col 3:44-49]; a continuous offset can be determined with respect to a binned output. Continuing with the example above, a machine-learned model may output a binned depth value of 10.5 meters with a continuous offset of positive 15 cm from the discrete depth value. In such an example, the depth value would correspond to a depth of 10.65 meters.).   The combination does not explicitly disclose the ith default depth with reference to the ith middle value.
However, Dudzik teaches wherein the vulnerability output device is further configured to: perform a process of generating a j_ith offset corresponding to the ith default depth with reference to the ith middle value, the jth predicted depth value, and the j_ith probability value for the jth pixel with respect to the 1st to Kth default depths to generate j_1st to j_Kth offsets; and perform a process of generating a jth offset of the jth pixel with reference to the j_1st to j_Kth offsets with respect to the 1st to nth pixels to generate the 1st to nth offsets (Dudzik, ¶[0020];  the machine-learned model can determine discrete depth portions/bins associated with the image data; a machine-learned model can output a continuous depth value as a continuous output, … the continuous offset can provide a graduated transition of between depth values regardless of whether the discrete depth bins are used; a combination of binning and offsets may be used (e.g., the model may output a “coarse” measurement of a bin in addition to a fine-grained offset (as may be measured from the “center” of the bin)).

Claim 11 is similarly analyzed as analogous claim 4.
Claim 12 is similarly analyzed as analogous claim 5.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Guizilini et al., US 20210350616 A1, would have been relied upon for teaching estimated depth that accounts for a depth uncertainty measurement in the current image and the at least one previous image.
Bhat et al., 2022 (Shariq Farooq , Ibraheem Alhashim, and Peter Wonka. 2022. LocalBins: Improving Depth Estimation by Learning Local Distributions. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I. Springer-Verlag, Berlin, Heidelberg, 480–496. https://doi.org/10.1007/978-3-031-19769-7_28) teaches depth estimation from a single image, building on AdaBins to provide an architecture called Local bins.  The reference would have been relied upon for teaching the use of “bin-centers” to provide a course-to-fine binning strategy.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHANDHANA PEDAPATI whose telephone number is 571-272-5325. The examiner can normally be reached M-F 8:30am-6pm (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached at 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHANDHANA PEDAPATI/Examiner, Art Unit 2669                                                                                                                                                                                                        /CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669

Read full office action

Prosecution Timeline

Nov 28, 2023

Application Filed

Jan 14, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/932,145

Patent 12614289

Determining Dispenser Vehicle Connection

3y 7m to grant Granted Apr 28, 2026

18/447,647

Patent 12602896

IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM

2y 8m to grant Granted Apr 14, 2026

18/179,540

Patent 12597095

INTELLIGENT SYSTEM AND METHOD OF ENHANCING IMAGES

3y 1m to grant Granted Apr 07, 2026

17/988,639

Patent 12571683

ELEVATED TEMPERATURE SCREENING SYSTEMS AND METHODS

3y 3m to grant Granted Mar 10, 2026

18/300,231

Patent 12548180

HOLE DIAMETER MEASURING DEVICE

2y 10m to grant Granted Feb 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

67%

Grant Probability

99%

With Interview (+35.6%)

2y 11m (~5m remaining)

Median Time to Grant

Low

PTA Risk

Based on 24 resolved cases by this examiner. Grant probability derived from career allowance rate.