Prosecution Insights
Last updated: April 19, 2026
Application No. 18/521,736

TRAINING DATA SELECTION DEVICE FOR SELECTING TRAINING DATA TO IMPROVE PERFORMANCE OF A DEPTH ESTIMATION NETWORK AND A TRAINING DATA SELECTION METHOD THEREFOR

Non-Final OA §103
Filed
Nov 28, 2023
Examiner
PEDAPATI, CHANDHANA
Art Unit
2669
Tech Center
2600 — Communications
Assignee
Kia Corporation
OA Round
1 (Non-Final)
64%
Grant Probability
Moderate
1-2
OA Rounds
2y 10m
To Grant
96%
With Interview

Examiner Intelligence

Grants 64% of resolved cases
64%
Career Allow Rate
14 granted / 22 resolved
+1.6% vs TC avg
Strong +32% interview lift
Without
With
+32.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
26 currently pending
Career history
48
Total Applications
across all art units

Statute-Specific Performance

§101
11.7%
-28.3% vs TC avg
§103
47.0%
+7.0% vs TC avg
§102
18.1%
-21.9% vs TC avg
§112
20.9%
-19.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 22 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Notice to Applicants Limitations appearing inside of {} are intended to indicate the limitations not taught by said prior art(s)/combinations. Claims 1-16 are pending in the application. Information Disclosure Statement No Information Disclosure Statement(s) (IDS) was/were filed; therefore, no applicant-submitted references were considered. Claim Interpretation The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked. As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph: (A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; (B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and (C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: depth estimation network in claims 1 and 7, Vulnerability output device in claims 1, and 3-5, Training data acquisition support device in claims 1 and 6. Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-3, 6-8, 9-11, and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over “Guizilini” (Vitor Guizilini and Rares Ambrus and Dian Chen and Sergey Zakharov and Adrien Gaidon. “Multi-Frame Self-Supervised Depth with Transformers”, (2022). arXiv: arXiv:2204.07616v2.) in view of Srinivasan, U.S. Patent No. US 11087494 B1. Regarding claim 1, Guizilini teaches a training data selection device, comprising: a depth estimation network configured to apply depth estimation calculation to an input image obtained in real time to output depth distribution information corresponding to the input image (See Guizilini, FIG 5 exhibits input image into depth estimation calculation to output depth distribution information) PNG media_image1.png 312 471 media_image1.png Greyscale a vulnerability output device configured to output depth estimation vulnerability corresponding to the input image with reference to the depth distribution information (Guizilini, FIG 4(c), shown below, exhibits Maximum attention, “normalized attention values which can be used as a measure of confidence”, [p 5, §3.1.1, Col 1, ¶1]) PNG media_image2.png 192 467 media_image2.png Greyscale PNG media_image2.png 192 467 media_image2.png Greyscale a training data acquisition support device configured {to store the input image and specific point cloud data} corresponding to the input image as new training data {in a certain storage space or configured to transmit the input image and the specific point cloud data to another device}, when it is determined that the depth estimation vulnerability is greater than or equal to a predetermined threshold (Guizilini, [p 5, §3.3.1, Col 1, ¶1]; leverage this novel matching confidence metric by masking out pixels with maximum attention value below a certain threshold _min, both from the high response loss calculation and the decoded features (Figure 4d)). Guizilini does not explicitly disclose store the input image and specific point cloud data corresponding to the input image as new training data in a certain storage space or configured to transmit the input image and the specific point cloud data to another device. However, Srinivasan, in a similar field of endeavor of depth estimation, teaches store the input image and specific point cloud data corresponding to the input image as new training data in a certain storage space or configured to transmit the input image and the specific point cloud data to another device (Srinivasan, [Col 2: 22-24]; The machine-learning model can be trained using training image data and training lidar data (i.e., point cloud) as a ground truth for training the machine-learning model; Examiner interprets that image and point cloud data are stored if they are used to train the model). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include storing point cloud data as training data as taught by Srinivasan to the invention of Guizilini. The motivation to do so would be to provide supervision for the machine-learning model. Regarding claim 2, the combination of Guizilini and Srinivasan teach the training data selection device of claim 1. Guizilini further teaches wherein the depth estimation network performs a process of outputting j_1st to j_Kth probability values corresponding to 1st to Kth default depths for a jth pixel being any one of 1st to nth pixels of the input image with respect to the 1st to nth pixels to output probability values from 1_1st to 1_Kth probability values to n_1st to n_Kth probability values for the 1st to nth pixels as the depth distribution information (Guizilini 2022, [p 4, FIG 3(c) caption]; Matching probability distribution along depth bins for different pixels relative to their depth-discretized epipolar candidates). Regarding claim 3, the combination of Guizilini and Srinivasan teach the training data selection device of claim 2. Guizilini further teaches wherein the vulnerability output device is further configured to: generate 1st to nth predicted depth values of the 1st to nth pixels and 1st to nth offsets corresponding to the 1st to nth predicted depth values with reference to the probability values from the 1_1st to 1_Kth probability values to the n_1st to n_Kth probability values (Guizilini 2022, [p 4,§3.1.1, Col 2, ¶1]; for each pixel, puv, the argmax operation is used to find the index huv of the most probable alongside its sampled epipolar line ε t   → c u v . A 1-dimensional 2s+1 window is placed around huv, and a re-normalization step is applied such that its sum is 1: PNG media_image3.png 60 499 media_image3.png Greyscale ) and the 1st to Kth default depths ([p 5, §3.1.1, Col 1, ¶1]; The depth value for puv is calculated by multiplying this re-normalized distribution with the corresponding depth bins: PNG media_image4.png 62 353 media_image4.png Greyscale ); and output the depth estimation vulnerability with reference to the 1st to nth predicted depth values and the 1st to nth offsets ([p 5, §3.1.1, Col 1, ¶1];The normalized attention values can also be used as a measure of matching confidence). Regarding claim 6, the combination of Guizilini and Srinivasan teach the training data selection device of claim 1. Srinivasan further teaches wherein the training data acquisition support device is further configured to store point cloud data obtained in a first time interval set on the basis of a time point when the input image is obtained as the specific point cloud data (Srinivasan, [Col 2:60-64]; After training (i.e., first time point) , the machine-learned model can receive image data captured by image sensor(s) to determine depth data associated with image data. In some instances, the machine-learned model can receive captured depth data captured by depth sensors (e.g., lidar sensors)) in the certain storage space or further configured to transmit the point cloud data to the other device (Srinivasan, FIG 12 exhibits memory 1238). Regarding claim 7, the combination of Guizilini and Srinivasan teach the training data selection device of claim 1. Guizilini further teaches wherein the depth estimation network is further configured to: apply the depth estimation calculation to the input image to output the depth distribution information corresponding to the input image (Guizilini, See FIG 5 above), in a state where a learning device applies the depth estimation calculation to beforehand training image to generate predicted depth distribution information corresponding to the beforehand training image (i.e., pre-training); generate a depth loss using the predicted depth distribution information and ground truth (GT) depth distribution information corresponding to the predicted depth distribution information (Guizilini [p 5, §3.4, Col 2, ¶1]; We train our self-supervised depth…using reprojection loss and absolute error (L1) terms. PNG media_image5.png 346 374 media_image5.png Greyscale ; and [p 8, §4.5, Col 2, ¶1]; the VKITTI2, PD, and TartainAir models are pre-trained with depth supervision (using a Smooth L1 loss) and use ground-truth relative poses. Real-world datasets (DDAD and Cityscapes) are pre-trained using the self-supervised loss described in Section 3.4) ; and perform back propagation of the depth loss to learn a parameter of the depth estimation network (Guizilini 2022, [p 3, §3.1, Col 1, ¶1]; gradient back-propagation for end-to-end training). Regarding claim 8, the combination of Guizilini and Srinivasan teach the training data selection device of claim 7. Guizilini teaches wherein: the {GT} depth distribution information is generated by applying point cloud data for training to an image coordinate system corresponding to the beforehand training image; {and the point cloud data is obtained in a second time interval set on the basis of a time point when the beforehand training image is obtained}. Guizilini 2022 teaches point cloud data is reconstructed (See FIG 10 and FIG 11), and that “no ground-truth is used at training or inference time, only videos” (p 11, Appendix D, Col 1, ¶1). Rather, images are used as ground truth (p 8, §4.5, Col 2, ¶1). While recognizing the need for point cloud ground truth ([p 11, Appendix F, Col 2, ¶1]; Another common limitation of self-supervised monocular depth estimation is scale ambiguity, since models trained purely on image information cannot produce metrically-accurate predictions. Scale-aware results are necessary for downstream tasks that ingest our reconstructed pointclouds), the Guizilini does not explicitly disclose the GT depth distribution information is generated by applying point cloud data for training to an image coordinate system corresponding to the beforehand training image the point cloud data is obtained in a second time interval set on the basis of a time point when the beforehand training image is obtained. However, Srinivasan teaches the GT depth distribution information is generated by applying point cloud data for training to an image coordinate system corresponding to the beforehand training image the point cloud data is obtained in a second time interval set on the basis of a time point when the beforehand training image is obtained (Srinivasan, [Col 16:15-21]; ground truth data (e.g., from lidar data and/or other sensor data) associated with the image data 804 can be used to train the machine-learned model (i.e., beforehand training)). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include ground truth point cloud data as taught by Srinivasan to the invention of Guizilini. The motivation to do so would be to provide supervision for the machine-learning model. Claim 9 is similarly analyzed as analogous claim 1. Claim 10 is similarly analyzed as analogous claim 2. Claim 11 is similarly analyzed as analogous claim 3. Claim 14 is similarly analyzed as analogous claim 6. Claim 15 is similarly analyzed as analogous claim 7. Claim 16 is similarly analyzed as analogous claim 8. Claims 4-5, and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Guizilini in view of Srinivasan, and further in view of Dudzik et al., US 20210150278 A, hereinafter Dudzik. Regarding claim 4, the combination of Guizilini and Srinivasan teach the training data selection device of claim 3. Guizilini teaches wherein the vulnerability output device is further configured to: perform a process of generating a j_ith predicted depth value corresponding to an ith default depth {with reference to an ith middle value} determined on the basis of at least one default depth including the ith default depth and a j_ith probability value corresponding to the ith default depth for the jth pixel with respect to the 1st to Kth default depths to generate j_1st to j_Kth predicted depth values; and perform a process of generating a jth predicted depth value of the jth pixel with reference to the j_1st to j_Kth predicted depth values with respect to the 1st to nth pixels to generate the 1st to nth predicted depth values (Guizilini, [p 4, §3.1.1, Col , ¶1]); We use a localized high-response window [72] to estimate continuous depth values from discretized bins, thus increasing robustness to multi-modal distributions [48]. A diagram is shown in Figure 4a; See Eq 8 shown above, and [p 5, §3.1.1, Col 1, ¶1]) The depth value for puv is calculated by multiplying this re-normalized distribution with the corresponding depth bins:). Srinivasan also teaches wherein the vulnerability output device is further configured to: perform a process of generating a j_ith predicted depth value corresponding to an ith default depth {with reference to an ith middle value} determined on the basis of at least one default depth including the ith default depth and a j_ith probability value corresponding to the ith default depth for the jth pixel with respect to the 1st to Kth default depths to generate j_1st to j_Kth predicted depth values; and perform a process of generating a jth predicted depth value of the jth pixel with reference to the j_1st to j_Kth predicted depth values with respect to the 1st to nth pixels to generate the 1st to nth predicted depth values (Srinivasan, ¶[Col 3:25-40]; the machine-learned model can determine discrete depth portions/bins associated with the image data. For example, output values falling within a range of depths (e.g., within a depth bin) can be associated with a discrete depth bin and output a discrete value.) The combination does does not explicitly disclose an ith default depth with reference to an ith middle value. However, Dudzik, similar field of endeavor of depth estimation, teaches an ith default depth with reference to an ith middle value (Dudzik, ¶[0020]; a combination of binning and offsets may be used (as may be measured from the “center” of the bin). In some instances, the machine-learned algorithm can use a loss function and/or softmax loss that is associated with a depth bin to determine the continuous offset (i.e., vulnerability depends upon the offset).) It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include measuring offset from the “center” of the bin as taught by Dudzik to the combined invention of Guizilini and Srinivasan. The motivation to do so would be to output a “coarse” measurement of a bin. Regarding claim 5, the combination of Guizilini, Srinivasan, and Dudzik teach the training data selection device of claim 4. Srinivasan further teaches wherein the vulnerability output device is further configured to: perform a process of generating a j_ith offset {corresponding to the ith default depth with reference to the ith middle value}, the jth predicted depth value, and the j_ith probability value for the jth pixel with respect to the 1st to Kth default depths to generate j_1st to j_Kth offsets; and perform a process of generating a jth offset of the jth pixel with reference to the j_1st to j_Kth offsets with respect to the 1st to nth pixels to generate the 1st to nth offsets (Srinivasan, [Col 3:44-49]; a continuous offset can be determined with respect to a binned output. Continuing with the example above, a machine-learned model may output a binned depth value of 10.5 meters with a continuous offset of positive 15 cm from the discrete depth value. In such an example, the depth value would correspond to a depth of 10.65 meters.). The combination does not explicitly disclose the ith default depth with reference to the ith middle value. However, Dudzik teaches wherein the vulnerability output device is further configured to: perform a process of generating a j_ith offset corresponding to the ith default depth with reference to the ith middle value, the jth predicted depth value, and the j_ith probability value for the jth pixel with respect to the 1st to Kth default depths to generate j_1st to j_Kth offsets; and perform a process of generating a jth offset of the jth pixel with reference to the j_1st to j_Kth offsets with respect to the 1st to nth pixels to generate the 1st to nth offsets (Dudzik, ¶[0020]; the machine-learned model can determine discrete depth portions/bins associated with the image data; a machine-learned model can output a continuous depth value as a continuous output, … the continuous offset can provide a graduated transition of between depth values regardless of whether the discrete depth bins are used; a combination of binning and offsets may be used (e.g., the model may output a “coarse” measurement of a bin in addition to a fine-grained offset (as may be measured from the “center” of the bin)). Claim 11 is similarly analyzed as analogous claim 4. Claim 12 is similarly analyzed as analogous claim 5. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Guizilini et al., US 20210350616 A1, would have been relied upon for teaching estimated depth that accounts for a depth uncertainty measurement in the current image and the at least one previous image. Bhat et al., 2022 (Shariq Farooq , Ibraheem Alhashim, and Peter Wonka. 2022. LocalBins: Improving Depth Estimation by Learning Local Distributions. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I. Springer-Verlag, Berlin, Heidelberg, 480–496. https://doi.org/10.1007/978-3-031-19769-7_28) teaches depth estimation from a single image, building on AdaBins to provide an architecture called Local bins. The reference would have been relied upon for teaching the use of “bin-centers” to provide a course-to-fine binning strategy. Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHANDHANA PEDAPATI whose telephone number is 571-272-5325. The examiner can normally be reached M-F 8:30am-6pm (ET). Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached at 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /CHANDHANA PEDAPATI/Examiner, Art Unit 2669 /CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669
Read full office action

Prosecution Timeline

Nov 28, 2023
Application Filed
Jan 09, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602896
IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
2y 5m to grant Granted Apr 14, 2026
Patent 12597095
INTELLIGENT SYSTEM AND METHOD OF ENHANCING IMAGES
2y 5m to grant Granted Apr 07, 2026
Patent 12571683
ELEVATED TEMPERATURE SCREENING SYSTEMS AND METHODS
2y 5m to grant Granted Mar 10, 2026
Patent 12548180
HOLE DIAMETER MEASURING DEVICE
2y 5m to grant Granted Feb 10, 2026
Patent 12541829
MOTION-BASED PIXEL PROPAGATION FOR VIDEO INPAINTING
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
64%
Grant Probability
96%
With Interview (+32.5%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 22 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month