Last updated: April 19, 2026

Application No. 18/417,210

Learned Stereo Architecture

Final Rejection §103§DP

Filed

Jan 19, 2024

Examiner

MAHROUKA, WASSIM

Art Unit

2665

Tech Center

2600 — Communications

Assignee

Toyota Research Institute, Inc.

OA Round

2 (Final)

Interview Optional

— +6.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 243 resolved cases, 2023–2026

Examiner Intelligence

MAHROUKA, WASSIM View full profile →

Grants 86% — above average

Career Allow Rate

210 granted / 243 resolved

+24.4% vs TC avg

Moderate +6% lift

Without

With

+6.4%

Interview Lift

resolved cases with interview

Typical timeline

2y 5m

Avg Prosecution

29 currently pending

Career history

272

Total Applications

across all art units

Statute-Specific Performance

§101

16.5%

-23.5% vs TC avg

§103

42.8%

+2.8% vs TC avg

§102

17.9%

-22.1% vs TC avg

§112

12.5%

-27.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 243 resolved cases

Office Action

§103 §DP

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claims 6 and 14 are objected to because of the following informalities: the claims recite “comprises a convex upsamping process to generate the full resolution disparity estimate”, which contains a typographical error. Appropriate correction is required.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 5, 7-10, 13, 15-17, 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Khamis (US 20200099920) in view of Chakravarty (US 20210103745), hereinafter “Chak”
Regarding claim 1: 
 Khamis discloses: a method for generating a refined disparity estimate (FIG. 6), the method comprising:
receiving, with a computing device having one or more processors and one or more memories (FIG. 2), a stereo image pair (¶ [0011] “…An infrared stereo camera pair including a first camera and a second camera augmented to perceive infrared spectra captures depth images of the environment while the illumination pattern is being projected.”; ¶ [0045] “…At block 602, the processor 220 receives the left depth image 302 captured by the left depth camera 114 and the right depth image 304 captured by the right depth camera 116”).
generating, with two feature extractors of the learned stereo architecture, a pair of feature maps, wherein each one of the pair of feature maps corresponds to one of the images of the stereo image pair (¶ [0024] “…The downsampler 230 implements a feature network with shared weights between the left depth image and the right depth image (also referred to as a Siamese network)… The downsampler 230 outputs a 32-dimensional feature vector at each pixel in the downsampled image”; ¶ [0031] “…The downsampler 230 produces a downsampled left depth image 312 feature map and a downsampled right depth image 314 feature map at a reduced resolution” Note that Siamese network consists of two extractors);
generating, with a cost volume stage of the learned stereo architecture comprising one or more 3D convolution networks, a first disparity estimate (¶ [0025] “…The coarse cost volume calculator 235 generates a downsampled disparity map (e.g., a 160×90 disparity map) using a soft argmin operator.…To aggregate context across the spatial domain as well as the disparity domain, the coarse cost volume calculator 235 filters the cost volume with four 3D convolutions with a filter size of 3×3×3, batch- normalization, and leaky ReLu activations.”; ¶ [0045] “…At block 604, the downsampler 230 downsamples the left depth image 302 and the right depth image 304 to generate downsampled (reduced-resolution) depth images 312, 314. At block 606, the coarse cost volume calculator 235 generates a coarse (downsampled) disparity map 340 based on minimizing a matching cost.”);
upsampling the first disparity estimate to a resolution corresponding to a resolution of the stereo image pair to form a full resolution disparity estimate (¶ [0028] “…The upsampler 240 is a module configured to upsample the downsampled disparity map output from the coarse cost volume calculator 235 to the original resolution. In some embodiments, the upsampler 240 upsamples a 160×90 disparity map using bi-linear interpolation and convolutions to the original resolution of 1280×720.”; ¶ [0047] “At block 608, the upsampler 240 upsamples the coarse disparity map 340 using bilinear interpolation to the original resolution of the left depth image 302 and the right depth image 304”);
refining the full resolution disparity estimate with a disparity residual thereby generating a refined full resolution disparity estimate (¶ [0029] “The depth map generator 245 is a module configured to hierarchically refine the output of the upsampler 240 with a cascade of the network and apply a single refinement that upsamples the coarse output to the full resolution in one shot”; ¶ [0041] “FIG. 5 illustrates the electronic device 100 upsampling and refining a downsampled disparity map 340 output”; ¶ [0043] “…outputs a 1-dimensional disparity residual that is added to the previous prediction”; ¶ [0047] “…At block 610, the depth map generator 245 refines the upsampled disparity map 510 to generate a full-resolution disparity map by retrieving the high frequency details such as edges.”)); and
outputting the refined full resolution disparity estimate (¶ [0047] “…At block 612, the depth map generator 245 provides the full-resolution disparity map to the electronic device 100 for computer vision functionality, such as 3D reconstruction, localization and tracking, virtual and augmented reality, and applications such as indoor mapping and architecture, autonomous cars, and human body and face mapping.”).
Khamis does not specifically teach: implementing, with the computing device, a learned stereo architecture trained on fully synthetic image data.
However, in a related field, Chak teaches: implementing, with the computing device, a learned stereo architecture trained on fully synthetic image data (¶ [0010] “…synthetic images can be generated by a synthetic image rendering software and processed using generative adversarial networks to modify the synthetic images to appear to be photorealistic images”; ¶ [0011] “…generating two or more stereo pairs of synthetic images and generating two or more stereo pairs of real images based on the two or more stereo pairs of synthetic images using a generative adversarial network (GAN), wherein the GAN is trained using a six-axis degree of freedom (DoF)”; ¶ [0012] “…The deep neural network can be trained based on ground truth determined by a scene description input to a synthetic rendering engine”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Khamis to incorporate the teachings of Chak by including: implementing, with the computing device, a learned stereo architecture trained on fully synthetic image data in order to increase the amount and quality of diverse datasets without the high computational and manual costs of real-world data collections. 

Regarding claim 2: 
Khamis in view of Chak teaches the limitations of claim 1 as applied above. 
Khamis further teaches: the two feature extractors comprises a first feature extractor configured to generate a first feature map corresponding to a first image of the stereo image pair and a second feature extractor configured to generate a second feature map corresponding to a second image of the stereo image pair (¶ [0024] “…The downsampler 230 implements a feature network with shared weights between the left depth image and the right depth image (also referred to as a Siamese network)”; ¶ [0031] “…The downsampler 230 produces a downsampled left depth image 312 feature map and a downsampled right depth image 314 feature map at a reduced resolution”);), and
the first feature extractor and the second feature extractor are configured to share network weights (¶ [0024] “…The downsampler 230 implements a feature network with shared weights between the left depth image and the right depth image (also referred to as a Siamese network)”).

Regarding claim 5: 
Khamis in view of Chak teaches the limitations of claim 1 as applied above. 
Khamis further teaches: wherein the first disparity estimate comprises a disparity resolution less than the resolution of the stereo image pair (¶ [0012] “A processor of the electronic device downsamples the images captured by the first and second cameras and matches sections (referred to as patches) of the reduced-resolution images from the first and second cameras to each other to generate a low-resolution disparity (also referred to as a coarse depth map). “; ¶ [0025] “…The coarse cost volume calculator 235 generates a downsampled disparity map (e.g., a 160×90 disparity map) using a soft argmin operator”),
wherein the disparity resolution is at least one of a factor of 2, 4, or 8 less than the resolution of the stereo image pair (¶ [0031] “…in some embodiments, the downsampled left depth image 312 and the downsampled right depth image 314 are ⅛ of the input resolutions of the left depth image 302 and the right depth image 304”).

Regarding claim 7: 
Khamis in view of Chak teaches the limitations of claim 1 as applied above. 
Khamis further teaches: wherein the disparity residual is generated from a residual neural network (¶ [0043] “…The color matcher 520 then passes the 32-dimensional representation through 6 residual blocks that employ 3×3 convolutions, batch-normalization, and leaky ReLu activations (α=0.2)… The color matcher 520 outputs a 1-dimensional disparity residual that is added to the previous prediction”) 
based on the full resolution disparity estimate and at least one of the images of the stereo image pair (¶ [0043] “…the color matcher 520 passes the concatenated color of the left depth image 302 [one of the images of the stereo image pair] and the upsampled disparity map 510 [full resolution version]”);
wherein the disparity residual defines an error value (¶ [0012] “…To refine the results, the processor determines an error of the full resolution depth map based on matching patches of the full resolution depth map to patches of one of the images captured by the first or second camera based on color.”; ¶ [0043] “…The color matcher 520 outputs a 1-dimensional disparity residual that is added to the previous prediction”).

Regarding claim 8: 
Khamis in view of Chak teaches the limitations of claim 7 as applied above. 
Khamis further teaches: wherein refining the full resolution disparity estimate with the disparity residual comprises adjusting one or more disparity values of the full resolution disparity estimate based on the disparity residual (¶ [0043] “…The color matcher 520 outputs a 1-dimensional disparity residual that is added to the previous prediction. The color matcher 520 applies a ReLu to the sum to constrain disparities to be positive.”).

Regarding claims 9 and 17: the claims limitations are similar to those of claim 1; therefore, rejected in the same manner as applied above. Khamis discloses an apparatus in FIG. 2 and a CRM in ¶ [0048].
Regarding claim 10: the claim limitations are similar to those of claim 2; therefore, rejected in the same manner as applied above.
Regarding claims 13 and 19: the claims limitations are similar to those of claim 5; therefore, rejected in the same manner as applied above.
Regarding claims 15 and 20: the claims limitations are similar to those of claim 7; therefore, rejected in the same manner as applied above.
Regarding claim 16: the claim limitations are similar to those of claim 8; therefore, rejected in the same manner as applied above.


Claim(s) 3-4, 11-12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Khamis (US 20200099920) in view of Chakravarty (US 20210103745), hereinafter “Chak” and Smolyanskiy (US 20190295282), hereinafter “Smoly”.

Regarding claim 3: 
Khamis in view of Chak teaches the limitations of claim 1 as applied above. 
Khamis in view of Chak does not specifically teach: wherein the cost volume stage of the learned stereo architecture further comprises a cross-correlation cost volume to create a cost volume comprising a 4D feature volume at a configurable number of disparities for input into the one or more 3D convolution networks.
However, in a related field, Smoly teaches: wherein the cost volume stage of the learned stereo architecture further comprises a cross-correlation cost volume (¶ [0038] “…In some examples, rather than using concatenation for constructing or computing the cost volumes, one or more of the models may use cross-correlation (e.g., sliding dot product)” 
to create a cost volume comprising a 4D feature volume (¶ [0042] “…the left feature map and the right feature map may be concatenated and copied into a resulting four-dimensional (4D) cost volume (e.g., the first cost volume)“; ¶ [0043] “…, the right feature map and the left feature map may be concatenated and copied into a resulting four-dimensional (4D) cost volume (e.g., the second cost volume)”);
at a configurable number of disparities (¶ [0042] “…In some examples, the sliding of the right feature map tensor to the left along the epipolar lines of the left feature map tensor may be after padding the left feature map tensor by the max disparity. The max disparity may be a hyper-parameter of the DNN 100”)
 for input into the one or more 3D convolution networks (¶ [0044] “The DNN 100 may include one or more matching layers 108A and 108B.”; ¶ [0045] “…the matching layers 108A and 108B may include 3D convolutional layers followed by deconvolutional layers”; and Khamis in ¶ [0025] “…the coarse cost volume calculator 235 filters the cost volume with four 3D convolutions with a filter size of 3×3×3”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Khamis to incorporate the teachings of Smoly by including: wherein the cost volume stage of the learned stereo architecture further comprises a cross-correlation cost volume to create a cost volume comprising a 4D feature volume at a configurable number of disparities for input into the one or more 3D convolution networks in order to improve real-world performance by optimizing the memory cost.  

Regarding claim 4: 
Khamis in view of Chak and Smoly teaches the limitations of claim 3 as applied above. 
Smoly further teaches: wherein the cost volume is created through one or more shifting operations of a first feature map corresponding to a first image of the stereo image pair with respect to a second feature map corresponding to a second image of the stereo image pair (¶ [0042] “…to generate the first cost volume, the left feature map may be matched against the right feature map by sliding the right feature map tensor (e.g., corresponding to the right feature map) to the left along the epipolar lines of the left feature map tensor (e.g., corresponding to the left feature map).”; ¶ [0043] “Similarly, for example, to generate the second cost volume, the right feature map may be matched against the left feature map by sliding the left feature map tensor to the right along the epipolar lines of the right feature map tensor.”).

Regarding claims 11 and 18: the claims limitations are similar to those of claim 3; therefore, rejected in the same manner as applied above.
Regarding claim 12: the claim limitations are similar to those of claim 4; therefore, rejected in the same manner as applied above.

Claim(s) 6 and  14 are rejected under 35 U.S.C. 103 as being unpatentable over Khamis (US 20200099920) in view of Chakravarty (US 20210103745), hereinafter “Chak” and Changjiang (WO 2023225235), hereinafter “Chang”.

Regarding claim 6: 
Khamis in view of Chak teaches the limitations of claim 1 as applied above. 
Khamis in view of Chak does not specifically teach: wherein upsampling the first disparity estimate comprises a convex upsamping process to generate the full resolution disparity estimate.
However, in a related field, Chang teaches: wherein upsampling the first disparity estimate comprises a convex upsamping process to generate the full resolution disparity estimate (¶ [0040] “The first depth map DT at iteration T is estimated by sampling the depth hypotheses via linear interpolation given the index field ϕT obtained finally. In one embodiment, assuming the index field is at 1/4 resolution, a upsampling operator U (ex. a convex combination of a 3×3 neighbors) is used to upsample the index field to full resolution. For example, weight mask  is predicted from the hidden state ht using two convolutional layers and  softmax is performed over the weights of those 9 neighbors. The final high resolution index field (upsampled index field  ) is obtained by taking a weighted combination over the 9 neighbors, and reshaping to the resolution H×W. Convex combination can be implemented using the einsum function in PyTorch.”).
Therefore, it would have been obvious to a person of ordinary skill in the art prior to the effective filing date of the claimed invention to have modified Khamis and Chak to incorporate the teachings of Chang by including: wherein upsampling the first disparity estimate comprises a convex upsamping process to generate the full resolution disparity estimate in order to improve real-world performance by optimizing the memory cost.  

Regarding claim 14: the claim limitations are similar to those of claim 6; therefore, rejected in the same manner as applied above.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.

Claim 1-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-20 of copending Application No. 18/417249 (reference application) and/or over claims 1-20 of copending Application No. 18/417249 in view of the cited prior art above including: 
Khamis (US 20200099920);
Chakravarty (US 20210103745);
Smolyanskiy (US 20190295282), and 
Changjiang (WO 2023225235).

Motivation to combine these references with reference application is similar to those found throughout the office action above.  Although the claims at issue are not identical, they are not patentably distinct from each other because the claims at issue are broader in scope and are encompassed in the claims of the reference application.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WASSIM MAHROUKA whose telephone number is (571)272-2945. The examiner can normally be reached Monday-Thursday 8:00-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Koziol can be reached at (408) 918-7630. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WASSIM MAHROUKA/Primary Examiner, Art Unit 2665

Read full office action

Prosecution Timeline

Jan 19, 2024

Application Filed

Dec 30, 2025

Non-Final Rejection — §103, §DP

Mar 23, 2026

Applicant Interview (Telephonic)

Mar 31, 2026

Examiner Interview Summary

Apr 06, 2026

Response Filed

Apr 16, 2026

Final Rejection — §103, §DP (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/398,193

Patent 12602739

JOINT DENOISING AND DEMOSAICKING METHOD FOR COLOR RAW IMAGES GUIDED BY MONOCHROME IMAGES

2y 5m to grant Granted Apr 14, 2026

18/499,412

Patent 12602950

SYSTEM AND METHODS FOR IDENTIFYING A VERIFIED SEARCHER

2y 5m to grant Granted Apr 14, 2026

17/898,078

Patent 12597248

TARGET OBJECT DETECTION DEVICE

2y 5m to grant Granted Apr 07, 2026

18/538,523

Patent 12586421

ELECTRONIC DEVICE AND BIOMETRIC AUTHENTICATION METHOD USING SAME

2y 5m to grant Granted Mar 24, 2026

18/127,143

Patent 12579814

COMPUTER VISION-BASED ENERGY USAGE MANAGEMENT SYSTEM

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

86%

Grant Probability

93%

With Interview (+6.4%)

2y 5m

Median Time to Grant

Moderate

PTA Risk

Based on 243 resolved cases by this examiner. Grant probability derived from career allow rate.