Last updated: May 29, 2026
Application No. 18/658,739
Learning Device, Learning Method And Test Device, Test Method Using The Same

Non-Final OA §103§112
Filed
May 08, 2024
Priority
Sep 20, 2023 — RE 10-2023-0125735
Examiner
LEMIEUX, IAN L
Art Unit
2669
Tech Center
2600 — Communications
Assignee
Kia Corporation
OA Round
1 (Non-Final)
Interview Optional

— +8.5% interview lift. Interview lift (+8.5%) is below the 15.0% threshold. A written response is recommended.
Based on 580 resolved cases, 2023–2026
Examiner Intelligence

LEMIEUX, IAN L View full profile →
Grants 87% — above average
Career Allowance Rate
505 granted / 580 resolved
+25.1% vs TC avg
Moderate +8% lift
Without
With
+8.5%
Interview Lift
resolved cases with interview
Fast prosecutor
2y 1m
Avg Prosecution
15 currently pending
Career history
603
Total Applications
across all art units
Statute-Specific Performance

§101
4.4%
-35.6% vs TC avg
§103
71.5%
+31.5% vs TC avg
§102
4.4%
-35.6% vs TC avg
§112
15.5%
-24.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 580 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are currently pending in U.S. Patent Application No. 18/658,739 and an Office action on the merits follows.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a first network” and “a learning device” in claim(s) 9-13.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function (e.g. [0046] memory and processor combination for 1000 and encoder/decoder pair for 1200 as a software module that is also executed as a processor and memory combination [0052]), and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim(s) 2-3 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 recites the limitation "via the loss calculation device" in line 4.  There is insufficient antecedent basis for this limitation in the claim.  Applicant’s specification discloses 1500 as a sub-component of learning device 1000, which discourages interpretation wherein the device being referenced is the learning device itself.  For compact prosecution purposes ‘the’ is read ‘a’ so as to establish required antecedent basis therein.
As to claim 3, this claim depends on, and thereby inherits and fails to cure that deficiency as identified above for the case of claim 2, and is rejected accordingly.


Claim Rejections - 35 USC § 103
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


1.	Claims 14-15, 17-18, 1-2, 4-5, 9-10 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Guizilini et al. (US 2021/0004976 A1) in view of Petrovai et al. “Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation”.

As to claim 14, Guizilini discloses a learning method (Figs. 6-7) comprising:
obtaining, by one or more processors (Fig. 2 Processor 110 training module 230), a target image and a source image (It and Is respectively, Fig. 6 610 and 620, Fig. 7 710 as training data 250, [0048] “As shown in FIG. 6, the training data 250 includes a first image (It) 610 and a second image (Is) 620 of a training pair”, etc.,);
generating, by the one or more processors, an estimated depth map based on the target image (Fig. 7 720, Fig. 6 D̂t 270 output from Depth Model 280/260 based on input It , Fig. 4, etc.,);
generating, by the one or more processors, pose change information corresponding to a pose change between the target image and the source image (Fig. 5 pose change information 290, Fig. 6 output of Pose Model 280 based in input It 610 and Is 620 (training data 250), [0084] “transformation 290 indicates a difference in a frame of reference between the images according to ego-motion (i.e. , motion of the camera) and defines a relationship between the pair of training images”, etc.,);
generating, by the one or more processors, a composite image corresponding to the target image by using the estimated depth map, the pose change information, and the source image (Fig. 6 output from ‘view synthesis’ portion, Ît 640, based on pose change info/transformation 290, source images 620 and depth map 270, [0051] “The depth system 170 uses the transformation as a basis for synthesizing the image 640 from which the training module 230 generates the photometric loss 650”, [0052] “training module 230 generates the synthesized image 640. In one embodiment, the synthesized image 640 is, for example, a synthesized version of the second image 620 according to the depth map 270 and the transformation 290”, etc.,);
determining, by the one or more processors and based on the composite image and the target image, a first loss (Fig. 6 photometric loss 650 between Ît 640 and It 610, Fig. 7 730, [0074], etc.,);
determining, by the one or more processors and based on a pseudo depth map corresponding to the target image and the estimated depth map, a second loss (Fig. 6 supervised loss 660 based on ‘pseudo’ depth map equivalent under BRI (MPEP 2173.01 and 2111) if interpretable to be sparse, Dt 630 and predicted/estimated depth map 270 from 280/260, Fig. 7 750); and
back-propagating, by the one or more processors, the first loss and the second loss and updating a parameter of a first network for generating the estimated depth map and a parameter of a second network for generating the pose change information (Fig. 7 760, [0053], [0065] “As an additional note and as previously indicated, the reprojected distance loss is not appearance - based. Therefore, in general, any transformation matrix T may be applied to produce the reprojections from which the distance is minimized. However, in one embodiment, the training module 230 enforces the transformation T used in the reprojected distance loss as T = Tt→s, which is the transformation 290 produced by the depth model 280. In this way, the training module 230 can back-propagate through the pose model 280 to directly update the pose model 280 in combination with the depth model 260 and remain consistent with the depth model 260. Additionally, enforcing the transformation further provides for operating on the same reprojected distances used by the photometric loss (i.e. , self-supervised loss), but in a scale-aware capacity thereby avoiding the inherent ambiguity of the self-supervised loss and forcing the models 260/280 into metrically accurate models. In this way, the reprojected distance loss improves the training of the models 260 and 280 to provide models that are scale aware. The training module 230 imposes the supervised loss to further refine the depth model 260. As noted, the additional supervised loss allows the depth model 260 to learn metrically accurate estimates resulting in the depth model 260 improving predictions”, etc., see Fig. 6 reproduced below for quick reference).
    PNG
    media_image1.png
    542
    894
    media_image1.png
    Greyscale

Should Applicant assert that Guizilini Dt 630 is not ‘pseudo’ (even if sparse) because it is not artificially generated and instead sourced from physical sensors e.g. Lidar and alternatives (Guizilini [0039]), Petrovai evidences the obvious nature of a pseudo depth map (ȳi) used in determining a loss based further on a predicted/estimated depth map (p. 1578 “To improve the performance of our estimates, in the second step, we re-train the network with the scale invariant logarithmic loss supervised by pseudo labels”, page 1582 Lsp equation 6 “where yi is the predicted depth and ȳi is the pseudo ground-truth depth”).
Petrovai further discloses obtaining, by one or more processors, a target image and a source image (Fig. 2 at page 1580, It and Is target and source images respectively, for input into depth and pose networks (pose receiving both, depth receives It), page 1581, Section 3.3 “Consider a target image It and adjacent source images Is, where s = {t − 1, t+1} captured by a moving camera”, etc.,);
generating, by the one or more processors, an estimated depth map based on the target image (Fig. 2 output from depth network receiving It as input, page 1581 Section 3.2 Depth Network Architecture “Finally, two convolutional layers with [5x5, 32] and [1x1, 1] yield the final depth map”);
generating, by the one or more processors, pose change information corresponding to a pose change between the target image and the source image (page 1580 pose network output Rts and Tts);
generating, by the one or more processors, a composite image corresponding to the target image by using the estimated depth map, the pose change information, and the source image (It synthesized based on depth map output and output from pose network, page 1581 Section 3.3 “The basic mechanism behind the method relies on geometric projections that allow view synthesis of adjacent frames based on the predicted depth. … The target image is synthesized Is→t by sampling the source images Is with bilinear interpolation [17,18], which we denote with Is<p’>”, in view of Eq. 3’s basis on Mt→s (see further Equation 1, camera pose Mt→s));
determining, by the one or more processors and based on the composite image and the target image, a first loss (photometric loss Lp; page 1578 Abs “In the first step, our network is trained in a self-supervised regime on high-resolution images with the photometric loss”, p 1578 Intro “In this way, the target image can be reconstructed from the source images, and the photometric difference between the target and synthesized image will be minimized during training”, equation 4 at p. 1581);

    PNG
    media_image2.png
    340
    990
    media_image2.png
    Greyscale

determining, by the one or more processors and based on a pseudo depth map corresponding to the target image and the estimated depth map, a second loss (p. 1578 “To improve the performance of our estimates, in the second step, we re-train the network with the scale invariant logarithmic loss supervised by pseudo labels”, page 1582 Lsp equation 6 “where yi is the predicted depth and ȳi is the pseudo ground-truth depth”); and

    PNG
    media_image3.png
    324
    990
    media_image3.png
    Greyscale

updating a parameter of a first network for generating the estimated depth map (page 1585 Section 5 “We have presented a novel self-distillation based two-stage self-supervised training framework for monocular depth estimation: in the first stage, a self-supervised depth network and camera pose estimation network are trained on monocular sequences, in the second stage, the depth network is trained on high-resolution pseudo labels generated with the first network”) and a parameter of a second network for generating the pose change information (page 1585 Section 5 “We have presented a novel self-distillation based two stage self-supervised training framework for monocular depth estimation: in the first stage, a self-supervised depth network and camera pose estimation network are trained”).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to modify the system and method of Guizilini such that depth data 630 is optionally obtained by artificial (and ‘pseudo’ accordingly) means other than physical sensors as taught/ suggested by Petrovai, the motivation being as similarly taught/suggested therein and readily recognized by POSITA that such pseudo depth data may be generated in a less costly manner not requiring Lidar calibration and/or the same storage constraints (see Guizilini [0036], and Petrovai at p1585 Section 4.3 trained only on the monocular sequences and “without extra data”).

As to claim 15, Guizilini in view of Petrovai teaches/suggests the method of claim 14.
Guizilini further discloses the method wherein the determining of the second loss (determining supervised loss 660, Fig. 7 750, etc.,) comprises: determining the second loss further based on at least one of luminance information, contrast information, or structure information, of each of the pseudo depth map and the estimated depth map ([0061] “Thus, the training module 230 further employs the second stage loss (e.g., supervised loss 660) in addition to the first stage loss to refine the depth model 260. As shown in equations (6), (7), and (9) below, the second stage loss may take different forms depending on a particular implementation. Thus, the second stage loss may be an L1 loss, as shown in equation (6), a Berhu loss as shown in equation (7), or the reprojected distance loss, as shown in equation (9). The L1 loss and the Berhu loss generally illustrate approaches to directly comparing the ground truth data (i.e., the depth data) with the information produced by the depth model 260”; wherein the depth map values are luminance information in view of [0044] “depth map 270 is, in one embodiment, a data structure corresponding to the input image that indicates distances/depths to objects/features represented therein. Additionally, in one embodiment, the depth map 270 is a tensor with separate data values indicating depths for corresponding locations in the image on a per-pixel basis”, regarding ‘structure information’ see [0074] with reference to that depth smoothness loss as disclosed in [0059]; see also Petrovai page 1581 “We also adopt an edge-aware smoothness loss [17, 18] that encourages local smoothness in the presence of low image gradient”).

As to claim 17, Guizilini in view of Petrovai teaches/suggests the method of claim 14.
Guizilini further discloses the method wherein the target image is generated by an image sensor at a first time, and wherein the source image is generated by the image sensor at a second time within a threshold range around the first time ([0034] “the pairs of training images are monocular images from a monocular video that are separated by some interval of time (e.g., 0.06s) such that a perspective of the camera changes between the pair of training images as a result of motion of the camera through the environment while generating the video”).

As to claim 18, Guizilini in view of Petrovai teaches/suggests the method of claim 17.
Guizilini further discloses the method wherein the generating of the pose change information comprises:
generating the pose change information (290 from pose model 280) based on a first pose at the first time and a second pose at the second time ([0046] “Continuing to FIG . 5, the pose model 280 accepts two monocular images (i.e., a training pair) from the training data 250 of the same scene as an electronic input and processes the monocular images (It, Is) to produce estimates of camera ego-motion in the form of a set of 6 degree-of freedom (DOF) transformations between the two images” in view of the manner in which It and Is are separated by that temporal distance e.g. [0034]; see also Petrovai p. 1581 section 3.3).

As to claim 1, this claim is the system claim corresponding to the method of claim 14 and is rejected accordingly.  Regarding structural limitations one or more processors and memory, see Guizilini Fig. 2, Processor 110, data store 240, memory 210, training module 230, etc..

As to claim 9, this claim is an alternate system claim corresponding to the method of claim 14, and additionally comprising a ‘test device’ 2000 housing acquisition device 2100 and depth map generation network 2200 (that ‘first network’ of claim 1 implemented as one or more processors in conjunction with memory for the case of method claim 14), and is rejected accordingly under an interpretation that each of learning device 1000 and test device 2000 are implemented by processor(s) in conjunction with memory and differ only in that test device 2000 requires the additional function of “perform[ing] testing based on the updated parameters”.  Such a testing is taught/suggested for an inference execution of Guizilini and/or Petrovai as applied above for the case of claim 14 (see e.g. Petrovai “Train” and “Test” columns of Table 7), and Official Notice (MPEP 2144.03) is taken to the manner in which the state of the art frequently partitions data samples into training, validation and testing data sets with corresponding phases.

As to claim(s) 2 and 10, these claims are the system claims corresponding to the method of claim 15 and is rejected accordingly.  

As to claim(s) 4 and 12, this claim is the system claim corresponding to the method of claim 17 and is rejected accordingly.  

As to claim(s) 5 and 13, this claim is the system claim corresponding to the method of claim 18 and is rejected accordingly.  


2.	Claims 19 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Guizilini et al. (US 2021/0004976 A1) in view of Petrovai et al. “Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation” and Tong et al. (US 2020/0167941 A1).

As to claim 19, Guizilini in view of Petrovai teaches/suggests the method of claim 17.
Guizilini further discloses the method wherein the generating of the composite image (Ît 640, [0051-0052]) comprises:
	obtaining first three-dimensional (3D) point cloud information at the first time (Fig. 3A, depth map 300/270; Examiner notes the ‘obtaining’ recited does not exclude per-pixel depth measures sourced from a monocular camera, as distinguished from e.g. LiDAR like point clouds 320 and 310 of Guizilini);
converting, based on the pose change information (290), the first 3D point cloud information at the first time (270, [0052]) into second [0052] “training module 230 generates the synthesized image 640. In one embodiment, the synthesized image 640 is, for example, a synthesized version of the second image 620 according to the depth map 270 and the transformation 290”);
While pertinent to the second/supervised loss, Guizilini suggests converting, based on an image sensor parameter corresponding to the image sensor (suggested camera pose matrix in view of that ego-motion disclosure, see also Eqn. 1 of Petrovai), [0007] “executed by the one or more processors cause the one or more processors to compute a supervised loss based, at least in part, on reprojecting the depth map and the depth data onto an image space of the second image according to at least the transformation. The training module including instructions to update the depth model and the pose model according to at least the supervised loss”, Fig. 8, [0063]); and
generating the composite image based on a pixel value of the source image corresponding to Guizilini 640, [0052]).
While Guizilini suggests the manner in which 3D points from either image space of target and/or source images may be projected to that space of the other (Fig. 8), and further considers the predicted depth map in generating the synthesized/composite image, Guizilini fails to explicitly disclose that second 3D point cloud when generating synthesized image 640.  Guizilini does however suggest that such data exists as part of 630, in [0071] – since depth data 630 exists for each monocular image in the sequence – to include the second/source (as distinguished from target).  
Tong evidences the obvious nature of a synthesized image generation based on corresponding 3D point clouds, uplifting and converting/projection steps and explicitly a 3D point cloud corresponding to Is (Fig. 5, see two 3D point clouds 525 and 565, [0055] “the 3-D point cloud 525 can be derived for the source view input image 505. The 3-D point cloud 525 is then combined with the predicted rotation and translation parameters R and T 545 to generate a transformed 3-D point cloud 565 for the target”, [0053] “The parameter Îs; is the source image Is, warped to the target view based on the predicted depth image D̂s and the predicted camera transformation R, T from the source view to target view”; Examiner understands 345/545 of Tong to correspond to 290 of Guizilini and that pose change information recited).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to further modify the system and method of Guizilini in view of Petrovai such that synthesized image Ît 640 is generated in part based on a 3D point cloud associated with Is as taught/suggested by Tong, the motivation as similarly taught/suggested therein that such depth data for the source image may provide additional geometric context better distinguishing between objects/edges for the synthesized 2D image based thereon.

As to claim 6, this claim is the system claim corresponding to the method of claim 19 and is rejected accordingly.  


3.	Claims 20 and 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Guizilini et al. (US 2021/0004976 A1) in view of Petrovai et al. “Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation” and Bian et al. “Auto-Rectify Network for Unsupervised Indoor Depth Estimation”.

As to claim 20, Guizilini in view of Petrovai teaches/suggests the method of claim 14.
Guizilini discloses the method further comprising: generating, Guizilini discloses depth data 630 as sparse depth data sourced from e.g. Lidar ([0039]), and associated with each of It and Is, [0071], and required for the calculation of that second/supervised loss 660 in comparison to estimated/predicted depth map D̂t 270).
As suggested above for the case of claim 1 should ‘pseudo’ require an artificially generated depth map, Guizilini fails to explicitly disclose the pseudo/sparse depth map data 630 as being derived/generated by any pre-trained pseudo depth map generation network.  
Petrovai however discloses a pseudo depth map (ȳi section 3.5 Equation 6) generated artificially by a pre-trained pseudo depth map generation network (see Fig. 2 stage 2, Pseudo Depth Labels Generation and that depth network producing relative pseudo labels, “PS generated with the self-supervised teacher network”, page 1583 “using our best model… we generate pseudo labels for the entire training set”).  Examiner notes that Applicant’s Disclosure regarding 1700 at most requires ([0072]) that 1700 be characterized by a non-specific level of accuracy higher than that of the depth estimation network 1200 currently being trained, and as such that teacher/student disclosure of Petrovai suggests equivalence, however should Applicant assert that the teacher depth estimation network of Petrovai is non-equivalent to Applicant’s 1700, references of record evidence the obvious nature of such a pseudo depth map generation network broadly (see e.g. Sun et al. SC-depthV3 NPL attached, Bian/SC-depthV2 as applied, etc.,).
Bian further evidences the obvious nature of generating, by a pre-trained pseudo depth map generation network, the pseudo depth map (Fig. 5 “Qualitative results. Left to right: RGB, TrainFlow [37], Monodepth2 [29], and Ours. The models are trained on NYUv2 [36]”).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date, to further modify the system and method of Guizilini in view of Petrovai such that depth data 630 used in that second/supervised loss 660 calculation is depth data derived from a pre-trained pseudo depth map generation network as taught/ suggested by Petrovai, Bian, etc., in view of that same rationale as presented above for the case of claim 1, in further view of a reasonable expectation of success for an instance wherein the pre-trained model generating such data is of a level of accuracy higher (the accuracy of Bian’s network has also been established/benchmarked therein) than that of the depth network being trained (and accordingly its produced/ pseudo depth map is a supervisory signal).

As to claim 7, this claim is the system claim corresponding to the method of claim 20 and is rejected accordingly.  

As to claim 8, Guizilini in view of Petrovai and Bian teaches/suggests the system of claim 7.
Guizilini in view of Petrovai and Bian further teaches the method wherein the pseudo depth map generation network comprises a parameter in a frozen state (Petrovai’s teacher network (Self) parameters are in a frozen/locked state after that first training – see page 1580 “In the second stage, the camera pose network is fixed and we instantiate a new depth student network having the same or a more lightweight architecture”).
Furthermore, for the case of Guizilini as modified and Bian as applied (see rejection to claim 20 above), Bian’s SCDepth-v2 network generating that supervisory pseudo depth map comprises parameters in a locked/fixed/frozen state since it has already, and is not being further, trained/updated.


Additional References
Prior art made of record and not relied upon that is considered pertinent to applicant's disclosure:
Additionally cited references (see attached PTO-892) otherwise not relied upon above have been made of record in view of the manner in which they evidence the general state of the art.  


Allowable Subject Matter
Claim(s) 16 and 11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  
Claim 3 (corresponding to 16/11) would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
References of record fail to serve in any obvious combination teaching each and every limitation as required therein, namely that second loss based on luminance, contrast and structure information of each of the pseudo and estimated depth maps, “wherein a first weight corresponding to the luminance information is smaller than a second weight corresponding to the contrast information and smaller than a third weight corresponding to the structure information”.
Hafeez et al. “Depth Estimation using Weighted-loss and Transfer Learning” teaches/suggests an equivalent, however does not beat Applicant’s EFD, see at pages 4-5 Equations 4 and 5 of Section 3.2.4, reproduced in part below:

    PNG
    media_image4.png
    206
    1042
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    516
    1034
    media_image5.png
    Greyscale



Inquiry
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IAN L LEMIEUX whose telephone number is (571)270-5796. The examiner can normally be reached Mon - Fri 9:00 - 6:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached on 571-272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/IAN L LEMIEUX/Primary Examiner, Art Unit 2669
Read full office action
Prosecution Timeline

May 08, 2024
Application Filed
Mar 17, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/244,771
Patent 12639920
VISION BASED TARGETING OF AGRICULTURAL OBJECTS
2y 8m to grant Granted May 26, 2026
17/850,766
Patent 12633078
VOLUMETRIC PERMISSIONING
3y 10m to grant Granted May 19, 2026
18/341,163
Patent 12608849
FEATURE LOCATION IDENTIFICATION FOR AUTONOMOUS SYSTEMS AND APPLICATIONS
2y 10m to grant Granted Apr 21, 2026
18/459,244
Patent 12602825
Human body positioning method based on multi-perspectives and lighting system
2y 7m to grant Granted Apr 14, 2026
17/896,167
Patent 12592086
POSE DETERMINING METHOD AND RELATED DEVICE
3y 7m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
87%
Grant Probability
96%
With Interview (+8.5%)
2y 1m (~1m remaining)
Median Time to Grant
Low
PTA Risk
Based on 580 resolved cases by this examiner. Grant probability derived from career allowance rate.