Last updated: May 29, 2026
Application No. 18/178,100
SYSTEMS AND METHODS FOR DETERMINING A COMBINATION OF SENSOR MODALITIES BASED ON ENVIRONMENTAL CONDITIONS

Non-Final OA §103
Filed
Mar 03, 2023
Examiner
SULTANA, DILARA
Art Unit
2858
Tech Center
2800 — Semiconductors & Electrical Systems
Assignee
Caterpillar Inc.
OA Round
3 (Non-Final)
Interview Optional

— +16.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 81% grant rate with +16.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 129 resolved cases, 2023–2026
Examiner Intelligence

SULTANA, DILARA View full profile →
Grants 81% — above average
Career Allowance Rate
104 granted / 129 resolved
+12.6% vs TC avg
Strong +16% interview lift
Without
With
+16.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
23 currently pending
Career history
173
Total Applications
across all art units
Statute-Specific Performance

§101
3.1%
-36.9% vs TC avg
§103
81.3%
+41.3% vs TC avg
§102
12.5%
-27.5% vs TC avg
§112
2.8%
-37.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 129 resolved cases
Office Action

§103
DETAILED ACTIONS
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04/06/2026 has been entered
Response to Amendment
This office action is in response to the amendments/arguments submitted by the Applicant(s) on 03/26/2026.

Status of the Claims
Claims 1-6, and 8-20 are pending.
Claims 1, 11,12, 14, and 18 are amended.
Claim 7 is canceled.



Response to Arguments
Rejections Under 35 U.S.C. 103
Applicant’s arguments see remarks pages 3-4, filed 09/26/2025., with respect to the rejection(s) of Claim 1 under 35 U.S.C. 103 have been fully considered and are moot because the amendment has necessitated a new ground of rejections. The new rejections are set forth below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 

 A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, and 8-11, and 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Plouzek et al. (US 2020/0363203 A1, hereinafter Plouzek, previously cited) and in view of Park et al. (US 2021/0406560A1, hereinafter Park, previously cited) and in further view of Sporrer et al. (US 2021/0227742 A1, hereinafter Sporrer).

Regarding Claim 1, Plouzek teaches,
A computer-implemented method for determining a wear or loss condition (Plouzek, Figure 32, step 1002) of a ground engaging tool (Plouzek, Figure 1, ground engaging tool 134) comprising: 
receiving, by one or more processors ((Plouzek, Figure 14, processor 506, [0067], “the processor 506 may be a plurality of processors arranged, for example, as a processing array”), imaging data (Plouzek, Figure 32, [0098] “acquire an image of the ground engaging tool (see block 1000 of FIG. 32)”; and environmental data from a plurality of sensors (Plouzek, Figure 1, [0051], a plurality sensors 110 may be mounted on the boom 108' near the joint 106”) , wherein the plurality of sensors includes a plurality of imaging sensors of different modalities ((Plouzek, Figure, sensors 110, [0052] “ the plurality of sensors 110 may be a plurality of cameras 110'”); 
 predicting, by the one or more processors, one or more environmental conditions associated with a worksite based on the imaging data and the environmental data (Plouzek, Figure 14,[ 0071], the output device 132 may continuously display a plurality of three-dimensional scenes on a frame-by-frame basis as provided by the processor 506 to the output device 132 based upon the input signals from the sensors 110 as modified by the processor. Such frame-by-frame representation of the work environment of the machine 102 when used for recognition and monitoring the movement or the condition of the ground engaging tool 134”);
determining, by the one or more processors, the wear or loss condition of the ground engaging tool  (Plouzek, Figure 32, [0098] “acquire an image of the ground engaging tool (see block 1000 of FIG. 32);[0099] evaluate the image using an algorithm that compares the acquired image to a database of existing images to determine the damage, the amount of wear, or the absence of the ground engaging tool (see block 1002)”); based on the at least one physical dimension (Plouzek, Figure 14, [0082] “The electronic controller unit 126 may be configured to: [0083] determine a dimension of a ground engaging tool installed on a work tool (see FIG. 16, block 600”).
determining, by the one or more processors, at least one physical dimension of at least one portion of the ground engaging tool using the selected at least one deep learning network (Plouzek, Figure 31, Step 918, [0079] The electronic controller unit may be configured to use machine learning to determine at least one of the following: a bare shape of the work tool, a shape of the work tool with new ground engaging tools attached to the work tool, a shape of a worn work tool necessitating maintenance, and a shape of a worn GET necessitating maintenance (see block 918 of FIG. 31, see also FIG. 21” NOTE: DEEP learning  is a machine learning ).
 Plouzek is silent on 
determining, by the one or more processors, network selection weights for each of a plurality of deep learning networks based, at least in part, on the predicted one or more environmental conditions to select at least one deep learning network that is predicted to provide higher accuracy under the predicted one or more environmental conditions than at least one unselected deep learning network,
 wherein each of the plurality of deep learning networks utilizes a different respective combination of one or more of the pluralities of imaging sensors as inputs;
 However, Park teaches determining, by the one or more processors, network selection weights for each of a plurality of deep learning networks based, at least in part, (Park, Figure 7, [0078], “The method 700, at block B702, includes generating a plurality of trained DNNs by computing one or more losses with respect to outputs of individual DNN s and by computing one or more consistency losses with respect to outputs of two or more of the individual DNNs. For example, the source DNNs may be trained using the ground truth data 614 to generate the losses 622 and may be trained using photometric consistency losses 624”. on the predicted one or more environmental conditions to select at least one deep learning network that is predicted to provide higher accuracy under the predicted one or more environmental conditions than at least one unselected deep learning network (Park, Figure 6A-6B, 7 [0071] “The 3D signals output by the source DNNs may be compared to ground truth data 614 using one more loss function to compute loss 612 corresponding to the source DNNs. The ground truth data 614 may be generated using map data (e.g., from an HD map, or other map type, such as those used for localization) that may indicate locations of static features or objects such as lane lines, wait conditions, signs, fixed objects, and/or the like. These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision” NOTE: the losses are used to determine weights/ biases of DNN’s)
wherein each of the plurality of deep learning networks utilizes a different respective combination of one or more of the pluralities of imaging sensors as inputs (Park, Figure 5, [0033] “In embodiments, each feature output or 3D signal
may correspond to a respective sensor pipeline or stream. For example, a first sensor pipeline may include a first can1era that may generate image data that may be processed by a first DNN to generate the feature outputs F 1 and/or the
3D signal 104A, a second sensor pipeline may include a second camera that may generate image data that may be processed by a second DNN to generate the feature outputs F2 and/or the 3D signal 104B, a third sensor pipeline may
include a first RADAR sensor that may generate RADAR data that may be processed directly--e.g., using a sensor data pre-processor-and/or may be processed using a DNN to generate the feature outputs F RADAR and/or the 3D signal 108, and so on. Depending on the embodiment, any number of sensor pipelines may be used”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]-[0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).
Plouzek teaches is plurality of sensors for example but not limited, the sensor 110 may be a monocular which could be a monocular camera, a stereo camera, an infrared camera, a high resolution camera, an array of one or more types of cameras, an opto-acoustic sensor, a radar, a laser based imaging sensor, or the like, or combinations thereof, configured to assist recognition, and monitoring of the ground engaging tool to detect work environment ((Plouzek, Figure, sensors 110, [0052] “ the plurality of sensors 110 may be a plurality of cameras 110 (…) configured to assist recognition, and monitoring of the ground engaging tool 134'”).
Plouzek silent on environmental data including weather data from a plurality of sensors,
However, Sporrer teaches environmental data including weather data from a plurality of sensors (Sporrer, Figure 1, [0087] “along with the measured frame height (that is measured using sensor 123), along with the frame height measured from the sensor signal generated by sensor 106, along with current soil conditions, weather conditions, or a wide variety of other information”. [0030] FIG. lA is a perspective view showing one example of a mobile agricultural machine architecture 100 with a ground-following device 130 and corresponding
sensor 106”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’s one of the sensors of the engaging tool with a weather sensing sensor as taught by Sporrer and obtain worksite weather soil condition and environment data (Sporrer, [0087]).

Regarding Claim 2, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 1,
Plouzek teaches wherein the determining of the at least one physical dimension includes: applying, by the one or more processors,(Plouzek, Figure 31, Step 918, [0079] The electronic controller unit may be configured to use machine learning to determine at least one of the following: a bare shape of the work tool, a shape of the work tool with new ground engaging tools attached to the work tool, a shape of a worn work tool necessitating maintenance, and a shape of a worn GET necessitating maintenance (see block 918 of FIG. 31, see also FIG. 21” NOTE: DEEP learning  is a machine learning ).
Plouzek is silent applying, by the one or more processors, the network selection weights to object identification probability scores of the plurality of deep learning networks; generating, by the one or more processors, a composite object identification,
wherein the composite object identification is based on a weighted combination of the object identification probability scores based on the network selection weights; and
determining, by the one or more processors, the at least one physical dimension based on the composite object identification. 
	However, Park teaches applying, by the one or more processors (Park 8C, controllers 836, [0086], a third controller 836 for artificial intelligence functionality (e.g., computer)), the network selection weights to object identification probability scores of the plurality of deep learning networks (Park,[007] the multi-sensor fusion network and the select layers of the individual machine learning models may be trained together in an end-to-end training process. As
such, updates to weights and biases as a result of one or more loss functions may be back propagated through not only the layers of the multi-sensor fusion network, but also through to the layers of the respective source machine learning
models (e.g., the feature extractor layers. [0071] These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”); 
generating, by the one or more processors, a composite object identification, wherein the composite object identification is based on a weighted combination of the object identification probability scores based on the network selection weights (Park, [0071], “in FIGS. 1A-1B. For example, output 3D signals 104 of the source DNNs may correspond to rasterized images, or may be used to generate rasterized images, and the rasterized images may be compared to ground truth rasterized images to compute losses 622 (e.g., losses 622A, 622B, and 622N). These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”) and
determining, by the one or more processors, the at least one physical dimension based on the composite object identification. (Park, Figure 8C, [0034]” the input 3D signals may be generated from a perspective of the ego machine 800, and the fused output 122 (Fig. 1B) may be generated from an ego-centric point of view. For example, the input channels may indicate a shape, orientation, and/or classification for objects or features in the environment”).
 	 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]-[0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).

Regarding Claim 3, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 2,
	Plouzek teaches processing the image (Plouzek, Figure 16, step 612) of the ground engaging tool (Plouzek, Figure 1, ground engaging tool 134).
Plouzek is silent on wherein each of the plurality of deep learning networks is configured to determine the object identification probability scores by: 
processing, by the one or more processors, at least one image of the ground engaging tool to determine at least one bounding box for at least one region of interest; and performing, by the one or more processors, instance segmentation of the at least one region of interest to detect one or more objects within the at least one bounding box, wherein the object identification probability scores is indicative of a confidence level of the detection of the one or more objects. 
	However, Park teaches on wherein each of the plurality of deep learning networks is configured to determine the object identification probability scores (Park,[007] the multi-sensor fusion network and the select layers of the individual machine learning models may be trained together in an end-to-end training process. As such, updates to weights and biases as a result of one or more loss functions may be back propagated through not only the layers of the multi-sensor fusion network, but also through to the layers of the respective source machine learning models (e.g., the feature extractor layers. [0071] These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”);  by: processing, by the one or more processors, at least one image to determine at least one bounding box (Park, [0034], a rasterized image may include bounding shapes or cuboids corresponding to dynamic actors”) for at least one region of interest (Park, [0032] “For training, as described in more detail herein, the sensor data may include original images ( e.g., as captured by one or more image sensors), down-sampled images, up-sampled images, cropped or region of interest (ROI) images, otherwise augmented images, and/or a combination thereof”) ; and 
performing, by the one or more processors, instance segmentation of the at least one region of interest to detect one or more objects within the at least one bounding box (Park, [0034], a boundary or encoded values for pixels corresponding to drivable free space (e.g., an area of the environment that the ego-machine 800 is able to traverse), and/or other objects or features”).
 wherein the object identification probability scores is indicative of a confidence level of the detection of the one or more objects (Park, Figure 1A, 2D, [0039], “a distribution of potential locations may be fed to the fusion DNN 120 to aid the fusion DNN 120 in generating more accurate predictions in the fused output 122. As such, the ellipses or probability distribution function (PDF) 252 representations corresponding to each object 240 may indicate potential locations-e.g., with corresponding confidence values-for where the object 240 may be located. For different sensor modalities, the corresponding ellipses or PDFs 252 may be of different shape”. [0040] For example, FIG. 2D may represent a subset of the ellipses or PDFs for a field of view 250A of a particular sensor”).
 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]- [0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).

 Regarding Claim 4, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 3,
 wherein the composite object identification is based on a weighted average of the detections of the one or more object by the plurality of deep learning networks weighted by the object identification probability scores and the network selection weights (Park, [0071], “in FIGS. 1A-1B. For example, output 3D signals 104 of the source DNNs may correspond to rasterized images, or may be used to generate rasterized images, and the rasterized images may be compared to ground truth rasterized images to compute losses 622 (e.g., losses 622A, 622B, and 622N). These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”. Figure 1A, 2D, [0039], “a distribution of potential locations may be fed to the fusion DNN 120 to aid the fusion DNN 120 in generating more accurate predictions in the fused output 122. As such, the ellipses or probability distribution function (PDF) 252 representations corresponding to each object 240 may indicate potential locations-e.g., with corresponding confidence values-for where the object 240 may be located), and 
wherein the at least one physical dimension of the at least one portion of the ground engaging tool is based on a measurement of the instance segmentation based on the composite object identification (Park, [0074] “with respect to FIGS. 6Aand 6B, the fused outputs 122 of the fusion DNN 120 may be compared to the ground truth data 614 corresponding to the fused outputs to compute loss 630. The ground truth data 614 for the fused output 122 may be generated using similar techniques or data as the ground truth data 614 for the source DNNs. As such. the loss 630 may be used to update paran1eters of the fusion DNN 120 until the fusion DNN 120 converges to an acceptable level of accuracy or precision”).
 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]- [0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).
  
 Regarding Claim 5, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 3,
Plouzek further teaches wherein a comparison of at least one nominal physical dimension to the at least one physical dimension is indicative of wear or loss rate of the ground engaging tool (Plouzek, Figure 1, [0073] “The electronic controller unit 126' may be configured to acquire an image of the ground engaging tool 134 (see block 900 in FIG. 31), evaluate the image using an algorithm that compares the acquired image to a database of existing images to determine the amount of damage, the amount of wear. also see Figure 22 [0083] determine a dimension of a ground engaging tool installed on a work tool (see FIG. 16, block 600);[0084] compare the determined dimension of the ground engaging tool installed on a work tool to a theoretical dimension of a new ground engaging tool installed on the work tool (block 602))” . 
 
 Regarding Claim 6, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 2,
	Plouzek is silent on wherein an adjustment to the composite object identification includes: performing, by the one or more processors, a normalization of the object identification probability scores.
However, Park teaches wherein an adjustment to the composite object identification includes: performing, by the one or more processors, a normalization of the object identification probability scores. (Park, [0044] In embodiments where the fusion DNN 120 and/or one or more of the DNNs or machine learning models used to generate the 3D signal(s) includes a convolutional neural network (CNN), one or more of the layers may include an input layer. [0050] fusion DNN 120, this is not intended to be limiting. For example, additional or alternative layers may be used, such as normalization layers, SoftMax layers, and/or other layer types” Figure 2D, Probability 252a).
 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]- [0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).
 
 Regarding Claim 8, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 2,
	Plouzek further teaches further comprising: receiving, by the one or more processors (Plouzek, Figure 14, processor 506), and from the plurality of sensors (Plouzek, Figure 14, sensors 110), data indicative of one or more operating condition of the ground engaging tool, (Plouzek, Figure 31, block 900, [0073] The electronic controller unit 126' may be configured to acquire an image of the ground engaging tool 134)
wherein the one or more operating condition includes one or more of usage data, maintenance data, measurement data, or wear data (Plouzek, [0075] The at least one sensor 216 may be configured to determine at least one of the following variables: a bucket height, a bucket tilt angle, a linkage position, a linkage tilt
angle, a length of hydraulic cylinder extension, a force exerted on a hydraulic cylinder, a linkage strain, a cylinder control, a drive power, a wheel or a track velocity, and a steering position or control (see block 910)); and
 comparing, by the one or more processors, the at least one physical dimension to a predetermined safety threshold associated with the one or more operating condition (Plouzek, [0105] comparing the determined dimension of the ground engaging tool installed on a work tool to a theoretical acceptable dimension of the ground engaging tool (step804)); and, the predetermined safety threshold including a minimum thickness threshold, a minimum wear percentage threshold, or a combination thereof (Plouzek, Figure 14,  [0087] In some embodiments, the electronic controller
unit 126 may be configured to compare the difference between the determined dimension 204 and the theoretical dimension 206 to a threshold value).

 Regarding Claim 9, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 8,
Plouzek further teaches further comprising: generating, by the one or more processors, a notification regarding operable conditions of the at least one ground engaging tool in a user interface of a user device based, at least in part, on the at least one physical dimension of the ground engaging tool. (Plouzek, [0087] In some embodiments, the electronic controller unit 126 may be configured to compare the difference between the determined dimension 204 and the theoretical dimension 206 to a threshold value, then the electronic controller unit 126 may be configured to create an alert that the ground engaging tool 134 needs to be serviced, if the difference is above the threshold value, then the electronic controller unit 126 may be configured to create an alert that the ground engaging tool 134 is damaged or missing, requiring immediate maintenance (block 610).)

Regarding Claim 10, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 1,
Plouzek further wherein the plurality of imaging sensors includes a visible color (RGB) imager, a stereo camera, and a longwave infrared camera. (Plouzek, Figure 14, [0052], the sensor 110 may be a monocular camera, a stereo camera, an infrared camera, a high-resolution camera, an array of one or more types of cameras, an opto-acoustic sensor, a radar, a laser- based imaging sensor).  

Regarding Claim 11, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 1,
Plouzek teaches wherein the plurality of sensors include one or more of a weather sensor, a temperature sensor, or an ultrasonic sensors that indicates the  environmental data at a worksite ((Plouzek, Figure 14, [0072] Referring back to FIG. 14, Figure 26, The system 200' may comprise at least one sensor 216 that is configured to monitor the position or the orientation of the work tool 104 or the ground engaging tool 134 (see also FIG.26), and an electronic controller unit 126' coupled to the at least one sensor 216. [0075] The at least one sensor 216 may be configured to determine at least one of the following variables: a bucket height, a bucket tilt angle, a linkage position, a linkage tilt angle, a length of hydraulic cylinder extension, a force exerted on a hydraulic cylinder, a linkage strain, a cylinder control, a drive power, a wheel or a track velocity, and a steering position or control (see block 910)).

Regarding Claim 13, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 1,
Plouzek is silent on wherein each of the plurality of deep learning networks includes a respective convolutional neural network (CNN).  
However, Park teaches wherein each of the plurality of deep learning networks includes a respective convolutional neural network (CNN). (Park, [0044] In embodiments where the fusion DNN 120 and/or one or more of the DNNs or machine learning models used to generate the 3D signal(s) includes a convolutional neural network (CNN)”).
 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]- [0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).

Regarding Claim 14, Plouzek teaches,
A system for determining a wear or loss condition of a ground engaging tool Plouzek, Figure 31, block 902, [0073], determine the amount of damage, the amount of wear, or the absence of the ground engaging tool 134), comprising: 
one or more processors (Plouzek, Figure 14, processor 506); and
 at least one non-transitory computer readable medium storing instructions (Plouzek, [0068] The memory 508 may be implemented as a non-transitory
computer readable medium) which, when executed by the one or more processors, cause the one or more processors to perform operations (Plouzek, which are not shown in FIG.14. In one particular embodiment, the processor 506 is configured to execute various parts of a method 800 illustrated in FIG. 15
by executing computer executable instructions 510 in the memory 508. In yet another embodiment, the processor 506 may be a plurality of processors arranged, for example, as a processing array) comprising:
receiving, by one or more processors ((Plouzek, Figure 14, processor 506, [0067], “the processor 506 may be a plurality of processors arranged, for example, as a processing array”), imaging data (Plouzek, Figure 32, [0098] “acquire an image of the ground engaging tool (see block 1000 of FIG. 32)”; and environmental data from a plurality of sensors (Plouzek, Figure 1, [0051], a plurality sensors 110 may be mounted on the boom 108' near the joint 106”) , wherein the plurality of sensors includes a plurality of imaging sensors of different modalities ((Plouzek, Figure, sensors 110, [0052] “ the plurality of sensors 110 may be a plurality of cameras 110'”); 
 predicting, by the one or more processors, one or more environmental conditions associated with a worksite based on the imaging data and the environmental data (Plouzek, Figure 14,[ 0071], the output device 132 may continuously display a plurality of three-dimensional scenes on a frame-by-frame basis as provided by the processor 506 to the output device 132 based upon the input signals from the sensors 110 as modified by the processor. Such frame-by-frame representation of the work environment of the machine 102 when used for recognition and monitoring the movement or the condition of the ground engaging tool 134”);
determining, by the one or more processors, the wear or loss condition of the ground engaging tool  (Plouzek, Figure 32, [0098] “acquire an image of the ground engaging tool (see block 1000 of FIG. 32);[0099] evaluate the image using an algorithm that compares the acquired image to a database of existing images to determine the damage, the amount of wear, or the absence of the ground engaging tool (see block 1002)”); based on the at least one physical dimension (Plouzek, Figure 14, [0082] “The electronic controller unit 126 may be configured to: [0083] determine a dimension of a ground engaging tool installed on a work tool (see FIG. 16, block 600”).
determining, by the one or more processors, at least one physical dimension of at least one portion of the ground engaging tool using the selected at least one deep learning network (Plouzek, Figure 31, Step 918, [0079] The electronic controller unit may be configured to use machine learning to determine at least one of the following: a bare shape of the work tool, a shape of the work tool with new ground engaging tools attached to the work tool, a shape of a worn work tool necessitating maintenance, and a shape of a worn GET necessitating maintenance (see block 918 of FIG. 31, see also FIG. 21” NOTE: DEEP learning  is a machine learning ).
 Plouzek is silent on 
determining, by the one or more processors, network selection weights for each of a plurality of deep learning networks based, at least in part, on the predicted one or more environmental conditions to select at least one deep learning network that is predicted to provide higher accuracy under the predicted one or more environmental conditions than at least one unselected deep learning network,
 wherein each of the plurality of deep learning networks utilizes a different respective combination of one or more of the pluralities of imaging sensors as inputs;
 However, Park teaches determining, by the one or more processors, network selection weights for each of a plurality of deep learning networks based, at least in part, (Park, Figure 7, [0078], “The method 700, at block B702, includes generating a plurality of trained DNNs by computing one or more losses with respect to outputs of individual DNN s and by computing one or more consistency losses with respect to outputs of two or more of the individual DNNs. For example, the source DNNs may be trained using the ground truth data 614 to generate the losses 622 and may be trained using photometric consistency losses 624”. on the predicted one or more environmental conditions to select at least one deep learning network that is predicted to provide higher accuracy under the predicted one or more environmental conditions than at least one unselected deep learning network (Park, Figure 6A-6B, 7 [0071] “The 3D signals output by the source DNNs may be compared to ground truth data 614 using one more loss function to compute loss 612 corresponding to the source DNNs. The ground truth data 614 may be generated using map data (e.g., from an HD map, or other map type, such as those used for localization) that may indicate locations of static features or objects such as lane lines, wait conditions, signs, fixed objects, and/or the like. These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision” NOTE: the losses are used to determine weights/ biases of DNN’s)
wherein each of the plurality of deep learning networks utilizes a different respective combination of one or more of the pluralities of imaging sensors as inputs (Park, Figure 5, [0033] “In embodiments, each feature output or 3D signal
may correspond to a respective sensor pipeline or stream. For example, a first sensor pipeline may include a first can1era that may generate image data that may be processed by a first DNN to generate the feature outputs F 1 and/or the
3D signal 104A, a second sensor pipeline may include a second camera that may generate image data that may be processed by a second DNN to generate the feature outputs F2 and/or the 3D signal 104B, a third sensor pipeline may
include a first RADAR sensor that may generate RADAR data that may be processed directly--e.g., using a sensor data pre-processor-and/or may be processed using a DNN to generate the feature outputs F RADAR and/or the 3D signal 108, and so on. Depending on the embodiment, any number of sensor pipelines may be used”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]-[0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).
Plouzek teaches is plurality of sensors for example but not limited, the sensor 110 may be a monocular which could be a monocular camera, a stereo camera, an infrared camera, a high resolution camera, an array of one or more types of cameras, an opto-acoustic sensor, a radar, a laser based imaging sensor, or the like, or combinations thereof, configured to assist recognition, and monitoring of the ground engaging tool to detect work environment ((Plouzek, Figure, sensors 110, [0052] “ the plurality of sensors 110 may be a plurality of cameras 110 (…) configured to assist recognition, and monitoring of the ground engaging tool 134'”).
Plouzek silent on wherein the environmental data includes data regarding a weather condition at a worksite where the ground engaging tool is located
However, Sporrer teaches wherein the environmental data includes data regarding a weather condition at a worksite where the ground engaging tool is located (Sporrer, Figure 1, [0087] “along with the measured frame height (that is measured using sensor 123), along with the frame height measured from the sensor signal generated by sensor 106, along with current soil conditions, weather conditions, or a wide variety of other information”. [0030] FIG. lA is a perspective view showing one example of a mobile agricultural machine architecture 100 with a ground-following device 130 and corresponding
sensor 106”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’s one of the sensors of the engaging tool with a weather sensing sensor as taught by Sporrer and obtain worksite weather soil condition and environment data (Sporrer, [0087]).

Regarding Claim 15, combination of Plouzek, Park, and Sporrer teaches the system of claim 14,
 Plouzek teaches wherein the determining of the at least one physical dimension includes: applying, by the one or more processors,(Plouzek, Figure 31, Step 918, [0079] The electronic controller unit may be configured to use machine learning to determine at least one of the following: a bare shape of the work tool, a shape of the work tool with new ground engaging tools attached to the work tool, a shape of a worn work tool necessitating maintenance, and a shape of a worn GET necessitating maintenance (see block 918 of FIG. 31, see also FIG. 21” NOTE: DEEP learning  is a machine learning ).
Plouzek is silent applying, by the one or more processors, the network selection weights to object identification probability scores of the plurality of deep learning networks; generating, by the one or more processors, a composite object identification,
wherein the composite object identification is based on a weighted combination of the object identification probability scores based on the network selection weights; and
determining, by the one or more processors, the at least one physical dimension based on the composite object identification. 
	However, Park teaches applying, by the one or more processors (Park 8C, controllers 836, [0086], a third controller 836 for artificial intelligence functionality (e.g., computer)), the network selection weights to object identification probability scores of the plurality of deep learning networks (Park,[007] the multi-sensor fusion network and the select layers of the individual machine learning models may be trained together in an end-to-end training process. As
such, updates to weights and biases as a result of one or more loss functions may be back propagated through not only the layers of the multi-sensor fusion network, but also through to the layers of the respective source machine learning
models (e.g., the feature extractor layers. [0071] These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”); 
generating, by the one or more processors, a composite object identification, wherein the composite object identification is based on a weighted combination of the object identification probability scores based on the network selection weights (Park, [0071], “in FIGS. 1A-1B. For example, output 3D signals 104 of the source DNNs may correspond to rasterized images, or may be used to generate rasterized images, and the rasterized images may be compared to ground truth rasterized images to compute losses 622 (e.g., losses 622A, 622B, and 622N). These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”) and
determining, by the one or more processors, the at least one physical dimension based on the composite object identification. (Park, Figure 8C, [0034]” the input 3D signals may be generated from a perspective of the ego machine 800, and the fused output 122 (Fig. 1B) may be generated from an ego-centric point of view. For example, the input channels may indicate a shape, orientation, and/or classification for objects or features in the environment”).
 	 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]-[0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).  

Regarding Claim 16, combination of Plouzek, Park, and Sporrer teaches the system of claim 15,
Plouzek teaches processing the image (Plouzek, Figure 16, step 612) of the ground engaging tool (Plouzek, Figure 1, ground engaging tool 134).
Plouzek is silent on wherein each of the plurality of deep learning networks is configured to determine the object identification probability scores by: 
processing, by the one or more processors, at least one image of the ground engaging tool to determine at least one bounding box for at least one region of interest; and performing, by the one or more processors, instance segmentation of the at least one region of interest to detect one or more objects within the at least one bounding box, wherein the object identification probability scores is indicative of a confidence level of the detection of the one or more objects. 
	However, Park teaches on wherein each of the plurality of deep learning networks is configured to determine the object identification probability scores (Park,[007] the multi-sensor fusion network and the select layers of the individual machine learning models may be trained together in an end-to-end training process. As such, updates to weights and biases as a result of one or more loss functions may be back propagated through not only the layers of the multi-sensor fusion network, but also through to the layers of the respective source machine learning models (e.g., the feature extractor layers. [0071] These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”);  by: processing, by the one or more processors, at least one image to determine at least one bounding box (Park, [0034], a rasterized image may include bounding shapes or cuboids corresponding to dynamic actors”) for at least one region of interest (Park, [0032] “For training, as described in more detail herein, the sensor data may include original images ( e.g., as captured by one or more image sensors), down-sampled images, up-sampled images, cropped or region of interest (ROI) images, otherwise augmented images, and/or a combination thereof”) ; and 
performing, by the one or more processors, instance segmentation of the at least one region of interest to detect one or more objects within the at least one bounding box (Park, [0034], a boundary or encoded values for pixels corresponding to drivable free space (e.g., an area of the environment that the ego-machine 800 is able to traverse), and/or other objects or features”).
 wherein the object identification probability scores is indicative of a confidence level of the detection of the one or more objects (Park, Figure 1A, 2D, [0039], “a distribution of potential locations may be fed to the fusion DNN 120 to aid the fusion DNN 120 in generating more accurate predictions in the fused output 122. As such, the ellipses or probability distribution function (PDF) 252 representations corresponding to each object 240 may indicate potential locations-e.g., with corresponding confidence values-for where the object 240 may be located. For different sensor modalities, the corresponding ellipses or PDFs 252 may be of different shape”. [0040] For example, FIG. 2D may represent a subset of the ellipses or PDFs for a field of view 250A of a particular sensor”).
 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]- [0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).

Regarding Claim 17, combination of Plouzek, Park, and Sporrer teaches the system of claim 16,
wherein the composite object identification is based on a weighted average of the detections of the one or more object by the plurality of deep learning networks weighted by the object identification probability scores and the network selection weights (Park, [0071], “in FIGS. 1A-1B. For example, output 3D signals 104 of the source DNNs may correspond to rasterized images, or may be used to generate rasterized images, and the rasterized images may be compared to ground truth rasterized images to compute losses 622 (e.g., losses 622A, 622B, and 622N). These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”. Figure 1A, 2D, [0039], “a distribution of potential locations may be fed to the fusion DNN 120 to aid the fusion DNN 120 in generating more accurate predictions in the fused output 122. As such, the ellipses or probability distribution function (PDF) 252 representations corresponding to each object 240 may indicate potential locations-e.g., with corresponding confidence values-for where the object 240 may be located), and 
wherein the at least one physical dimension of the at least one portion of the ground engaging tool is based on a measurement of the instance segmentation based on the composite object identification (Park, [0074] “with respect to FIGS. 6Aand 6B, the fused outputs 122 of the fusion DNN 120 may be compared to the ground truth data 614 corresponding to the fused outputs to compute loss 630. The ground truth data 614 for the fused output 122 may be generated using similar techniques or data as the ground truth data 614 for the source DNNs. As such. the loss 630 may be used to update paran1eters of the fusion DNN 120 until the fusion DNN 120 converges to an acceptable level of accuracy or precision”).
 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]- [0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).


Regarding Claim 18, Plouzek teaches
A non-transitory computer readable medium Plouzek, [0068] The memory 508 may be implemented as a non-transitory computer readable medium)  for determining a wear or loss condition of a ground engaging tool( Plouzek, Figure 31, block 902, [0073], determine the amount of damage, the amount of wear, or the absence of the ground engaging tool 134), the non-transitory computer readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations (Plouzek, which are not shown in FIG.14. In one particular embodiment, the processor 506 is configured to execute various parts of a method 800 illustrated in FIG. 15 by executing computer executable instructions 510 in the memory 508. In yet another embodiment, the processor 506 may be a plurality of processors arranged, for example, as a processing array) comprising:
receiving imaging data (Plouzek, Figure 32, [0098] “acquire an image of the ground engaging tool (see block 1000 of FIG. 32)”and environmental data (Plouzek, Figure 1, [0051], a plurality sensor 110 may be mounted on the boom 108' near the joint 106”) from a plurality of sensors, wherein the plurality of sensors includes a plurality of imaging sensors of different modalities((Plouzek, Figure, sensors 110, [0052] “ the plurality of sensors 110 may be a plurality of cameras 110'”); and 
predicting one or more environmental conditions associated with the worksite based on the imaging data and the environmental data (Plouzek, Figure 14, [ 0071], the output device 132 may continuously display a plurality of three-dimensional scenes on a frame-by-frame basis as provided by the processor 506 to the output device 132 based upon the input signals from the sensors 110 as modified by the processor. Such frame-by-frame representation of the work environment of the machine 102 when used for recognition and monitoring the movement or the condition of the ground engaging tool 134”).
determining, by the one or more processors, the wear or loss condition of the ground engaging tool (Plouzek, Figure 32, [0098] “acquire an image of the ground engaging tool (see block 1000 of FIG. 32);[0099] evaluate the image using an algorithm that compares the acquired image to a database of existing images to determine the damage, the amount of wear, or the absence of the ground engaging tool (see block 1002)”); based on the at least one physical dimension (Plouzek, Figure 14, [0082] “The electronic controller unit 126 may be configured to: [0083] determine a dimension of a ground engaging tool installed on a work tool (see FIG. 16, block 600”).
determining, by the one or more processors, at least one physical dimension of at least one portion of the ground engaging tool using the at least one deep learning network (Plouzek, Figure 31, Step 918, [0079] The electronic controller unit may be configured to use machine learning to determine at least one of the following: a bare shape of the work tool, a shape of the work tool with new ground engaging tools attached to the work tool, a shape of a worn work tool necessitating maintenance, and a shape of a worn GET necessitating maintenance (see block 918 of FIG. 31, see also FIG. 21” NOTE: DEEP learning  is a machine learning ). 
 Plouzek is silent on 
determining, by the one or more processors, network selection weights for each of a plurality of deep learning networks based, at least in part, on the predicted one or more environmental conditions to select at least one deep learning network that is predicted to provide higher accuracy under the predicted one or more environmental conditions, than at least one unselected deep learning network,
wherein each of the plurality of deep learning networks utilizes a different respective combination of one or more of the pluralities of imaging sensors as inputs; 
However, Park teaches determining, by the one or more processors, network selection weights for each of a plurality of deep learning networks based, at least in part, (Park, Figure 7, [0078], “The method 700, at block B702, includes generating a plurality of trained DNNs by computing one or more losses with respect to outputs of individual DNN s and by computing one or more consistency losses with respect to outputs of two or more of the individual DNNs. For example, the source DNNs may be trained using the ground truth data 614 to generate the losses 622 and may be trained using photometric consistency losses 624”. on the predicted one or more environmental conditions to select at least one deep learning network that is predicted to provide higher accuracy under the predicted one or more environmental conditions than at least one unselected deep learning network, (Park, Figure 6A-6B, 7 [0071] “The 3D signals output by the source DNNs may be compared to ground truth data 614 using one more loss function to compute loss 612 corresponding to the source DNNs. The ground truth data 614 may be generated using map data (e.g., from an HD map, or other map type, such as those used for localization) that may indicate locations of static features or objects such as lane lines, wait conditions, signs, fixed objects, and/or the like. These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision) NOTE: the losses are used to determine weights/ biases of DNN’s)
wherein each of the plurality of deep learning networks utilizes a different respective combination of one or more of the pluralities of imaging sensors as inputs (Park, Figure 5, [0033] “In embodiments, each feature output or 3D signal
may correspond to a respective sensor pipeline or stream. For example, a first sensor pipeline may include a first can1era that may generate image data that may be processed by a first DNN to generate the feature outputs F 1 and/or the
3D signal 104A, a second sensor pipeline may include a second camera that may generate image data that may be processed by a second DNN to generate the feature outputs F2 and/or the 3D signal 104B, a third sensor pipeline may
include a first RADAR sensor that may generate RADAR data that may be processed directly--e.g., using a sensor data pre-processor-and/or may be processed using a DNN to generate the feature outputs F RADAR and/or the 3D signal 108, and so on. Depending on the embodiment, any number of sensor pipelines may be used”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]-[0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).
Plouzek teaches is plurality of sensors for example but not limited, the sensor 110 may be a monocular which could be a monocular camera, a stereo camera, an infrared camera, a high resolution camera, an array of one or more types of cameras, an opto-acoustic sensor, a radar, a laser based imaging sensor, or the like, or combinations thereof, configured to assist recognition, and monitoring of the ground engaging tool to detect work environment ((Plouzek, Figure, sensors 110, [0052] “ the plurality of sensors 110 may be a plurality of cameras 110 (…) configured to assist recognition, and monitoring of the ground engaging tool 134'”).
Plouzek silent on environmental data including weather data from a plurality of sensors,
However, Sporrer teaches environmental data including weather data from a plurality of sensors (Sporrer, Figure 1, [0087] “along with the measured frame height (that is measured using sensor 123), along with the frame height measured from the sensor signal generated by sensor 106, along with current soil conditions, weather conditions, or a wide variety of other information”. [0030] FIG. lA is a perspective view showing one example of a mobile agricultural machine architecture 100 with a ground-following device 130 and corresponding
sensor 106”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’s one of the sensors with a weather sensing sensor as taught by Sporrer and obtain worksite weather soil condition and environment data (Sporrer, [0087]). 

Regarding Claim 19, combination of Plouzek, Park, and Sporrer teaches the non-transitory computer readable medium of claim 18, 
Plouzek teaches wherein the determining of the at least one physical dimension includes: applying, by the one or more processors,(Plouzek, Figure 31, Step 918, [0079] The electronic controller unit may be configured to use machine learning to determine at least one of the following: a bare shape of the work tool, a shape of the work tool with new ground engaging tools attached to the work tool, a shape of a worn work tool necessitating maintenance, and a shape of a worn GET necessitating maintenance (see block 918 of FIG. 31, see also FIG. 21” NOTE: DEEP learning  is a machine learning ).
Plouzek is silent applying, by the one or more processors, the network selection weights to object identification probability scores of the plurality of deep learning networks; generating, by the one or more processors, a composite object identification,
wherein the composite object identification is based on a weighted combination of the object identification probability scores based on the network selection weights; and
determining, by the one or more processors, the at least one physical dimension based on the composite object identification. 
	However, Park teaches applying, by the one or more processors (Park 8C, controllers 836, [0086], a third controller 836 for artificial intelligence functionality (e.g., computer)), the network selection weights to object identification probability scores of the plurality of deep learning networks (Park,[007] the multi-sensor fusion network and the select layers of the individual machine learning models may be trained together in an end-to-end training process. As
such, updates to weights and biases as a result of one or more loss functions may be back propagated through not only the layers of the multi-sensor fusion network, but also through to the layers of the respective source machine learning
models (e.g., the feature extractor layers. [0071] These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”); 
generating, by the one or more processors, a composite object identification, wherein the composite object identification is based on a weighted combination of the object identification probability scores based on the network selection weights (Park, [0071], “in FIGS. 1A-1B. For example, output 3D signals 104 of the source DNNs may correspond to rasterized images, or may be used to generate rasterized images, and the rasterized images may be compared to ground truth rasterized images to compute losses 622 (e.g., losses 622A, 622B, and 622N). These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”) and
determining, by the one or more processors, the at least one physical dimension based on the composite object identification. (Park, Figure 8C, [0034]” the input 3D signals may be generated from a perspective of the ego machine 800, and the fused output 122 (Fig. 1B) may be generated from an ego-centric point of view. For example, the input channels may indicate a shape, orientation, and/or classification for objects or features in the environment”).
 	 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]-[0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).
 
Regarding Claim 20, combination of Plouzek, Park, and Sporrer teaches the non-transitory computer readable medium of claim 19, 
Plouzek teaches processing the image (Plouzek, Figure 16, step 612) of the ground engaging tool (Plouzek, Figure 1, ground engaging tool 134).
Plouzek is silent on wherein each of the plurality of deep learning networks is configured to determine the object identification probability scores by: 
processing, by the one or more processors, at least one image of the ground engaging tool to determine at least one bounding box for at least one region of interest; and performing, by the one or more processors, instance segmentation of the at least one region of interest to detect one or more objects within the at least one bounding box, wherein the object identification probability scores is indicative of a confidence level of the detection of the one or more objects. 
	However, Park teaches on wherein each of the plurality of deep learning networks is configured to determine the object identification probability scores (Park,[007] the multi-sensor fusion network and the select layers of the individual machine learning models may be trained together in an end-to-end training process. As such, updates to weights and biases as a result of one or more loss functions may be back propagated through not only the layers of the multi-sensor fusion network, but also through to the layers of the respective source machine learning models (e.g., the feature extractor layers. [0071] These losses may be used to updated parameters (e.g., weights and biases) of the source DNNs using, e.g., backpropagation, to aid in training the DNN(s) until they converge to an acceptable level of accuracy or precision”);  by: processing, by the one or more processors, at least one image to determine at least one bounding box (Park, [0034], a rasterized image may include bounding shapes or cuboids corresponding to dynamic actors”) for at least one region of interest (Park, [0032] “For training, as described in more detail herein, the sensor data may include original images ( e.g., as captured by one or more image sensors), down-sampled images, up-sampled images, cropped or region of interest (ROI) images, otherwise augmented images, and/or a combination thereof”) ; and 
performing, by the one or more processors, instance segmentation of the at least one region of interest to detect one or more objects within the at least one bounding box (Park, [0034], a boundary or encoded values for pixels corresponding to drivable free space (e.g., an area of the environment that the ego-machine 800 is able to traverse), and/or other objects or features”).
 wherein the object identification probability scores is indicative of a confidence level of the detection of the one or more objects (Park, Figure 1A, 2D, [0039], “a distribution of potential locations may be fed to the fusion DNN 120 to aid the fusion DNN 120 in generating more accurate predictions in the fused output 122. As such, the ellipses or probability distribution function (PDF) 252 representations corresponding to each object 240 may indicate potential locations-e.g., with corresponding confidence values-for where the object 240 may be located. For different sensor modalities, the corresponding ellipses or PDFs 252 may be of different shape”. [0040] For example, FIG. 2D may represent a subset of the ellipses or PDFs for a field of view 250A of a particular sensor”).
 It would have been obvious to a person having ordinary skill in the art before the effective filing date to modify Plouzek’ s image processing method to incorporate plurality of deep learning networks associated with the multimodal sensors and generate fused output as taught by Park and obtain an accurate estimation of and target object feature from the image analysis (Park, Figure 5, 7, [0033], [0071]- [0078]). It would have been obvious to a person of ordinary skill to include the well-known fused Deep learning output using multimodal sensors data along with the other plurality of Deep learning network, in order to yield the predicted results of generating accurate object identification, yet with higher accuracy (KSR).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Plouzek and Park, and further in view of Sporrer as applied to claim 11 and in further view of Arevalo et al. (WO2022018513A1, previously cited).

 Regarding Claim 12, combination of Plouzek, Park, and Sporrer teaches the computer-implemented method of claim 11,
Plouzek and Park are silent on wherein the characteristic data for the materials include one or more of material type information, material density information, material texture information, material hardness information, material weight information, or moisture content of the material.  
	However, Arevalo teaches wherein the characteristic data for the materials include one or more of material type information, material density information, material texture information, material hardness information, material weight information, or moisture content of the material (Arevalo, [89] The image processing process 211 may also receive a digital representation of a work bench or working area (e.g., the earthen bank being excavated at a mine). The image processing process 211 may predict the remaining life of the ground engaging product using the work bench to determine: fragmentation, material size, hardness, material type, geometric properties, location in mine, and the like).  
It would have been obvious to a person of ordinary skill before the effective filing date to modified Plouzek’s method to include a method pf prediction GET remaining life base on the material characteristics of the GET as taught by Arevalo in order to determine the physical condition of the ground engaging tool (Arevalo [89]). 
Conclusion
Citation of Pertinent Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure
Sporrer et al (US 20180153088 A1) recites “An agricultural implement includes a transversely extending frame forming a first, a second, and a third frame section. A first actuator is coupled to the first frame section, a second actuator coupled to the second frame section, and a third actuator coupled to the third frame section. Sensors are coupled to each frame section to detect a height of the respective frame section relative to an underlying surface. A control unit is disposed in electrical communication with the sensors and operably controls the actuators to adjust the height of each frame section” (abstract)
Ryota Chiba Kurosawa (US 20220154431 A1) discloses “A shovel includes a hardware processor configured to record log information in a storage. The log information includes at least one of information on conditions surrounding the shovel and information on the conditions of the shovel at each of a time before, the time of, and a time after the occurrence of a predetermined event where at least one of the safeties and the security of the shovel is relatively reduce” (abstract)

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DILARA SULTANA whose telephone number is (571)272-3861. The examiner can normally be reached Mon-Fri, 9 AM-5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s
supervisor, EMAN ALKAFAWI can be reached on (571) 272-4448. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DILARA SULTANA/Examiner, Art Unit 2858 

04/24/2026    

/EMAN A ALKAFAWI/Supervisory Patent Examiner, Art Unit 2858                                                                                                                                                                                                                                                                                  
5/1/2026
Read full office action
Prosecution Timeline

Show 3 earlier events
Sep 17, 2025
Applicant Interview (Telephonic)
Sep 17, 2025
Examiner Interview Summary
Sep 26, 2025
Response Filed
Jan 06, 2026
Final Rejection mailed — §103
Mar 26, 2026
Response after Non-Final Action
Apr 06, 2026
Request for Continued Examination
Apr 16, 2026
Response after Non-Final Action
Apr 24, 2026
Non-Final Rejection (signed) — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/597,666
Patent 12638842
PREDICTIVE MAINTENANCE FOR A DEVICE IN THE FOOD INDUSTRY BY MEANS OF A DIGITAL TWIN, AND OPTIMIZED PRODUCTION PLANNING
4y 4m to grant Granted May 26, 2026
18/259,735
Patent 12638535
MEASUREMENT CORRECTION METHOD AND APPARATUS FOR SENSOR, AND SERVER POWER SUPPLY
2y 11m to grant Granted May 26, 2026
17/387,601
Patent 12618375
SYSTEMS AND METHODS FOR ESTIMATING INTEGRITY AND EFFICIENCY OF AN INLET FILTRATION SYSTEM FOR TURBINE SYSTEMS AND FOR RECOMMENDING MITIGATION ACTIONS
4y 9m to grant Granted May 05, 2026
17/689,359
Patent 12618913
METHOD AND DEVICE WITH BATTERY MODEL OPTIMIZATION
4y 1m to grant Granted May 05, 2026
18/355,544
Patent 12618888
Electrical Grid Edge Event Detection and Mitigation
2y 9m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
81%
Grant Probability
97%
With Interview (+16.0%)
2y 9m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 129 resolved cases by this examiner. Grant probability derived from career allowance rate.