Last updated: May 29, 2026
Application No. 18/836,967
SYSTEM AND METHOD FOR CONTROLLING AUTONOMOUS MACHINERY BY PROCESSING RICH CONTEXT SENSOR INPUTS

Non-Final OA §103
Filed
Aug 08, 2024
Priority
Mar 30, 2022 — nonprovisional of PCTUS2022022552
Examiner
DUNNE, KENNETH MICHAEL
Art Unit
3669
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Siemens Corporation
OA Round
1 (Non-Final)
Interview Optional

— +10.8% interview lift. Interview lift (+10.8%) is below the 15.0% threshold. A written response is recommended.
Based on 291 resolved cases, 2023–2026
Examiner Intelligence

DUNNE, KENNETH MICHAEL View full profile →
Grants 77% — above average
Career Allowance Rate
223 granted / 291 resolved
+24.6% vs TC avg
Moderate +11% lift
Without
With
+10.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
20 currently pending
Career history
311
Total Applications
across all art units
Statute-Specific Performance

§101
2.9%
-37.1% vs TC avg
§103
71.8%
+31.8% vs TC avg
§102
9.0%
-31.0% vs TC avg
§112
7.8%
-32.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 291 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/08/2024 was filed before the first action on the merits of the application.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 6,  8-11, 14-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20210406560 A1, Park et al, “SENSOR FUSION FOR AUTONOMOUS MACHINE APPLICATIONS USING MACHINE LEARNING”,  and further in view of US 20200082227 A1, Wierstra et al, “IMAGINATION-BASED AGENT NEURAL NETWORKS”.
Regarding Claim 1, Park et al teaches “A computer-implemented method for controlling an autonomous machine, comprising: acquiring sensor data streamed via a plurality of sensors “([0027] here teaches the variety of sensors of multiple modalities);”calibrated with respect to a common real world reference frame centered on the autonomous machine processing the streamed sensor data by a plurality of perception modules to extract perception information from the sensor data in real time”([0033] teaches 3D signals = multiple sensors +  [0034] “Where the process 100 uses the 3D signals, the 3D signals may be computed by each respective DNN in a same format, or may be converted—e.g., using a post-processor—to a same format. For example, in some non-limiting embodiments, the 3D signals that are used as inputs to the fusion DNN 120 may include rasterized images generated from a certain perspective (e.g., top-down birds eye view, projection images, such as range images, side-view images, etc.) that encode any number of input channels. The rasterized images—including the fused output rasterized image computed using the fusion DNN—may be ego-centric, in embodiments, where the ego-machine 800 is at the center of the representation. In other embodiments, the input 3D signals may be generated from a perspective of the ego-machine 800, and the fused output 122 may be generated from an ego-centric point of view. For example, the input channels may indicate a shape, orientation, and/or classification for objects or features in the environment. In such an example, a rasterized image may include bounding shapes or cuboids corresponding to dynamic actors, for example, and may include lines corresponding to lane markers (e.g., lane dividers, road dividers, solid lines, dashed lines, double lines, yellow lines, white lines, etc.), wait conditions (e.g., cross walks, stop lines, etc.), and/or other driving surface features, a boundary or encoded values for pixels corresponding to drivable free space (e.g., an area of the environment that the ego-machine 800 is able to traverse), and/or other objects or features.” Here teaches that the rasterized output (from the sensor data) is from an ego-centric point of view, i.e. centered on the autonomous vehicle + [0055] Here teaches that the world model is updating continuously (i.e. streamed data/ in real time) from the variety of perception modules (i.e. calibrated to a common real-world reference frame);” fusing the extracted real time perception information from the plurality of perception modules by a context awareness module to create a blackboard image, wherein the black- board image is a representation of an operating environment of the autonomous machine derived from fusion of the extracted perception information using a controlled semantic, which defines a context of the autonomous machine,”( [0055] “The fused output 122 may be used by an autonomous driving software stack (“drive stack”) 124 to perform one or more operations by the vehicle 800 (and/or other ego-machine type). For example, the drive stack 124 may include a world model manager that may be used to generate, update, and/or define a world model. The world model manager may use information generated by and received from the perception component(s) of the drive stack 124. The perception component(s) may include an obstacle perceiver, a path perceiver, a wait perceiver, a map perceiver, and/or other perception component(s).” The fused outputs (blackboard image) is derived from the various perception modules” which read in light of [0034] cited above the rasterized images/fused output of the DNN defines the context around the vehicle via the fusion of these sensors DNN = controlled Semantic   );” whereby the streamed sensor data is transformed into a stream of blackboard images defining an evolution of context of the autonomous machine with time, and processing the stream of blackboard images by an action evaluation module using a control policy to output a control action to be executed by the autonomous machine,”( [0056] The world model may be used to help inform planning component(s), control component(s), obstacle avoidance component(s), and/or actuation component(s) of the drive stack 124. The obstacle perceiver may perform obstacle perception that may be based on where the vehicle 800 is allowed to drive or is capable of driving, and how fast the vehicle 800 can drive without colliding with an obstacle (e.g., an object, such as a structure, entity, vehicle, etc.) that is sensed by the vehicle 800 (and represented in the fused output 122, for example).” + [0055] “The world model manager may continually update the world model based on newly generated and/or received inputs (e.g., data) from the obstacle perceiver, the path perceiver, the wait perceiver, the map perceiver, and/or other components of the ego-machine 800. For example, the world model manager and/or the perception components may use the fused output 122 to perform one or more operations.” “continually” teaches the streaming of blackboard images (inputs from the various perceivers) and subsequent path planning/output control based on the world model);
Park et al however does not teach “the control policy comprising a learned mapping of context to control action using training data in which contexts are represented by blackboard images created using the controlled semantic.”
Wierstra et al teaches a vehicle control module which includes “the  control policy comprising a learned mapping of context to control action using training data in which contexts are represented by blackboard images created using the controlled semantic” (Abstract: “A neural network system is proposed to select actions to be performed by an agent interacting with an environment to perform a task in an attempt to achieve a specified result. The system may include a controller to receive state data and context data, and to output action data.” The neural network takes State and context (blackboard) inputs are used output ( i.e. learned mapping on input to output) which in view of [0007] includes autonomous vehicle navigation  + [0017]-[0018] using of training data)
It would have been obvious to one o foridnary skill in the art, before the effective filing date of the application to implement the neural network based vehicle controller as taught by Wierstra as the planners called for generally in Park. One would be motivated to implement the neural network based controller(s) to allow for a system which can adapt to a wide variety of complex situtations and leverage similarities between learn states to navigate new/novel sceneraios. Wierstra et al teaches this improvement in ([0016] The subject matter described in this specification can be implemented in particular embodiments so as to realize one or more of the following advantages. The system can be used for continuous control applications where there is no finite search tree to consider. The system, in particular the manager module, can learn to decide whether the agent should keep planning by imagining actions or if it is ready to act, and optionally can also decide from which state to imagine. Both these abilities contribute to achieving good performance efficiently….[0018]” The system can better distinguish between similar observed states by using the model to roll out forwards to distinguish between the effects of actions. It can also improve handling of examples (states) which are different to those encountered during its training. More”)
Regarding Claim 2, modified Park teaches “The method according to claim 1, wherein the plurality of sensors comprise multiple modalities of sensors and the plurality of perception modules are associated with multiple modalities of perception.”([0029] + [0033] teaches that the plurality of sensors includes multiple sensor types/modalities (and subsequent perception modules))
Regarding Claim 3, modified Park teaches “The method according to  claim 1,wherein the plurality of sensors comprises at least one camera and the plurality of perception modules comprises at least one vision perception module configured to process vision data streamed via the at least one camera, wherein the method comprises processing image frames of the streamed vision data by the at least one vision perception module to locate and classify one or more objects of interest on the image frames, which define presence and location information of one or more perceived objects in the operating environment of the autonomous machine, as part of the perception information extracted by the at least one vision perception module.”( [0095] A variety of cameras may be used in a front-facing configuration, including, for example, a monocular camera platform that includes a CMOS (complementary metal oxide semiconductor) color imager. Another example may be a wide-view camera(s) 870 that may be used to perceive objects coming into view from the periphery (e.g., pedestrians, crossing traffic or bicycles). Although only one wide-view camera is illustrated in FIG. 8B, there may any number of wide-view cameras 870 on the vehicle 800. In addition, long-range camera(s) 898 (e.g., a long-view stereo camera pair) may be used for depth-based object detection, especially for objects for which a neural network has not yet been trained. The long-range camera(s) 898 may also be used for object detection and classification, as well as basic object tracking.” Here teaches that the cameras are used for object detection (presence and location) and classification)
Regarding Claim 4, modified Park  teaches “The method according to claim 3, wherein the vision data streamed via the at least one camera further comprises depth frames, wherein the method comprises processing the depth frames by the at least one vision perception module to extract depth information for the one or more perceived objects, to infer a distance of the one or more perceived objects in relation to the autonomous machine, as part of the perception information extracted by the at least one vision perception module.”( [0039] Where a sensor pipeline corresponds to a camera and/or other sensor type that does not directly compute depth, the 3D signals may still include a predicted depth value that is represented in the rasterized image or other input representation. For example, the individual DNNs used in the sensor data pipelines may be trained to predict depth using a single camera image, or may be trained to predict depth from inputs of two or more sensors with overlapping fields of view (e.g., such as fields of view 404A and 404B that may include overlap region 406 as illustrated in FIG. 4A). In some embodiments, a location prior image 114 may be generated as an additional input to the fusion DNN 120 that indicates to the fusion DNN 120 predicted depths—or a probability distribution function 252 (represented as a 2D Gaussian representation, in embodiments) indicating a distribution of potential depths—of an object based on the predicted depth from the 3D signal.” Here teaches using camera images to infer a depth)
Regarding Claim 6, modified Park  teaches “The method according to claim 1, wherein the plurality of sensors comprises at least one laser scanner and the plurality of perception modules comprises at least one laser scan perception module”( [0087] The controller(s) 836 may provide the signals for controlling one or more components and/or systems of the vehicle 800 in response to sensor data received from one or more sensors (e.g., sensor inputs). The sensor data may be received from, for example and without limitation, global navigation satellite systems sensor(s) 858 (e.g., Global Positioning System sensor(s)), RADAR sensor(s) 860, ultrasonic sensor(s) 862, LIDAR sensor(s) 864, inertial measurement unit (IMU) sensor(s) 866 (e.g., accelerometer(s), gyroscope(s), magnetic compass(es), magnetometer(s), etc.), microphone(s) 896, stereo camera(s) 868, wide-view camera(s) 870 (e.g., fisheye cameras), infrared camera(s) 872, surround camera(s) 874 (e.g., 360 degree cameras), long-range and/or mid-range camera(s) 898, speed sensor(s) 844 (e.g., for measuring the speed of the vehicle 800), vibration sensor(s) 842, steering sensor(s) 840, brake sensor(s) (e.g., as part of the brake sensor system 846), and/or other sensor types.);”wherein the method comprises processing laser sensor data streamed via the at least one laser scanner by the at least one laser scan perception module to perceive a presence of an object within a defined range from the autonomous machine and infer a distance of the perceived object in relation to the autonomous machine, as part of the perception information extracted by the at least one laser scan perception module.”( [0158] “The vehicle 800 may include LIDAR sensor(s) 864. The LIDAR sensor(s) 864 may be used for object and pedestrian detection, emergency braking, collision avoidance, and/or other functions. The LIDAR sensor(s) 864 may be functional safety level ASIL B. In some examples, the vehicle 800 may include multiple LIDAR sensors 864 (e.g., two, four, six, etc.) that may use Ethernet (e.g., to provide data to a Gigabit Ethernet switch).” Lidars are used to detect objects + [0159] “In some examples, the LIDAR sensor(s) 864 may be capable of providing a list of objects and their distances for a 360-degree field of view. mmercially available LIDAR sensor(s) 864 may have an advertised range of approximately 800 m, with an accuracy of 2 cm-3 cm, and with support for a 800 Mbps Ethernet connection, for example. In some examples, one or more non-protruding LIDAR sensors 864 may be used. In such examples, the LIDAR sensor(s) 864 may be implemented as a small device that may be embedded into the front, rear, sides, and/or corners of the vehicle 800. The LIDAR sensor(s) 864, in such examples, may provide up to a 120-degree horizontal and 35-degree vertical field-of-view, with a 200 m range even for low-reflectivity objects. Front-mounted LIDAR sensor(s) 864 may be configured for a horizontal field of view between 45 degrees and 135 degrees.” Lidar sensors detect objects with their distances within their specific field of views and ranges)
	Regarding Claim 8, modified Park teaches “wherein the blackboard image created by the context awareness module comprises a graphical representation of the operating environment of the autonomous machine including one or more perceived objects and their inferred location in relation to the autonomous machine using the controlled semantic.”( [0034] Where the process 100 uses the 3D signals, the 3D signals may be computed by each respective DNN in a same format, or may be converted—e.g., using a post-processor—to a same format. For example, in some non-limiting embodiments, the 3D signals that are used as inputs to the fusion DNN 120 may include rasterized images generated from a certain perspective (e.g., top-down birds eye view, projection images, such as range images, side-view images, etc.) that encode any number of input channels. The rasterized images—including the fused output rasterized image computed using the fusion DNN—may be ego-centric, in embodiments, where the ego-machine 800 is at the center of the representation. In other embodiments, the input 3D signals may be generated from a perspective of the ego-machine 800, and the fused output 122 may be generated from an ego-centric point of view. For example, the input channels may indicate a shape, orientation, and/or classification for objects or features in the environment. In such an example, a rasterized image may include bounding shapes or cuboids corresponding to dynamic actors, for example, and may include lines corresponding to lane markers (e.g., lane dividers, road dividers, solid lines, dashed lines, double lines, yellow lines, white lines, etc.), wait conditions (e.g., cross walks, stop lines, etc.), and/or other driving surface features, a boundary or encoded values for pixels corresponding to drivable freespace (e.g., an area of the environment that the ego-machine 800 is able to traverse), and/or other objects or features.” The fused raster output image (i.e. graphical representation) includes bounding shapes/cuboid corresponding to dynamic actors (perceived objects) and their location relative to the vehicle (given that it is a ego-centric))
	Regarding Claim 9, modified Park teaches “The method according to claim 8, wherein the controlled semantic comprises graphically representing the autonomous machine and different classes of perceived objects using defined colors, or shapes, or icons, or combinations thereof.”( [0034] Where the process 100 uses the 3D signals, the 3D signals may be computed by each respective DNN in a same format, or may be converted—e.g., using a post-processor—to a same format. For example, in some non-limiting embodiments, the 3D signals that are used as inputs to the fusion DNN 120 may include rasterized images generated from a certain perspective (e.g., top-down birds eye view, projection images, such as range images, side-view images, etc.) that encode any number of input channels. The rasterized images—including the fused output rasterized image computed using the fusion DNN—may be ego-centric, in embodiments, where the ego-machine 800 is at the center of the representation. In other embodiments, the input 3D signals may be generated from a perspective of the ego-machine 800, and the fused output 122 may be generated from an ego-centric point of view. For example, the input channels may indicate a shape, orientation, and/or classification for objects or features in the environment. In such an example, a rasterized image may include bounding shapes or cuboids corresponding to dynamic actors, for example, and may include lines corresponding to lane markers (e.g., lane dividers, road dividers, solid lines, dashed lines, double lines, yellow lines, white lines, etc.), wait conditions (e.g., cross walks, stop lines, etc.), and/or other driving surface features, a boundary or encoded values for pixels corresponding to drivable freespace (e.g., an area of the environment that the ego-machine 800 is able to traverse), and/or other objects or features.” Here teaches that the rasterized image includes colored lines corresponding to lane markers aswell as bounding shapes/cuboid for perceived objects)
	Regarding Claim 10, modified Park teaches “The method according to claim 8, ,wherein the controlled semantic comprises graphically representing a dynamic property of the autonomous machine and/or of the one or more perceived objects.”([0035] “The input channels included in the rasterized input image (or other input representation) may include, without limitation, an object or feature location or occupancy channel that may include a starting height of an object as a channel and/or an ending height of the object as a channel (e.g., where a top-down view rasterized image is generated, the pixel locations may denote an x or y pixel location of the corresponding feature or object indicating a location laterally or longitudinally with respect to an ego-machine 800, and the pixel may be encoded with one or more height channels indicating an elevation of the object or feature), one or more velocity channels (e.g., pixels of the image may be encoded with velocity in the x and/or y directions), an orientation channel corresponding to one or more objects (e.g., encoded as an angle(s)), and/or a classification channel corresponding to one or more objects or features, and/or additional or alternative channels.” The graphical representation includes a velocity (dynamic property) associated with the object)
	Regarding Claim 11, modified Park teaches “The method according to claim 8, wherein blackboard image comprises a graphical representation of an uncertainty with respect to the inferred location of the one or more perceived objects.”([0039] “l. For example, as illustrated in FIG. 2C, one or more sensors of an ego-machine 800 may detect objects 240A, 240B, 240C, 240D, etc., as illustrated by the circles in location prior image 114A. In such an example, where the object locations predictions are based on sensor data generated using monocular cameras or other sensor types where the accuracy of the location predictions may be less than ideal, a distribution of potential locations may be fed to the fusion DNN 120 to aid the fusion DNN 120 in generating more accurate predictions in the fused output 122. As such, the ellipses or probability distribution function (PDF) 252 representations corresponding to each object 240 may indicate potential locations—e.g., with corresponding confidence values—for where the object 240 may be located.” Read in view of [0035] as cited above the shapes/ellipses of perceived objects are based on the PDF representations (with confidence i.e. uncertainty) values to that object)
	Regarding Claim 14, modified Park teaches “The method according to claim 1,wherein the control policy comprises a deep neural network.”(Wiestra as modified in claim 1 teaches a neural network based control policy + [0005]” Neural networks are machine learning models that employ one or more layers of nonlinear units to predict an output for a received input. Some neural networks are deep neural networks that include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.” Here teaches that the neural network can be a deep neural network)
	Regarding Claim 15,modified Park et al teaches “The method according to claim 14, wherein the deep neural network includes a recurrent neural network (RNN) configured to process the stream of blackboard images as time series input data to determine the control action.”(Wierstra [0037] + [0039] The memory 14 is also an adaptive system, such as a long-short-term-memory (LSTM). It recurrently aggregates the external and internal data d.sub.i generated from each iteration i (where d.sub.i is a member of the set D of all possible data), to update the history, i.e. h.sub.i=μ(d.sub.i, h.sub.i−1), where μ is a trained adaptive function.” Here teaches the use of a long short-term memory training the adaptive function over the history/sequence of states (i.e. the neural network of Wierstra is a recurrent neural network RNN))
	Regarding Claim 16, modified Park teaches “claim 1,wherein the control policy is trained by: acquiring real streaming sensor data or simulated streaming sensor data pertaining to multiple sensors in a respectively real or simulated operating environment of the autonomous vehicle, creating a plurality of training blackboard images by extracting perception information from the real streaming sensor data or the simulated streaming sensor data and fusing the extracted perception information using the controlled semantic, wherein the plurality of training blackboard images define different contexts of the autonomous machine, and using the plurality of training blackboard images to learn a mapping of context to control action via a machine learning process.”(Wierstra et al “[0056] The training method is shown in FIG. 4. In step 401 the imagination 13 was taught to make next-step predictions of the state in a supervised fashion, with error gradients computed by backpropagation. The training data is collected from the observations the agent makes when acting in the environment 10. The policy of the imagination 13 in this case was stochastic, so an entropy reward was used during training to encourage exploration. In step 402, the manager 11 was trained. In step 403, the controller 12 and the memory 14 were jointly trained. Note that steps 401, 402 and 403 are independent, so they can be performed in any order or simultaneously.” Wierstra et al is trained via observed data from the environment + [0007] teaches that environment can be either real or simulated (i.e. streamed inputs can be real or simulated sensor data) which form the context of Park as cited in claim 1 Park is teaching the otuptuing of the rasterized images (world models/black board images) are used for path planning/control policty, Wierstra was used to teach the neural network control policy, i.e. the blackboard images of Park are fed into the neural network control policy of Wierstra, thus it is a logical extension that the same is true for the training data being blackboard images of park being used to train the neural network control policy of Wierstra)
	Regarding Claim 17, modified Park teaches “A non-transitory computer-readable storage medium including instructions that, when processed by a computer, configure the computer to perform the method according to claim 1” (Park [0026] “Various functions described herein as being performed by entities may be carried out by hardware, firmware, and/or software. For instance, various functions may be carried out by a processor executing instructions stored in memory”)

Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over modified Park as applied to claim 3 above, and further in view of, US 20220375189 A1, “VISUAL OBJECT TRACKING METHOD, VISUAL OBJECT TRACKING SYSTEM, MACHINE LEARNING METHOD, AND LEARNING SYSTEM”, Beye et al
Regarding Claim 5, while Park teaches the use of object tracking based on the camera images (e.g. [0095]) it does not specifically teach the use of tracking IDs and image frame comparison as recited in claim 5.
	Beye et al teaches a camera based object tracking system which includes “wherein the perception information extracted by the at least one vision perception module further comprises a respective tracking ID for the one or more perceived objects, wherein the tracking ID is a newly assigned tracking ID or an old tracking ID based on a comparison of a current image frame with a previous image frame, wherein the method comprises tracking a position of the one or more perceived objects over time based on the respective tracking IDs.”(Abstract: “An estimation unit of a visual object tracking apparatus estimates a plurality of estimated bounding boxes and estimated object IDs respectively corresponding to the estimated bounding boxes based on a plurality of predicted bounding boxes and a plurality of detected bounding boxes. For example, the “detected bounding boxes” are “bounding boxes (bounded areas)” detected by a detector in each of a plurality of frames in a time series such as moving images. The “bounding box” is a frame surrounding an image of an object detected in a frame. For example, the “predicted bounding box” is a “bounding box” predicted by a predictor based on an estimated bounding box(es) estimated for one or a plurality of frames in the past.” + [0039]-[0042] + [0080] teaches detecting of new objects when an object/ID is detected which is not matched with previously detected objects/ID of previous frames)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the application, to modify Park to implement the image based object tracking with object ID as taught by Beye. One would be motivated to implement Beye’s object tracker in order to allow the system to improve object certainty over time improving operation. [0006]
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over modified Park as applied to claim 1 above, and further in view of US 20220060818 A1, “MICROPHONE ARRAYS”, Hafizovic et al.
Regarding Claim 7, modified Park teaches “The method according to claim 1, wherein the plurality of sensors comprises at least one (Park [0087] teaches microphones are one of the sensors on the vehicle );” and the plurality of perception modules comprises at least one audio perception module configured to process audio data streamed by the at least one( [0146] In another example, a CNN for emergency vehicle detection and identification may use data from microphones 896 to detect and identify emergency vehicle sirens. In contrast to conventional systems, that use general classifiers to detect sirens and manually extract features, the SoC(s) 804 use the CNN for classifying environmental and urban sounds, as well as classifying visual data. In a preferred embodiment, the CNN running on the DLA is trained to identify the relative closing speed of the emergency vehicle (e.g., by using the Doppler Effect).” Here Park teaches using microphone data to detect presence and location information of emergency vehicles via the microhpones on the vehicle)
Park however does not teach that the microphones are “adaptive” and “directional” (and by extension the directional information of the microphone detection); Hafizovic teaches a adaptive directional microphone array (Abstract: “A system for capturing sound comprising a plurality of discrete microphones (112, 14, 116, 118) and a processing system (408). The plurality of discrete microphones are arranged in a circular array. The processing system (408) arranged to perform a first signal processing algorithm on sound originating from one or more of a first set of directions relative to the array to isolate a first sound source. The processing system (408) is further arranged to perform a second signal processing algorithm on sound originating from one or more of a second set of directions relative to the array to isolate a second sound source therein. A method for receiving sound at a plurality of discrete microphones (112, 114, 116, 118) arranged in a circular array is also described.” Here the determining of a first sound source and adjusting of the signal processining to isolate it/determine the directions reads on the [0045] of the applicant’s specification in which the angle of sensitivity being changed constitutes an “adaptive” microphone.) which includes application for use in tracking purposes ([0068] The Applicant has envisaged a set of embodiments in which the methods and arrangements as described above are implemented in devices for applications such as covert surveillance, video conferencing and detection and tracking of unmanned aerial vehicles (drones).)
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the application to further modify park to substitute its generally taught microphones for the adaptive directional microphones as taught by Hafizovic et al. One would be motivated to implement the adaptive directional microphone to improve the quality of sound reception/isolation across a large sweep area compared to a standard microphone. ([0014] Thus it will be seen by those skilled in the art that in accordance with the invention, different processing techniques are employed depending on the originating direction of the incoming sound. The Applicant has appreciated that by changing the signal processing applied to signals received from an array of microphones dependent on the orientation of the array relative to a direction of interest (e.g. to a particular sound source or to one of a range of directions during a sweep), a better response can be achieved than by using a single signal processing approach alone and that it may be possible to use a microphone array to capture sound accurately across a wide sound field, e.g. up to a 180° hemisphere or beyond.” And to ) Improving the detection function(s) of the microphones.
Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over modified Park as applied to claim 8 above, and further in view of US 20210094539 A1, “BLOCKING OBJECT AVOIDANCE”, Beller.
Regarding Claim 12, modified Park does not teach that the rasterized image (graphical representation) includes “claim 8,wherein the blackboard image comprises a graphical representation of safe or unsafe zones for different objects in relation to the autonomous machine, wherein the safe or unsafe zones are determined based on a dynamic property of the autonomous machine.” The closest it comes is teaching the determined drivable areas around the vehicle in ([0034]”… and/or other driving surface features, a boundary or encoded values for pixels corresponding to drivable freespace (e.g., an area of the environment that the ego-machine 800 is able to traverse), and/or other objects or features.”)
Beller et al teaches a vehicle control system in which a determining a driveable free space around objects includes determining of a safe/unsafe area around to object which is determined in part on a dynamic property (speed) of the autonomous vehicle ([0027] In various examples, the perception component 112 may determine that the blocking object(s) 106 are blocking the vehicle path 116 based on a determination that the vehicle is not able to proceed around the blocking objects 106 in the lane 118. In such examples, the distance D may be less than a width of the vehicle plus a minimum safe distance (e.g., safety margin) from the blocking object(s) 106. In some examples, the minimum safe distance may be based on the classification 114 associated with the blocking object(s) 106. For example, the minimum safe distance associated with pedestrians may be 1 meter and the minimum safe distance associated with bicyclists may be 1.2 meters. In various examples, the minimum safe distance may be based on a vehicle speed. In such examples, the faster the vehicle 104 travels, the greater the minimum safe distance, or vice versa. For example, a vehicle traveling 10 miles per hour may include a minimum safe distance of 3 feet from a pedestrian and a vehicle traveling 20 miles per hour may include a minimum safe distance of 5 feet. Though the distances and speeds are merely illustrative examples, and other speeds and/or distances may be contemplated.” Here teaches that the minimum safe area around a potential object/obstacle is based both on the object’s classification and the travel speed of the vehicle)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the application to modify Park to include the setting/displaying of safe/unsafe (minimum safe distances) around detected objects, wherein that minimum safe distance is based on the vehicle’s current travel speed as taught by Beller as part of the drivable area display/detection of Park (i.e. modify the drive able area detection to exclude areas within the minimum safety distances from detected objects). One would be motivated to implement this to allow for the system to give proper safety margin to other objects improving the safety of operation. Beller [0021] “Accordingly, the techniques described herein improve the technical field of autonomous and/or semi-autonomous vehicle.”
Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over modified as applied to claim 1 above, and further in view of  NPL, Murphy et al, “When to explicitly replan paths for mobile robots”
Regarding Claim 13, Park et al does not teach the use of perception events and subsequent implementation of the control policy when such an event occurs.
Murphy et al teaches a robotic planning system which includes determining a perception event (detecting that the environment has deviations in current versus expected map) which cause the robot to trigger path replanning (control policy) (Section 2 related work: “Three basic approaches to determining when to re-plan paths have been discussed in the literature: re-plan after an event (e.g., significant deviations in actual vs. expect map, actual path vs. desired path), replan after every n moves, where n 2 1, and replan at fixed locations.”)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the application to modify Park to include the replanning based on detecting deviations between a current detected map/model and expected (previous map/model) as taught by Murphy. One would be motivated to implement this selective replanning, as opposed to continuous replanning, to save on path planning computational costs. (Murphy introduction: “Continuous path planning is assumed to  be more accurate, but it may not be feasible with computationally limited robots such as a planetary rover, and/or may slow reactivity in general.”)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 20190228232 A1; US 20190347806 A1; US 20200110416 A1; US 20210004611 A1; US 12183050 B1
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KENNETH MICHAEL DUNNE whose telephone number is (571)270-7392. The examiner can normally be reached Mon-Thurs 8:30-6:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Navid Z Mehdizadeh can be reached at (571) 272-7691. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KENNETH M DUNNE/Primary Examiner, Art Unit 3669
Read full office action
Prosecution Timeline

Aug 08, 2024
Application Filed
Mar 06, 2026
Non-Final Rejection mailed — §103
Apr 30, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/120,299
Patent 12621638
V2X COMMUNICATION METHOD AND APPARATUS USING HUMAN LANGUAGE
3y 1m to grant Granted May 05, 2026
18/417,726
Patent 12600262
VEHICLE MANAGING ENERGY AT A LOCATION DURING AN EVENT
2y 2m to grant Granted Apr 14, 2026
18/399,076
Patent 12596290
DAY/NIGHT FILTER GLASS FOR AIRCRAFT CAMERA SYSTEMS
2y 3m to grant Granted Apr 07, 2026
18/517,749
Patent 12594956
METHOD FOR PROVIDING INFORMATION ON RAINY ENVIRONMENT BY REFERRING TO POINT DATA ACQUIRED FROM A LIDAR SENSOR AND COMPUTING DEVICE USING THE SAME
2y 4m to grant Granted Apr 07, 2026
18/656,055
Patent 12590815
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT
1y 10m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
77%
Grant Probability
87%
With Interview (+10.8%)
2y 5m (~7m remaining)
Median Time to Grant
Low
PTA Risk
Based on 291 resolved cases by this examiner. Grant probability derived from career allowance rate.