Last updated: April 19, 2026
Application No. 18/066,692
Detecting a Moving Object in the Passenger Compartment of a Vehicle

Non-Final OA §103
Filed
Dec 15, 2022
Examiner
ZAK, JACQUELINE ROSE
Art Unit
2666
Tech Center
2600 — Communications
Assignee
Aptiv Technologies AG
OA Round
3 (Non-Final)
Interview Optional

— -11.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 12 resolved cases, 2023–2026
Examiner Intelligence

ZAK, JACQUELINE ROSE View full profile →
Grants 67% — above average
Career Allow Rate
8 granted / 12 resolved
+4.7% vs TC avg
Minimal -11% lift
Without
With
+-11.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
46 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
5.7%
-34.3% vs TC avg
§103
56.3%
+16.3% vs TC avg
§102
21.1%
-18.9% vs TC avg
§112
13.8%
-26.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 12 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/02/2025 has been entered.
 
Claim Status
Claims 1 and 3-20 are pending for examination in the application filed 12/02/2025. Claims 1 and 19-20 have been amended and claim 2 was cancelled. 

Priority
	Acknowledgement is made of Applicant’s claim for foreign priority under 35
U.S.C. 119 (a)-(d). The certified copies have been filed in parent applications EP21215194.8 filed on 12/16/2021 and EP22210625.4 filed on 11/30/2022. 

Response to Arguments and Amendments
Applicant’s arguments, filed 12/02/2025, with respect to claims 1, 19, and 20 have been considered but are moot because the new ground of rejection does not rely on the combination of references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Drawings
Figures 1-2 are objected to as depicting a block diagram without “readily identifiable” descriptors of each block, as required by 37 CFR 1.84(n).  Rule 84(n) requires “labeled representations” of graphical symbols, such as blocks; and any that are “not universally recognized may be used, subject to approval by the Office, if they are not likely to be confused with existing conventional symbols, and if they are readily identifiable.”  In the case of Figures 1-2, the blocks are not readily identifiable per se and therefore require the insertion of text that identifies the function of that block.  That is, each vacant block should be provided with a corresponding label identifying its function or purpose.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1 and 3-20 are rejected under 35 U.S.C. 103 as being unpatentable over Porta (US20210012126A1) in view of Gronau (US20220114817A1) and Mandal (Mandal, M., Kumar, L. K., & Saran, M. S. (2020). MotionRec: A unified deep framework for moving object recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2734-2743)). 

Regarding claim 1, Porta teaches a computer implemented method, the method comprising: illuminating an inside of a passenger compartment of a vehicle using a light source ([0002] The invention relates to computer vision generally and, more particularly, to a method and/or apparatus for implementing detecting illegal use of phone to prevent the driver from getting a fine. [0005] The invention concerns an apparatus comprising an interface and a processor. The interface may be configured to receive video frames corresponding to an interior of a vehicle. [0031] In yet another example, the capture devices 102a-102n may perform depth sensing using structured light); 
obtaining a video stream of a plurality of consecutive images at a given frame rate from the illuminated inside of the passenger compartment of the vehicle using an infrared camera ([0021] The video frames may be captured by one or more video cameras directed to the interior of a vehicle. [0043] In some embodiments, the sensor 140a may implement an RGB-InfraRed (RGB-IR) sensor. [0171] For example, the CNN module 150 may perform the computer vision operations on a sequence of the video frames and the processors 106a-106n may determine the length of time based on the frame rate and the number of frames that one of the hands 502a-502b is near the ear); 
identifying moving and static objects in the video stream based on an object detection algorithm using a processor ([0079] In some embodiment, the processors 106a-106n may be configured to generate motion vectors to track the movement of objects across video frames temporally. The motion vectors may indicate a direction and/or speed of movement of an object between a current video frame and previous video frames. Tracking movements of objects may enable determining gestures (e.g., to receive input commands) and/or determine a vulnerability of an occupant (e.g., a non-moving occupant may be asleep and/or unconscious). In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object). 
Porta does not teach illuminating an inside of a passenger compartment of a vehicle using an infrared light source.
Gronau, in the same field of endeavor of object detection inside a vehicle, teaches illuminating an inside of a passenger compartment of a vehicle using an infrared light source ([0153] FIG. 2A is a flow diagram 296 illustrating steps of detecting occupancy state, including identifying objects, such as hidden objects in a vehicle cabin and in some cases providing information on the identified objects, according to one embodiment. As shown in FIG. 2A, a monitoring system 210 includes one or more illuminators such as an illuminator 274 which provides structured light with a specific illumination pattern (e.g., spots or stripes or other patterns). [0157] In particular, the illumination source may be controlled to produce or emit light in a number of spatial or two-dimensional patterns. Illumination may take the form of any of a large variety of wavelengths or ranges of wavelengths of electromagnetic energy. For instance, illumination may include electromagnetic energy of wavelengths in an optical range or portion of the electromagnetic spectrum including wavelengths in a human-visible range or portion (e.g., approximately 390 nm-750 nm) and/or wavelengths in the near-infrared (NIR) (e.g., approximately 750 nm-1400 nm) or infrared (e.g., approximately 750 nm-1 mm).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Porta with the teachings of Gronau to illuminate an inside of a passenger compartment of a vehicle using an infrared light source because "the 2D imaging device may capture images of the vehicle cabin (e.g. 2D images or images captured approximately 390 nm-750 nm and/or wavelengths in the near-infrared (NIR) (e.g., approximately 750 nm-1400 nm) or infrared (e.g., approximately 750 nm-1 mm) portions and/or the near-ultraviolet (NUV)), for example from different angels, and generate visual images of the cabin (e.g. 2D video images)" [Gronau 0139] for "detecting occupancy state, including identifying objects, such as hidden objects in a vehicle cabin and in some cases providing information on the identified objects" [Gronau 0153].  
Porta does not teach improving an image quality of only the identified moving objects in the stream based on a machine-learning algorithm using the processor; and improving the image quality of only the static objects in the video stream based on an image stacking algorithm using the processor.
Mandal, in the same field of endeavor of object detection in a video stream teaches improving an image quality of only the identified moving objects in the stream based on a machine-learning algorithm using the processor ([Abstract] In this paper we present a novel deep learning framework to perform online moving object recognition (MOR) in streaming videos. [3.2. Motion Saliency Estimation Network] Since AsFeat contains contrasting features i.e. background maps and current frame features. It provides crucial encoding to construct coarse motion saliency maps. We then extract base features from AsFeat for higher level feature abstractions. The base features are extracted from three layers (C3, C4, C5) of ResNet residual stage as in [46]. Moreover, in order to delineate semantically accurate shape representation for object categorization, we propose to incorporate certain reinforcements by parallelly extracting ResNet features for the current frame as well. The feature maps at same scales are combined for both temporal and spatial saliency aware feature representation. The MoSENet feature response is computed using Eq. (5). [pg. 2738] MotionRec takes two tensors of shape 608x608xT (past temporal history) and 608x608x3 (current frame) as input and returns the spatial coordinates with class labels for moving object instances);
and improving the image quality of only the static objects in the video stream based on an image stacking algorithm using the processor ([3.2. Motion Saliency Estimation Network] In MoSENet block, we assimilate estimated background with a temporal median (MT) and current frame (I). The pixel-wise temporal median of recent observations fortifies the background confidence by supplementing TDR response with statistical estimates. This enhances robustness of background model for different real-world scenarios such a dynamic background changes, bad weather, shadows, etc. These assimilated features are computed using Eq. (4)). 
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Porta with the teachings of Mandal to improve the image quality of only the identified moving objects using a machine learning algorithm and improve the image quality of only the static objects using an image stacking algorithm because "Discriminating moving objects from static objects and background in videos is an essential task for many computer vision applications. MOR has widespread applications in intelligent visual surveillance, intrusion detection, anomaly detection and monitoring, industrial sites monitoring, detection-based tracking, autonomous vehicles, etc. However, numerous real-world scenarios such as dynamic background changes, illumination variations, shadows, challenging environmental conditions such as rainfall, haze, etc. make recognition of relevant moving objects a challenging task" [Mandal pg. 2734]. 
 
Regarding claim 3, Porta, Gronau, and Mandal teach the method of claim 1. Porta further teaches classifying moving objects in the stream based on a neural network using the processor ([0044] The CNN module 150 may be configured to implement convolutional neural network capabilities. The CNN module 150 may be configured to implement computer vision using deep learning techniques. The CNN module 150 may be configured to implement pattern and/or image recognition using a training process through multiple layers of feature-detection. [0066] The processors 106a-106n may be configured to implement intelligent vision processors. The intelligent vision processors 106a-106n may implement multi-object classification. In one example, multi-object classification may comprise detecting multiple objects in the same video frames using parallel processing that reduces power consumption and/or computational resources compared to detecting multiple objects one object at a time. The multi-object classification may further comprise determining multiple inferences at a time (e.g., compared to first detecting whether an object exists, then detecting that the object is a driver, then determining whether the driving is holding the steering wheel, etc.)). 
 
Regarding claim 4, Porta, Gronau, and Mandal teach the method of claim 3. Porta further teaches providing a notification based on the classification of the moving objects using the processor ([0005] The processor may be configured to perform video operations on the video frames to detect objects in the video frames, detect a driver based on the objects detected in the video frames, detect a use of an electronic device by the driver and generate a notification signal. The notification signal may be configured to warn the driver about using the electronic device in the vehicle).
 
Regarding claim 5, Porta, Gronau, and Mandal teach the method of claim 4. Porta further teaches classifying the identified static objects in the stream based on a neural network using the processor ([0079] Tracking movements of objects may enable determining gestures (e.g., to receive input commands) and/or determine a vulnerability of an occupant (e.g., a non-moving occupant may be asleep and/or unconscious). In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. [0044] The CNN module 150 may be configured to implement convolutional neural network capabilities. The CNN module 150 may be configured to implement computer vision using deep learning techniques. The CNN module 150 may be configured to implement pattern and/or image recognition using a training process through multiple layers of feature-detection. [0066] The processors 106a-106n may be configured to implement intelligent vision processors. The intelligent vision processors 106a-106n may implement multi-object classification. In one example, multi-object classification may comprise detecting multiple objects in the same video frames using parallel processing that reduces power consumption and/or computational resources compared to detecting multiple objects one object at a time. The multi-object classification may further comprise determining multiple inferences at a time (e.g., compared to first detecting whether an object exists, then detecting that the object is a driver, then determining whether the driving is holding the steering wheel, etc.)).
 
Regarding claim 6, Porta, Gronau, and Mandal teach the method of claim 4. Porta further teaches taking an action based on the classification of the moving objects using the processor ([0005] The processor may be configured to perform video operations on the video frames to detect objects in the video frames, detect a driver based on the objects detected in the video frames, detect a use of an electronic device by the driver and generate a notification signal. The notification signal may be configured to warn the driver about using the electronic device in the vehicle. [0135] The notification signal VCTRL may be generated to enable a response that warns the driver 402a about the unauthorized usage. In one example, the response may be to perform an audible noise to alert the driver 402a. One of the actuators 116 may be a speaker and the signal VCTRL may cause the speaker to generate audio. In an example, the audio may be a warning sound (e.g., similar to a no seatbelt warning). In another example, the audio may be a pre-recorded message (e.g., a voice speaking a message such as, “It is against the law to use a mobile phone while driving. Please put the phone down and pay attention to the road, or you may receive a fine.”). In another example, the response may be to activate a warning light and/or display a message on an infotainment system. One of the actuators 116 may be a warning light. For example, the dashboard may provide a warning light (e.g., similar to an engine light) that represents a warning about using a smartphone. One of the actuators 116 may be the infotainment system. For example, the infotainment system may print out a message (e.g., “put the phone down”) and/or display an icon (e.g., an image of a phone with an X drawn over the phone). The number and/or type of notifications and/or responses performed may be varied according to the design criteria of a particular implementation).

Regarding claim 7, Porta, Gronau, and Mandal teach the method of claim 3. Porta further teaches taking an action based on the classification of the moving objects using the processor ([0005] The processor may be configured to perform video operations on the video frames to detect objects in the video frames, detect a driver based on the objects detected in the video frames, detect a use of an electronic device by the driver and generate a notification signal. The notification signal may be configured to warn the driver about using the electronic device in the vehicle. [0135] The notification signal VCTRL may be generated to enable a response that warns the driver 402a about the unauthorized usage. In one example, the response may be to perform an audible noise to alert the driver 402a. One of the actuators 116 may be a speaker and the signal VCTRL may cause the speaker to generate audio. In an example, the audio may be a warning sound (e.g., similar to a no seatbelt warning). In another example, the audio may be a pre-recorded message (e.g., a voice speaking a message such as, “It is against the law to use a mobile phone while driving. Please put the phone down and pay attention to the road, or you may receive a fine.”). In another example, the response may be to activate a warning light and/or display a message on an infotainment system. One of the actuators 116 may be a warning light. For example, the dashboard may provide a warning light (e.g., similar to an engine light) that represents a warning about using a smartphone. One of the actuators 116 may be the infotainment system. For example, the infotainment system may print out a message (e.g., “put the phone down”) and/or display an icon (e.g., an image of a phone with an X drawn over the phone). The number and/or type of notifications and/or responses performed may be varied according to the design criteria of a particular implementation).

Regarding claim 8, Porta, Gronau, and Mandal teach the method of claim 7. Porta further teaches classifying the identified static objects in the stream based on a neural network using the processor ([0079] Tracking movements of objects may enable determining gestures (e.g., to receive input commands) and/or determine a vulnerability of an occupant (e.g., a non-moving occupant may be asleep and/or unconscious). In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. [0044] The CNN module 150 may be configured to implement convolutional neural network capabilities. The CNN module 150 may be configured to implement computer vision using deep learning techniques. The CNN module 150 may be configured to implement pattern and/or image recognition using a training process through multiple layers of feature-detection. [0066] The processors 106a-106n may be configured to implement intelligent vision processors. The intelligent vision processors 106a-106n may implement multi-object classification. In one example, multi-object classification may comprise detecting multiple objects in the same video frames using parallel processing that reduces power consumption and/or computational resources compared to detecting multiple objects one object at a time. The multi-object classification may further comprise determining multiple inferences at a time (e.g., compared to first detecting whether an object exists, then detecting that the object is a driver, then determining whether the driving is holding the steering wheel, etc.)).
 
Regarding claim 9, Porta, Gronau, and Mandal teach the method of claim 3. Porta further teaches classifying the identified static objects in the stream based on a neural network using the processor ([0079] Tracking movements of objects may enable determining gestures (e.g., to receive input commands) and/or determine a vulnerability of an occupant (e.g., a non-moving occupant may be asleep and/or unconscious). In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. [0044] The CNN module 150 may be configured to implement convolutional neural network capabilities. The CNN module 150 may be configured to implement computer vision using deep learning techniques. The CNN module 150 may be configured to implement pattern and/or image recognition using a training process through multiple layers of feature-detection. [0066] The processors 106a-106n may be configured to implement intelligent vision processors. The intelligent vision processors 106a-106n may implement multi-object classification. In one example, multi-object classification may comprise detecting multiple objects in the same video frames using parallel processing that reduces power consumption and/or computational resources compared to detecting multiple objects one object at a time. The multi-object classification may further comprise determining multiple inferences at a time (e.g., compared to first detecting whether an object exists, then detecting that the object is a driver, then determining whether the driving is holding the steering wheel, etc.)).
 
Regarding claim 10, Porta, Gronau, and Mandal teach the method of claim 1. Porta further teaches classifying the identified static objects in the stream based on a neural network using the processor ([0079] Tracking movements of objects may enable determining gestures (e.g., to receive input commands) and/or determine a vulnerability of an occupant (e.g., a non-moving occupant may be asleep and/or unconscious). In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. [0044] The CNN module 150 may be configured to implement convolutional neural network capabilities. The CNN module 150 may be configured to implement computer vision using deep learning techniques. The CNN module 150 may be configured to implement pattern and/or image recognition using a training process through multiple layers of feature-detection. [0066] The processors 106a-106n may be configured to implement intelligent vision processors. The intelligent vision processors 106a-106n may implement multi-object classification. In one example, multi-object classification may comprise detecting multiple objects in the same video frames using parallel processing that reduces power consumption and/or computational resources compared to detecting multiple objects one object at a time. The multi-object classification may further comprise determining multiple inferences at a time (e.g., compared to first detecting whether an object exists, then detecting that the object is a driver, then determining whether the driving is holding the steering wheel, etc.)).
 
Regarding claim 11, Porta, Gronau, and Mandal teach the method of claim 10. Porta further teaches providing a notification based on the classification of the static objects using the processor ([0079] In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. For example, the windshield may be tracked over time to determine that visibility has been reduced and/or increased (e.g., due to frost forming and/or disappearing). [0160] The processors 106a-106n may perform the computer vision operations on the example video frame 500 to determine whether the driver 402a′ is performing an unauthorized use of the electronic device (e.g., the smartphone 408′ and/or the infotainment system 520). For example, some drivers may attempt to hide usage of the smartphone 408′ by keeping the smartphone 408′ below the level of the windshield 510 (e.g., on their lap) and glancing downwards to read and/or type text message. Hiding the smartphone 408′ below the level of the windshield 510 may prevent traffic enforcement officers from catching the driver 402a′ texting but would not prevent the apparatus 100 from providing the notification signal VCTRL. [0005] The processor may be configured to perform video operations on the video frames to detect objects in the video frames, detect a driver based on the objects detected in the video frames, detect a use of an electronic device by the driver and generate a notification signal).
 
Regarding claim 12, Porta, Gronau, and Mandal teach the method of claim 11. Porta further teaches taking an action based on the classification of the static objects using the processor ([0079] In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. For example, the windshield may be tracked over time to determine that visibility has been reduced and/or increased (e.g., due to frost forming and/or disappearing). [0160] The processors 106a-106n may perform the computer vision operations on the example video frame 500 to determine whether the driver 402a′ is performing an unauthorized use of the electronic device (e.g., the smartphone 408′ and/or the infotainment system 520). For example, some drivers may attempt to hide usage of the smartphone 408′ by keeping the smartphone 408′ below the level of the windshield 510 (e.g., on their lap) and glancing downwards to read and/or type text message. Hiding the smartphone 408′ below the level of the windshield 510 may prevent traffic enforcement officers from catching the driver 402a′ texting but would not prevent the apparatus 100 from providing the notification signal VCTRL. [0135] The notification signal VCTRL may be generated to enable a response that warns the driver 402a about the unauthorized usage. In one example, the response may be to perform an audible noise to alert the driver 402a. One of the actuators 116 may be a speaker and the signal VCTRL may cause the speaker to generate audio. In an example, the audio may be a warning sound (e.g., similar to a no seatbelt warning). In another example, the audio may be a pre-recorded message (e.g., a voice speaking a message such as, “It is against the law to use a mobile phone while driving. Please put the phone down and pay attention to the road, or you may receive a fine.”). In another example, the response may be to activate a warning light and/or display a message on an infotainment system. One of the actuators 116 may be a warning light. For example, the dashboard may provide a warning light (e.g., similar to an engine light) that represents a warning about using a smartphone. One of the actuators 116 may be the infotainment system. For example, the infotainment system may print out a message (e.g., “put the phone down”) and/or display an icon (e.g., an image of a phone with an X drawn over the phone). The number and/or type of notifications and/or responses performed may be varied according to the design criteria of a particular implementation).
 
Regarding claim 13, Porta, Gronau, and Mandal teach the method of claim 10. Porta further teaches taking an action based on the classification of the static objects using the processor ([0079] In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. For example, the windshield may be tracked over time to determine that visibility has been reduced and/or increased (e.g., due to frost forming and/or disappearing). [0160] The processors 106a-106n may perform the computer vision operations on the example video frame 500 to determine whether the driver 402a′ is performing an unauthorized use of the electronic device (e.g., the smartphone 408′ and/or the infotainment system 520). For example, some drivers may attempt to hide usage of the smartphone 408′ by keeping the smartphone 408′ below the level of the windshield 510 (e.g., on their lap) and glancing downwards to read and/or type text message. Hiding the smartphone 408′ below the level of the windshield 510 may prevent traffic enforcement officers from catching the driver 402a′ texting but would not prevent the apparatus 100 from providing the notification signal VCTRL. [0135] The notification signal VCTRL may be generated to enable a response that warns the driver 402a about the unauthorized usage. In one example, the response may be to perform an audible noise to alert the driver 402a. One of the actuators 116 may be a speaker and the signal VCTRL may cause the speaker to generate audio. In an example, the audio may be a warning sound (e.g., similar to a no seatbelt warning). In another example, the audio may be a pre-recorded message (e.g., a voice speaking a message such as, “It is against the law to use a mobile phone while driving. Please put the phone down and pay attention to the road, or you may receive a fine.”). In another example, the response may be to activate a warning light and/or display a message on an infotainment system. One of the actuators 116 may be a warning light. For example, the dashboard may provide a warning light (e.g., similar to an engine light) that represents a warning about using a smartphone. One of the actuators 116 may be the infotainment system. For example, the infotainment system may print out a message (e.g., “put the phone down”) and/or display an icon (e.g., an image of a phone with an X drawn over the phone). The number and/or type of notifications and/or responses performed may be varied according to the design criteria of a particular implementation).
 
Regarding claim 14, Porta, Gronau, and Mandal teach the method of claim 1. Gronau further teaches wherein the object detection algorithm is a semantic segmentation algorithm ([0195] In some cases, the analysis is performed using one or more of computer vision and machine learning algorithms. These may include but are not limited to Neural Network such as Convolutional Neural Network detection algorithms configured to detect objects such as one or more persons in the vehicle cabin. [0197] Alternatively or in combination, the computer vision and machine learning algorithms may include Neural Network detection algorithms configured to segment the multiple images of the vehicle cabin and specifically the passengers in the multiple images based on for example the 2D and 3D images. For example, computer vision and machine learning algorithms may include comparing 3D data to an empty car 3D data. In some cases, the segmentation can be obtained by a segmentation algorithm such as a semantic segmentation neural network).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the method of Porta with the teachings of Gronau to use semantic segmentation because "Neural Networks such as one or more Convolutional Neural Networks for detecting people, networks that specifically detect the face, hands, torso and other body parts, networks that can segment the image and specifically the passengers in the image based on the 2D and 3D images, algorithms that can calculate the volume of objects and people and algorithms that can determine if there is motion in a certain region of the car" [Gronau 0165]. 
 
Regarding claim 15, Porta, Gronau, and Mandal teach the method of claim 1. Porta further teaches classifying moving objects in the stream based on a neural network using the processor ([0044] The CNN module 150 may be configured to implement convolutional neural network capabilities. The CNN module 150 may be configured to implement computer vision using deep learning techniques. The CNN module 150 may be configured to implement pattern and/or image recognition using a training process through multiple layers of feature-detection. [0066] The processors 106a-106n may be configured to implement intelligent vision processors. The intelligent vision processors 106a-106n may implement multi-object classification. In one example, multi-object classification may comprise detecting multiple objects in the same video frames using parallel processing that reduces power consumption and/or computational resources compared to detecting multiple objects one object at a time. The multi-object classification may further comprise determining multiple inferences at a time (e.g., compared to first detecting whether an object exists, then detecting that the object is a driver, then determining whether the driving is holding the steering wheel, etc.)). 
 
Regarding claim 16, Porta, Gronau, and Mandal teach the method of claim 1. Porta further teaches classifying the identified moving objects in the stream based on a neural network using the processor ([0044] The CNN module 150 may be configured to implement convolutional neural network capabilities. The CNN module 150 may be configured to implement computer vision using deep learning techniques. The CNN module 150 may be configured to implement pattern and/or image recognition using a training process through multiple layers of feature-detection. [0066] The processors 106a-106n may be configured to implement intelligent vision processors. The intelligent vision processors 106a-106n may implement multi-object classification. In one example, multi-object classification may comprise detecting multiple objects in the same video frames using parallel processing that reduces power consumption and/or computational resources compared to detecting multiple objects one object at a time. The multi-object classification may further comprise determining multiple inferences at a time (e.g., compared to first detecting whether an object exists, then detecting that the object is a driver, then determining whether the driving is holding the steering wheel, etc.)); 
providing a notification based on the classification of the identified moving objects using the processor ([0005] The processor may be configured to perform video operations on the video frames to detect objects in the video frames, detect a driver based on the objects detected in the video frames, detect a use of an electronic device by the driver and generate a notification signal. The notification signal may be configured to warn the driver about using the electronic device in the vehicle); 
taking an action based on the classification of the identified moving objects using the processor ([0005] The processor may be configured to perform video operations on the video frames to detect objects in the video frames, detect a driver based on the objects detected in the video frames, detect a use of an electronic device by the driver and generate a notification signal. The notification signal may be configured to warn the driver about using the electronic device in the vehicle. [0135] The notification signal VCTRL may be generated to enable a response that warns the driver 402a about the unauthorized usage. In one example, the response may be to perform an audible noise to alert the driver 402a. One of the actuators 116 may be a speaker and the signal VCTRL may cause the speaker to generate audio. In an example, the audio may be a warning sound (e.g., similar to a no seatbelt warning). In another example, the audio may be a pre-recorded message (e.g., a voice speaking a message such as, “It is against the law to use a mobile phone while driving. Please put the phone down and pay attention to the road, or you may receive a fine.”). In another example, the response may be to activate a warning light and/or display a message on an infotainment system. One of the actuators 116 may be a warning light. For example, the dashboard may provide a warning light (e.g., similar to an engine light) that represents a warning about using a smartphone. One of the actuators 116 may be the infotainment system. For example, the infotainment system may print out a message (e.g., “put the phone down”) and/or display an icon (e.g., an image of a phone with an X drawn over the phone). The number and/or type of notifications and/or responses performed may be varied according to the design criteria of a particular implementation); 
and classifying the identified static objects in the stream based on a neural network using the processor ([0079] Tracking movements of objects may enable determining gestures (e.g., to receive input commands) and/or determine a vulnerability of an occupant (e.g., a non-moving occupant may be asleep and/or unconscious). In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. [0044] The CNN module 150 may be configured to implement convolutional neural network capabilities. The CNN module 150 may be configured to implement computer vision using deep learning techniques. The CNN module 150 may be configured to implement pattern and/or image recognition using a training process through multiple layers of feature-detection. [0066] The processors 106a-106n may be configured to implement intelligent vision processors. The intelligent vision processors 106a-106n may implement multi-object classification. In one example, multi-object classification may comprise detecting multiple objects in the same video frames using parallel processing that reduces power consumption and/or computational resources compared to detecting multiple objects one object at a time. The multi-object classification may further comprise determining multiple inferences at a time (e.g., compared to first detecting whether an object exists, then detecting that the object is a driver, then determining whether the driving is holding the steering wheel, etc.)).
 
Regarding claim 17, Porta, Gronau, and Mandal teach the method of claim 16. Porta further teaches providing a notification based on the classification of the static objects using the processor ([0079] In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. For example, the windshield may be tracked over time to determine that visibility has been reduced and/or increased (e.g., due to frost forming and/or disappearing). [0160] The processors 106a-106n may perform the computer vision operations on the example video frame 500 to determine whether the driver 402a′ is performing an unauthorized use of the electronic device (e.g., the smartphone 408′ and/or the infotainment system 520). For example, some drivers may attempt to hide usage of the smartphone 408′ by keeping the smartphone 408′ below the level of the windshield 510 (e.g., on their lap) and glancing downwards to read and/or type text message. Hiding the smartphone 408′ below the level of the windshield 510 may prevent traffic enforcement officers from catching the driver 402a′ texting but would not prevent the apparatus 100 from providing the notification signal VCTRL. [0005] The processor may be configured to perform video operations on the video frames to detect objects in the video frames, detect a driver based on the objects detected in the video frames, detect a use of an electronic device by the driver and generate a notification signal).
 
Regarding claim 18, Porta, Gronau, and Mandal teach the method of claim 17. Porta further teaches taking an action based on the classification of the static objects using the processor ([0079] In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object. For example, the windshield may be tracked over time to determine that visibility has been reduced and/or increased (e.g., due to frost forming and/or disappearing). [0160] The processors 106a-106n may perform the computer vision operations on the example video frame 500 to determine whether the driver 402a′ is performing an unauthorized use of the electronic device (e.g., the smartphone 408′ and/or the infotainment system 520). For example, some drivers may attempt to hide usage of the smartphone 408′ by keeping the smartphone 408′ below the level of the windshield 510 (e.g., on their lap) and glancing downwards to read and/or type text message. Hiding the smartphone 408′ below the level of the windshield 510 may prevent traffic enforcement officers from catching the driver 402a′ texting but would not prevent the apparatus 100 from providing the notification signal VCTRL. [0135] The notification signal VCTRL may be generated to enable a response that warns the driver 402a about the unauthorized usage. In one example, the response may be to perform an audible noise to alert the driver 402a. One of the actuators 116 may be a speaker and the signal VCTRL may cause the speaker to generate audio. In an example, the audio may be a warning sound (e.g., similar to a no seatbelt warning). In another example, the audio may be a pre-recorded message (e.g., a voice speaking a message such as, “It is against the law to use a mobile phone while driving. Please put the phone down and pay attention to the road, or you may receive a fine.”). In another example, the response may be to activate a warning light and/or display a message on an infotainment system. One of the actuators 116 may be a warning light. For example, the dashboard may provide a warning light (e.g., similar to an engine light) that represents a warning about using a smartphone. One of the actuators 116 may be the infotainment system. For example, the infotainment system may print out a message (e.g., “put the phone down”) and/or display an icon (e.g., an image of a phone with an X drawn over the phone). The number and/or type of notifications and/or responses performed may be varied according to the design criteria of a particular implementation).
 
Regarding claim 19, Porta teaches a system comprising at least one processor configured to: illuminate an inside of a passenger compartment of a vehicle using a light source ([0002] The invention relates to computer vision generally and, more particularly, to a method and/or apparatus for implementing detecting illegal use of phone to prevent the driver from getting a fine. [0005] The invention concerns an apparatus comprising an interface and a processor. The interface may be configured to receive video frames corresponding to an interior of a vehicle. [0031] In yet another example, the capture devices 102a-102n may perform depth sensing using structured light); 
obtain a video stream of a plurality of consecutive images at a given frame rate from the illuminated inside of the passenger compartment of the vehicle using an infrared camera ([0021] The video frames may be captured by one or more video cameras directed to the interior of a vehicle. [0043] In some embodiments, the sensor 140a may implement an RGB-InfraRed (RGB-IR) sensor. [0171] For example, the CNN module 150 may perform the computer vision operations on a sequence of the video frames and the processors 106a-106n may determine the length of time based on the frame rate and the number of frames that one of the hands 502a-502b is near the ear); 
identify moving and static objects in the video stream based on an object detection algorithm using a processor ([0079] In some embodiment, the processors 106a-106n may be configured to generate motion vectors to track the movement of objects across video frames temporally. The motion vectors may indicate a direction and/or speed of movement of an object between a current video frame and previous video frames. Tracking movements of objects may enable determining gestures (e.g., to receive input commands) and/or determine a vulnerability of an occupant (e.g., a non-moving occupant may be asleep and/or unconscious). In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object). 
Porta does not teach illuminate an inside of a passenger compartment of a vehicle using an infrared light source.
Gronau, in the same field of endeavor of object detection inside a vehicle, teaches illuminate an inside of a passenger compartment of a vehicle using an infrared light source ([0153] FIG. 2A is a flow diagram 296 illustrating steps of detecting occupancy state, including identifying objects, such as hidden objects in a vehicle cabin and in some cases providing information on the identified objects, according to one embodiment. As shown in FIG. 2A, a monitoring system 210 includes one or more illuminators such as an illuminator 274 which provides structured light with a specific illumination pattern (e.g., spots or stripes or other patterns). [0157] In particular, the illumination source may be controlled to produce or emit light in a number of spatial or two-dimensional patterns. Illumination may take the form of any of a large variety of wavelengths or ranges of wavelengths of electromagnetic energy. For instance, illumination may include electromagnetic energy of wavelengths in an optical range or portion of the electromagnetic spectrum including wavelengths in a human-visible range or portion (e.g., approximately 390 nm-750 nm) and/or wavelengths in the near-infrared (NIR) (e.g., approximately 750 nm-1400 nm) or infrared (e.g., approximately 750 nm-1 mm).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the system of Porta with the teachings of Gronau to illuminate an inside of a passenger compartment of a vehicle using an infrared light source because "the 2D imaging device may capture images of the vehicle cabin (e.g. 2D images or images captured approximately 390 nm-750 nm and/or wavelengths in the near-infrared (NIR) (e.g., approximately 750 nm-1400 nm) or infrared (e.g., approximately 750 nm-1 mm) portions and/or the near-ultraviolet (NUV)), for example from different angels, and generate visual images of the cabin (e.g. 2D video images)" [Gronau 0139] for "detecting occupancy state, including identifying objects, such as hidden objects in a vehicle cabin and in some cases providing information on the identified objects" [Gronau 0153]. 
Porta does not teach improve an image quality of only the identified moving objects in the stream based on a machine-learning algorithm using the processor; and improving the image quality of only the static objects in the video stream based on an image stacking algorithm using the processor.
Mandal, in the same field of endeavor of object detection in a video stream teaches improve an image quality of only the identified moving objects in the stream based on a machine-learning algorithm using the processor ([Abstract] In this paper we present a novel deep learning framework to perform online moving object recognition (MOR) in streaming videos. [3.2. Motion Saliency Estimation Network] Since AsFeat contains contrasting features i.e. background maps and current frame features. It provides crucial encoding to construct coarse motion saliency maps. We then extract base features from AsFeat for higher level feature abstractions. The base features are extracted from three layers (C3, C4, C5) of ResNet residual stage as in [46]. Moreover, in order to delineate semantically accurate shape representation for object categorization, we propose to incorporate certain reinforcements by parallelly extracting ResNet features for the current frame as well. The feature maps at same scales are combined for both temporal and spatial saliency aware feature representation. The MoSENet feature response is computed using Eq. (5). [pg. 2738] MotionRec takes two tensors of shape 608x608xT (past temporal history) and 608x608x3 (current frame) as input and returns the spatial coordinates with class labels for moving object instances);
and improving the image quality of only the static objects in the video stream based on an image stacking algorithm using the processor ([3.2. Motion Saliency Estimation Network] In MoSENet block, we assimilate estimated background with a temporal median (MT) and current frame (I). The pixel-wise temporal median of recent observations fortifies the background confidence by supplementing TDR response with statistical estimates. This enhances robustness of background model for different real-world scenarios such a dynamic background changes, bad weather, shadows, etc. These assimilated features are computed using Eq. (4)). 
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the system of Porta with the teachings of Mandal to improve the image quality of only the identified moving objects using a machine learning algorithm and improve the image quality of only the static objects using an image stacking algorithm because "Discriminating moving objects from static objects and background in videos is an essential task for many computer vision applications. MOR has widespread applications in intelligent visual surveillance, intrusion detection, anomaly detection and monitoring, industrial sites monitoring, detection-based tracking, autonomous vehicles, etc. However, numerous real-world scenarios such as dynamic background changes, illumination variations, shadows, challenging environmental conditions such as rainfall, haze, etc. make recognition of relevant moving objects a challenging task" [Mandal pg. 2734].

Regarding claim 20, Porta teaches a non-transitory computer readable medium comprising instructions that, when executed, configure at least one processor to: illuminate an inside of a passenger compartment of a vehicle using a light source ([0188] The invention thus may also include a computer product which may be a storage medium or media and/or a transmission medium or media including instructions which may be used to program a machine to perform one or more processes or methods in accordance with the invention. [0005] The invention concerns an apparatus comprising an interface and a processor. The interface may be configured to receive video frames corresponding to an interior of a vehicle. [0031] In yet another example, the capture devices 102a-102n may perform depth sensing using structured light); 
obtain a video stream of a plurality of consecutive images at a given frame rate from the illuminated inside of the passenger compartment of the vehicle using an infrared camera ([0021] The video frames may be captured by one or more video cameras directed to the interior of a vehicle. [0043] In some embodiments, the sensor 140a may implement an RGB-InfraRed (RGB-IR) sensor. [0171] For example, the CNN module 150 may perform the computer vision operations on a sequence of the video frames and the processors 106a-106n may determine the length of time based on the frame rate and the number of frames that one of the hands 502a-502b is near the ear); 
identify moving and static objects in the video stream based on an object detection algorithm using a processor ([0079] In some embodiment, the processors 106a-106n may be configured to generate motion vectors to track the movement of objects across video frames temporally. The motion vectors may indicate a direction and/or speed of movement of an object between a current video frame and previous video frames. Tracking movements of objects may enable determining gestures (e.g., to receive input commands) and/or determine a vulnerability of an occupant (e.g., a non-moving occupant may be asleep and/or unconscious). In another example, tracking a static object across video frames temporally may be implemented to determine a status of an object). 
Porta does not teach illuminate an inside of a passenger compartment of a vehicle using an infrared light source.
Gronau, in the same field of endeavor of object detection inside a vehicle, teaches illuminate an inside of a passenger compartment of a vehicle using an infrared light source ([0153] FIG. 2A is a flow diagram 296 illustrating steps of detecting occupancy state, including identifying objects, such as hidden objects in a vehicle cabin and in some cases providing information on the identified objects, according to one embodiment. As shown in FIG. 2A, a monitoring system 210 includes one or more illuminators such as an illuminator 274 which provides structured light with a specific illumination pattern (e.g., spots or stripes or other patterns). [0157] In particular, the illumination source may be controlled to produce or emit light in a number of spatial or two-dimensional patterns. Illumination may take the form of any of a large variety of wavelengths or ranges of wavelengths of electromagnetic energy. For instance, illumination may include electromagnetic energy of wavelengths in an optical range or portion of the electromagnetic spectrum including wavelengths in a human-visible range or portion (e.g., approximately 390 nm-750 nm) and/or wavelengths in the near-infrared (NIR) (e.g., approximately 750 nm-1400 nm) or infrared (e.g., approximately 750 nm-1 mm).
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the medium of Porta with the teachings of Gronau to illuminate an inside of a passenger compartment of a vehicle using an infrared light source because "the 2D imaging device may capture images of the vehicle cabin (e.g. 2D images or images captured approximately 390 nm-750 nm and/or wavelengths in the near-infrared (NIR) (e.g., approximately 750 nm-1400 nm) or infrared (e.g., approximately 750 nm-1 mm) portions and/or the near-ultraviolet (NUV)), for example from different angels, and generate visual images of the cabin (e.g. 2D video images)" [Gronau 0139] for "detecting occupancy state, including identifying objects, such as hidden objects in a vehicle cabin and in some cases providing information on the identified objects" [Gronau 0153]. 
Porta does not teach improve an image quality of only the identified moving objects in the stream based on a machine-learning algorithm using the processor; and improving the image quality of only the static objects in the video stream based on an image stacking algorithm using the processor.
Mandal, in the same field of endeavor of object detection in a video stream teaches improve an image quality of only the identified moving objects in the stream based on a machine-learning algorithm using the processor ([Abstract] In this paper we present a novel deep learning framework to perform online moving object recognition (MOR) in streaming videos. [3.2. Motion Saliency Estimation Network] Since AsFeat contains contrasting features i.e. background maps and current frame features. It provides crucial encoding to construct coarse motion saliency maps. We then extract base features from AsFeat for higher level feature abstractions. The base features are extracted from three layers (C3, C4, C5) of ResNet residual stage as in [46]. Moreover, in order to delineate semantically accurate shape representation for object categorization, we propose to incorporate certain reinforcements by parallelly extracting ResNet features for the current frame as well. The feature maps at same scales are combined for both temporal and spatial saliency aware feature representation. The MoSENet feature response is computed using Eq. (5). [pg. 2738] MotionRec takes two tensors of shape 608x608xT (past temporal history) and 608x608x3 (current frame) as input and returns the spatial coordinates with class labels for moving object instances);
and improving the image quality of only the static objects in the video stream based on an image stacking algorithm using the processor ([3.2. Motion Saliency Estimation Network] In MoSENet block, we assimilate estimated background with a temporal median (MT) and current frame (I). The pixel-wise temporal median of recent observations fortifies the background confidence by supplementing TDR response with statistical estimates. This enhances robustness of background model for different real-world scenarios such a dynamic background changes, bad weather, shadows, etc. These assimilated features are computed using Eq. (4)). 
Therefore, it would have been obvious to a person of ordinary skill in the art at the time that the invention was made to modify the medium of Porta with the teachings of Mandal to improve the image quality of only the identified moving objects using a machine learning algorithm and improve the image quality of only the static objects using an image stacking algorithm because "Discriminating moving objects from static objects and background in videos is an essential task for many computer vision applications. MOR has widespread applications in intelligent visual surveillance, intrusion detection, anomaly detection and monitoring, industrial sites monitoring, detection-based tracking, autonomous vehicles, etc. However, numerous real-world scenarios such as dynamic background changes, illumination variations, shadows, challenging environmental conditions such as rainfall, haze, etc. make recognition of relevant moving objects a challenging task" [Mandal pg. 2734].

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jacqueline R Zak whose telephone number is (571) 272-4077. The examiner can normally be reached M-F 9-5. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, please use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached at (571) 270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JACQUELINE R ZAK/Examiner, Art Unit 2666                                                                                                                                                                                                        
/EMILY C TERRELL/Supervisory Patent Examiner, Art Unit 2666
Read full office action
Prosecution Timeline

Dec 15, 2022
Application Filed
Mar 25, 2025
Non-Final Rejection — §103
Jul 24, 2025
Applicant Interview (Telephonic)
Jul 24, 2025
Response Filed
Jul 24, 2025
Examiner Interview Summary
Aug 29, 2025
Final Rejection — §103
Nov 03, 2025
Response after Non-Final Action
Dec 02, 2025
Request for Continued Examination
Dec 16, 2025
Response after Non-Final Action
Jan 28, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/175,738
Patent 12586340
PIXEL PERSPECTIVE ESTIMATION AND REFINEMENT IN AN IMAGE
2y 5m to grant Granted Mar 24, 2026
18/012,667
Patent 12462343
MEDICAL DIAGNOSTIC APPARATUS AND METHOD FOR EVALUATION OF PATHOLOGICAL CONDITIONS USING 3D OPTICAL COHERENCE TOMOGRAPHY DATA AND IMAGES
2y 5m to grant Granted Nov 04, 2025
17/924,432
Patent 12373946
ASSAY READING METHOD
2y 5m to grant Granted Jul 29, 2025
Study what changed to get past this examiner. Based on 3 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
55%
With Interview (-11.4%)
2y 10m
Median Time to Grant
High
PTA Risk
Based on 12 resolved cases by this examiner. Grant probability derived from career allow rate.
Detecting a Moving Object in the Passenger Compartment of a Vehicle

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email