Last updated: April 19, 2026
Application No. 18/035,993
MULTI-VIEW MEDICAL ACTIVITY RECOGNITION SYSTEMS AND METHODS

Non-Final OA §101§103
Filed
May 09, 2023
Examiner
BROUGHTON, KATHLEEN M
Art Unit
2661
Tech Center
2600 — Communications
Assignee
Intuitive Surgical Operations, Inc.
OA Round
3 (Non-Final)
Interview Optional

— +8.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 263 resolved cases, 2023–2026
Examiner Intelligence

BROUGHTON, KATHLEEN M View full profile →
Grants 83% — above average
Career Allow Rate
219 granted / 263 resolved
+21.3% vs TC avg
Moderate +8% lift
Without
With
+8.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
34 currently pending
Career history
297
Total Applications
across all art units
Statute-Specific Performance

§101
10.9%
-29.1% vs TC avg
§103
51.2%
+11.2% vs TC avg
§102
24.1%
-15.9% vs TC avg
§112
11.4%
-28.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 263 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 28, 2026 has been entered.

Response to Amendment
Receipt is acknowledged of claim amendments with associated arguments/remarks, received January 28, 2026. Claims 1, 3-11, 13-19 are pending in which claims 1, 3, 4, 11, 13, 14, 19 were amended. Claims 2, 12, 20-26 were cancelled.

Response to Arguments
Applicant’s arguments, see Remarks, pg 10, filed 01/28/2026, with respect to the rejection of claim 1-20 under 35 USC § 101 has been fully considered but is not persuasive. 
The applicant disagrees with, and traverses, the examiner’s analysis arguing the limitations are “significantly more” than an abstract idea. Applicant has amended the independent claims to include a limitation to “fuse” the first sensor image with the second sensor image as well as to “automate” a task based on the identified scene and procedural step identified in the images. However, as examiner has provided notice to applicant in previous office actions (most recently 10/28/2025) and interviews (most recently 12/16/2025), the claims are recited at a high level of generality (claim limitation for “sensor” rather than a specific type of sensor that would not be equivalent to a means to capture images similar to an eye and process the scene in a way beyond how the mind processes visual information input from the eyes, recognizing the eyes are equivalent to a stereo camera taking in visual data simultaneously. Furthermore, as discussed under MPEP § 2106.05(a), generically recited claim limitations of a generic task being automated is not considered an improvement to the functioning of a computer or to any other technological field when the specific animation tasks are not integrated into the claim language. The applicant has not recited how the claim language overcomes the rejection. Respectfully, the examiner is not persuaded and the rejection is maintained.

Applicant’s arguments, see pg 10-13, filed January 28, 2026, with respect to the rejections of claim 1, 3-11, 13-19 under 35 USC §§ 102, 103 has been fully considered and is persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new grounds of rejection is made under Wu et al (US 2021/0158937) in view of Jarc et al (US 2019/0090969).

	
Information Disclosure Statement
The information disclosure statement (IDS) submitted on January 30, 2026 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is considered by examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 3-11, 13-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claim 1 recites a system (the mind is a system) comprising: a memory storing instructions (the mind can store instructions); and a processor communicatively coupled to the memory and configured to execute the instructions (the mind can process information, including instructions) to: 
access a plurality of data streams representing imagery of a scene including a medical setting of a medical session captured by a plurality of sensors from a plurality of viewpoints (the mind can process information captured through sensory organs, such as the eyes (plural sensors and datastreams from different viewpoints), during a medical session or procedure in a medical setting), 
the plurality of sensors including a dynamic sensor capturing the imagery including the medical setting from a dynamic viewpoint that changes during the medical session (the eyes are dynamic by adjusting to the surroundings, including during a medical session or procedure in a medical setting); 
temporally align the plurality of data streams (the mind aligns image sensory information to gain a single perspective); 
determine, based on fused data generated by a viewpoint agnostic machine learning model and based on the plurality of data streams, an activity within the scene (the eyes collect information indiscriminately and the mind collectively (fuse) process such viewed image of a scene, including during a medical session or procedure), the activity comprising a predefined phase of the medical session (the eyes collect information during any and each phase (described as preoperative, operative, postoperative, specification ¶ [0060]) of a medical procedure; and 
automating, based on the determining the activity, a task associated with the medical session (generic automation is not considered a significant improvement, see MPEP § 2106.05(a); the mind can automatically recognize a next task associated with a familiar procedure when identifying a current task). 
Claim 3 recites the system of claim 1 (as described above), wherein: the plurality of data streams comprises a first data stream and a second data stream (the left eye and the right eye each create a visual datastream of a scene); 
the machine learning model is further configured to: determine, based on the first data stream, a first classification of the activity within the scene, and determine, based on the second data stream, a second classification of the activity within the scene (the left eye and the right eye each create a visual datastream of a scene with visual perspectives unique to each eye, which the mind can recognize as unique to each eye) and 
the generating the fused data comprises combining the first classification and the second classification using a weighting determined based on the first data stream, the second data stream, and the activity within the scene (the mind can collectively analyze the visual information captured from the left eye and the right eye to determine a scene).  

Claim 4 recites the system of claim 1 (as described above), wherein: the plurality of data streams comprises a first data stream and a second data stream (the left eye and the right eye each create a visual datastream of a scene); and 
the generating the fused data (the mind can collectively analyze the visual information captured from the left eye and the right eye) comprises: 
determining, based on the first data stream and the second data stream, a global classification of the activity within the scene (the mind can collectively analyze the visual information captured from the left eye and the right eye to determine a scene), 
determining, based on the first data stream and the global classification, a first classification of the activity within the scene (the mind identify visual information captured from the left eye representing a scene), 
determining, based on the second data stream and the global classification, a second classification of the activity within the scene (the mind identify visual information captured from the right eye representing a scene), and
 combining the first classification, the second classification, and the global classification using a weighting determined based on the first data stream, the second data stream, and the activity within the scene (the mind can collectively analyze the visual information captured from the left eye and the right eye to determine a scene and put more emphasis on information from one eye depending on activity viewed from each eye).  

Claim 5 recites the system of claim 4 (as described above), wherein the determining the global classification comprises combining, for points in time, respective temporally aligned data from the first data stream and the second data stream corresponding to the points in time using a weighting determined based on the first data stream, the second data stream, and the activity within the scene (the mind can collectively analyze the visual information captured from the left eye and the right eye to determine a scene and put more emphasis on information from one eye depending on activity viewed from each eye).  

Claim 6 recites the system of claim 4 (as described above), wherein the determining the global classification (the mind can collectively analyze the visual information captured from the left eye and the right eye to determine a scene) comprises: 
extracting first features from the data of the first data stream (the mind identify visual information captured from the left eye representing a scene); 
extracting second features from the data of the second data stream (the mind identify visual information captured from the right eye representing a scene); and 
combining the first features and the second features using a weighting determined based on the first data stream, the second data stream, and the activity within the scene (the mind can collectively analyze the visual information captured from the left eye and the right eye to determine a scene and put more emphasis on information from one eye depending on activity viewed from each eye).  

Claim 7 recites the system of claim 1 (as described above), wherein the determining the activity within the scene is performed during the activity within the scene (the mind processes visual information captured in real-time as it occurs).  

Claim 8 recites the system of claim 1 (as described above), wherein the plurality of data streams further comprises a data stream representing data captured by a non-imaging sensor (the mind processes sensory information other than visual information, such as hearing and noise).  

Claim 9 recites the system of claim 1 (as described above), wherein the viewpoint agnostic model is agnostic to a number of the plurality of sensors (the mind indiscriminately processes visual information captured from each eye). 

Claim 10 recites the system of claim 1 (as described above), wherein the viewpoint agnostic model is agnostic to positions of the plurality of sensors (the mind indiscriminately processes visual information captured from each eye). 

Claims 11, 13-18 recite a method (the mind processes information) with claim limitations identical to claims 1-8 (as described above).
	
Claim 19 recite a non-transitory computer-readable medium storing instructions executable by a processor (the mind can store and process information, including broadly claimed processing steps (instructions) to view and interpret information in a field of view) with claim limitations identical to claims 1-2 (as described above).

The limitations of capturing image data of a medical procedure with multiple sensors and aligning the images with a machine learning model are processes that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, regarding the method, other than reciting generic placeholder-related computer components, such as a memory, processor or instructions with a generically recited machine learning model, nothing in the claim elements precludes the steps from practically being performed in the mind. For example in claim 1, language to “access” is a broad means of acquiring the image data; “align” is to align two images of a scene; and “determine” is to identify activity in the aligned image data. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, these claims each recite an abstract idea.

This judicial exception is not integrated into a practical application. In particular, the method claims do not recite any elements which could not be performed in the mind and the claims only recite generic placeholder-related computer components, including a memory, processor or instructions with a generically recited machine learning model. The computer components are recited at a high-level of generality (i.e., generic machine learning for performing a general function of detecting activity in a scene based on aligned images) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, the computer components do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Therefore, the aforementioned claims are directed to abstract ideas. 

The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a generic placeholder-related computer components, the memory, processor or instructions with a generically recited machine learning model, used to align images and detect activity in the scene amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an invention concept. The claims are not patent eligible.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 7-11, 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al (US 2021/0158937) in view of Jarc et al (US 2019/0090969, cited in Final Rejection – 10/28/2025).
Regarding Claim 1, Wu et al teach a system (sensing device 300 (sensing device 104) of system 100; Fig 1, 3 and ¶ [0045]) comprising: 
a memory storing instructions (functional unit 304 of sensor 302 includes memory 316; Fig 3 and ¶ [0050]); and 
a processor communicatively coupled to the memory and configured to execute the instructions (processing unit 310 configured to analyze image data (execute instructions), which is facilitated by computation unit 312; Fig 3 and ¶ [0052]-[0053]) to: 
access a plurality of data streams representing imagery of a scene including a medical setting of a medical session captured by a plurality of sensors from a plurality of viewpoints (a plurality of camera image sensing devices 104 capture data streams of images representing a medical scene during a given procedure with a patient; Fig 1-3 and ¶ [0029], [0043]-[0045], [0055]), the plurality of sensors including a dynamic sensor capturing the imagery including the medical setting from a dynamic viewpoint that changes during the medical session (the sensor 302 (based on the functional unit) may change the operation of the sensor, such as the FOV (thereby dynamic) by instruction to manipulate the direction or orientation of the sensor while imaging the scene; Fig 1-3 and ¶ [0029], [0048], [0060]); 
temporally align the plurality of data streams (the multiple images are aligned for the patient by the processing device; Fig 1-3 and ¶ [0031]-[0032]); 
determine, based on fused data generated by using a viewpoint agnostic machine learning model and based on the plurality of data streams , an activity within the scene (a deep neural network (machine learning) is used for pattern matching decisions based on the image data; Fig 1-3 and ¶ [0023], [0037], [0053]), the activity comprising a predefined phase of the medical session (the neural network is used for visual feature/pattern recognition for constructing the patient shape associated with the given scene in the medical environment for the given services; Fig 1-3 and ¶ [0023], [0037], [0053]); and 
automating, based on the determining the activity, a task associated with the medical session (based on the patient determination, personalized healthcare services may be determined and implemented by the communicably coupled medical instruments (102) to adjust the given medical device to accommodate the patient through automation (¶ [0041], [0044]); Fig 1-3 and ¶ [0038]-[0039]). 

Wu et al does not explicitly teach to temporally align the plurality of data streams. 

Jarc et al is analogous art pertinent to the technological problem addressed in the current application and teaches to temporally align the plurality of data streams (the video data is matched (stereo imaging) through temporal synchronization to identify the surgery from multiple perspectives; ¶ [0072], [0116]).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the current application to combine the teachings of Wu et al with Jarc et al including to temporally align the plurality of data streams. By matching the video segments from multiple perspectives, the virtual surgical environment is modeled with the given temporal and spatial metrics to allow for surgical stage-specific procedural tasks, which may then be used in the training and development of surgical performance, as recognized by Jarc et al (¶ [0012]). 


Regarding Claim 7, Wu et al in view of Jarc et al teach the system of claim 1 (as described above), wherein the determining the activity within the scene is performed during the activity within the scene (Wu et al, the images are taken and analyzed in real time during the medical procedure; ¶ [0032]).  

Regarding Claim 8, Wu et al in view of Jarc et al teach the system of claim 1 (as described above), wherein the plurality of data streams further comprises a data stream representing data captured by a non-imaging sensor (Wu et al, sensor 104 data may include non-image (camera) sensor data, such as motion sensor or radar; ¶ [0018]). 

Regarding Claim 9, Wu et al in view of Jarc et al teach the system of claim 1 (as described above), wherein the viewpoint agnostic model is agnostic to a number of the plurality of sensors (Wu et al, the deep learning model utilizes the plurality of data stream data to model the patient and the scene; ¶ [0029], [0052]-[0053]).  

Regarding Claim 10, Wu et al in view of Jarc et al teach the system of claim 1 (as described above), wherein the viewpoint agnostic model is agnostic to positions of the plurality of sensors (Wu et al, the sensor data selected is to analyze a plurality of features representative of the patient and environment with the neural networks designed to detect set of keypoints representing the visual feature or pattern (thereby agnostic to position of sensor); ¶ [0052]-[0054]).

Regarding Claim 11, Wu et al teach a method (method of using the sensing device 300 (104) of the medical system 100; Fig 1-5 and ¶ [0066]-[0072]) comprising: elements identical to claim 1 (as described above).
Regarding Claim 17, Wu et al in view of Jarc et al teach the method of claim 11 (as described above), further teaching elements identical to claim 7 (as described above).

Regarding Claim 18, Wu et al in view of Jarc et al teach the method of claim 11 (as described above), further teaching elements identical to claim 8 (as described above).

Regarding Claim 19, Wu et al teach a non-transitory computer-readable medium storing instructions executable by a processor (sensing device 300 (sensing device 104) of system 100 includes functional unit 304 with memory 316 storing instructions and processing unit 310 configured to analyze image data (execute instructions), which is facilitated by computation unit 312; Fig 1, 3 and ¶ [0045], [0050]-[0053]) to: perform elements identical to claim 1 (as described above).

Claims 3-6, 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al (US 2021/0158937) in view of Jarc et al (US 2019/0090969) and Wells et al (US 2015/0294143, cited in Final Rejection – 10/28/2025)).
Regarding Claim 3, Wu et al in view of Jarc et al teach the system of claim 1 (as described above), including the plurality of data streams comprises a first data stream and a second data stream (Wu et al, the plural sensors 302 each generate image data streams for analysis and classification by the deep neural network; ¶ [0051]-[0053]);
the plurality of data streams comprises a first data stream and a second data stream (Jarc et al, one or more imaging sensors 528 are located on instrument carriage 530, connected to manipulator arm 512 that allows for surgical instrument motion, and collectively captures stereoscopic video images of the surgical site (captured of the surgical suite) and internal of the patient’s body cavity, captured during a surgical procedure; Fig 5B, 7, 8 and ¶ [0058], [0061], [0065]-[0067]); 
the machine learning model is further configured to: determine, based on the first data stream, a first classification of the activity within the scene (Jarc et al, the task assessor 1108 may perform one or more classification operations (using a neural network) for tasks (activities within the segmented surgical procedure) and can be based on a first instrument (thereby first stream and first classification) and used for assessment criteria 1406; ¶ [0094]-[0095], [0117]), and determine, based on the second data stream, a second classification of the activity within the scene  (Jarc et al, the task assessor 1108 may perform one or more classification operations (using a neural network) for tasks (activities within the segmented surgical procedure) and can be based on a first instrument (thereby first stream and first classification) and used for assessment criteria 1406; ¶ [0094]-[0095], [0117]).

It would have been obvious to one of ordinary skill in the art to combine the teachings of Wu et al with Jarc et al including  the plurality of data streams comprises a first data stream and a second data stream; the machine learning model is further configured to: determine, based on the first data stream, a first classification of the activity within the scene, and determine, based on the second data stream, a second classification of the activity within the scene. By matching the video segments from multiple perspectives, the virtual surgical environment is modeled with the given temporal and spatial metrics to allow for identification and classification of stage-specific surgical procedural tasks, which may then be used in the training and development of surgical performance, as recognized by Jarc et al (¶ [0012]).

 Wu in view of Jarc et al does not explicitly teach generating the fused data comprises combining the first classification and the second classification using a weighting determined based on the first data stream, the second data stream, and the activity within the scene. 

Wells et al is analogous art pertinent to the technological problem addressed in this application and teaches generating the fused data comprises combining the first classification and the second classification using a weighting determined based on the first data stream, the second data stream, and the activity within the scene (a mapping module 50 is used to map the images 80, 82,84 to a synergy map with a common plane and axis, where the synergy map 88 identifies the axis of the object body in each image, which is then weighted based on an integrity score to determine a common plane and generate the synergy map 88, including considerations of the activity of the object; Fig 6 and ¶ [0062]-[0065]).

It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Jarc et al with Wells et al including generating the fused data comprises combining the first classification and the second classification using a weighting determined based on the first data stream, the second data stream, and the activity within the scene. By using a weight to each of the identified objects in each of the images when mapping the object to a common plane, the mapped object may be improved by considering the clarity of the original image, as recognized by Wells et al (¶ [0022], [0064]-[0065]).

Regarding Claim 4, Jarc et al teach the system of claim 1 (as described above), wherein: the plurality of data streams comprises a first data stream and a second data stream  (Wu et al, the plural sensors 302 each generate image data streams for analysis and classification by the deep neural network; ¶ [0051]-[0053]).

Wu et al in view of Jarc et all does not explicitly teach determining, based on the first data stream and the second data stream, a global classification of the activity within the scene, determining, based on the first data stream and the global classification, a first classification of the activity within the scene, determining, based on the second data stream and the global classification, a second classification of the activity within the scene, and combining the first classification, the second classification, and the global classification using a weighting determined based on the first data stream, the second data stream, and the activity within the scene. 

Wells et al is analogous art pertinent to the technological problem addressed in this application and teaches determining, based on the first data stream and the second data stream, a global classification of the activity within the scene (module 122 takes each video feed and detects moving humans (global classification of detecting moving humans); Fig 4, 8 and ¶ [0081]),
determining, based on the first data stream and the global classification, a first classification of the activity within the scene (classifiers are used to determine actions, including a first action, such as human standing, which may be based on the first image; Fig 4, 8 and ¶ [0081]-[0082]), 
determining, based on the second data stream and the global classification, a second classification of the activity within the scene (classifiers are used to determine actions, including a second action, such as human squatting, which may be based on the second image; Fig 4, 8 and ¶ [0081]-[0082]), and 
combining the first classification, the second classification, and the global classification using a weighting determined based on the first data stream, the second data stream, and the activity within the scene (projections of the position of the person from the different viewpoints are measured with a weighted integrity score to determine the detected human form and used with different weighted coefficients to data types for detection accuracy; ¶ [0064]-[0065], [0070]). 

It would have been obvious to one of ordinary skill in the art before the effective filing date of this application to combine the teachings of Wu et al in view of Jarc et al with Wells et al including generating fused data comprises: determining, based on the first data stream and the second data stream, a global classification of the activity within the scene, determining, based on the first data stream and the global classification, a first classification of the activity within the scene, determining, based on the second data stream and the global classification, a second classification of the activity within the scene, and combining the first classification, the second classification, and the global classification using a weighting determined based on the first data stream, the second data stream, and the activity within the scene. By using a weight to each of the identified objects in each of the images when mapping the object to a common plane, the mapped object may be improved by considering the clarity of the original image, as recognized by Wells et al (¶ [0022], [0064]-[0065]).

Regarding Claim 5, Wu et al in view of Jarc et al and Wells et al teach the system of claim 4 (as described above), wherein the determining the global classification comprises combining, for points in time, respective temporally aligned data from the first data stream and the second data stream corresponding to the points in time using a weighting determined based on the first data stream, the second data stream, and the activity within the scene (Wells et al, a mapping module 50 is used to map the images 80, 82, 84 to a synergy map with a common plane and axis, where the synergy map 88 identifies the axis of the object body in each image, which is then weighted based on an integrity score to determine a common plane and generate the synergy map 88, including considerations of the activity of the object, which includes a temporal classifier to determine action and analyze the activity in the scene; Fig 6 and  ¶ [0062]-[0065], [0070], [0081]-[0082]).  

Regarding Claim 6, Wu et al in view of Jarc et al and Wells et al teach the system of claim 4 (as described above), wherein the determining the global classification (Wells et al, module 122 takes each video feed and detects moving humans (global classification of detecting moving humans); Fig 4, 8 and ¶ [0081]) comprises: extracting first features from the data of the first data stream  (Wells et al, classifiers are used to determine actions, including a first action, such as human standing, which may be based on the first image and extracting detected features; Fig 4, 8 and ¶ [0081]-[0084]); extracting second features from the data of the second data stream (Wells et al, classifiers are used to determine actions, including a second action, such as human squatting, which may be based on the second image and extracting detected features; Fig 4, 8 and ¶ [0081]-[0084]); and combining the first features and the second features using a weighting determined based on the first data stream, the second data stream, and the activity within the scene (Wells et al, projections of the position of the person from the different viewpoints (based on the detected features) are measured with a weighted integrity score to determine the detected human form and used with different weighted coefficients to data types for detection accuracy; ¶ [0064]-[0065], [0070], [0081]-[0084]). 

Regarding Claim 13, Wu et al in view of Jarc et al teach the method of claim 11 (as described above), further teaching elements identical to claim 3 (as described above).

Regarding Claim 14, Wu et al in view of Jarc et al teach the method of claim 11 (as described above), further teaching elements identical to claim 4 (as described above).

Regarding Claim 15, Wu et al in view of Jarc et al and Wells et al teach the method of claim 14 (as described above), further teaching elements identical to claim 5 (as described above).

Regarding Claim 16, Wu et al in view of Jarc et al and Wells et al teach the method of claim 14 (as described above), further teaching elements identical to claim 6 (as described above).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Tu et al (US 2014/0071287, cited in Final Rejection – 10/28/2025) teach the use of multiple cameras to capture images of a scene from multiple perspectives and use the plurality of images to determine analytics of the scene.	
Naikal et al (US 2014/0333775, cited in Final Rejection – 10/28/2025) teach a method and system to identify an object and event in a scene based on a plurality of image data from multiple cameras which is fused to create the scene without occlusions.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATHLEEN M BROUGHTON whose telephone number is (571)270-7380. The examiner can normally be reached Monday-Friday 8:00-5:00.

Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Villecco can be reached at (571) 272-7319. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KATHLEEN M BROUGHTON/Primary Examiner, Art Unit 2661
Read full office action
Prosecution Timeline

May 09, 2023
Application Filed
Jun 21, 2025
Non-Final Rejection — §101, §103
Sep 18, 2025
Examiner Interview Summary
Sep 18, 2025
Applicant Interview (Telephonic)
Sep 25, 2025
Response Filed
Oct 26, 2025
Final Rejection — §101, §103
Dec 11, 2025
Applicant Interview (Telephonic)
Dec 11, 2025
Examiner Interview Summary
Jan 28, 2026
Request for Continued Examination
Feb 09, 2026
Response after Non-Final Action
Feb 20, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/313,287
Patent 12602915
FEATURE FUSION FOR NEAR FIELD AND FAR FIELD IMAGES FOR VEHICLE APPLICATIONS
2y 5m to grant Granted Apr 14, 2026
18/183,030
Patent 12597233
SYSTEM AND METHOD FOR TRAINING A MACHINE LEARNING MODEL
2y 5m to grant Granted Apr 07, 2026
18/219,943
Patent 12586203
IMAGE CUTTING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
18/274,217
Patent 12567227
METHOD AND SYSTEM FOR UNSUPERVISED DEEP REPRESENTATION LEARNING BASED ON IMAGE TRANSLATION
2y 5m to grant Granted Mar 03, 2026
18/309,150
Patent 12565240
METHOD AND SYSTEM FOR GRAPH NEURAL NETWORK BASED PEDESTRIAN ACTION PREDICTION IN AUTONOMOUS DRIVING SYSTEMS
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
92%
With Interview (+8.3%)
2y 7m
Median Time to Grant
High
PTA Risk
Based on 263 resolved cases by this examiner. Grant probability derived from career allow rate.