Last updated: April 19, 2026
Application No. 18/592,922
SYSTEMS AND METHODS FOR KEYPOINT DETECTION AND TRACKING-BY-PREDICTION FOR MULTIPLE SURGICAL INSTRUMENTS

Final Rejection §102§103§112
Filed
Mar 01, 2024
Examiner
MALDONADO, STEVEN
Art Unit
3797
Tech Center
3700 — Mechanical Engineering & Manufacturing
Assignee
Intuitive Surgical Operations, Inc.
OA Round
2 (Final)
Interview Optional

— +54.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 20 resolved cases, 2023–2026
Examiner Intelligence

MALDONADO, STEVEN View full profile →
Grants only 30% of cases
Career Allow Rate
6 granted / 20 resolved
-40.0% vs TC avg
Strong +54% interview lift
Without
With
+54.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
51 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
8.4%
-31.6% vs TC avg
§103
49.1%
+9.1% vs TC avg
§102
15.9%
-24.1% vs TC avg
§112
25.8%
-14.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 20 resolved cases
Office Action

§102 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
Claims 5, 17, and 25 each recites the limitation of “if a determination is made that the surgical instrument will exceed the range of motion” which in an interpretation it may be construed as a conditional limitation where the conditional limitations may not be given a full weight in light of the below decisions as for considering the other case scenario of “if” not being advanced… which the claim would not require this limitation to be a positive recitation.
Claims 6, 18, and 26 each recites the limitation of “ if a determination is made that the surgical instrument will collide with the second surgical instrument” which in an interpretation it may be construed as a conditional limitation where the conditional limitations may not be given a full weight in light of the below decisions as for considering the other case scenario of “if” not being advanced… which the claim would not require this limitation to be a positive recitation.
In the recent Ex parte Gopalan decision, the PTAB addressed a claim where all of the features were recited in a conditional manner. A first step of “identifying … an outlier” was performed if “traffic is outside of a prediction interval.” A second step of “identifying” was performed “only when a count of outliers … is greater than or equal to two, and exceeds an anomaly threshold.” These were the only two elements of the independent claim. Thus, if the traffic is never outside Gopalan’s prediction interval, then the steps of the method are never performed.
However, the PTAB distinguished Schulhauser and noted that this construction “would render the entire claim meaningless.” Gopalan at p. 5. The Board went on to state, “Although each of these steps is conditional, they are integrated into one method or path and do not cause the claim to diverge into two methods or paths, as in Schulhauser. Thus, we conclude that the broadest reasonable interpretation of claim 1 requires the performance of both steps…” Id. at p. 6.”

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-7, 9-13, 16-19, 21-22, and 25-26 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

	Claims 1, 13 & 21 recites the limitation “based on the second output image, receiving instructions to move the surgical instrument” which renders the claim unclear. It is unclear what structure is receiving the instructions.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claims 1-5, 7, 9, 12-13, 16-17, 19, 21-22, and 25 are rejected under 35 U.S.C. 102(a)(2) as being anticipated over Muntion et al (US20240206989A1; hereinafter referred to as Muntion).

	Regarding Claim 1, Muntion discloses a method for detecting a location of a plurality of keypoints of a surgical instrument (“The instrument network 401 can be trained to identify groups of keypoints associated with surgical instruments.” [0078]), the method comprising: 
receiving, at a first neural network model, a video input including a plurality of video frame images of a surgical procedure (“At block 202, the system 100 can access input data including, for example, video data, spatial data, and/or sensor data temporally associated with a video stream of a surgical procedure. At block 204, the one or more machine-learning models 702 can predict a state of the surgical procedure based on the input data.” [0068]); 
generating, using the first neural network model, a first output image including a first output location of each keypoint of the plurality of keypoints annotated on a first output image of the surgical instrument (“At block 206, the one or more machine-learning models 702 can detect one or more surgical instruments at least partially depicted in the video stream based on the input data, when such features are present. Detection of surgical instruments can include determining a presence or localization of one or more surgical instruments. The localization can include, for example, a bounding box, a medial axis, and/or any other marker or keypoint identifying the location of one or more surgical instruments… At block 208, a state indicator and/or one or more surgical instrument indicators temporally correlated with the video stream can be output.” [0068]); 
receiving, at a second neural network model, the first output image generated by the first neural network mode (“At block 208, a state indicator and/or one or more surgical instrument indicators temporally correlated with the video stream can be output.“, “At block 210, based on the state as predicted and tracked instrument information of the one or more surgical instruments, the system 100 can output a notification, a request, a visualization, and/or a scheduling update depending on how the state and tracked instrument information are used…The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069], “FIG. 12 depicts an example of a process 1200 of using machine learning to generate motion profiles 1102 of surgical instruments 1008 according to one or more examples.” [0128], Block 208 outputs the state indicator from the first machine learning model [0068] which is used in Block 210 by another machine learning model to create a motion profile and visualize it to the user [0069]); 
receiving, at the second neural network model, historic keypoint trajectory data including a historic trajectory for each keypoint of the plurality of keypoints (“ The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion.” [0069], “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097]); 
determining, using the second neural network model, a trajectory for each keypoint of the plurality of keypoints (“The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069],” “Keypoints” of surgical instruments can be defined within a bounded region between a starting keypoint and an end, such as a tip of surgical instrument. Keypoints may also represent pivot points, such as joints, shaft starting and end points, and other such features depending upon the physical structure and function of each surgical instrument.” [0086] “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097]); 
wherein the trajectory for each keypoint is based on the first output image generated by the first neural network model and the historic keypoint trajectory data for each keypoint (“Temporal information that is provided by state information can be used to refine confidence of the instrument prediction or anatomy prediction in one or more aspects. In one or more aspects, the temporal information can be fused with a feature space, and the resulting fused information can be used by a decoder to output instrument localization and/or anatomical localization, for example. Other visual or temporal cues may also be fused“ [0081], “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097]);
generating, using the second neural network model, a second output image including a second output location of each keypoint of the plurality of keypoints annotated on a second output image of the surgical instrument (“The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069] , “The location of structure(s) predicted by the one or more machine-learning models 702 can also be predicted in the frames between two successive input windows 320 in the same manner. Graphical overlays that are used to overlay the images 302 to represent predicted features (e.g., surgical instruments, anatomical structures, etc.) are accordingly adjusted, if required, based on the predicted locations.” [0098]);
and based on the second output image, receiving instructions to move the surgical instrument (“the output generator is configured to provide user feedback based on detecting that at least one of the one or more surgical instruments has veered off of a predetermined path by more than a predetermined threshold.” [0015]).
	
	Regarding Claim 2, Muntion discloses the second output location of each keypoint of the plurality of keypoints identifies a location of a corresponding landmark of the surgical instrument ((“The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069] “’Keypoints’ of surgical instruments can be defined within a bounded region between a starting keypoint and an end, such as a tip of surgical instrument. Keypoints may also represent pivot points, such as joints, shaft starting and end points, and other such features depending upon the physical structure and function of each surgical instrument.” [0086], “The location of structure(s) predicted by the one or more machine-learning models 702 can also be predicted in the frames between two successive input windows 320 in the same manner. Graphical overlays that are used to overlay the images 302 to represent predicted features (e.g., surgical instruments, anatomical structures, etc.) are accordingly adjusted, if required, based on the predicted locations.” [0098]).

Regarding Claim 3, Muntion discloses generating the second output image includes generating, using the second neural network model, a second output image corresponding to each video frame image of the plurality of video frame images received by the first neural network model (“In yet more examples, the motion profiles 1102 can be used to provide real time guidance during surgery.” [0133]).

Regarding Claim 4, Muntion discloses further comprising evaluating a performance of the surgical procedure based on the second output images corresponding to each video frame image of the plurality of video frame images (“In yet more examples, the motion profiles 1102 can be used to provide real time guidance during surgery. For example, if the motion of the surgical instrument 1008 is beyond one or more thresholds, a warning can be provided to the operator. For example, a predetermined path of the surgical instrument 1008 can be configured for a surgical procedure. Alternatively, or in addition, the path for a surgical instrument 1008 can be configured per phase. If, during the surgical procedure, the surgical instrument 1008 is detected to veer off of the predetermined path by more than a predetermined threshold, a user feedback can be provided.“ [0133]).

Regarding Claim 5, Muntion discloses further comprising: determining whether the surgical instrument will exceed a range of motion of the surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will exceed the range of motion, generating a warning indicating that the surgical instrument will exceed the range of motion(“In yet more examples, the motion profiles 1102 can be used to provide real time guidance during surgery. For example, if the motion of the surgical instrument 1008 is beyond one or more thresholds, a warning can be provided to the operator. For example, a predetermined path of the surgical instrument 1008 can be configured for a surgical procedure. Alternatively, or in addition, the path for a surgical instrument 1008 can be configured per phase. If, during the surgical procedure, the surgical instrument 1008 is detected to veer off of the predetermined path by more than a predetermined threshold, a user feedback can be provided.“ [0133]).

Regarding Claim 7, Muntion discloses determining the trajectory for each keypoint of the plurality of keypoints includes matching, using the second neural network model, the first output location for each keypoint of the plurality of keypoints with a corresponding historic trajectory of the historic keypoint trajectory data (“ The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. When observed in real-time, motion profiles can guide usage of surgical instruments during a surgical procedure. When used during post-operative analysis, motion profiles track whether a surgical instrument followed or deviated from a predetermined path.” [0069], “The one or more machine-learning models 702 of FIG. 7 can operate on the surgical data per frame, but can use information from a previous frame, or a window of previous frames.” [0070]).

Regarding Claim 9, Muntion discloses generating the first output image includes: generating, using the first neural network model, a heatmap indicating an estimated location of the plurality of keypoints in each video frame image of the plurality of video frame images; and generating the first output image based on the estimated location of the plurality of keypoints (“The one or more machine-learning models can then be used in real-time to process one or more data streams (e.g., video streams, audio streams, RFID data, etc.). The processing can include predicting and characterizing one or more surgical states, instruments, and/or other structures within various instantaneous or block time periods…The localization can be represented as coordinates in images that map to pixels depicting the surgical instrument(s) in the images. Localization of other structures, such as anatomical structures, can be used to provide locations, e.g., coordinates, heatmaps, bounding boxes, boundaries, masks, etc., of one or more anatomical structures identified and distinguish between other structures, such as surgical instruments.” [0047]).

Regarding Claim 12, Muntion discloses further comprising displaying the second output image on a display system (“the motion profile can be displayed as an overlay on the video stream.” [0012]).

Regarding Claim 13, Muntion discloses a system for detecting a location of a plurality of keypoints of a surgical instrument (“The instrument network 401 can be trained to identify groups of keypoints associated with surgical instruments.” [0078]), the system comprising: 
a memory configured to store a first neural network model and a second neural network model; and a processor coupled to the memory (“a computer program product includes a memory device having computer executable instructions stored thereon, which when executed by one or more processors cause the one or more processors to perform a method.” [0018]), the processor configured to:
receive, at a first neural network model, a video input including a plurality of video frame images of a surgical procedure (“At block 202, the system 100 can access input data including, for example, video data, spatial data, and/or sensor data temporally associated with a video stream of a surgical procedure. At block 204, the one or more machine-learning models 702 can predict a state of the surgical procedure based on the input data.” [0068]); 
generate, using the first neural network model, a first output image including a first output location of each keypoint of the plurality of keypoints annotated on a first output image of the surgical instrument (“At block 206, the one or more machine-learning models 702 can detect one or more surgical instruments at least partially depicted in the video stream based on the input data, when such features are present. Detection of surgical instruments can include determining a presence or localization of one or more surgical instruments. The localization can include, for example, a bounding box, a medial axis, and/or any other marker or keypoint identifying the location of one or more surgical instruments… At block 208, a state indicator and/or one or more surgical instrument indicators temporally correlated with the video stream can be output.” [0068]); 
receive, at a second neural network model, the first output image generated by the first neural network mode (“At block 210, based on the state as predicted and tracked instrument information of the one or more surgical instruments, the system 100 can output a notification, a request, a visualization, and/or a scheduling update depending on how the state and tracked instrument information are used…The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069], “FIG. 12 depicts an example of a process 1200 of using machine learning to generate motion profiles 1102 of surgical instruments 1008 according to one or more examples.” [0128]); 
receive, at the second neural network model, historic keypoint trajectory data including a historic trajectory for each keypoint of the plurality of keypoints (“ The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion.” [0069], “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097]); 
determine, using the second neural network model, a trajectory for each keypoint of the plurality of keypoints (“The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069],” “Keypoints” of surgical instruments can be defined within a bounded region between a starting keypoint and an end, such as a tip of surgical instrument. Keypoints may also represent pivot points, such as joints, shaft starting and end points, and other such features depending upon the physical structure and function of each surgical instrument.” [0086] “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097]); 
wherein the trajectory for each keypoint is based on the first output image generated by the first neural network model and the historic keypoint trajectory data for each keypoint (“Temporal information that is provided by state information can be used to refine confidence of the instrument prediction or anatomy prediction in one or more aspects. In one or more aspects, the temporal information can be fused with a feature space, and the resulting fused information can be used by a decoder to output instrument localization and/or anatomical localization, for example. Other visual or temporal cues may also be fused“ [0081], “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097]);
generate, using the second neural network model, a second output image including a second output location of each keypoint of the plurality of keypoints annotated on a second output image of the surgical instrument (“The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069] , “The location of structure(s) predicted by the one or more machine-learning models 702 can also be predicted in the frames between two successive input windows 320 in the same manner. Graphical overlays that are used to overlay the images 302 to represent predicted features (e.g., surgical instruments, anatomical structures, etc.) are accordingly adjusted, if required, based on the predicted locations.” [0098]);
and based on the second output image, receive instructions to move the surgical instrument (“the output generator is configured to provide user feedback based on detecting that at least one of the one or more surgical instruments has veered off of a predetermined path by more than a predetermined threshold.” [0015]).
	

Regarding Claim 16, Muntion discloses that the processor is further configured to: evaluating a performance of the surgical procedure based on the second output image corresponding to each video frame image of the plurality of video frame images (“In yet more examples, the motion profiles 1102 can be used to provide real time guidance during surgery. For example, if the motion of the surgical instrument 1008 is beyond one or more thresholds, a warning can be provided to the operator. For example, a predetermined path of the surgical instrument 1008 can be configured for a surgical procedure. Alternatively, or in addition, the path for a surgical instrument 1008 can be configured per phase. If, during the surgical procedure, the surgical instrument 1008 is detected to veer off of the predetermined path by more than a predetermined threshold, a user feedback can be provided.“ [0133]).

Regarding Claim 17, Muntion discloses that the processor is further configured to: determining whether the surgical instrument will exceed a range of motion of the surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will exceed the range of motion, generating a warning indicating that the surgical instrument will exceed the range of motion(“In yet more examples, the motion profiles 1102 can be used to provide real time guidance during surgery. For example, if the motion of the surgical instrument 1008 is beyond one or more thresholds, a warning can be provided to the operator. For example, a predetermined path of the surgical instrument 1008 can be configured for a surgical procedure. Alternatively, or in addition, the path for a surgical instrument 1008 can be configured per phase. If, during the surgical procedure, the surgical instrument 1008 is detected to veer off of the predetermined path by more than a predetermined threshold, a user feedback can be provided.“ [0133]).

Regarding Claim 19, Muntion discloses determining the trajectory for each keypoint of the plurality of keypoints includes matching, using the second neural network model, the first output location for each keypoint of the plurality of keypoints with a corresponding historic trajectory of the historic keypoint trajectory data (“ The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. When observed in real-time, motion profiles can guide usage of surgical instruments during a surgical procedure. When used during post-operative analysis, motion profiles track whether a surgical instrument followed or deviated from a predetermined path.” [0069], “The one or more machine-learning models 702 of FIG. 7 can operate on the surgical data per frame, but can use information from a previous frame, or a window of previous frames.” [0070]).

Regarding Claim 21, Muntion discloses a non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations that detect a location of a plurality of keypoints of a surgical instrument (“The instrument network 401 can be trained to identify groups of keypoints associated with surgical instruments.” [0078], “a computer program product includes a memory device having computer executable instructions stored thereon, which when executed by one or more processors cause the one or more processors to perform a method.” [0018]), the operations comprising:
receiving, at a first neural network model, a video input including a plurality of video frame images of a surgical procedure (“At block 202, the system 100 can access input data including, for example, video data, spatial data, and/or sensor data temporally associated with a video stream of a surgical procedure. At block 204, the one or more machine-learning models 702 can predict a state of the surgical procedure based on the input data.” [0068]); 
generating, using the first neural network model, a first output image including a first output location of each keypoint of the plurality of keypoints annotated on a first output image of the surgical instrument (“At block 206, the one or more machine-learning models 702 can detect one or more surgical instruments at least partially depicted in the video stream based on the input data, when such features are present. Detection of surgical instruments can include determining a presence or localization of one or more surgical instruments. The localization can include, for example, a bounding box, a medial axis, and/or any other marker or keypoint identifying the location of one or more surgical instruments… At block 208, a state indicator and/or one or more surgical instrument indicators temporally correlated with the video stream can be output.” [0068]); 
receiving, at a second neural network model, the first output image generated by the first neural network mode (“At block 210, based on the state as predicted and tracked instrument information of the one or more surgical instruments, the system 100 can output a notification, a request, a visualization, and/or a scheduling update depending on how the state and tracked instrument information are used…The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069], “FIG. 12 depicts an example of a process 1200 of using machine learning to generate motion profiles 1102 of surgical instruments 1008 according to one or more examples.” [0128]); 
receiving, at the second neural network model, historic keypoint trajectory data including a historic trajectory for each keypoint of the plurality of keypoints (“ The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion.” [0069], “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097]); 
determining, using the second neural network model, a trajectory for each keypoint of the plurality of keypoints (“The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069],” “Keypoints” of surgical instruments can be defined within a bounded region between a starting keypoint and an end, such as a tip of surgical instrument. Keypoints may also represent pivot points, such as joints, shaft starting and end points, and other such features depending upon the physical structure and function of each surgical instrument.” [0086] “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097]); 
wherein the trajectory for each keypoint is based on the first output image generated by the first neural network model and the historic keypoint trajectory data for each keypoint (“Temporal information that is provided by state information can be used to refine confidence of the instrument prediction or anatomy prediction in one or more aspects. In one or more aspects, the temporal information can be fused with a feature space, and the resulting fused information can be used by a decoder to output instrument localization and/or anatomical localization, for example. Other visual or temporal cues may also be fused“ [0081], “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097]);
generating, using the second neural network model, a second output image including a second output location of each keypoint of the plurality of keypoints annotated on a second output image of the surgical instrument (“The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069] , “The location of structure(s) predicted by the one or more machine-learning models 702 can also be predicted in the frames between two successive input windows 320 in the same manner. Graphical overlays that are used to overlay the images 302 to represent predicted features (e.g., surgical instruments, anatomical structures, etc.) are accordingly adjusted, if required, based on the predicted locations.” [0098]);
and based on the second output image, receiving instructions to move the surgical instrument (“the output generator is configured to provide user feedback based on detecting that at least one of the one or more surgical instruments has veered off of a predetermined path by more than a predetermined threshold.” [0015]).
	
	Regarding Claim 22, Muntion discloses the second output location of each keypoint of the plurality of keypoints identifies a location of a corresponding landmark of the surgical instrument ((“The visualization can include displaying one or more motion profiles of surgical instruments that can include tracking observed motion and/or a predicted path of motion. ” [0069] “’Keypoints’ of surgical instruments can be defined within a bounded region between a starting keypoint and an end, such as a tip of surgical instrument. Keypoints may also represent pivot points, such as joints, shaft starting and end points, and other such features depending upon the physical structure and function of each surgical instrument.” [0086], “The location of structure(s) predicted by the one or more machine-learning models 702 can also be predicted in the frames between two successive input windows 320 in the same manner. Graphical overlays that are used to overlay the images 302 to represent predicted features (e.g., surgical instruments, anatomical structures, etc.) are accordingly adjusted, if required, based on the predicted locations.” [0098]).

Regarding Claim 25, Muntion discloses that the processor is further configured to: determining whether the surgical instrument will exceed a range of motion of the surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will exceed the range of motion, generating a warning indicating that the surgical instrument will exceed the range of motion(“In yet more examples, the motion profiles 1102 can be used to provide real time guidance during surgery. For example, if the motion of the surgical instrument 1008 is beyond one or more thresholds, a warning can be provided to the operator. For example, a predetermined path of the surgical instrument 1008 can be configured for a surgical procedure. Alternatively, or in addition, the path for a surgical instrument 1008 can be configured per phase. If, during the surgical procedure, the surgical instrument 1008 is detected to veer off of the predetermined path by more than a predetermined threshold, a user feedback can be provided.“ [0133]).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 6,18, and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Muntion in view of Bongalon (WO 2020247451 A1).

Regarding Claim 6, Muntion discloses tracking multiple surgical instruments (“The one or more machine-learning models can be trained to perform higher-level predictions and tracking, such as predicting a state of a surgical procedure and tracking one or more surgical instruments used in the surgical procedure, as further described herein.” [0059])
Muntion does not specifically disclose that the method further comprises: determining whether the surgical instrument will collide with a second surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will collide with the second surgical instrument, generating a warning indicating that the surgical instrument will collide with the second surgical instrument.
However, in a similar field of endeavor, Bongalon teaches an operation profile system may collect surgical session data representative of surgical procedure operations performed during a surgical session and may access operation pattern data representative of multiple historical patterns of surgical procedure operations [Abstract].
Bongalon also teaches that the method further comprises: determining whether the surgical instrument will collide with a second surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will collide with the second surgical instrument, generating a warning indicating that the surgical instrument will collide with the second surgical instrument (“Accordingly, such operation pattern data 706-2 may be used to predict when such an event is likely to occur during a current surgical session having a similar operation pattern. Thus, an event may be predicted based on currently tracked surgical session data 702 and/or operation pattern data 706-2. In some examples, the event may be predicted by the computer-assisted surgical system (or by system 400) by comparing surgical session data 702 with operation profile 708-2. For instance, operation profile 708-2 may indicate that, upon the occurrence of a particular trigger event (e.g., fast movement of master controls with large trajectories), a particular response event (e.g., a collision of surgical instruments) is likely to occur. [0104] The information associated with a predicted event may be provided by system 400 and/or by the computer-assisted surgical system. In some examples, information associated with a predicted event may include a notification (e.g., a displayed message, a warning tone, etc.) configured to notify a user that the predicted event is likely to occur.” [0103], “For example, supervised machine learning model 1002 may analyze surgical session data 1004 in accordance with one or more decision tree learning algorithms, association rule learning algorithms, artificial neural network learning algorithms, deep learning algorithms, bitmap algorithms, and/or any other suitable data analysis technique as may serve a particular implementation” [0126])
It would have been obvious to an ordinary skilled person in the art before the effective filing
date of the claimed invention to modify the system of Muntion as outlined above with the method further comprises: determining whether the surgical instrument will collide with a second surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will collide with the second surgical instrument, generating a warning indicating that the surgical instrument will collide with the second surgical instrument as taught by Bongalon, because it can help perform surgical procedures more efficiently, more ergonomically, and with better situational awareness [0002].

Regarding Claim 18, Muntion discloses tracking multiple surgical instruments (“The one or more machine-learning models can be trained to perform higher-level predictions and tracking, such as predicting a state of a surgical procedure and tracking one or more surgical instruments used in the surgical procedure, as further described herein.” [0059])
Muntion does not specifically disclose that the system further comprises: determining whether the surgical instrument will collide with a second surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will collide with the second surgical instrument, generating a warning indicating that the surgical instrument will collide with the second surgical instrument.
However, in a similar field of endeavor, Bongalon teaches that the system further comprises: determining whether the surgical instrument will collide with a second surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will collide with the second surgical instrument, generating a warning indicating that the surgical instrument will collide with the second surgical instrument (“Accordingly, such operation pattern data 706-2 may be used to predict when such an event is likely to occur during a current surgical session having a similar operation pattern. Thus, an event may be predicted based on currently tracked surgical session data 702 and/or operation pattern data 706-2. In some examples, the event may be predicted by the computer-assisted surgical system (or by system 400) by comparing surgical session data 702 with operation profile 708-2. For instance, operation profile 708-2 may indicate that, upon the occurrence of a particular trigger event (e.g., fast movement of master controls with large trajectories), a particular response event (e.g., a collision of surgical instruments) is likely to occur. [0104] The information associated with a predicted event may be provided by system 400 and/or by the computer-assisted surgical system. In some examples, information associated with a predicted event may include a notification (e.g., a displayed message, a warning tone, etc.) configured to notify a user that the predicted event is likely to occur.” [0103], “For example, supervised machine learning model 1002 may analyze surgical session data 1004 in accordance with one or more decision tree learning algorithms, association rule learning algorithms, artificial neural network learning algorithms, deep learning algorithms, bitmap algorithms, and/or any other suitable data analysis technique as may serve a particular implementation” [0126])
It would have been obvious to an ordinary skilled person in the art before the effective filing
date of the claimed invention to modify the system of Muntion as outlined above with the system further comprises: determining whether the surgical instrument will collide with a second surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will collide with the second surgical instrument, generating a warning indicating that the surgical instrument will collide with the second surgical instrument as taught by Bongalon, because it can help perform surgical procedures more efficiently, more ergonomically, and with better situational awareness [0002].

Regarding Claim 26, Muntion discloses tracking multiple surgical instruments (“The one or more machine-learning models can be trained to perform higher-level predictions and tracking, such as predicting a state of a surgical procedure and tracking one or more surgical instruments used in the surgical procedure, as further described herein.” [0059])
Muntion does not specifically disclose that the operations further comprise: determining whether the surgical instrument will collide with a second surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will collide with the second surgical instrument, generating a warning indicating that the surgical instrument will collide with the second surgical instrument.
However, in a similar field of endeavor, Bongalon teaches that the system further comprises: determining whether the surgical instrument will collide with a second surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will collide with the second surgical instrument, generating a warning indicating that the surgical instrument will collide with the second surgical instrument (“Accordingly, such operation pattern data 706-2 may be used to predict when such an event is likely to occur during a current surgical session having a similar operation pattern. Thus, an event may be predicted based on currently tracked surgical session data 702 and/or operation pattern data 706-2. In some examples, the event may be predicted by the computer-assisted surgical system (or by system 400) by comparing surgical session data 702 with operation profile 708-2. For instance, operation profile 708-2 may indicate that, upon the occurrence of a particular trigger event (e.g., fast movement of master controls with large trajectories), a particular response event (e.g., a collision of surgical instruments) is likely to occur. [0104] The information associated with a predicted event may be provided by system 400 and/or by the computer-assisted surgical system. In some examples, information associated with a predicted event may include a notification (e.g., a displayed message, a warning tone, etc.) configured to notify a user that the predicted event is likely to occur.” [0103], “For example, supervised machine learning model 1002 may analyze surgical session data 1004 in accordance with one or more decision tree learning algorithms, association rule learning algorithms, artificial neural network learning algorithms, deep learning algorithms, bitmap algorithms, and/or any other suitable data analysis technique as may serve a particular implementation” [0126])
It would have been obvious to an ordinary skilled person in the art before the effective filing
date of the claimed invention to modify the system of Muntion as outlined above with the system further comprises: determining whether the surgical instrument will collide with a second surgical instrument based on the determined trajectory of each keypoint of the plurality of keypoints; and if a determination is made that the surgical instrument will collide with the second surgical instrument, generating a warning indicating that the surgical instrument will collide with the second surgical instrument as taught by Bongalon, because it can help perform surgical procedures more efficiently, more ergonomically, and with better situational awareness [0002].

Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Muntion in view of Zhang (US20230017202A1).

Regarding Claim 10, Muntion discloses all limitations noted above except that the method further comprises applying a smoothness filter to the second output image to generate a refined the second output location of each keypoint of the plurality of keypoints annotated on the second output image of the surgical instrument.
However, in a similar field of endeavor, Zhang teaches Systems, methods, and instrumentalities for computer vision-based surgical workflow recognition [Abstract].
Zhang also teaches that the method further comprises applying a smoothness filter to the second output image to generate a refined  the second output location of each keypoint of the plurality of keypoints annotated on the second output image of the surgical instrument (“The computing system may perform filtering, for example, on the unfiltered prediction result. Filtering may include noise filtering, for example, such as using predetermined rules (e.g., set by humans or automatically derived overtime), a smooth filter (e.g., median filter), and/or the like. Noise filtering may include prior knowledge noise filtering. For example, the unfiltered prediction result may include incorrect predictions. The filtering may remove the incorrect predictions to generate an accurate prediction result, which may include accurate information associated with the video.” [0080]).
It would have been obvious to an ordinary skilled person in the art before the effective filing
date of the claimed invention to modify the system of Muntion as outlined above with the method further comprises applying a smoothness filter to the second output image to generate a refined the second output location of each keypoint of the plurality of keypoints annotated on the second output image of the surgical instrument as taught by Zhang, because it can generate a more accurate prediction result [0080].

Regarding Claim 11, Muntion discloses further comprising generating, using the second neural network model, a refined output image based on the refined second output location of each keypoint of the plurality of keypoints.
However, in a similar field of endeavor, Zhang teaches that the method further comprises generating, using the second neural network model, a refined output image based on the refined second output location of each keypoint of the plurality of keypoints (“The computing system may perform filtering, for example, on the unfiltered prediction result. Filtering may include noise filtering, for example, such as using predetermined rules (e.g., set by humans or automatically derived overtime), a smooth filter (e.g., median filter), and/or the like. Noise filtering may include prior knowledge noise filtering. For example, the unfiltered prediction result may include incorrect predictions. The filtering may remove the incorrect predictions to generate an accurate prediction result, which may include accurate information associated with the video.” [0080]).
It would have been obvious to an ordinary skilled person in the art before the effective filing
date of the claimed invention to modify the system of Muntion as outlined above with the method further comprises generating, using the second neural network model, a refined output image based on the refined second output location of each keypoint of the plurality of keypoints as taught by Zhang, because it can generate a more accurate prediction result [0080].

Response to Arguments
Applicant's arguments filed 11/17/2025 have been fully considered but they are not persuasive.
Regarding the U.S.C. 102(a)(2) rejection of Claim 1 the applicant argues the following:
As discussed during the interview, the Applicant respectfully submits that Muntion does not disclose at least "determining, using the second neural network model, a trajectory for each keypoint of the plurality of keypoints, wherein the trajectory for each keypoint is based on the first output image generated by the first neural network model and the historic keypoint trajectory data for each keypoint," as recited by amended claim 1. (Emphasis added.) The Office Action appears to analogize Muntion's "Block 208" to the "first neural network model" of claim 1, appears to analogize Muntion's "Block 210" to the "second neural network model" of claim 1, appears to analogize Muntion's "visualization" to the "second output image" of claim 1, and appears to analogize Muntion's "motion profiles of surgical instruments" to the "historic keypoint trajectory data" of claim 1. (Pages 7-9.) Even assuming, without conceding, the Office Action's analogies are accurate, Muntion does not disclose that "the trajectory for each keypoint is based on the first output image generated by the first neural network model and the historic keypoint trajectory data for each keypoint," as recited by amended claim 1. (Emphasis added.)

	However, it is noted that Muntion discloses a workflow of analyzing a continuous video stream  and analyzing frames at a set frequency defined in the spec as an input window, wherein each analysis results in identification of locations of anatomical structures and surgical instruments (done by neural networks) in the images that are in the input window. Muntion continues to explain that as the workflow progress each input window depends on previous input windows (first output image) as well as previous key points identified to produce a trajectory (historic keypoint trajectory data), this is done to fill in the blanks between each window and provide a continuous trajectory as well as potential future trajectory. This process reiterates until the procedure is over and continuously uses previous input windows to affect current trajectory tracking. (“to facilitate real-time performance, the input window 320 can be analyzed at a predetermined frequency, such as 5 times per second, 3 times per second, 10 times per second, etc. The analysis can result in identification of locations of anatomical structures and surgical instruments in the images 302 that are in the input window 320. It can be appreciated that the video of the surgical procedure includes images 302 that are between two successive input windows 320. For example, if the video is captured at 60 frames per second, and if the input window 320 includes 5 frames, and if the input window 320 is analyzed 5 times per second, then a total of 25 frames from the captured 60 frames are analyzed. The remaining 35 frames are in between two successive input windows 320. It is understood that the capture speed, input window frequency, and other parameters can vary from one aspect to another, and that above numbers are examples.” [0096], “For the frames, e.g., including images 302, between two successive input windows 320, the locations of the anatomical structures and surgical instruments can be predicted based on the locations predicted in the most recent input window 320. For example, a movement vector of the surgical instrument can be computed based on the changes in the location of the surgical instrument in the frames in the prior input window 320. The movement vector can be computed using a machine learning model, such as a deep neural network. The movement vector is used to predict the location of the surgical instrument in the subsequent frames after the input window 320, until a next input window 320 is analyzed.” [0097])

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's
disclosure (CN112037263A, CN110490906A).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN MALDONADO whose telephone number is 703-756-1421. The examiner can normally be reached 8:00 am-4:00 pm PST M-Th Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at
http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christopher Koharski can be reached on (571) 272-7230. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Steven Maldonado/
Patent Examiner, Art Unit 3797

/SHAHDEEP MOHAMMED/Primary Examiner, Art Unit 3797
Read full office action
Prosecution Timeline

Mar 01, 2024
Application Filed
Jul 11, 2025
Non-Final Rejection — §102, §103, §112
Oct 27, 2025
Applicant Interview (Telephonic)
Oct 27, 2025
Examiner Interview Summary
Nov 17, 2025
Response Filed
Jan 16, 2026
Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/558,642
Patent 12551289
Tracker-Based Surgical Navigation
2y 5m to grant Granted Feb 17, 2026
18/058,854
Patent 12496034
SYSTEMS AND METHODS FOR PATIENT MONITORING
2y 5m to grant Granted Dec 16, 2025
18/991,054
Patent 12484796
SYSTEM AND METHOD FOR MEASURING PULSE WAVE VELOCITY
2y 5m to grant Granted Dec 02, 2025
17/377,982
Patent 12350095
DIAGNOSTIC IMAGING CATHETER AND DIAGNOSTIC IMAGING APPARATUS
2y 5m to grant Granted Jul 08, 2025
Study what changed to get past this examiner. Based on 4 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
30%
Grant Probability
84%
With Interview (+54.2%)
3y 0m
Median Time to Grant
Moderate
PTA Risk
Based on 20 resolved cases by this examiner. Grant probability derived from career allow rate.