Last updated: April 19, 2026
Application No. 18/727,739
DETECTING AND DISTINGUISHING CRITICAL STRUCTURES IN SURGICAL PROCEDURES USING MACHINE LEARNING

Non-Final OA §103
Filed
Jul 10, 2024
Examiner
WU, MING HAN
Art Unit
2618
Tech Center
2600 — Communications
Assignee
Digital Surgery Limited
OA Round
1 (Non-Final)
Interview Optional

— +23.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 370 resolved cases, 2023–2026
Examiner Intelligence

WU, MING HAN View full profile →
Grants 76% — above average
Career Allow Rate
282 granted / 370 resolved
+14.2% vs TC avg
Strong +23% interview lift
Without
With
+23.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
35 currently pending
Career history
405
Total Applications
across all art units
Statute-Specific Performance

§101
7.8%
-32.2% vs TC avg
§103
68.3%
+28.3% vs TC avg
§102
2.1%
-37.9% vs TC avg
§112
12.6%
-27.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 370 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: an output generator configured to generate in claim 9. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 2, 4, 5, 6, 7, 9, 10, 11, 13, 14, 15, 16, 17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wolf et al. (Publication: US 2020/0237452 A1) in view of Barral, et al. (Publication: US 2019/0069957 A1).

Regarding claim 1, Wolf discloses a computer-implemented method comprising ([0156], [0314] - system 1401 may include computer-implemented methods, tangible non-transitory computer-readable mediums, and systems by at least one processor that receives instructions from a non-transitory computer-readable storage medium, the instructions to perform the methods below: ):
a plurality of structures in a video of a laparoscopic surgical procedure ([0167] - data structures may comprise a table including video footage and video footage pertaining to different surgical procedures. video footage may include footage of a laparoscopic cholecystectomy, surgical procedure.
[0525] - The anatomical region may include cavities (e.g., a surgical cavity), organs, tissues, ducts, arteries, cells, or any other anatomical structures and together with a label indicating an anatomical region within the video.);
identifying, using a second configuration of the neural network, from the plurality of structures, a first type of anatomical structure and a second type of anatomical structure ([0475], [0483] - identify the anatomical structures include the cystic duct and the cystic artery, first type and second type.
[0205] - a machine learning model may be trained using training examples to detect interactions between medical instruments and anatomical structures from videos, or the trained machine learning model may be used to analyze the video footage” a second configuration of the neural network”.); and 
the generating comprising annotating the video with the first type of anatomical structure and the second type of anatomical structure ([0525] An anatomical region may be any region that generates anatomical structures of a living organism together with a label, generate, indicating an anatomical region within the video “annotating”. The anatomical region may include cavities (e.g., a surgical cavity), organs, tissues, ducts, arteries, cells, or any other anatomical structures “first type, second type” and together with a label indicating an anatomical region within the video. ).
Wolf does not however Barral discloses
detecting, using a first configuration of a neural network ([0038] - Block 305 describes identifying anatomical features in the video using a machine learning algorithm stored in a memory in the processing apparatus. The specific anatomical features may be identified using at least one of a deep learning “detecting”. The machine learning algorithm may be trained with anatomical maps of the human body, other surgical videos, images of anatomy, and use these different inputs to change the state of artificial neurons. Thus, the deep learning model will produce a different output based on one of the inputs and activation of the artificial neurons, ““one of the inputs = first configuration, deep learning focuses on neural networks as stated in Wikipedia, https://en.wikipedia.org/wiki/Deep_learning”. ), 
generating an augmented video comprises the structures ([0032], [0034], Fig. 2 – generated the augmented video shown on display 209 with different anatomical features.

    PNG
    media_image1.png
    208
    252
    media_image1.png
    Greyscale
 )

Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf with detecting, using a first configuration of a neural network, generating an augmented video comprises the structures as taught by Barral. The motivation for doing is to reduce the amount of time in the operating room. 

Regarding claim 2, Wolf in view of Barral disclose all the limitations of claim 1.
Wolf discloses wherein the surgical procedure is a laparoscopic cholecystectomy, the first type of anatomical structure is a cystic artery, and the second type of anatomical structure is a cystic duct ([0154] – the surgical phase is laparoscopic cholecystectomy.
[0475], [0483] - the anatomical structures include the cystic duct and the cystic artery, types.).

Regarding claim 4, Wolf in view of Barral disclose all the limitations of claim 3.
Wolf discloses using one or more temporal models to provide context to the frame ([0116] - a machine learning model may be trained using training examples, each training example may include video footage known to be associated with surgical procedures, surgical phases, intraoperative events, and event characteristics, together with labels indicating locations within the video footage. Using the trained machine learning model, similar phases and events may be identified in other video footage for the determining marker locations. ”Temporal model” is time dependent data in this case label indicating location within the video footage, time dependent.).

Regarding claim 5, Wolf in view of Barral disclose all the limitations of claim 1.
Wolf discloses wherein the neural network is trained to generate the second configuration based on weak labels ([0205], [0279] - a machine learning model may be trained using training examples to identify condition of anatomical structures from images and/or videos, the trained machine learning model may be used to analyze the first set of frames to identify a first condition of a first anatomical structure and/or to analyze the second set of frames to identify a second condition of a second anatomical structure (while may be the same as the first anatomical structure or a different anatomical structure), and the first surgical complexity level may be determined based on the identified first condition may be determined. [0893] determining that the first surgical complexity level is less than a selected threshold, “weak label”. machine learning model may be trained using training examples to analyze the video based on the complexity level is less than a selected threshold. ).

Regarding claim 6, Wolf in view of Barral disclose all the limitations of claim 1.
Wolf discloses wherein the video is a live video of the surgical procedure ([0021] - the received video footage may be analyzed using the image-related data to determine, in real time.).
Barral discloses video is a live video stream ([0026] -The processing apparatus may then output the annotated video to the display in real time. [0033] - Processing apparatus 207 has recognized the spleen in the incision and has accentuated (bolded its outline either in black and white or color) the spleen in the annotated video stream.)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf in view of Barral with video is a live video stream as taught by Barral. The motivation for doing is to reduce the amount of time in the operating room. 

Regarding claim 7, Wolf in view of Barral disclose all the limitations of claim 1.
Wolf discloses wherein the first type of anatomical structure is annotated differently than the second type of anatomical structure ([0525] An anatomical region may be any region that includes anatomical structures of a living organism together with a label indicating an anatomical region within the video. The anatomical region may include cavities (e.g., a surgical cavity), organs, tissues, ducts, arteries, cells, or any other anatomical structures and together with a label indicating an anatomical region within the video. 
[0475], [0483] - the anatomical structures include the cystic duct and the cystic artery, types. Cystic duct and the cystic artery are different.).

Regarding claim 9, Wolf discloses a system comprising: a training system configured to use a training dataset to train one or more machine learning models ( [0156], [0314] - system 1401 may include computer-implemented methods, tangible non-transitory computer-readable mediums, and systems by at least one processor that receives instructions from a non-transitory computer-readable storage medium, the instructions, the machine learning model is trained using training examples to identify the data “training dataset”. [0167] including video footage and video footage pertaining to different surgical procedures and may include footage of a laparoscopic cholecystectomy, surgical procedure.); 
a data collection system configured to capture a video of a surgical procedure being performed ([0021] - the system receives video footage of a surgical procedure performed by a surgeon on a patient in an operating room and accessing at least one data structure including image-related data characterizing surgical procedures “data collection” .);
a machine learning model execution system configured to execute the one or more machine learning models to perform a method comprising ([0486] – the system processes a machine learning model trained by training examples to analyze the image data and the following:) : 
identifying, from the plurality of structures, at least one type of anatomical structure by using a second configuration of the one or more machine learning models ([0205] - a machine learning model may be trained using training examples to detect interactions between medical instruments and anatomical structures from videos, and the trained machine learning model may be used to analyze the video footage, and detect the interaction between the medical instrument and the anatomical structure. [0475], [0483] - identify the anatomical structures include the cystic duct and the cystic artery.); and 
configured to generate a video by annotating the video to mark the at least one type of anatomical structure ([0525] An anatomical region may be any region that generates anatomical structures of a living organism together with a label, generate, indicating an anatomical region within the video “annotating”. The anatomical region may include cavities (e.g., a surgical cavity), organs, tissues, ducts, arteries, cells, or any other anatomical structures and together with a label indicating an anatomical region within the video.).
Wolf does not however Barral discloses
detecting a plurality of structures in the video by using a first configuration of the one or more machine learning models ([0038] - Block 305 describes identifying anatomical features in the video using a machine learning algorithm stored in a memory in the processing apparatus. The specific anatomical features may be identified using at least one of a deep learning “detecting”. The machine learning algorithm may be trained with anatomical maps of the human body, other surgical videos, images of anatomy, and use these different inputs to change the state of artificial neurons. Thus, the deep learning model will produce a different output based on one of the inputs and activation of the artificial neurons, the first input is the “first configuration“. “one of the inputs = first configuration, deep learning focuses on neural networks as stated in Wikipedia, https://en.wikipedia.org/wiki/Deep_learning”.) ; and 
an output generator configured to generate an augmented video ([0032], [0034], Fig. 2 – graphics processing unit generates the augmented video shown on display 209 with different anatomical features.

    PNG
    media_image1.png
    208
    252
    media_image1.png
    Greyscale
 )

Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf with detecting, using a first configuration of a neural network, generating an augmented video comprises the structures as taught by Barral. The motivation for doing is to reduce the amount of time in the operating room. 

Regarding claim 10, Wolf in view of Barral disclose all the limitations of claim 9.
Wolf discloses a second machine learning model is trained to identify the at least one type of anatomical structure from the plurality of structures ([0205] - a machine learning model may be trained using training examples to detect interactions between medical instruments and anatomical structures from videos, and the trained machine learning model may be used to analyze the video footage,” a second configuration of the neural network” , and detect the interaction between the medical instrument and the anatomical structure. [0475], [0483] - identify the anatomical structures include the cystic duct and the cystic artery.).
Barral discloses wherein a first machine learning model is trained to detect the plurality of structures ([0038] - Block 305 describes identifying anatomical features in the video using a machine learning algorithm stored in a memory in the processing apparatus. The specific anatomical features may be identified using at least one of a deep learning “detecting”.)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf in view of Barral with wherein a first machine learning model is trained to detect the plurality of structures as taught by Barral. The motivation for doing is to reduce the amount of time in the operating room. 
	
Regarding claim 11, Wolf in view of Barral disclose all the limitations of claim 9.
Wolf discloses wherein a same machine learning model is used to detect the plurality of structures and to identify the at least one type of anatomical structure from the plurality of structures ([0205] - a machine learning model may be trained using training examples to detect interactions between medical instruments and anatomical structures from videos, and the trained machine learning model may be used to analyze the video footage,” a second configuration of the neural network” , and detect the interaction between the medical instrument and the anatomical structure, all the interactions where from the same machine learning model. [0475], [0483] - identify the anatomical structures include the cystic duct and the cystic artery.).

Regarding claim 13, Wolf in view of Barral disclose all the limitations of claim 9.
Wolf discloses wherein the training system is further configured to train a third machine learning model to identify at least one surgical instrument from the plurality of structures ([0205] - a machine learning model may be trained using training examples to detect interactions between medical instruments and anatomical structures from videos, and the trained machine learning model may be used to analyze the video footage, and detect the interaction between the medical instrument and the anatomical structure” a third configuration of the neural network”.).

Regarding claim 14, Wolf discloses a computer program product comprising a memory device having computer executable instructions stored thereon, which when executed by one or more processors cause the one or more processors to perform a method for prediction of features in surgical data using machine learning, the method comprising ([0156], [0314] - system 1401 may include computer-implemented methods, tangible non-transitory computer-readable mediums, and systems by at least one processor that receives instructions from a non-transitory computer-readable storage medium, the instructions, the machine learning model is trained using training examples to identify the data structure. [0167] data structure may comprise a table including video footage and video footage pertaining to different surgical procedures, include footage of a laparoscopic cholecystectomy.) :
detecting, using a neural network model, a plurality of structures comprising one or more images from a video of a surgical procedure, the neural network model is trained using surgical training data ([0257] - identifying the anatomical structure in a first set of frames includes using a machine learning model trained to detect anatomical structures. Videos along with identifications of anatomical structures “plurality of structures” known to be depicted in videos may be input into a machine learning model as training data. As a result, the trained model may be used to analyze the surgical video footage to identify in the first set of frames, an anatomical structure.
[0268] - a machine learning model may be trained using training examples “surgical training data” of a surgical procedure to detect or anatomical structures from images video.) .
identifying, using the neural network model, at least one type of anatomical structure in the plurality of structures detected ([0475], [0483] - identify the anatomical structures include the cystic duct and the cystic artery. [0205] - a machine learning model may be trained using training examples to detect interactions between medical instruments and anatomical structures from videos, and the trained machine learning model may be used to analyze the video footage, and detect the interaction between the medical instrument and the anatomical structure.); and 
generating a visualization of the surgical procedure by displaying a graphical overlay in the video of the surgical procedure ([0006] Accessing at least one video of a surgical procedure and causing the at least one video to be output for display. Generating, overlaying, on the at least one video outputted for display, a surgical timeline on a surgical procedure. The surgical timeline may include markers identifying at least one of a surgical phase, an intraoperative surgical event, and a decision making junction.).
displaying a graphical overlay at a location of the at least one type of data in the video of the surgical procedure ([0006] Accessing at least one video of a surgical procedure and causing the at least one video to be output for display. Generating, overlaying, on the at least one video outputted for display, a location on the surgical timeline on a surgical procedure. The surgical timeline may include markers identifying at least one of a surgical phase, an intraoperative surgical event, and a decision making junction.) .
Wolf does not however Barral discloses
detecting a plurality of structures in an input window ([0015] - recognized anatomical features could be circled by a dashed or continuous line, or the annotation could be directly superimposed on the structures. A user interface (e.g., keyboard, mouse, microphone, etc.) could be provided to the surgeon to input additional annotations, “ detecting, input window” .
[0020] When there is an anatomical area that does not make sense because it is too large, too diseased, or too damaged for the device to verify its identity, the model could alert the surgeon. The alert can be a mark on the user interface, “ detecting, input window”.).
displaying a graphical overlay at least one type of anatomical structure ([0015] Recognized anatomical features, the annotation could be directly superimposed “display with overlay” on the structures , the annotations could be available in a caption, or a bounding box could follow the anatomical features, type.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf with detecting a plurality of structures in an input window, displaying a graphical overlay at least one type of anatomical structure as taught by Barral. The motivation for doing is to reduce the amount of time in the operating room. 

Regarding claim 15, Wolf in view of Barral disclose all the limitations of claim 14.
Wolf discloses wherein the neural network model detects the location of the at least one type of anatomical structure based on an identification of a phase of the surgical procedure being performed ([0189] - At step 814, in FIG. 8B, process 800 may include storing an event characteristic associated with the particular intraoperative surgical event, ”phase”. the event characteristic may include information related to an anatomical structure involved in the event (such as type of the anatomical structure, condition of the anatomical structure, change occurred to the anatomical structure in relation to the event, etc.), a machine learning model may be trained using training example to identify such information related to anatomical structures from videos, and the trained machine learning model may be used to analyze the video footage and determine the information related to the anatomical structure involved in the event.).

Regarding claim 16, Wolf in view of Barral disclose all the limitations of claim 14.
Wolf discloses wherein one or more visual attributes of the graphical overlay are configured based on the at least one type of data ([0006] – include overlaying, on the at least one video outputted for display, a surgical timeline. The surgical timeline may include markers identifying at least one of a surgical phase, an intraoperative surgical event, and a decision making junction. The surgical timeline may enable a surgeon, while viewing playback of the at least one video to select one or more markers on the surgical timeline, and thereby cause a display of the video to skip to a location associated with the selected marker.) .
Barral discloses configured based on the at least one type of anatomical structure ([0013] The instant disclosure trains a machine learning model (e.g., a deep learning model) to recognize specific anatomical structures within surgical videos, and highlight these structures. For example, in cholecystectomy (removal of gallbladder), the systems disclosed here trains a model on frames extracted from laparoscopic videos (which may, or may not, be robotically assisted) where structures of interest (liver, gallbladder, omentum, etc.) have been highlighted. Once image classification has been learned by the algorithm, the device may use a sliding window approach to find the relevant structures in videos and highlight them, for example by delineating them with a bounding box.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf in view of Barral with configured based on the at least one type of anatomical structure as taught by Barral. The motivation for doing is to reduce the amount of time in the operating room. 

Regarding claim 17, Wolf in view of Barral disclose all the limitations of claim 16.
Wolf discloses wherein the one or more visual attributes assigned to the at least one type of anatomical structure are user configurable ([0165] - the video footage may include a representation of one or more anatomical structures of a patient and an event characteristic identifying the anatomical structures may be determined based on detecting the anatomical structure in the video footage. a user may input the event characteristic to be stored via a user interface ” user configurable”. [0162] - training example may include a video clip together with a label indicating a location of a particular event within the video clip, or an absence of such event.).

Regarding claim 19, Wolf in view of Barral disclose all the limitations of claim 14.
Wolf discloses comprises a first neural network for semantic image segmentation ([0080] - a trained machine learning algorithm may include an image segmentation model, the input may include an image, and the inferred output may include a segmentation of the image.) and a second neural network for encoding ([0159] - Generating the phase tag may also include using a trained machine learning model or a neural network model (such as deep neural network, convolutional neural networks, etc.), which may be trained to associate one or more video frames with one or more phase tags. generating a phase tag associated with the surgical phase may include generating a tag including binary encoding of a surgical phase identifier.).

Regarding claim 20, Wolf in view of Barral disclose all the limitations of claim 14.
Wolf discloses wherein the plurality of structures comprises one or more anatomical structures and one or more surgical instruments ([0115] - visual action recognition algorithms may be used to analyze the video and detect the interactions between the medical instrument and the anatomical structure.
Fig. 3 – surgical instrument.).

Claims 3, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Wolf et al. (Publication: US 2020/0237452 A1) in view of Barral, et al. (Publication: US 2019/0069957 A1) and Weinzaepfel et al. (Publication: US 2020/0364509 A1)

Regarding claim 3, Wolf in view of Barral disclose all the limitations of claim 1.
Wolf discloses wherein an anatomical structure, from the plurality of structures, [[occludes]] at least one other anatomical structure from the plurality of structures in a frame of the video ([0205] - a machine learning model may be trained using training examples to detect interactions between medical instruments and anatomical structures from videos, and the trained machine learning model may be used to analyze the video footage, and detect the interaction between the medical instrument and the anatomical structure. [0475], [0483] - identify the anatomical structures include the cystic duct and the cystic artery.).
Wolf in view of Barral do not however Weinzaepfel discloses 
occludes at least one other structure ([0036] In the description below, an object-of-interest is a discriminative area within a three-dimensional map which can be reliably detected from multiple viewpoints, partly occluded, and under various lighting conditions.) .
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf in view of Barral with occludes at least one other structure as taught by Weinzaepfel. The motivation for doing is to accurately regress the three-dimensinal coordinate to provide vivid scenarios. 

Regarding claim 8, Wolf in view of Barral disclose all the limitations of claim 1.
Wolf wherein the annotating comprises adding, to the video, at least one from [[a mask, a bounding box, and a label]] ([0525] An anatomical region may be any region that includes anatomical structures of a living organism together with a label indicating an anatomical region within the video. The anatomical region may include cavities (e.g., a surgical cavity), organs, tissues, ducts, arteries, cells, or any other anatomical parts and together with a label indicating an anatomical region within the video.).
Wolf in view of Barral do not however Weinzaepfel discloses 
Perform at least one from a mask, a bounding box, and a label ([0177] A method, using a data processor, for training a neural network for determining, from a query image generated from a camera pose, localization of the camera pose in a predetermined environment, comprises (a) accessing a first frame of training data with two-dimensional pixels having an object-of-interest identified by a bounding box of a manual mask and a class label.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf in view of Barral with Perform at least one from a mask, a bounding box, and a label as taught by Weinzaepfel. The motivation for doing is to accurately regress the three-dimensinal coordinate to provide vivid scenarios. 

Claims 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Wolf et al. (Publication: US 2020/0237452 A1) in view of Barral, et al. (Publication: US 2019/0069957 A1) and Feinman et al. (Patent: US 10,572,823 B1)

Regarding claim 12, Wolf in view of Barral disclose all the limitations of claim 1.
Wolf discloses to detect the plurality of structures, ([0115] - visual action recognition algorithms may be used to analyze the video and detect the interactions between the medical instrument and the anatomical structure.) and to identify the at least one type of anatomical structure ([0475], [0483] - identify the anatomical structures include the cystic duct and the cystic artery, type) .
Wolf in view of Barral do not however Feinman discloses 
Feinman discloses to detect, which comprises a first set of hyperparameter values, and to identify, uses the second configuration, which comprises a second set of hyperparameter values (column 1 lines 55 to 65 - selecting the set of hyperparameters may include (i) identifying an objective function including a first function that rewards model efficacy and a second function that penalizes model size and (ii) selecting, from among the candidate hyperparameter sets, the set of hyperparameters that optimizes the objective function. In one embodiment, the objective function may also include a weighting term that adjusts the magnitude of at least one of the first function and the second function. Identifying =select and identify).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf in view of Barral with Perform at least one from a mask, a bounding box, and a label as taught by Feinman. The motivation for doing is to improve efficiency. 

Regarding claim 18, Wolf in view of Barral disclose all the limitations of claim 14.
Wolf discloses to detect the plurality of structures ([0115] - visual action recognition algorithms may be used to analyze the video and detect the interactions between the medical instrument and the anatomical structure.);
identify the at least one type of anatomical structure ([0475], [0483] - identify the anatomical structures include the cystic duct and the cystic artery, type).
Wolf in view of Barral do not however Feinman discloses 
Feinman discloses configured with a first set of hyperparameters to detect the data, and with a second set of hyperparameters to identif (column 1 lines 55 to 65 - selecting the set of hyperparameters may include (i) identifying an objective function including a first function that rewards model efficacy and a second function that penalizes model size and (ii) selecting, from among the candidate hyperparameter sets, the set of hyperparameters that optimizes the objective function. In one embodiment, the objective function may also include a weighting term that adjusts the magnitude of at least one of the first function and the second function. Identifying =select and identify).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to modify Wolf in view of Barral with configured with a first set of hyperparameters to detect the data, and with a second set of hyperparameters to identif as taught by Feinman. The motivation for doing is to improve efficiencyy the at least another data.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ming Wu whose telephone number is (571)270-0724. The examiner can normally be reached on Monday - Friday: 9:30am - 6:00pm EST .
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Devona Faulk can be reached on 571-272-7515. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MING WU/
Primary Examiner, Art Unit 2618
Read full office action
Prosecution Timeline

Jul 10, 2024
Application Filed
Jan 10, 2026
Non-Final Rejection — §103
Apr 06, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/470,698
Patent 12597109
SYSTEMS AND METHODS FOR GENERATING THREE-DIMENSIONAL MODELS USING CAPTURED VIDEO
2y 5m to grant Granted Apr 07, 2026
18/436,674
Patent 12579702
METHOD AND SYSTEM FOR ADAPTING A DIFFUSION MODEL
2y 5m to grant Granted Mar 17, 2026
18/551,392
Patent 12579623
IMAGE PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
18/387,825
Patent 12567185
Method and system of creating and displaying a visually distinct rendering of an ultrasound image
2y 5m to grant Granted Mar 03, 2026
18/490,325
Patent 12548202
TEXTURE COORDINATE COMPRESSION USING CHART PARTITION
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+23.3%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 370 resolved cases by this examiner. Grant probability derived from career allow rate.