Last updated: April 19, 2026
Application No. 18/599,180
Metrics and Event Detection Using Multi-Modal Data

Non-Final OA §103§112
Filed
Mar 07, 2024
Examiner
WELLS, HEATH E
Art Unit
2664
Tech Center
2600 — Communications
Assignee
Auris Health, Inc.
OA Round
1 (Non-Final)
Interview Optional

— +18.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 77 resolved cases, 2023–2026
Examiner Intelligence

WELLS, HEATH E View full profile →
Grants 75% — above average
Career Allow Rate
58 granted / 77 resolved
+13.3% vs TC avg
Strong +18% interview lift
Without
With
+18.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
46 currently pending
Career history
123
Total Applications
across all art units
Statute-Specific Performance

§101
17.8%
-22.2% vs TC avg
§103
62.8%
+22.8% vs TC avg
§102
2.4%
-37.6% vs TC avg
§112
13.8%
-26.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 77 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The IDS dated 30 June 2025 has been considered and placed in the application file.  
Claim Interpretation
Under MPEP 2143.03, "All words in a claim must be considered in judging the patentability of that claim against the prior art." In re Wilson, 424 F.2d 1382, 1385, 165 USPQ 494, 496 (CCPA 1970).  As a general matter, the grammar and ordinary meaning of terms as understood by one having ordinary skill in the art used in a claim will dictate whether, and to what extent, the language limits the claim scope. Language that suggests or makes a feature or step optional but does not require that feature or step does not limit the scope of a claim under the broadest reasonable claim interpretation. In addition, when a claim requires selection of an element from a list of alternatives, the prior art teaches the element if one of the alternatives is taught by the prior art. See, e.g., Fresenius USA, Inc. v. Baxter Int’l, Inc., 582 F.3d 1288, 1298, 92 USPQ2d 1163, 1171 (Fed. Cir. 2009).
Claims 1, 3, 7, 9, 10, 15-18 and 20 recite “at least one of.” Since “at least one of”  is disjunctive, any one of the elements found in the prior art is sufficient to reject the claim.  While citations have been provided for completeness and rapid prosecution, only one element is required.  Because, on balance, it appears the disjunctive interpretation enjoys the most specification support and for that reason the disjunctive interpretation (one of A, B OR C) is being adopted for the purposes of this Office Action.  Applicant’s comments and/or amendments relating to this issue are invited to clarify the claim language and the prosecution history.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f), is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f), is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are:
“an image repository configured to” in claim 1 and 20;
“a log repository configured to” in claim 1 and 20;
“control configured to” in claim 1; and
“control circuitry communicatively coupled to” in claim 20.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f), they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f), applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. § 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 1, 6, 8, 9 and 15 are rejected under 35 U.S.C. § 112(b) as being indefinite for claiming both an apparatus and a process of using the apparatus.  When both an apparatus and a method are claimed in the same claim it is unclear whether infringement occurs when the apparatus is constructed or when the apparatus is used.  Therefore the scope of the claim is indefinite.  See MPEP 2173.05(p).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-20 are rejected under 35 U.S.C. 103 as obvious over US Patent Publication 2019 0216548 A1, (Ummalaneni) in view of US Patent Publication 2025 0279172 A1, (Wolf et al.).
[AltContent: textbox (Ummalaneni, Fig. 19, showing a system for extracting object info during a medical procedure.)]
    PNG
    media_image1.png
    402
    633
    media_image1.png
    Greyscale
Claim 1
 Regarding Claim 1, Ummalaneni teaches a system for extracting information of objects from video captured during a medical procedure ("In some implementations, the user 205 can both view data and input commands to the medical robotic system 110 using the integrated displays 202 and control modules," paragraph [0139]), the system comprising:
an image repository configured to store image data representing views within a luminal network, the views captured with an imaging device ("The navigation fusion system 400 includes a number of data repositories including depth features data repository 405, endoscope EM sensor data repository 415, registration data repository 475, model data repository 425, endoscope imaging data repository 480, navigation path data repository 445, and robotic position data repository 470," paragraph [0146]);
a log repository configured to store commands and/or states associated with an object within the luminal network ("The navigation fusion system 400 includes a number of data repositories including depth features data repository 405, endoscope EM sensor data repository 415, registration data repository 475, model data repository 425, endoscope imaging data repository 480, navigation path data repository 445, and robotic position data repository 470," paragraph [0146]); and
control circuitry  ("The navigation controller 460 can automatically control the instrument according to the determined next movement in some embodiments," paragraph [0174]) configured to:
generate change data representing changes of visual states of the object over a time period based at least in part on the image data ("The optical components move along with the tip of the endoscope 115 such that movement of the tip of the endoscope 115 results in corresponding changes to the field of view of the images captured by the imaging devices. The distal end of the endoscope 115 can be provided with one or more EM sensors 125 for tracking the position of the distal end within an EM field generated around the luminal network 140. The distal end of the endo scope 115 is further described with reference to FIG. 18 below," paragraph [0123] where sensors produce change data); and
access the log repository to determine logs including at least one command or at least one state associated with the object over the time period ("using the modeling system 420, data representing a number of images of a patient's anatomical luminal network can be analyzed to build a three-dimensional model of a virtual representation of the anatomical luminal network, and this virtual anatomical luminal network," paragraph [0147] where a modeling system has a log repository and state associated with the three-dimensional model).

    PNG
    media_image2.png
    667
    504
    media_image2.png
    Greyscale
Ummalaneni is not relied upon to explicitly teach all of contextual information.
[AltContent: textbox (Wolf et al. Fig. 21, showing integrating multiple sources into a repository.)]However, Wolf et al. teach generate contextual information associated with the object based at least in part on (i) the change data and (ii) the at least one command or the at least one state associated with the object ("For example, the information may provide context that is useful in determining which frames of the particular surgical footage are associated with intraoperative events and/or surgical activity. In some embodiments, distinguishing in the particular surgical footage the first group of frames from the second group of frames may involve the use of a machine learning algorithm," paragraph [0201]).
Therefore, taking the teachings of Ummalaneni and Wolf et al. as a whole, it would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify “Robotic Systems for Determining a Roll of a Medical Device in Luminal Networks” as taught by Ummalaneni to use “Video used to Automatically Populate a postoperative report” as taught by Wolf et al.  The suggestion/motivation for doing so would have been that, “Therefore, there is a need for unconventional approaches that efficiently and effectively analyze surgical videos to enable a surgeon to view surgical events, provide decision support, and/or facilitate postoperative activity.” as noted by the Wolf et al. disclosure in paragraph [0004], which also motivates combination because the combination would predictably have a higher productivity as there is a reasonable expectation that surgery needs post-surgical documentation and automation of the documentation will save doctors time; and/or because doing so merely combines prior art elements according to known methods to yield predictable results.
The rejection of system claim 1 above applies mutatis mutandis to the corresponding limitations of method claim 16 and system claim 20 while noting that the rejection above cites to both device and method disclosures.  Claims 16 and 20 are mapped below for clarity of the record and to specify any new limitations not included in claim 1.
Claim 2
 Regarding claim 2, Ummalaneni teaches the system of claim 1, as noted above.
Ummalaneni is not relied upon to explicitly teach all of contextual information.
However, Wolf et al. teach wherein the control circuitry is further configured to: assign, using a machine learning classifier, a semantic label to the object in one or more image frames of the image data  ("For example, the information may provide context that is useful in determining which frames of the particular surgical footage are associated with intraoperative events and/or surgical activity. In some embodiments, distinguishing in the particular surgical footage the first group of frames from the second group of frames may involve the use of a machine learning algorithm," paragraph [0201]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 3
 Regarding claim 3, Ummalaneni teaches the system of claim 1, wherein the control circuitry is further configured to: determine the object is a LASER, a basket, a Percutaneous Antegrade Urethral Catheter (PAUC), a ureteral access sheath (UAS), a needle, an anatomical feature, or a stone ("For example, the ureteroscope 32 may be directed into the ureter and kidneys to break up kidney stone build up using laser or ultrasonic lithotripsy device deployed down the working channel of the ureteroscope 32. After lithotripsy is complete, the resulting stone fragments may be removed using baskets deployed down the ureteroscope 32," paragraph [0075] and " As described in more detail below, using the system 400, data from a number of different sources is combined and repeatedly analyzed during a surgical procedure to provide an estimation of the real-time movement information and location/orientation information of a surgical instrument ( e.g., the endoscope) within the luminal network of the patient and to make navigation decisions." paragraph [0145]).
Claim 4
 Regarding claim 4, Ummalaneni teaches the system of claim 1, as noted above.
Ummalaneni is not relied upon to explicitly teach all of determine a medical procedure.
However, Wolf et al. teach wherein the control circuitry is further configured to: determine a medical procedure or a phase of the medical procedure ("the surgical phase that the leakage situation occurred at, etc.), and the trained machine learning model may be used to analyze information related to the fluid leakage situation and determining whether the fluid leakage situation is abnormal," paragraph [0667]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 5
 Regarding claim 5, Ummalaneni teaches the system of claim 1, as noted above.
Ummalaneni is not relied upon to explicitly teach all of change data.
However, Wolf et al. teach wherein the control circuitry is further configured to: based on the change data, determine a starting image frame and an ending image frame from one or more image frames of the image data, the change data including at least one of: (i) visibility of the object, (ii) movement of the object, or (iii) a detected size, shape, or count of the object ("some non-limiting examples of such event characteristics may include skill level associated with the event (such as minimal skill level required, skill level demonstrated, skill level of a medical care giver involved in the event, etc.), time associated with the event (such as start time, end time, etc.), type of the event, information related to medical instruments involved in the event, information related to anatomical structures involved in the event, information related to medical outcome associated with the event, one or more amounts (such as an amount of leak, amount of medication, amount of fluids, etc.), one or more dimensions (such as dimensions of anatomical structures, dimensions of incision, etc.), and so forth," paragraph [0167]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 6
 Regarding claim 6, Ummalaneni teaches the system of claim 1, as noted above.
Ummalaneni is not relied upon to explicitly teach all of timestamps.
However, Wolf et al. teach wherein the accessing the log repository further comprises:
filter the log repository to select the logs including the at least one command or the at least one state associated with the object ("Similar to with the phase tags, a user may select video footage based on event tags and event characteristics using search boxes 720 and 730, respectively. User interface 700 may also include dropdown buttons 722 and 732 to access dropdown lists and further filter the results," paragraph [0169]);
determine a timestamp from the selected logs ("Similar to with the phase tags, a user may select video footage based on event tags and event characteristics using search boxes 720 and 730, respectively. User interface 700 may also include dropdown buttons 722 and 732 to access dropdown lists and further filter the results," paragraph [0169]); and
determine a starting image frame for one or more image frames of the image data based on the timestamp ("The information may also include time information, such as a begin timestamp, an end timestamp, a duration, a timestamp range, or other information related to timing of the surgical footage," paragraph [0198]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 7
 Regarding claim 7, Ummalaneni teaches the system of claim 1, as noted above.
Ummalaneni is not relied upon to explicitly teach all of timestamps.
However, Wolf et al. teach wherein the control circuitry is further configured to:
select an image frame associated with a timestamp from one or more image frames of the image data ("The information may also include time information, such as a begin timestamp, an end timestamp, a duration, a timestamp range, or other information related to timing of the surgical footage," paragraph [0198]); and
index the image frame with the determined contextual information, the contextual information including at least one of: (i) a medical procedure, (ii) a phase of the medical procedure, (iii) a result of the medical procedure, (iv) a result of the phase of the medical procedure, (v) a visual state of the image data, (vi) visibility of the object , or (vii) a relative position of the object in relation to another object ("the historical data may include a machine learning model trained to identify portions of videos corresponding to surgical activity and/or portions of videos corresponding to non-surgical activity, for example based on the historical surgical footage," paragraph [0198]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 8
 Regarding claim 8, Ummalaneni teaches the system of claim 7, as noted above.
Ummalaneni is not relied upon to explicitly teach all of contextual information.
However, Wolf et al. teach the control circuitry is further configured to:
receive a selection query including the contextual information ("In this context, characteristic event may include any event commonly occurring within a particular stage of a surgical procedure, any event commonly suggesting a particular complication within a surgical procedure, or any event commonly occurring in response to a particular complication within a surgical procedure," paragraph [0352]); and
in response to the receiving of the selection query, provide the timestamp or the image frame ("Similar to with the phase tags, a user may select video footage based on event tags and event characteristics using search boxes 720 and 730, respectively. User interface 700 may also include dropdown buttons 722 and 732 to access dropdown lists and further filter the results," paragraph [0169] and "The information may also include time information, such as a begin timestamp, an end timestamp, a duration, a timestamp range, or other information related to timing of the surgical footage," paragraph [0198]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 9
 Regarding claim 9, Ummalaneni teaches the system of claim 1, wherein the commands include at least one of: insertion, retraction, LASER activation, articulation, basket open or closure, aspiration, irrigation, or puncture ("The robotic system's computer-based control system, based in the tower, bed and/or cart, may store computer program instructions, for example, within a non-transitory computer-readable storage medium such as a persistent magnetic storage drive, solid state drive, or the like, that, upon execution, cause the system to receive and analyze sensor data and user commands, generate control signals throughout the system, and display the navigational and localization data, such as the position of the instrument within the global coordinate system, anatomical map, etc," paragraph [0113]).
Claim 10
 Regarding claim 10, Ummalaneni teaches the system of claim 1, wherein the states include at least one of: kinematics, position, orientation, usage time, number of activations, protrusion length, number of stone retrievals, treatment time, articulation duration, blind driving, backflow, LASER fires or LASER misfires, or successful puncture ("The robotic system's computer-based control system, based in the tower, bed and/or cart, may store computer program instructions, for example, within a non-transitory computer-readable storage medium such as a persistent magnetic storage drive, solid state drive, or the like, that, upon execution, cause the system to receive and analyze sensor data and user commands, generate control signals throughout the system, and display the navigational and localization data, such as the position of the instrument within the global coordinate system, anatomical map, etc," paragraph [0113]).
Claim 11
 Regarding claim 11, Ummalaneni teaches the system of claim 1, wherein the control circuitry is further configured to: based on the determined contextual information, enable an operational functionality of the object ("the system 36 may also include a tower (not shown) that divides the functionality of system 36 between table and tower to reduce the form factor and bulk of the table. As in earlier disclosed embodiments, the tower may be provide a variety of support functionalities to table, such as processing, computing, and control capabilities, power, fluidics, and/or optical and sensor processing," paragraph [0082]).
Claim 12
 Regarding claim 12, Ummalaneni teaches the system of claim 1, as noted above.
Ummalaneni is not relied upon to explicitly teach all of contextual information.
However, Wolf et al. teach wherein the control circuitry is further configured to: cause a display to present a warning based on the determined contextual information ("Additionally or alternatively, the notification may be implemented as a warning signal ( e.g., light signal, audio signal, and the like)," paragraph [0514]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 13
 Regarding claim 13, Ummalaneni teaches the system of claim 1, wherein the log repository is configured to further store electromagnetic (EM) sensor data and the contextual information is generated based at least in part on the EM sensor data ("The navigation fusion system 400 includes a number of data repositories including depth features data repository 405, endoscope EM sensor data repository 415, registration data repository 475, model data repository 425, endoscope imaging data repository 480, navigation path data repository 445, and robotic position data repository 470," paragraph [0146]).
Claim 14
 Regarding claim 14, Ummalaneni teaches the system of claim 1, as noted above.
Ummalaneni is not relied upon to explicitly teach all of voice recordings.
However, Wolf et al. teach wherein the control circuitry is further configured to: 
access a voice recording captured by a recording device ("indicating the marker through a voice interface, indicating the marker with a gesture, or  undertaking any other action that causes the marker to be selected. Selection of the marker may thereby cause a display of the video to skip to a location associated with the selected marker," paragraph [0113] where  a voice interface teaches a voice recording);
converting the voice recording into text  ("indicating the marker through a voice interface, indicating the marker with a gesture, or  undertaking any other action that causes the marker to be selected. Selection of the marker may thereby cause a display of the video to skip to a location associated with the selected marker," paragraph [0113] where  a voice interface is within the interpretation of converting voice into text); and
indexing a first segment of the image data with a first segment of the text, the first segment associated with a timestamp  ("indicating the marker through a voice interface, indicating the marker with a gesture, or  undertaking any other action that causes the marker to be selected. Selection of the marker may thereby cause a display of the video to skip to a location associated with the selected marker," paragraph [0113] ).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 15
 Regarding claim 15, Ummalaneni teaches the system of claim 14, as noted above.
Ummalaneni is not relied upon to explicitly teach all of timestamps.
However, Wolf et al. teach the control circuitry is further configured to:
receive a selection query including the first segment of the text  ("In this context, characteristic event may include any event commonly occurring within a particular stage of a surgical procedure, any event commonly suggesting a particular complication within a surgical procedure, or any event commonly occurring in response to a particular complication within a surgical procedure," paragraph [0352]); and
in response to the receiving of the selection query, provide the timestamp or the first segment of the image data ("Similar to with the phase tags, a user may select video footage based on event tags and event characteristics using search boxes 720 and 730, respectively. User interface 700 may also include dropdown buttons 722 and 732 to access dropdown lists and further filter the results," paragraph [0169] and "The information may also include time information, such as a begin timestamp, an end timestamp, a duration, a timestamp range, or other information related to timing of the surgical footage," paragraph [0198]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 16
 Regarding claim 16, Ummalaneni teaches a method for extracting information of objects from image captured during a medical procedure ("In some implementations, the user 205 can both view data and input commands to the medical robotic system 110 using the integrated displays 202 and control modules," paragraph [0139]), the method comprising:
accessing image data representing a view within a luminal network, the image data accessed from an image repository configured to store the image data ("The navigation fusion system 400 includes a number of data repositories including depth features data repository 405, endoscope EM sensor data repository 415, registration data repository 475, model data repository 425, endoscope imaging data repository 480, navigation path data repository 445, and robotic position data repository 470," paragraph [0146]);
accessing commands and/or states associated with a medical tool configured to operate within the luminal network from a log repository ("The navigation fusion system 400 includes a number of data repositories including depth features data repository 405, endoscope EM sensor data repository 415, registration data repository 475, model data repository 425, endoscope imaging data repository 480, navigation path data repository 445, and robotic position data repository 470," paragraph [0146]);
generating change data representing changes of visual states of an object over a time period based at least in part on the image data ("The optical components move along with the tip of the endoscope 115 such that movement of the tip of the endoscope 115 results in corresponding changes to the field of view of the images captured by the imaging devices. The distal end of the endoscope 115 can be provided with one or more EM sensors 125 for tracking the position of the distal end within an EM field generated around the luminal network 140. The distal end of the endo scope 115 is further described with reference to FIG. 18 below," paragraph [0123] where sensors produce change data); and
determining logs including at least one command or at least one state associated with the medical tool over the time period  ("using the modeling system 420, data representing a number of images of a patient's anatomical luminal network can be analyzed to build a three-dimensional model of a virtual representation of the anatomical luminal network, and this virtual anatomical luminal network," paragraph [0147] where a modeling system has a log repository and state associated with the three-dimensional model).
Ummalaneni is not relied upon to explicitly teach all of contextual information.
However, Wolf et al. teach generating contextual information associated with the object based at least in part on (i) the change data and (ii) the at least one command or the at least one state associated with the medical tool ("For example, the information may provide context that is useful in determining which frames of the particular surgical footage are associated with intraoperative events and/or surgical activity. In some embodiments, distinguishing in the particular surgical footage the first group of frames from the second group of frames may involve the use of a machine learning algorithm," paragraph [0201]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 17
 Regarding claim 17, Ummalaneni teaches the method of claim 16, as noted above.
Ummalaneni is not relied upon to explicitly teach all of timestamps.
However, Wolf et al. teach further comprising:
filtering the log repository to select the logs including the at least one command or the at least one state associated with the medical tool ("Similar to with the phase tags, a user may select video footage based on event tags and event characteristics using search boxes 720 and 730, respectively. User interface 700 may also include dropdown buttons 722 and 732 to access dropdown lists and further filter the results," paragraph [0169]);
determining a timestamp from the selected logs ("Similar to with the phase tags, a user may select video footage based on event tags and event characteristics using search boxes 720 and 730, respectively. User interface 700 may also include dropdown buttons 722 and 732 to access dropdown lists and further filter the results," paragraph [0169]); and
determining a starting image frame for one or more image frames of the image data based on the timestamp ("The information may also include time information, such as a begin timestamp, an end timestamp, a duration, a timestamp range, or other information related to timing of the surgical footage," paragraph [0198]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 18
 Regarding claim 18, Ummalaneni teaches the method of claim 16, as noted above.
Ummalaneni is not relied upon to explicitly teach all of timestamps.
However, Wolf et al. teach further comprising:
selecting an image frame associated with a timestamp from one or more image frames of the image data  ("The information may also include time information, such as a begin timestamp, an end timestamp, a duration, a timestamp range, or other information related to timing of the surgical footage," paragraph [0198]); and
indexing the image frame with the determined contextual information, the contextual information including at least one of: (i) the at least one command or (ii) the at least one state, the at least one command or the at least one state associated with the medical tool ("the historical data may include a machine learning model trained to identify portions of videos corresponding to surgical activity and/or portions of videos corresponding to non-surgical activity, for example based on the historical surgical footage," paragraph [0198]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 19
 Regarding claim 19, Ummalaneni teaches the method of claim 18, as noted above.
Ummalaneni is not relied upon to explicitly teach all of contextual information.
However, Wolf et al. teach further comprising:
receiving a selection query including the contextual information ("In this context, characteristic event may include any event commonly occurring within a particular stage of a surgical procedure, any event commonly suggesting a particular complication within a surgical procedure, or any event commonly occurring in response to a particular complication within a surgical procedure," paragraph [0352]); and
in response to the receiving of the selection query, providing the timestamp or the image frame ("Similar to with the phase tags, a user may select video footage based on event tags and event characteristics using search boxes 720 and 730, respectively. User interface 700 may also include dropdown buttons 722 and 732 to access dropdown lists and further filter the results," paragraph [0169] and "The information may also include time information, such as a begin timestamp, an end timestamp, a duration, a timestamp range, or other information related to timing of the surgical footage," paragraph [0198]).
Ummalaneni and Wolf et al. are combined as per claim 1.
Claim 20
 Regarding claim 20, Ummalaneni teaches a system for determining metrics and events of objects from image captured during a medical procedure ("In some implementations, the user 205 can both view data and input commands to the medical robotic system 110 using the integrated displays 202 and control modules," paragraph [0139]), the system comprising:
control circuitry ("The navigation controller 460 can automatically control the instrument according to the determined next movement in some embodiments," paragraph [0174]) communicatively coupled to  
(i) an image repository configured to store image data representing views within a luminal network, the views captured with an imaging device ("The navigation fusion system 400 includes a number of data repositories including depth features data repository 405, endoscope EM sensor data repository 415, registration data repository 475, model data repository 425, endoscope imaging data repository 480, navigation path data repository 445, and robotic position data repository 470," paragraph [0146]), and
(ii) a log repository configured to store data from sensors other than the imaging device ("The navigation fusion system 400 includes a number of data repositories including depth features data repository 405, endoscope EM sensor data repository 415, registration data repository 475, model data repository 425, endoscope imaging data repository 480, navigation path data repository 445, and robotic position data repository 470," paragraph [0146]), the control circuitry configured to:
generate change data representing changes of visual states of an object over a time period based at least in part on the image data ("The optical components move along with the tip of the endoscope 115 such that movement of the tip of the endoscope 115 results in corresponding changes to the field of view of the images captured by the imaging devices. The distal end of the endoscope 115 can be provided with one or more EM sensors 125 for tracking the position of the distal end within an EM field generated around the luminal network 140. The distal end of the endo scope 115 is further described with reference to FIG. 18 below," paragraph [0123] where sensors produce change data); and
access the log repository to determine logs including sensor data associated with the object over the time period ("using the modeling system 420, data representing a number of images of a patient's anatomical luminal network can be analyzed to build a three-dimensional model of a virtual representation of the anatomical luminal network, and this virtual anatomical luminal network," paragraph [0147] where a modeling system has a log repository and state associated with the three-dimensional model).
Ummalaneni is not relied upon to explicitly teach all of contextual information.
However, Wolf et al. teach determine metrics and events associated with the object based at least in part on (i) the change data and (ii) the sensor data associated with the object ("For example, the information may provide context that is useful in determining which frames of the particular surgical footage are associated with intraoperative events and/or surgical activity. In some embodiments, distinguishing in the particular surgical footage the first group of frames from the second group of frames may involve the use of a machine learning algorithm," paragraph [0201]). 
Ummalaneni and Wolf et al. are combined as per claim 1.


Reference Cited
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
US Patent Publication 2020 0297444 A1 to Camarillo et al. discloses localizing and/or navigating a medical instrument within a luminal network. A medical system can include an elongate body configured to be inserted into the luminal network, as well as an imaging device positioned on a distal portion of the elongate body. The system may include memory and processors configured to receive from the imaging device image data that includes an image captured when the elongate body is within the luminal network.
US Patent Publication 2018 0144186 A1 to Wnuk et al. discloses an activity recognition system is disclosed. A plurality of temporal features is generated from a digital representation of an observed activity using a feature detection algorithm. An observed activity graph comprising one or more clusters of temporal features generated from the digital representation is established, wherein each one of the one or more clusters of temporal features defines a node of the observed activity graph.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HEATH E WELLS whose telephone number is (703)756-4696. The examiner can normally be reached Monday-Friday 8:00-4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ms. Jennifer Mehmood can be reached on 571-272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Heath E. Wells/Examiner, Art Unit 2664


Date: 10 February 2026
Read full office action
Prosecution Timeline

Mar 07, 2024
Application Filed
Feb 10, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/232,212
Patent 12602755
DEEP LEARNING-BASED HIGH RESOLUTION IMAGE INPAINTING
2y 5m to grant Granted Apr 14, 2026
17/783,931
Patent 12597226
METHOD AND SYSTEM FOR AUTOMATED PLANT IMAGE LABELING
2y 5m to grant Granted Apr 07, 2026
17/620,452
Patent 12591979
IMAGE GENERATION METHOD AND DEVICE
2y 5m to grant Granted Mar 31, 2026
17/828,545
Patent 12588876
TARGET AREA DETERMINATION METHOD AND MEDICAL IMAGING SYSTEM
2y 5m to grant Granted Mar 31, 2026
17/991,910
Patent 12586363
GENERATION OF PLURAL IMAGES HAVING M-BIT DEPTH PER PIXEL BY CLIPPING M-BIT SEGMENTS FROM MUTUALLY DIFFERENT POSITIONS IN IMAGE HAVING N-BIT DEPTH PER PIXEL
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
93%
With Interview (+18.1%)
3y 5m
Median Time to Grant
Low
PTA Risk
Based on 77 resolved cases by this examiner. Grant probability derived from career allow rate.