Last updated: April 19, 2026
Application No. 17/525,524
APPARATUS AND METHOD FOR PROVIDING SURGICAL ENVIRONMENT BASED ON A VIRTUAL REALITY

Final Rejection §101§103
Filed
Nov 12, 2021
Examiner
CHAVEZ, RENEE D
Art Unit
2186
Tech Center
2100 — Computer Architecture & Software
Assignee
Hutom Co. Ltd.
OA Round
2 (Final)
Interview Optional

— +12.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 370 resolved cases, 2023–2026
Examiner Intelligence

CHAVEZ, RENEE D View full profile →
Grants 69% — above average
Career Allow Rate
254 granted / 370 resolved
+13.6% vs TC avg
Moderate +13% lift
Without
With
+12.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
59 currently pending
Career history
429
Total Applications
across all art units
Statute-Specific Performance

§101
11.4%
-28.6% vs TC avg
§103
44.4%
+4.4% vs TC avg
§102
21.5%
-18.5% vs TC avg
§112
18.6%
-21.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 370 resolved cases
Office Action

§101 §103
DETAILED ACTION
A summary of this action:
Claims 1-17 have been presented for examination.
This action is non-Final.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections - Minor Informalities
The following Claims are objected to because of the following informalities:
Claim 1
the virtual surgical tool should be “the virtual reality-based virtual surgical tool”
Claim 2
the performing of the correspondence matching should be “a performing of the correspondence”
Claim 5
a difference should be “the difference”




Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
a video acquisition unit in claim 10
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
	For the purposes of examination of the claim limitations, the Examiner will be interpreting the hardware structure associated with "video acquisition unit" as in independent claim 10, as an “apparatus” containing a memory 120, and a processor 130 described in the specification in [Figure 1] and paragraphs [0043-0044]. 
The corresponding algorithm of the “video acquisition unit that receives ...”  is noted in MPEP 2181 (II) (A), where "Clearly, a unit which receives digital data, performs complex mathematical computations and outputs the results to a display must be implemented by or on a general or special purpose computer." This is to be the structure and algorithm required for the claim, or equivalents thereof. For, the remaining units, the specification is devoid of an algorithm to perform the claimed functions.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim does not fall within at least one of the four categories of patent eligible subject matter because the broadest reasonable interpretation (BRI) of the “computer program stored in a computer-readable recording medium” encompasses signals per se. The specification does not include a special definition nor does it limit the mediums to only non-transitory. Section 2106.03(I) of the MPEP states claims directed to transitory forms of signal transmission (signals per se) are not directed to any of the statutory categories of invention. Therefore, the claims are not directed to a patent eligible category of invention. Examiner suggests amending the claim to recite “a non-transitory computer readable medium to overcome this rejection.
Accordingly, Claim 17 fails to recite statutory subject matter under 35 U.S.C. 101. Claims 1-17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea of a mental process or mathematical concept without significantly more. 
Step 1: Claims 1-9 are directed to a computer-implemented method, which is a process and is a statutory category invention. Claims 10-16 are directed to an apparatus, which is a manufacturer and is a statutory category invention. Claim 17 is directed to a computer program stored in a computer-readable recording medium, which is directed to signals per se as discussed above. Even though this is not a statutory category of invention, in the interest of compact prosecution, the analysis of claim 17 will continue below. Therefore, claims 1-9 and 10-16 are directed to patent eligible categories of invention and claim 17 is not directed to a statutory claimed invention. 

Claim 1
Step 2A, Prong 1: Independent claims 1 and 10 similarly recites an abstract idea because the claims are derived from Mental Processes based on concepts performed in the human mind or with the aid of pencil and paper or in the alternative cover Mathematical Concepts including mathematical relationships, mathematical formulas or equations, or mathematical calculations.
The limitation recognizing at least one actual surgical tool included in an actual surgical video for each preset frame in the actual surgical video based on a first artificial intelligence model, cover mental processes including identifying at least one actual surgical tool included in an actual surgical video for each preset frame in the actual surgical video based on a first artificial intelligence model, as described [0009] of the specification.
The limitation performing correspondence matching on at least one identical portion of the actual surgical tool and at least one virtual reality-based virtual surgical tool corresponding to the actual surgical tool cover mental processes including comparing at least one identical portion correspondence of an actual surgical tool as described [0019] of the specification.
The limitation performing calibration such that the virtual surgical tool corresponds to a position of the actual surgical tool according to a result of the correspondence matching cover mental processes including calibrating a dataset that corresponds to a position of the actual surgical tool based on the matching correspondence results as described in [0009] of the specification. 
The limitation calculating calibrated coordinate values of the virtual surgical tool cover mathematical concepts, using mathematical formulas, equations, or calculations, including calculating a first coordinate values (X_ v 1, Y v 1, and Z_ v 1) of the virtual surgical tool in the virtual reality-based 1_1-th frame as described in [0073] of the specification and [Figure 7] of the drawings.
The limitation calculating a plurality of coordinate values for positions which the virtual surgical tool is calibrated for each preset frame by repeatedly performing steps a) to d) cover mathematical concepts including calculating a plurality of coordinate values for positions,  using mathematical formulas, equations, or calculations, at which the virtual surgical tool is calibrated for each preset frame by repeatedly performing steps A to D as described in [0075] of the specification and [Labels S701 to S705 of Figure 7] of the drawings.
The limitation generating a plurality of pieces of log information based on a difference between the calculated plurality of coordinate values cover mental processes including evaluating a dataset that considers pieces of log information and making a determination on the differences between the calculated plurality of coordinated values, as described in [0076] of the specification and [label S706 of Figure 7] of the drawings.
Thus, the claims recite the abstract idea of a mental process performed in the human mind, or with the aid of pencil and paper.
Dependent claims 2-9 and 11-17 further narrow the abstract ideas, identified in the independent claims. See analysis below.
	Step 2A, Prong 2: The judicial exception is not integrated into a practical application. Claim 10 recites the additional element “a video acquisition unit,” “a memory," “a processor,” as in independent claim 10, "a stereoscopic camera" as in dependent claims 7 and 15, “a computer program” as in dependent claim 17, and “a computer-readable recording medium” as in dependent claim 17, this limitation does not integrate the judicial exception into a practical application because it is nothing more than generally linking the use of the judicial exception to a particular technological environment. See MPEP 2106.05(h). Alternatively, this additional element merely uses a computer device as a tool to perform the abstract idea. (MPEP 2106.05(f)). 
The limitation of execute the method for providing a virtual reality-based surgical environment in the apparatus in cooperation with a computer in claim 1, in claim 17, can be viewed as merely use a computer as a tool to perform the abstract idea. (MPEP 2106.05(f)). Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a mental process or a mathematical concept) does not integrate a judicial exception into a practical application. (MPEP 2106.05(f)(2)).
Dependent claims 2-9, and 11-17 further narrow the abstract ideas, identified in the independent claims, and do not introduce further additional elements for consideration beyond those addressed above. The additional elements have been considered both individually and as an ordered combination in to determine whether they integrate the exception into a practical application. Therefore, the dependent claims do not integrate the claimed invention into a practical application.
Step 2B: The claims do not amount to significantly more. The judicial exception does not amount to significantly more. Claim 10 recites the additional element “a video acquisition unit,” “a memory," “a processor,” as in independent claim 10, "a stereoscopic camera" as in dependent claims 7 and 15, “a computer program” as in dependent claim 17, and “a computer-readable recording medium” as in dependent claim 17, this limitation does not amount to significantly more because it is nothing more than generally linking the use of the judicial exception to a particular technological environment. See MPEP 2106.05(h). Alternatively, this additional element merely uses a computer device as a tool to perform the abstract idea. (MPEP 2106.05(f)). 
The limitation of execute the method for providing a virtual reality-based surgical environment in the apparatus in cooperation with a computer in claim 1, in claim 17, can be viewed as merely use a computer as a tool to perform the abstract idea. (MPEP 2106.05(f)). Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a mental process or a mathematical concept) does not amount to significantly more. (MPEP 2106.05(f)(2)).
Dependent claims 2-9, and 11-17 further narrow the abstract ideas, identified in the independent claims, and do not amount to significantly more. The additional elements have been considered both individually and as an ordered combination in to determine whether they amount to significantly more. Therefore, the dependent claims do not amount to significantly more.
Therefore, the claims as a whole do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements, when considered alone or in combination, do not amount to significantly more than the judicial exception. 
As stated in Section I.B. of the December 16, 2014 101 Examination Guidelines, “[t]o be patent-eligible, a claim that is directed to a judicial exception must include additional features to ensure that the claim describes a process or product that applies the exception in a meaningful way, such that it is more than a drafting effort designed to monopolize the exception.”
The dependent claims include the same abstract ideas recited as recited in the independent claims, and merely incorporate additional details that narrow the abstract ideas and fail to add significantly more to the claims.
Dependent claims 2, and 11 similarly recite “setting at least one region of the actual surgical tool as a first reference point; setting a region identical to the at least one region of the actual surgical tool in the virtual surgical tool as a second reference point by using a semantic correspondence matching technique; and performing the correspondence matching using the first and second reference points,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes.”
Dependent claims 3 and 12 similarly recite “wherein the generating of the log information includes sequentially generating pieces of log information about a difference between a previously-calculated coordinate value and a subsequently-calculated coordinate value whenever the plurality of coordinate values are calculated,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes.” 
Dependent claims 4 recites “accumulating and storing the sequentially generated pieces of log information,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes.”
Dependent claims 5 recites “wherein the accumulating and storing of the sequentially generated pieces of log information includes accumulating and storing currently generated log information only when a difference between the sequentially generated pieces of log information is equal to or greater than a preset difference,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes.”
Dependent claims 6 and 14 similarly recite “predicting a movement of the actual surgical tool changed from a current frame to a next frame in the actual surgical video based on the accumulated and stored log information; and displaying a visual effect representing the predicted movement at a position corresponding to the movement on the current frame in the actual surgical video,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes.”
Dependent claims 7 and 15 similarly recite “wherein the actual surgical video is taken through a stereoscopic camera and includes a three-dimensional depth value for an actual surgical object for each frame, and wherein the performing of the calibration includes rendering the virtual surgical tool as a three-dimensional object by assigning a corresponding three- dimensional depth value to the calibrated position of the virtual surgical tool,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes.”
Dependent claims 8 recites “wherein the calculating of the coordinate values includes including the three-dimensional depth value in coordinate values corresponding to the calibrated position of the virtual surgical tool after the three- dimensional depth value is assigned,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes.”
Dependent claims 9 recites “wherein the generating of the log information includes generating a plurality of pieces of log information to which the three- dimensional depth value is assigned based on a difference between a plurality of coordinate values calculated by including the three-dimensional depth value,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes.”
Dependent claims 13 recites “wherein the processor is configured to: accumulate and store the sequentially generated log information, wherein the processor is configured to: accumulate and store the currently generated log information only when a difference between the sequentially generated log information is equal to or greater than a preset difference when the log information is accumulated and stored,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes.”
Dependent claims 16 recites “wherein the processor is configured to: include the three-dimensional depth value in coordinate values corresponding to the calibrated position of the virtual surgical tool after the three-dimensional depth value is assigned when the coordinate values are calculated, and generate a plurality of pieces of log information to which the three- dimensional depth value is assigned based on a difference between a plurality of coordinate values calculated by including the three-dimensional depth value when the log information is generated,” which further narrows the abstract idea identified in the independent claim, which is directed to “Mental Processes” or in the alternative cover “Mathematical Concepts.”
Accordingly, claims 1-17 are ineligible and rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without anything significantly more.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-4, 6-12, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over CHOW (US 20210015554 A1), herein CHOW, in view of ZHAO (US 8073528 B2), herein ZHAO.

Claim 1


Claim 1 is rejected because CHOW teaches recognizing at least one actual surgical tool included in an actual surgical video for each preset frame in the actual surgical video based on a first artificial intelligence model; CHOW ([Abstract] “The machine-learning techniques can be executed to train the machine-learning models (first artificial intelligence model) to recognize, classify, and interpret objects (surgical tools) within a live video feed (actual surgical video).”) See also CHOW ([0003] “The machine-learning model is trained, and thus, is configured to recognize patterns or classify objects within image frames (each preset frame in the actual surgical video) of the live video feed.”)
CHOW also teaches generating a plurality of pieces of log information based on a difference between the calculated plurality of coordinate values CHOW ([0049] “In some implementations, video streams from a previous surgical procedure can be processed ( e.g., using image-segmentation) to identify, detect, and determine probabilities of a surgical procedure. The video streams can be annotated to include information relevant to different portions of the surgical procedure to generate surgical data structures (generating a plurality of pieces of log information). For example, a video stream from an endoscopic procedure can be segmented to identify surgical tools used during the procedure. A surgical data structure can be generated by using training data with pixel-level labels (i.e., full supervision) from the segmented endoscopic procedure video stream generating a plurality of pieces of log information). In some implementations, generating a surgical data structure can be produced using other methods. For example, a video stream from an endoscopic procedure can be processed to detect instruments by using three different processes: identification ( e.g., identifying which instrument is present in the image), bounding box regression ( e.g., localizing each instrument in the image by finding a bounding box that encloses them) (based on a difference between the calculated plurality of coordinate values), and heat map regression (e.g., probability maps of where instruments might be present). This information can be compiled to generate a surgical data structure.”) 
CHOW does not explicitly teach CHOW also teaches performing correspondence matching on at least one identical portion of the actual surgical tool and at least one virtual reality-based virtual surgical tool corresponding to the actual surgical tool.
However, ZHAO teaches performing correspondence matching on at least one identical portion of the actual surgical tool and at least one virtual reality-based virtual surgical tool corresponding to the actual surgical tool ZHAO ([Column 29 | Rows 58-65] “A rigid sequence matching (correspondence matching) where the relative kinematics (actual surgical tool) within each sequence are perfectly known or identical across sequences in the ideal case may be employed where just one common motion parameter (at least one identical portion of the actual surgical tool) is used to estimate for all pairs between two sequences (actual surgical tool and one virtual reality-based surgical tool corresponding to the actual surgical tool) such as illustrated in FIG. 13A. The sequences 1301 and 1302 have identical motion relationship (relative kinematics) among the images 1301A to 1301E, and 1302A to 1302E.”) See also CHOW ([FIG. 13].)

    PNG
    media_image1.png
    594
    1003
    media_image1.png
    Greyscale

ZHAO Figure 13 Reference
ZHAO also teaches performing calibration such that the virtual surgical tool corresponds to a position of the actual surgical tool according to a result of the correspondence matching ZHAO ([Column 31 Line 67 to Column 32 | Lines 1-18 ] “The two dimensional ultrasound images 1711A may be translated from the two dimensional coordinate system 1703 into a camera coordinate system 1701. The translated ultrasound images may then be overlaid onto video images of the surgical site 1700 displayed by the stereo viewer 312, such as illustrated by the translated ultrasound images 1711B-1711D in the surgical site 1700 illustrated in FIG. 17. Tool tracking may be used to flagpole ultrasound by (1) determining the transformation (performing the calibration) of the ultrasound (virtual surgical tool) images 1711A from the two dimensional coordinate system 1703 to the local ultrasound coordinate system 1702 (corresponds to a position of the actual surgical tool) in response to ultrasound calibration (according to aa result of the correspondence matching); (2) at the transducer 171 0A, determining the transformation from the ultrasound transducer coordinate system 1702 to the camera coordinate system 1701 by using tool tracking; and then; (3) cascading the transformations together to overlay the ultrasound image in the camera coordinate system 1701 onto the surgical site as illustrated by image 1711B.”) See also ZHAO ([Figure 17].)

    PNG
    media_image2.png
    700
    981
    media_image2.png
    Greyscale

ZHAO Figure 17 Reference 
ZHAO also teaches calculating calibrated coordinate values of the virtual surgical tool ZHAO ([Column 3 | Lines 7-13] “Known kinematics transformation can be applied to the pose correction to achieve improved pose in any related coordinate system. A camera coordinate system is a coordinate system based on a chosen camera (for example, (X', Y', Z') in FIG. 12B), or a common reference coordinate system for multiple cameras (for example, CXs, Y s, Zs) in FIG. 12B).”) See also ZHAO ([Column 9 | Lines 6-15] “A marker 502 on the robotic instrument 101 may be used to assist in the tool tracking if visible or otherwise sensible. In one embodiment of the invention, the marker 502 is a painted marker minimally altering the robotic instruments. In other embodiments of the invention, markerless tool tracking is provided with no modification of the robotic instruments. For example, natural image features of a robotic tool may be detected as natural markers and/or image appearance of the tools and the CAD model of tools may be used to provide tool tracking.”) See also ZHAO ([Figure 5A] illustrates a robotic instrument (virtual surgical tool) that tracks (calculates) images via a camera coordinated systems (calibrated coordinate values of images). 

    PNG
    media_image3.png
    469
    824
    media_image3.png
    Greyscale

ZHAO Figure 5A Reference 
ZHAO also teaches calculating a plurality of coordinate values for positions which the virtual surgical tool is calibrated for each preset frame by repeatedly performing steps a) to d) ZHAO ([Column 22 | Lines 37-54) “In such a case, the observation equation is linear for the vision part and we can construct the observation covariance matrix C0o,v,i. For example, we have the following covariance matrix for the case of parallel camera setup (FIG. 12B): where the view-geometry variance matrices Co,v,i (for each 50 3D point Xs,i) are related to 1) the uncertainty (standard deviation) of matching stereo images, 2) the inverse of image resolution (for example, high-definition camera offers better accuracy than standard-definition camera), and 3) the square of the true values of Xs,i.”) See also ZHAO ([Figure 12B] and [Equation 13].)

    PNG
    media_image4.png
    693
    1026
    media_image4.png
    Greyscale

ZHAO Figure 12 B Reference

    PNG
    media_image5.png
    197
    688
    media_image5.png
    Greyscale

ZHAO Equation 13 Reference
It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the teachings of ZHAO with CHOW as the references deal with a method of providing surgical environment based on a virtual reality, and more particularly, to an apparatus and method for providing surgical environment based on a virtual reality by figuring out the movement of a surgical tool in a surgical video. ZHAO would modify CHOW by performing correspondence matching on at least one identical portion of the actual surgical tool and at least one virtual reality-based virtual surgical tool corresponding to the actual surgical tool. The benefits of doing so provide improved visualization for easier on-line diagnostics and improved localization for reliable and precise surgery. ZHAO ([Column 31 | Lines 26-28].) Accordingly, claim 1 is rejected based on the combination of these references.

Claim 2

Claim 2 is rejected because the combination of CHOW and ZHAO teach claim 1. CHOW does not explicitly teach wherein the performing of the correspondence matching includes setting at least one region of the actual surgical tool as a first reference point; setting a region identical to the at least one region of the actual surgical tool in the virtual surgical tool as a second reference point by using a semantic correspondence matching technique; and performing the correspondence matching using the first and second reference points.
However, ZHAO teaches wherein the performing of the correspondence matching includes setting at least one region of the actual surgical tool as a first reference point; setting a region identical to the at least one region of the actual surgical tool in the virtual surgical tool as a second reference point by using a semantic correspondence matching technique; and performing the correspondence matching using the first and second reference points ZHAO ([Column 32 | Lines 31-45] “As shown in FIG.16, tool tracking maybe used overlay one or more drop virtual point/marks 1650A-1650B (first and second reference points) on images of the tissue surface 1600 in the surgical site (one region) by using one or multiple tools 1610L, 1610R (for example, a surgical tool or an ultrasound tool) to touch point of interest (identical region). For example, in telestration operation (using a semantic correspondence matching technique), teaching surgeons can use tools (actual surgical tools in the virtual surgical tool) to draw virtual marks (performing the correspondence matching using the first and second reference points) to illustrate areas of interest  to remote student surgeons on an external display. Another example is that surgeon can use one type of tracked tool (e.g., an ultrasound tool) to draw marks to indicate regions of interest and then use a different type of tracked tool (e.g., a cautery tool) to operate or perform a surgical or other medical procedure in selected regions of interest.”) See also ZHAO ([Figure 16].)

    PNG
    media_image6.png
    462
    718
    media_image6.png
    Greyscale

ZHAO Figure 16 Reference 
See also ZHAO ([Column 17 | Lines 30-35] “The tool pose may be represented by the position P=[Px, Py,Pz] r of a chosen reference point 931R (setting at least one region of the actual tool as a first reference point) (e.g., the control point before the tool wrist) and the orientation Q of its local coordinate system 920 originated in the reference point 931R with respect to the camera coordinate system 921.”)  See also ZHAO ([Figure 9C].)

    PNG
    media_image7.png
    660
    966
    media_image7.png
    Greyscale

ZHAO Figure 9C Reference

It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the teachings of ZHAO with CHOW as the references deal with a method of providing surgical environment based on a virtual reality, and more particularly, to an apparatus and method for providing surgical environment based on a virtual reality by figuring out the movement of a surgical tool in a surgical video. ZHAO would modify CHOW wherein the performing of the correspondence matching includes setting at least one region of the actual surgical tool as a first reference point; setting a region identical to the at least one region of the actual surgical tool in the virtual surgical tool as a second reference point by using a semantic correspondence matching technique; and performing the correspondence matching using the first and second reference points. The benefits of doing so provide improved visualization for easier on-line diagnostics and improved localization for reliable and precise surgery. ZHAO ([Column 31 | Lines 26-28].) Accordingly, claim 2 is rejected based on the combination of these references. 

Claim 3

Claim 3 is rejected because the combination of CHOW and ZHAO teach claim 2. CHOW does not explicitly teach wherein the generating of the log information includes sequentially generating pieces of log information about a difference between a previously-calculated coordinate value and a subsequently-calculated coordinate value whenever the plurality of coordinate values are calculated. 
However, ZHAO teaches wherein the generating of the log information includes sequentially generating pieces of log information about a difference between a previously-calculated coordinate value and a subsequently-calculated coordinate value whenever the plurality of coordinate values are calculated ZHAO ([Column 14 | Lines 15-30] “Consider now for example, pure image segmentation 710, i.e., segmentation of tools from a 2D image only, is a challenging task when the background is cluttered and/or objects are occluded. To handle this particular technical challenge, prior information (previously-calculated coordinate values) is explored as a known robotic instrument is being tracked. More specifically, model based synthesis techniques 722 may be used. With model based synthesis 722, a CAD model of a robotic instrument may be used to render a clean tool image (sequentially generating pieces of log information) as a pattern to match against or search within a limited region constrained by the pose information of tool. As a result, pure image segmentation from the real images is avoided. Because the states of all robotic instruments are tracked, mutual occlusions of all these robotic instruments can be calculated thereby making image matching more reliable (subsequently-calculated coordinate values).”) See also ZHAO ([Figure 7].)

    PNG
    media_image8.png
    449
    693
    media_image8.png
    Greyscale

ZHAO Figure 7 Reference 
It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the teachings of ZHAO with CHOW as the references deal with a method of providing surgical environment based on a virtual reality, and more particularly, to an apparatus and method for providing surgical environment based on a virtual reality by figuring out the movement of a surgical tool in a surgical video. ZHAO would modify CHOW wherein the generating of the log information includes sequentially generating pieces of log information about a difference between a previously-calculated coordinate value and a subsequently-calculated coordinate value whenever the plurality of coordinate values are calculated. The benefits of doing so provide improved visualization for easier on-line diagnostics and improved localization for reliable and precise surgery. ZHAO ([Column 31 | Lines 26-28].) Accordingly, claim 3 is rejected based on the combination of these references. 

Claim 4 

Claim 4 is rejected because the combination of CHOW and ZHAO teach claim 3. CHOW also teaches accumulating and storing the sequentially generated pieces of log information CHOW ([0005] “In some implementations, the computer-vision processing system can receive (accumulate) the one or more data streams (sequentially generated pieces of log information) and input (store) the received (accumulating) data stream(s) into the machine-learning model. The computer-vision processing system can train the machine-learning model using machine-learning or artificial intelligence techniques (described in greater detail herein). For example, the computer-vision processing system can store a data set of sample images of surgical tools. The machine-learning or artificial intelligence techniques can be applied to the data set of sample images to train the machine-learning model to recognize patterns and classify objects within the images. The trained machine-learning model can then be used to generate an output that, when received at a procedural control system, can cause one or more surgical tools to be controlled.”) Accordingly, claim 4 is rejected based on the combination of these references.

Claim 6 

Claim 6 is rejected because the combination of CHOW and ZHAO teach claim 4. CHOW also teaches predicting a movement of the actual surgical tool changed from a current frame to a next frame in the actual surgical video based on the accumulated and stored log information CHAO ([0036] “The trained machine-learning model can then be used in real-time (current frame to a next frame in the actual surgical video) to process one or more data streams (e.g., video streams, audio streams, image data, haptic feedback streams from a laparoscopic surgical tool, etc.). The processing can include (for example) recognizing and classifying one or more features from the one or more data streams (based on the accumulated and stored log information), which can be used to interpret whether or not a surgical tool is within the field of view of the camera. Further, the feature(s) can then be used to identify a presence, position and/or use of one or more objects (e.g., surgical tool or anatomical structure), identify a stage or phase within a workflow ( e.g., as represented via a surgical data structure), predict a future stage (predicting a movement of the actual surgical tool changed) within a workflow, and other suitable features.”) 
See also CHOW ([0047] “State detector 150 can use the output from execution of the configured machine-learning model to identify a state within a surgical procedure that is then estimated (predict a movement of the actual surgical tool changed) to correspond with the processed image data (from a current frame to a next frame in the actual surgical video). Procedural tracking data structure 155 can identify a set of potential states (predict a movement of the actual surgical tool changed) that can correspond to part of a performance of a specific type of procedure (from a current frame to a next frame in the actual surgical video). Different procedural data structures ( e.g., and different machine-learning-model parameters and/or hyperparameters) may be associated with different types of procedures. The data structure can include a set of nodes, with each node corresponding to a potential state. The data structure can include directional connections between nodes that indicate (via the direction) an expected order (predict a movement of the actual surgical tool changed) during which the states will be encountered (from a current frame to a next frame in the actual surgical video) throughout an iteration of the procedure. The data structure may include one or more branching nodes that feeds to multiple next nodes and/or can include one or more points of divergence and/or convergence between the nodes. In some instances, a procedural state indicates a procedural action ( e.g., surgical action) that is being performed or has been performed and/or indicates a combination of actions that have been performed.”)	
CHOW also teaches displaying a visual effect representing the predicted movement at a position corresponding to the movement on the current frame in the actual surgical video CHOW ([0052] “Output generator 160 can also include an augmentor 175 that generates or retrieves one or more graphics and/or text (a visual effect) to be visually presented (displaying) on (e.g., overlaid on) or near ( e.g., presented underneath or adjacent to) (representing the predicted movement to a position) real-time capture (on the current frame in the actual surgical video) of a procedure. Augmentor 175 can further identify where the graphics and/or text (visual effect) are to be presented (displaying) (e.g., within a specified size of a display). In some instances, a defined part of a field of view (visual effect) is designated as being a display portion (displaying) to include augmented data (predicted movement at a position corresponding to the movement on the current frame in the actual surgical video). In some instances, the position (predicted movement at a position) of the graphics and/or text (visual effect) is defined (displaying) so as not to obscure view of an important part of an environment for the surgery and/or to overlay particular graphics (e.g., of a tool) with the corresponding real-world representation (corresponding to the movement on the current frame in the actual surgical video).”) Accordingly, claim 6 is rejected based on the combination of these references.

Claim 7 


Claim 7 is rejected because the combination of CHOW and ZHAO teach claim 1. CHOW does not explicitly teach wherein the actual surgical video is taken through a stereoscopic camera and includes a three-dimensional value for an actual surgical object for each frame, and wherein the performing of the calibration includes rendering the virtual surgical tool as a three-dimensional object by assigning a corresponding three- dimensional depth value to the calibrated position of the virtual surgical tool.
However, ZHAO teaches wherein the actual surgical video is taken through a stereoscopic camera and includes a three-dimensional depth value for an actual surgical object for each frame ZHAO ([Column 5 | Lines 58-65] “The stereo viewer 312 (stereoscopic camera) has two displays where stereo three dimensional images (includes a three-dimensional depth value) of the surgical site (for an actual surgical object for each frame) may be viewed to perform minimally invasive surgery. When using the master control console, the operator O typically sits in a chair, moves his or her head into alignment with the stereo viewer 312 (stereoscopic camera) to view the three-dimensional annotated images (three-dimensional depth value) of the surgical site (actual surgical object for each frame).”) 
ZHAO also teaches wherein the performing of the calibration includes rendering the virtual surgical tool as a three-dimensional object by assigning a corresponding three- dimensional depth value to the calibrated position of the virtual surgical tool ZHAO ([Column 5 | Lines 6-23] “Referring now to FIG. 2, the stereo endoscopic camera 101C includes an endoscope 202 for insertion into a patient, a camera head 204, a left image forming device ( e.g., a charge coupled device (CCD)) 206L, a right image forming device 206R, a left camera control unit (CCU) 208L, and a right camera control unit (CCU) 208R coupled together as shown. The stereo endoscopic camera 101C generates (renders the virtual surgical tool) a left video channel 220L and a right video channel 220R of frames of images (three-dimensional depth values) of the surgical site (object) coupled to a stereo display device 164 through a video board 218. To initially synchronize left and right frames of data (three-dimensional depth values), a lock reference signal (assigning a corresponding three-dimensional depth value) is coupled between the left and right camera control units 208L,208R. The right camera control unit generates (renders the virtual surgical tool) the lock signal (assigning three-dimensional depth value) that is coupled to the left camera control unit to synchronize (calibrated position of the virtual surgical tool) the left view channel to the right video channel. However, the left camera control unit 208L may also generates (render) the lock reference signal (three-dimensional depth values) so that the right video channel synchronizes (calibrated position of the virtual surgical tool) to the left video channel.”) See also ZHAO ([Figure 2] and [Figure 3].)

    PNG
    media_image9.png
    648
    886
    media_image9.png
    Greyscale

	ZHAO Figure 2 Reference 

    PNG
    media_image10.png
    667
    551
    media_image10.png
    Greyscale

ZHAO Figure 3 Reference 
It would have been obvious to one of ordinary skill in the art, before the effective filing date, to combine the teachings of ZHAO with CHOW as the references deal with a method of providing surgical environment based on a virtual reality, and more particularly, to an apparatus and method for providing surgical environment based on a virtual reality by figuring out the movement of a surgical tool in a surgical video. ZHAO would modify CHOW wherein the performing of the calibration includes rendering the virtual surgical tool as a three-dimensional object by assigning a corresponding three- dimensional depth value to the calibrated position of the virtual surgical tool. The benefits of doing so provide improved visualization for easier on-line diagnostics and improved localization for reliable and precise surgery. ZHAO ([Column 31 | Lines 26-28].) Accordingly, claim 7 is rejected based on the combination of these references. 
Claim 8 


Claim 8 is rejected because the combination of CHOW and ZHAO teach claim 7. CHOW does not explicitly teach wherein the calculating of the coordinate values includes including the three-dimensional depth value in coordinate values corresponding to the calibrated position of the virtual surgical tool after the three- dimensional depth value is assigned.
However, ZHAO teaches wherein the calculating of the coordinate values includes including the three-dimensional depth value in coordinate values corresponding to the calibrated position of the virtual surgical tool after the three- dimensional depth value is assigned ZHAO ([Column 7 | Lines 1-20] “In the stereo viewer, three dimensional maps (three-dimensional depth value) (a depth map with respect to a camera coordinate system (coordinate values) or equivalently a surface map of an object with respect to its local coordinate system is a plurality of three-dimensional points to illustrate a surface in three dimensions) of the anatomy, derived from alternative imaging modalities (e.g. CT scan, XRAY, or MRI) (corresponding to the calibrated position of the virtual surgical tool), may also be provided to a surgeon by overlaying (calculating the coordinate values) them onto the video images of the surgical site. In the right view finder 401R, a right image 410R rendered from a three dimensional map such as from a CT scan, may be merged onto or overlaid on the right image 400R (after the three-dimensional depth value is assigned) being displayed by the display device 402R. In the left viewfinder 401L, a rendered left image 410L is merged into or overlaid on the left image 400L (after the three-dimensional depth value is assigned) of the surgical site provided by the display device 402L. In this manner, a stereo image may be displayed to map out site to the operator O in the control of the robotic instruments in the surgical site, augmenting the operator's view of the surgical site (corresponding to the calibrated position of the virtual surgical tool after 
Read full office action
Prosecution Timeline

Nov 12, 2021
Application Filed
Feb 21, 2025
Non-Final Rejection — §101, §103
May 12, 2025
Interview Requested
May 23, 2025
Interview Requested
May 30, 2025
Applicant Interview (Telephonic)
Jun 04, 2025
Examiner Interview Summary
Jun 30, 2025
Response Filed
Sep 28, 2025
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/640,416
Patent 12586827
BATTERY MANAGEMENT APPARATUS, BATTERY MANAGEMENT METHOD AND BATTERY PACK
2y 5m to grant Granted Mar 24, 2026
18/118,688
Patent 11972087
ADJUSTMENT OF AUDIO SYSTEMS AND AUDIO SCENES
2y 5m to grant Granted Apr 30, 2024
17/383,211
Patent 11960716
MODELESS INTERACTION MODEL FOR VIDEO INTERFACE
2y 5m to grant Granted Apr 16, 2024
18/136,266
Patent 11943559
USER INTERFACES FOR PROVIDING LIVE VIDEO
2y 5m to grant Granted Mar 26, 2024
17/537,261
Patent 11934613
SYSTEMS AND METHODS FOR GENERATING A POSITION BASED USER INTERFACE
2y 5m to grant Granted Mar 19, 2024
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
69%
Grant Probability
81%
With Interview (+12.8%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 370 resolved cases by this examiner. Grant probability derived from career allow rate.