Last updated: April 19, 2026

Application No. 18/345,845

SURGICAL INSTRUMENT RECOGNITION FROM SURGICAL VIDEOS

Non-Final OA §103§112

Filed

Jun 30, 2023

Examiner

SCHWARTZ, RAPHAEL M

Art Unit

2671

Tech Center

2600 — Communications

Assignee

Verb Surgical Inc.

OA Round

1 (Non-Final)

Interview Optional

— +31.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 338 resolved cases, 2023–2026

Examiner Intelligence

SCHWARTZ, RAPHAEL M View full profile →

Grants 67% — above average

Career Allow Rate

227 granted / 338 resolved

+5.2% vs TC avg

Strong +31% interview lift

Without

With

+31.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

24 currently pending

Career history

362

Total Applications

across all art units

Statute-Specific Performance

§101

7.8%

-32.2% vs TC avg

§103

48.9%

+8.9% vs TC avg

§102

7.5%

-32.5% vs TC avg

§112

19.3%

-20.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 338 resolved cases

Office Action

§103 §112

DETAILED ACTION

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 12, 13, 19 and 20 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 12 and 19 require, “the video segment is recognized by the vision transformer, and extracting the features comprises doing so by EfficientNetV2 featurizer” it is not clear from the claims what might be may or may not be included within the metes and bounds of the term “EfficientNetV2 featurizer”. Examiner suggests amending the claims to provide clarifying details to define the metes and bounds of this process.
Claim 8 recites, “collect statistics on a plurality of instances of the detected presence of the surgical instrument”. The claim lacks clear antecedent basis. It appears the term “respective” should be added to refer to the “the respective surgical instrument” of claim 1. 
Claim 9 recites, “system of claim 1, wherein the one or more processors are further configured to filter the one or more video segments of detected surgical instrument based on filtering rules set by a human actor.” It appears the underlined term should recite “the detected presence of the respective surgical instrument” or the like in order to clear up the deficient antecedent basis here.

Claim Objections
Claim 17 objected to because of the following informalities: The claim recites, “a respective surgical video in which a respective surgeon is operation” It appears the underlined term should be “operating”.
Claim 18 objected to because of the following informalities: It appears the claims should be amended as follows, “instructions that configure a computing device to recognize instruments” It appears the underlined term should be added here. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-9, 11 and 14-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wolf (US pat. No. 10,729,502).
Regarding claim 1, Wolf discloses a system comprising: (Wolf teaches a system for generating surgical summary footage via neural networks grouping frames based on video content. Users then navigate the surgical summary footage with a user interface, See Abstract.)
one or more processors and a memory storing instructions executed by the one or more processors, configured to: (Col. 8, ll. 25-55)
extract a plurality of features including one or more surgical instrument types and a presence of a plurality of surgical instruments, from a surgical video, on a frame by frame basis; and (Col. 55, ¶ 2 and col. 56, ¶ 3 teach surgical instrument detection of a variety of types on a frame by frame basis. Also see Col. 150, last paragraph, col. 77, ¶ 3 and col. 22, ¶ 2.)
for a respective surgical instrument in the plurality of surgical instruments, analyze the surgical video based on the extracted features to recognize one or more video segments, each recognized video segment including a detected presence of the respective surgical instrument, (Col. 55, ¶ 2 and col. 56, ¶ 3 teach detecting video segments and grouping frames based on surgical instrument presence detection. As above, also see Col. 150, last paragraph, col. 77, ¶ 3 and col. 22, ¶ 2.)
wherein the one or more video segments are recognized by a multi-stage temporal convolution network (MS-TCN) or a natural language processing (NLP) module. (Col. 132 ¶ 3 teaches using natural language processing in video segment processing. Also see Col. 36 last paragraph and Col. 37 last paragraph which disclose using a variety of neural network types to perform natural language processing by generating phase tags to assign language labels to surgical phases.)
Wolf does not expressly disclose that all of its above-cited teachings generating surgical summary footage via neural networks and grouping frames based on the video content are expressly disclosed as occurring in the same embodiment. That is, despite the reference being clear that these functions are disclosed, there is no express disclosure that the details are all found in the same embodiment. Instead, the reference presents some of the individual detailed disclosures as ‘according to some embodiments.’ It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the various teachings to provide a single system capable of the variety of tasks which are disclosed. In view of these teachings, this cannot be considered a non-obvious improvement over the prior art. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.
Regarding claim 2, the above combination discloses the system of claim 1, wherein the NLP module uses the one or more processors to perform spatial-temporal feature learning. (See Col. 36 last paragraph and Col. 37 last paragraph which disclose using a variety of neural network types, performing spatial-temporal video feature learning, to accomplish natural language processing by generating phase tags to assign language labels to surgical phases.)
Regarding claim 3, the above combination discloses the system of claim 1, wherein the NLP module is based on a transformer model. (See Col. 36 last paragraph and Col. 37 last paragraph which disclose using a variety of neural network types including with transformer functions to perform natural language processing by generating phase tags to assign language labels to surgical phases. Also see Col. 12, last paragraph.)
Regarding claim 4, the above combination discloses the system of claim 3, wherein the transformer model includes an encoder network and a decoder network. (Col. 36 last paragraph teaches using an autoencoder neural network architecture which includes an encoder and decoder.)
Regarding claim 5, the above combination discloses the system of claim 1, wherein the one or more processors are further configured to present a surgical instrument navigation bar illustrating a timeline of usage for the respective surgical instrument detected in the surgical video. (Col. 19, ¶ 2 and col. 22, ¶ 2 teaches a surgical timeline navigation bar for surgical events including surgical instrument usage. Also see Fig 4.)
Regarding claim 6, the above combination discloses the system of claim 1, wherein the one or more processors are further configured to facilitate a search interface where responsive to input keywords, video segments matching the input keywords are presented. (Col. 42, ¶ 2 and Fig. 7 teach a search interface responsive to input keywords to pull up video segments such as surgical phases.)
Regarding claim 7, the above combination discloses the system of claim 6, wherein the input keywords include surgical procedure type, surgical steps, surgical events, and/or surgical instrument types and presence. (Col. 42, ¶ 2 and Fig. 7 teach a search interface responsive to input keywords such as surgical steps to pull up video segments.)
Regarding claim 8, the above combination discloses the system of claim 1, wherein the one or more processors are further configured to: collect statistics on a plurality of instances of the detected presence of the surgical instrument where each instance is from a respective surgical video in which a respective surgeon is operating and present the collected statistics to users. (Col. 49, ¶ 3 and col. 114 ¶ 3.)
Regarding claim 9, the above combination discloses the system of claim 1, wherein the one or more processors are further configured to filter the one or more video segments of detected surgical instrument based on filtering rules set by a human actor. (Col. 42, ¶ 2 and Fig. 7 teach a search interface responsive to input keywords such as surgical instruments and steps as well as other filtering to pull up video segments.)
Claims 11 and 14-17 is the method claim corresponding to the system of claims 1, 3, 5-8. The system necessarily requires method steps. Remaining limitations are rejected similarly. See detailed analysis above.
Claim 18 is the ‘article of manufacturer comprising memory’ claim corresponding to claims 1 and 3. See Wolf Col. 8, ll. 25-55 regarding the teaching for memory. Remaining limitations are rejected similarly. See detailed analysis above. 

Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wolf (US pat. No. 10,729,502) in view of Zhang (“SWNet: Surgical Workflow Recognition with Deep Convolutional Network”)
Regarding claim 10, the above combination discloses the system of claim 1, wherein the one or more processors are further configured to filter the one or more video segments of detected surgical instrument (See rejection of claim 1.
In the field of surgical video workflow recognition Zhang teaches using a prior knowledge noise filtering (PKNF) algorithm. (Zhang pg 2, last paragraph, “We concatenate the extracted features to get the full video features and utilize MS-TCN to achieve initial surgical phase segmentation for the full surgical video. We apply the Prior Knowledge Noise Filtering algorithm to the initial surgical phase segmentation results to get the final prediction results.” Prior Knowledge Noise Filtering algorithm improves offline recognition results to filter out incorrect predictions.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Wolf’s surgical video workflow recognition with Zhang’s surgical video workflow recognition. Wolf teaches a system for generating surgical summary footage via neural networks grouping frames based on video content. Users then navigate the surgical summary footage with a user interface for filtering the video content based on surgical instrumentation. Zhang’s Prior Knowledge Noise Filtering algorithm improves offline recognition results to filter out incorrect predictions. The combination constitutes the repeatable and predictable result of simply applying Zhang’s surgical video filtering step to be used in the way in which it was intended. This cannot be considered a non-obvious improvement in view of the relevant prior art here. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.

Claim(s) 12 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wolf (US pat. No. 10,729,502) in view of Tan (“EfficientNetV2: Smaller Models and Faster Training”)
 Regarding claim 12, the above combination discloses the method of claim 11 wherein the video segment is recognized by the vision transformer (See rejection of claim 1)
In the field of image analysis Tan teaches that extracting the features comprises doing so by EfficientNetV2 featurizer. (“EfficientNetV2, a new family of convolutional networks that have faster training speed and better parameter efficiency than previous models”, see Abstract and architecture at pg. 3, section 3.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Wolf’s convolutional neural network-based image analysis with Tan’s convolutional neural network-based image analysis. Wolf Col. 36 last paragraph and Col. 37 last paragraph disclose using a variety of neural network types to perform image processing, extract features and assign language labels. Tan teaches the EfficientNetV2 architecture as an improved convolutional neural network type. The combination constitutes the repeatable and predictable result of simply applying using Tan’s technique in the way in which it was intended. This cannot be considered a non-obvious improvement in view of the relevant prior art here. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.
Claim 19 is the ‘article of manufacturer comprising memory’ claim corresponding to claims 1 and 12. See Wolf Col. 8, ll. 25-55 regarding the teaching for memory. Remaining limitations are rejected similarly. See detailed analysis above. 

Claim(s) 13 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wolf (US pat. No. 10,729,502) in view of Tan (“EfficientNetV2: Smaller Models and Faster Training”) and Yi (“ASFormer: Transformer for Action Segmentation”).
Regarding claim 13, the above combination discloses the method of claim 12 (See rejection of claim 1)
In the field of image analysis Yi teaches that the vision transformer is ASFormer. (Abstract, “we design an efficient Transformer-based model for action segmentation task, named ASFormer”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Wolf’s neural network-based image analysis with Yi’s neural network-based image analysis. Wolf Col. 36 last paragraph and Col. 37 last paragraph disclose using a variety of neural network types to perform image processing, extract features and assign language labels. Yi teaches the ASFormer architecture as an improved transformer neural network type for video processing. The combination constitutes the repeatable and predictable result of simply applying using Yi’s technique in the way in which it was intended. This cannot be considered a non-obvious improvement in view of the relevant prior art here. Using known engineering design, no “fundamental” operating principle of the teachings are changed; they continue to perform the same functions as originally taught prior to being combined.
Claim 20 is the ‘article of manufacturer comprising memory’ claim corresponding to claims 1 and 13. See Wolf Col. 8, ll. 25-55 regarding the teaching for memory. Remaining limitations are rejected similarly. See detailed analysis above. 

Additional Prior Art
In addition to the above citations Examiner would like to make note of the following prior art:
Zhang et al. (WO 2022/219555 A1)
Zhang, Bokai, et al. "Towards accurate surgical workflow recognition with convolutional networks and transformers." Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization 10.4 (Published online: 24 Nov 2021): 349-356.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Raphael Schwartz whose telephone number is (571)270-3822. The examiner can normally be reached Monday to Friday 9am-5pm CT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached at (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RAPHAEL SCHWARTZ/           Examiner, Art Unit 2671

Read full office action

Prosecution Timeline

Jun 30, 2023

Application Filed

Jan 03, 2026

Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/100,991

Patent 12597128

ASSESSMENT OF SKIN TOXICITY IN AN IN VITRO TISSUE SAMPLES USING DEEP LEARNING

2y 5m to grant Granted Apr 07, 2026

18/504,469

Patent 12592063

MACHINE LEARNING OF SPATIO-TEMPORAL MANIFOLDS FOR SOURCE-FREE VIDEO DOMAIN ADAPTATION

2y 5m to grant Granted Mar 31, 2026

18/139,632

Patent 12579642

Methods, Systems, and Apparatuses for Quantitative Analysis of Heterogeneous Biomarker Distribution

2y 5m to grant Granted Mar 17, 2026

17/915,857

Patent 12548289

INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM

2y 5m to grant Granted Feb 10, 2026

18/039,054

Patent 12548179

FUNCTIONAL EVALUATION SYSTEM OF HIPPOCAMPUS AND DATA CREATION METHOD

2y 5m to grant Granted Feb 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

67%

Grant Probability

98%

With Interview (+31.3%)

2y 11m

Median Time to Grant

Low

PTA Risk

Based on 338 resolved cases by this examiner. Grant probability derived from career allow rate.