Last updated: April 19, 2026
Application No. 18/659,085
METHOD AND SYSTEM FOR MICRO-ACTIVITY IDENTIFICATION

Final Rejection §103
Filed
May 09, 2024
Examiner
HO, THOMAS Y
Art Unit
3624
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Tata Consultancy Services Limited
OA Round
2 (Final)
This examiner grants 15% of cases after interview

— +31.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 175 resolved cases, 2023–2026
Examiner Intelligence

HO, THOMAS Y View full profile →
Grants only 15% of cases
Career Allow Rate
27 granted / 175 resolved
-36.6% vs TC avg
Strong +32% interview lift
Without
With
+31.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
46 currently pending
Career history
221
Total Applications
across all art units
Statute-Specific Performance

§101
35.3%
-4.7% vs TC avg
§103
41.8%
+1.8% vs TC avg
§102
10.5%
-29.5% vs TC avg
§112
11.7%
-28.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 175 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Status of the Claims
The pending claims in the present application are claims 1-12 of the “AMENDMENT & RESPONSE UNDER 37 C.F.R. § 1.111” (hereinafter referred to as the “Amendment/Response”) dated 30 December 2025.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-5, 7-9, 11, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pat. App. Pub. No. 2021/0313052 A1 to Makrinich et al. (hereinafter referred to as “Makrinich”), in view of U.S. Pat. App. Pub. No. 2019/0038362 A1 to Nash et al. (hereinafter referred to as “Nash”), and further in view of WIPO Int’l Pub. No. 2023/031688 A1 to Shaul (hereinafter referred to as “Shaul”).
Regarding claim 1, Makrinich discloses the following limitations:
“A processor implemented method of scoring a task involving a plurality of micro-activity, the method comprising: ...” - Makrinich discloses, “Computer system 410 may include one or more processors 412 for analyzing the visual data collected by the image sensors, a data storage 413 for storing the visual data and/or other types of information, an input module 414 for entering any suitable input for computer system 410, and software instructions 416 for controlling various aspects of operations of computer system 410” (para. [0130]), and “the stored data based on prior surgical procedures may include a machine learning model trained using a data set based on prior surgical procedures. For example, a machine learning model may be trained to process video frames and generate competency-related scores” (para. [0253]). Operation of the computer systems and associated processors, for using a machine learning model to generate competency-related scores for surgical procedures, in Makrinich, reads on the recited limitation.
“... receiving, via one or more hardware processors, a video stream through a plurality of ... camera device; ...” - See the aspects of Makrinich that have been reference above. Makrinich also discloses, “Room 101 may include audio sensors, video/image sensors, chemical sensors, and other sensors, as well as various light sources (e.g., light source 119 is shown in FIG. 1) for facilitating the capture of video and audio data, as well as data from other sensors, during the surgical procedure. For example, room 101 may include one or more microphones (e.g., audio sensor 111, as shown in FIG. 1), several cameras (e.g., overhead cameras 115, 121, and 123, and a tableside camera 125) for capturing video/image data during surgery” (para. [0060]), “video frames within a set may include frames, recorded by the same capture device, recorded at the same facility, recorded at the same time or within the same timeframe, depicting surgical procedures performed on the same patient or group of patients, depicting the same or similar surgical procedures, or sharing any other properties or characteristics” (para. [0085]), “receiving a plurality of video frames from a surgical video feed. A surgical video feed may refer to any video, group of video frames, or video footage including representations of a surgical procedure. For example, the surgical video feed may include one or more video frames captured during a surgical operation” (para. [0143]), and “the operations may be continuously repeated to continuously monitor deviations from the surgical plane. For example, when a video feed is a real time broadcast of the surgical procedure, the operations may be repeated after a certain quantity of video frames from a video feed is received, or upon elapse of a certain pre-determined time interval.” (para. [0165]). Receiving, via the processors of the computer systems, video feeds through several cameras, in Makrinich, reads on the recited limitation.
“... decoding, via the one or more hardware processors, the video stream by an ingestion device, wherein the ingestion device is an interface to receive and decode the video stream into a plurality of frames and extracts the associated audio from the video stream captured from the plurality of ... camera device ...” - See the aspects of Makrinich that have been referenced above. Makrinich also discloses, “audio sensor 111 may include one or more audio sensors configured to capture audio by converting sounds to digital information (e.g., audio sensors 121)” (para. [0070]), “Audio sensors 425 may be any suitable sensors for capturing audio data” (para. [0138]), “image, video, or audio data may be captured during the surgical procedure. In some cases, video data may also include audio data” (para. [0152]), “analyzing image data (as described herein) may include analyzing the image data (in this case, data related to a video frame) to obtain preprocessed image data, and subsequently analyzing the image data and/or the preprocessed image data” (para. [0254]), “analyzing the received plurality of video frames of each surgical video to identify a plurality of surgical events in each of the plurality of surgical videos” (para. [0389), “wherein each of the identified plurality of surgical events in each of the plurality of surgical videos is defined by a differing subgroup of frames” (para. [0390]), “assigning each differing subgroup of frames to one of the surgical event-related categories to thereby interrelate subgroups of frames from differing surgical procedures under an associated common surgical event-related category” (para. [0391]). Identifying pluralities of frames from surgical videos, via the processors of the computer systems, such that subgroups of frames are placed in surgical event-related categories, and converting sounds to digital information from the audio data in the video data, in Makrinich, reads on the recited limitation.
“... detecting, via the one or more hardware processors, at least one micro-activity in the plurality of frames by a pre-trained Artificial Intelligence, Al model, wherein the pre-trained AI model comprises a plurality of modules for performing one or more tasks captured in each frame of the plurality of frames, wherein the pre-trained AI model, through the plurality of modules, identifies the one or more tasks captured in each frame of the plurality of frames and each module of the plurality of modules is trained to recognize human actions and identify specific objects associated with the at least one micro activity, and a wrapper function determines micro-activity as a result of actioning sub-modules, upon receiving input from the video cum audio captured device and instructions from a micro-activity configuration map tool, the micro-activity configuration map tool comprising one or more pre-defined features, one or more actions, and one or more parameters to be utilized by the plurality of modules for training the AI model, wherein the wrapper function selects features by selecting relevant micro-activity from a set of activities in the dataset; ...” - See the aspects of Makrinich that have been referenced above. Makrinich also discloses, “The figures illustrate a general schematic of the system architecture in accordance embodiments of the presently disclosed subject matter. Each module in the figures can be made up of any combination of software, hardware and/or firmware that performs the functions as defined and explained herein. The modules in the figures may be centralized in one location or dispersed over more than one location” (para. [0046]), “Analyzing the received video frames to identify surgical events may involve any form of electronic analysis using a computing device. In some embodiments, computer image analysis may include using one or more image recognition algorithms to identify features of one or more frames of the video footage. Computer image analysis may be performed on individual frames, or may be performed across multiple frames, for example, to detect motion or other changes between frames. In some embodiments, computer image analysis may include object detection algorithms, such as Viola-Jones object detection, scale-invariant feature transform (SIFT), histogram of oriented gradients (HOG) features, convolutional neural networks (CNN)” (para. [0052]), “the computer image analysis may include using a neural network model trained using example video frames including previously identified surgical events to thereby identify a similar surgical event in a set of frames” (para. [0053]), “camera 115 may be configured to track a surgical instrument (also referred to as a surgical tool) within location 127, an anatomical structure, a hand of surgeon 131, an incision, a movement of anatomical structure, and the like. In various embodiments, camera 115 may be equipped with a laser 137 (e.g., an infrared laser) for precision tracking. In some cases, camera 115 may be tracked automatically via a computer-based camera control application that uses an image recognition algorithm for positioning the camera to capture video/image data of a ROI. For example, the camera control application may identify an anatomical structure, identify a surgical tool, hand of a surgeon, bleeding, motion, and the like at a particular location within the anatomical structure, and track that location with camera 115 by rotating camera 115 by appropriate yaw and pitch angles” (para. [0061]), “A surgical procedure may include any set of medical actions associated with or involving manual or operative activity on a patient's body. Surgical procedures may include one or more of surgeries, repairs, ablations, replacements, implantations, implantations, extractions, treatments, restrictions, re-routing, and blockage removal, or may include veterinarian surgeries. Such procedures may involve cutting, abrading, suturing, extracting, lancing or any other technique that involves physically changing body tissues and/or organs” (para. [0077]), “selecting and processing video collections using artificial intelligence to identify relationships. Disclosed systems and methods may involve using artificial intelligence to automatically detect events by analyzing frames of surgical procedures assigning categories to thereby interrelate subgroups of frames from differing surgical procedures under an associated common surgical event-related category” (para. [0078]), “analyzing the plurality of video frames to determine an average skill of a category” (para. [0091]), “presenting an interface enabling the specific physician to self-compare with the average skill” (para. [0092]), “The intraoperative surgical event may be a planned event, such as an incision, administration of a drug, usage of a surgical instrument, an excision, a resection, a ligation, a graft, suturing, stitching, or any other planned event associated with a surgical procedure or phase” (para. [0189]), “the machine learning model may be used to analyze the accessed frames and identify in the accessed frames the at least one specific intraoperative event” (para. [0205]), “a software module or a portion of code” (para. [0361]), and “The various functions, scripts, programs, or modules may be created using a variety of programming techniques. For example, programs, scripts, functions, program sections or program modules may be designed in or by means of languages, including JAVASCRIPT, C, C++, JAVA, PHP, PYTHON, RUBY, PERL, BASH, or other programming or scripting languages. One or more of such software sections or modules may be integrated into a computer system, non-transitory computer readable media, or existing communications software. The programs, modules, or code may also be implemented or replicated as firmware or circuit logic” (para. [0565]). The system detecting events of surgical procedures in frames of recorded video using the trained artificial intelligence (and neural network), wherein the trained AI/NN is represented by software modules for analyzing the frames, wherein the trained AI/NN, via its software modules, identifies the various surgical events in the frames and recognizes the surgeon’s hand and tool and/or tissue manipulations during the surgical events, and software functionality recognizes the types or categories of events, upon receiving the captured frames and audio from the cameras and audio sensors and software instructions from the surgical event characterization function, the function involving surgical actions and parameters of the training of the trained AI/NN, wherein specific types of surgical events can be filtered, in Makrinich, reads on the recited limitation.
“... verifying, via the one or more hardware processors, the micro-activity detected by the pre-trained Al model, wherein verification involves matching the micro-activity detected with a ground truth sequence previously fed to the Al model; ...” - See the aspects of Makrinich that have been mentioned above. Makrinich also discloses, “the historical data may include a statistical model and/or a machine learning model based on an analysis of information and/or video footage from historical surgical procedures (for example as described above), and the statistical model and/or the machine learning model may be used to analyze the accessed frames and identify deviations in the received video frames from a reference set of frames or images related to prior surgical procedures” (para. [0254]). Determining, via the processors of the computer systems, intraoperative events identified by the machine learning models, wherein the determining involves comparing the identified intraoperative events with reference sets of frames from historical surgical procedures, in Makrinich, reads on the recited limitation.
“... assigning, via the one or more hardware processors, a weightage to the each of the micro-activity detected by the plurality of modules and scoring the task by adding a positive score to each micro-activity performed correctly and assigning a penalty to the micro-activity performed incorrectly; and  ...” - See the aspects of Makrinich that have been mentioned above. Makrinich also discloses, “the significant deviations between such a sequence of events and the recommended sequence of events may be recognized. Accordingly, a lower competency-related score may be assigned (e.g., a 1 out of 5). Conversely, if the subject were to perform a cataract surgery performing only the recommended steps in the recommended order and within the recommended timing, a high competency-related score may be assigned (e.g., a 5 out of 5 or 19 out of 20)” (para. [0267]), and “a competency-related score may include a composite score assessing a plurality of scores. A composite score may be an aggregation or combination of multiple scores related to different skills. Composite scores may be calculated through simple summation of the multiple scores, by a weighted average, or other suitable method. As an example, a subject may be assigned four separate scores, one each for tissue handling, economy of motion, depth perception, and surgical procedure flow. Each on a scale of 1 to 5, the subject may receive a tissue handling score of 4, an economy of motion score of 4, a depth perception score of 5, and surgical procedure flow score of 2. Accordingly, a composite for the subject may be 15 (4+4+5+2=15)” (para. [0270]) Assigning, by the processors of the computer systems, weighted averages of different competencies based on the machine learning analyses of the frames, wherein competencies are scored higher or lower based on intraoperative events being closer to or farther from recommended sequences of events, in Makrinich, reads on the recited limitation. Note that assigning a value of 5 is like adding 2 points to a value of 3, while assigning a value of 1 is like subtracting 2 points to a value of 3, which reads on the recited “adding” and “assigning a penalty” limitations.
“... obtaining, via the one or more hardware processors, a quality score for the task based on individual score assigned to the plurality of micro-activities detected.” - See the aspects of Makrinich that have been mentioned above. Makrinich also discloses, “a plurality of competency-related scores may be generated for a subject. Each of the plurality of scores may associated with a differing skill. The skills may include, but are not limited to, tissue handling, economy of motion, depth perception, or surgical procedure flow described above. For example, for a subject a separate competency score (e.g., out of scale of 0 to 5) may be calculated for each of the subject's tissue handling, economy of motion, depth perception, and surgical procedure flow. Thus, the subject would have four separate competency-related score, one specific to each of the four different skills” (para. [0269]). Obtaining, via the processors of the computer systems, competency scores or surgical procedures, based on scores for each intraoperative event performed, in Makrinich, reads on the recited limitation.
The combination of Makrinich and Nash (hereinafter referred to as “Makrinich/Nash”) teaches limitations below of claim 1 that do not appear to be disclosed in their entirety by Makrinich:
The claimed “camera device” is a “synchronized audio-video camera device” - See the aspects of Makrinich that have been referenced above. Nash discloses, “The computer image analysis system 160 may receive synchronized image captures from multiple camera devices in the camera mesh. Each image may be designated with an identification of the camera device which captured it. The identification of the camera device, and the camera or cameras within the camera device, for each image or for received image data may be used to correlate data about the camera device, such as location and positioning, to the image or image data. At least a portion of the synchronized image captures includes information indicative of a position and orientation of a tracked object within the surgical field. For example, the camera mesh may include two overhead camera devices (e.g., 150 and 155) and a camera (e.g., 120 or 140) on the AR headset (e.g., 110 or 130) worn by the surgeon (e.g., 105 or 125). One of the overhead camera devices and the AR headset camera may capture an instrument being used by the surgeon, but the second overhead camera device may not capture the instrument as the surgeon is blocking the line-of-sight” (para. [0051]). The synchronizing of data captures of multiple camera devices, in Nash, when applied to the cameras and audio sensors, of Makrinich, reads on the recited limitation.
The claimed “camera device” is “synchronized prior to decoding the video stream into the plurality of frames” - See the aspects of Makrinich and Nash that have been cited above. The synchronization being in place before images are captured, in Nash, reads on the recited limitation.
Nash discloses “tracking an object within a surgical field” (Abstract), similar to the claimed invention and to Makrinich. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the camera devices, in Makrinich, to have synchronized image captures, as in Nash, to track objects even if one of the cameras is blocked, per Nash (para. [0051]).
The combination of Makrinich, Nash, and Shaul (hereinafter referred to as “Makrinich/Nash/Shaul”) teaches limitations below of claim 1 that do not appear to be taught in their entirety by Makrinich/Nash:
“... applying at least one fiducial marker to the plurality of frames for image calibration, remove distortions, wherein the at least one fiducial marker comprises a binary square fiducial marker that determines an object size based on coordinates of an object captured in each frame of the plurality of frames and a pixel to inch conversion ratio; ...” - Shaul discloses, “In a block 306, a location and orientation of the US probe relative to the set of stereo cameras may be determined, optionally using triangulation based on the stereo cameras detection of the landmarks. Triangulation mathematical formulas and/or methods such as bundle adjustment may be used to increase location and orientation accuracy. Prior to determination of the location and orientation, calibration of the stereo cameras may be performed in a two-step process using a pre-defined calibration jig. A calibration grid may be used which may include a checkerboard of a known size optionally including ArUco markers or a more complex predefined pattern. In a first step, stereo images of the calibration may be acquired, followed by a second mathematical calculation step to obtain intrinsic parameters and offset. These parameters may include camera focal length, camera lens distortion, and location of the camera in 3D space. It is noted that calibration may also be performed more than once during the surgical procedure. Alternatively, the US probe and the cameras may include position sensors which may be used to determine the location and the orientation of the US probe relative to the set of stereo cameras” and “In a block 308, the acquired 2D US images are paired with the position data of the US probe and remodelled into a 3D scan. A grid is first defined. As the location of each pixel is known, interpolation may be performed to get measurements for all voxels in a 3D volume. Optionally, a minimal bounding rectangular shape may be fit to all the acquired voxels, for example, by maximizing the overlap between the rectangle and the voxels. The rectangle may define a grid wherein its voxels may be calculated using standard interpolation tools.” (p. 8). Applying the ArUco markers as visual landmarks for calibrating the camera, for addressing camera lens distortion, and for establishing focal length, location in space, position, orientation, and sizing, based on the appearance of the ArUco markers in the images captured, wherein locations of the pixels facilitate interpolation for voxels in 3D, in Shaul, reads on the recited limitation.
Shaul discloses “performing surgical procedures” and “combining multi-imaging modalities for use in surgical procedures” (p. 1), similar to the claimed invention and to Makrinich and Nash. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the image capture devices and processes, of Makrinich/Nash, to include use of ArUco markers, as in Shaul, for calibration purposes, per Shaul (p. 8).
Regarding claim 3, Makrinich/Nash/Shaul teaches the following limitations:
“The method of claim 1, wherein the weightage assigned to each micro-activity is based on a right tool usage and a right tool handling while performing the micro-activity.” - See the aspects of Makrinich that have been mentioned above. The weighted averaging of surgical procedures, or intraoperative events thereof, based on surgical instrument movements by users’ hands while performing the procedures or events, in Makrinich, reads on the recited limitation.
Regarding claim 4, Makrinich/Nash/Shaul/Garg teaches the following limitations:
“The method of claim 1, wherein the output of each module is mapped to the exact time duration of the micro-activity performed.” - Makrinich discloses, “video frames within a set may include frames, recorded by the same capture device, recorded at the same facility, recorded at the same time or within the same timeframe, depicting surgical procedures performed on the same patient or group of patients, depicting the same or similar surgical procedures, or sharing any other properties or characteristics” (para. [0085]), “frames associated with multiple surgeons can be analyzed to identify properties or events, such as hand-eye coordination, excessive bleeding amount, incision techniques, stitching techniques, appropriate incision placement, dissection, hemostasis, tissue handling skills, or a length of time to complete a surgical event. An average skill can be determined for a category of physicians through analysis of the presence or absence of certain surgical events, through the length of time associated with the performance of a surgical event” (para. [0091]). The aspects of the computer system that perform the various analyses of video feeds of surgical events, and track how long they take or when they take place, in Makrinich, reads on the recited limitation.
Regarding claims 5, 7, and 8, while the claims are of different scope relative to claims 1, 3, and 4, the claims recite limitations similar to those recited by claims 1, 3, and 4. As such, the rationales applied in the rejection of claims 1, 3, and 4, also apply for purposes of rejecting claims 5, 7, and 8. Limitations recited by claims 5, 7, and 8 that do not appear to have a counterpart in claims 1, 3, and 4, such as the recited computer hardware limitations of independent claim 5, also are disclosed by cited references (see, e.g., para. [0050] of Makrinich). Claims 5, 7, and 8 are, therefore, also rejected under 35 USC 103 as obvious in view of Makrinich/Nash/Shaul.
Regarding claims 9, 11, and 12, while the claims are of different scope relative to claims 1, 3, 4, 5, 7, and 8, the claims recite limitations similar to those recited by claims 1, 3, 4, 5, 7, and 8. As such, the rationales applied in the rejection of claims 1, 3, 4, 5, 7, and 8, also apply for purposes of rejecting claims 9, 11, and 12. Claims 9, 11, and 12 are, therefore, also rejected under 35 USC 103 as obvious in view of Makrinich/Nash/Shaul.
Claims 2, 6, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Makrinich, in view of Nash, further in view of Shaul, and further in view of U.S. Pat. App. Pub. No. 2021/0350134 A12 to Garg et al. (hereinafter referred to as “Garg”).
	Regarding claim 2, the combination of Makrinich, Nash, Shaul, and Garg (hereinafter referred to as “Makrinich/Nash/Shaul/Garg”) teaches limitations below that do not appear to be taught in their entirety by Makrinich/Nash/Shaul:
“The method of claim 1, wherein the Al model comprises of an object-hand-association module, a video classification module, an audio analysis module, a hand gesture module, a pose estimation module, an optical character recognition module, a dimension measurement module and an occupied hand module, wherein the object-hand-association module maps movements of the operator’s hand and detects a correct posture of the hand while holding a tool for performing an assembly operation or other task under surveillance, wherein the video classification module is trained on multiple trimmed videos for each activity class, and, during inference, recognizes the activities and their durations and classifies the activity into one of a plurality of predefined classes.” - See the aspects of Makrinich that have been referenced above. Makrinich also discloses, “camera 115 may be configured to track a surgical instrument (also referred to as a surgical tool) within location 127, an anatomical structure, a hand of surgeon 131, an incision, a movement of anatomical structure, and the like” (para. [0061]), “selection may occur through gestures (such as hand gestures), and the gestures may be analyzed using gesture recognition algorithms” (para. [0093]), “statistical information may include the average duration in which the surgeon performs the surgical event” (para. [0098]), “the interaction may include a contact between the medical instrument and the anatomical structure, an action by the surgical instrument on the anatomical structure (such as cutting, clamping, applying pressure, scraping, etc.), a reaction by the anatomical structure (such as a reflex action), a physiological response by the anatomical structure, the surgical tool emitting light towards the anatomical structure (e.g., surgical tool may be a laser that emits light towards the anatomical structure) a sound emitted towards anatomical structure, an electromagnetic field created in a proximity of the anatomical structure, a current induced into an anatomical structure, or any other suitable forms of interaction from which biological material-instrument feedback may be obtained” (para. [0149]), and “by analyzing the surgical video footage of an example surgical procedure, the image analysis model may be configured to determine a distance between the surgical tool and a point (or a set of points) of an anatomical structure” (para. [0150]). The computer system performing machine learning involving objects (like surgical instruments, or anatomical structures) and hands, categorized frames of video feeds, audio data associated with video feeds, hand gestures, and distances between tools and anatomical structures, in Makrinich, reads on the recited “wherein the Al model comprises of an object-hand-association module, a video classification module, an audio analysis module, a hand gesture module, a pose estimation module, ... a dimension measurement module and an occupied hand module” limitation. The elements of the system that identify tools, tissue, and hands as surgeons perform surgical procedures, and compare those elements to similar elements of other surgeons, including averages associated with video frames of other surgeons, wherein the system uses AI and NN trained on video frames of surgical events in various categories, to identify the surgical events, how long they take, and the categories they fall into, in Makrinich, reads on the recited “wherein the object-hand-association module maps movements of the operator’s hand and detects a correct posture of the hand while holding a tool for performing an assembly operation or other task under surveillance, wherein the video classification module is trained on multiple trimmed videos for each activity class, and, during inference, recognizes the activities and their durations and classifies the activity into one of a plurality of predefined classes” limitation. Garg discloses, “The image analysis module 320 extracts text data from the images that can be used to infer activity labels and data values contained in the images. In one embodiment, the image analysis module 320 extracts text data by applying a model to identify one or more text fields in the image and their relative locations in the image. In one instance, the image analysis module 332 applies optical character recognition (OCR) methods to identify the text data” (para. [0051]). The applying of OCR by system elements, in Garg, when applied in the context of the image analyses, of Makrinich, reads on the recited “optical character recognition module” limitation.
	Garg discloses “generating event logs for processing mining” including “generating event logs from videos streams of worker devices” (para. [0002]), similar to the claimed invention and to Makrinich/Nash/Shaul. It would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, to have modified the image analyses, of Makrinich/Nash/Shaul, to include the OCR features, of Garg, for expanded abilities to infer activities of images, as taught by Garg (see p. 9, claim 13).
	Regarding claim 6, while the claim is of different scope relative to claim 2, the claim recites limitations similar to those recited by claim 2. As such, the rationales applied to reject claim 2 also apply for purposes of rejecting claim 6. Claim 6 is, therefore, also rejected under 35 USC 103 as obvious in view of the combination of Makrinich/Nash/Shaul/Garg.
	Regarding claim 10, while the claim is of different scope relative to claims 2 and 6, the claim recites limitations similar to those recited by claims 2 and 6. As such, the rationales applied to reject claims 2 and 6 also apply for purposes of rejecting claim 10. Claim 10 is, therefore, also rejected under 35 USC 103 as obvious in view of Makrinich/Nash/Shaul/Garg.

Response to Arguments
	On pp. 8-28 of the Amendment/Response, the applicant requests reconsideration and withdrawal of the claim rejection under 35 USC 101. In view of the amendments to the claims, and the associated remarks of the applicant, the claim rejection under 35 USC 101 has been withdrawn.
On pp. 28-34 of the Amendment/Response, the applicant requests reconsideration and withdrawal of the claim rejection under 35 USC 102(a)(1) in view of Makrinich. The applicant’s arguments have been considered, and most are moot because the new grounds of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. See, for example, reference to Nash and Shaul in the 35 USC 103 section above. The arguments that are not moot are not persuasive. Any portion of software-driven functionality is a module. Thus, the system in Makrinich, with its multitude of software-driven functionality, including analyzing data using AI and NN, is a multi-module pre-trained AI architecture (contrary to p. 32 of the Amendment/Response). Whether Makrinich discloses, teaches, or suggests a plurality of independently trained modules (see Amendment/Response, p. 33) is not relevant because no such feature is claimed. Features upon which the applicant relies (i.e., the plurality of independently trained modules) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Further, the elements of the system in Makrinich that analyze, categorize, and assess the video information reads on the wrapper function and the micro-activity configuration map tool, as they place different surgical events into different categories, and the decision points by which this is accomplished constitutes instructions provided by a micro-activity configuration map tool (contrary to p. 33 of the Amendment/Response). 
On pp. 35-41 of the Amendment/Response, the applicant requests reconsideration and withdrawal of the claim rejection under 35 USC 103 in view of Makrinich/Garg. The examiner finds the applicant’s supporting arguments unpersuasive. Whether Makrinich discloses, teaches, or suggest a specific multi-module AI model, with each module having purpose-built industrial assembly functions and defined outputs/constraints, operating in concert for operator posture/tool-handling compliance and 2D dimensional inspection in an assembly surveillance setting (see Amendment/Response, p. 36) is not relevant because such limitations are not recited in the rejected claims. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). This same rebuttal applies to the other arguments about alleged deficiencies of Makrinich and Garg (see Amendment/Response pp. 36 and 37, and pp. 40 and 41).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Such prior art includes the following:
U.S. Pat. App. Pub. No. 2022/0375114 A1 to Hunter et al. discloses, “systems and methods for performing three-dimensional measurements of a surgical space using two-dimensional endoscopic images. According to an aspect, video data taken from an endoscopic imaging device can be used to generate a three-dimensional model of the surgical space represented by the video data. In one or more examples, two-dimensional images from the video data can be used to generate a three-dimensional model of the surgical space. In one or more examples, the one or more two-dimensional images of the surgical space can include a fiducial marker as part of the image. Using both the depth information and a size reference provided by the fiducial marker, the systems and methods herein can generate a three-dimensional model of the surgical space. The generated three-dimensional model can then be used to perform a variety of three-dimensional measurements in a surgical cavity in an accurate and efficient manner” (Abstract.)
WIPO Int’l Pub. No. 2022/125833 A1 to Siewerdsen et al. discloses, “A system for surgical navigation, including an instrument for a medical procedure attached to a camera and having a spatial position relative to the camera, an x-ray system to acquire x-ray images, and multiple fiducial markers detectable by both the camera and x-ray system, having a radio-opaque material arranged as at least one of a line and a point. A computer receives an optical image from the camera and an x-ray image from the x-ray system, identifies fiducial markers visible in both the optical image and x-ray image, determines for each fiducial marker a spatial position relative to the camera based on the optical image and relative to the x-ray system based on the x-ray image, and determines a spatial position for the instrument relative to the x-ray system based on at least the spatial positions relative to the camera and x-ray system” (Abstract.)
Hashimoto, Daniel A., et al. "Computer vision analysis of intraoperative video: automated recognition of operative steps in laparoscopic sleeve gastrectomy." Annals of surgery 270.3 (2019): 414-421.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.





Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS Y HO whose telephone number is (571)270-7918. The examiner can normally be reached Monday through Friday, 9:30 AM to 5:30 PM Eastern.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jerry O'Connor can be reached at 571-272-6787. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/THOMAS YIH HO/Primary Examiner, Art Unit 3624
Read full office action
Prosecution Timeline

May 09, 2024
Application Filed
Sep 27, 2025
Non-Final Rejection — §103
Dec 30, 2025
Response Filed
Mar 13, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/446,064
Patent 12572893
DECISION SUPPORT SYSTEM OF INDUSTRIAL COPPER PROCUREMENT
2y 5m to grant Granted Mar 10, 2026
17/635,341
Patent 12456126
SYSTEMS AND PROCESSES THAT AUGMENT TRANSPARENCY OF TRANSACTION DATA
2y 5m to grant Granted Oct 28, 2025
17/480,151
Patent 12406215
SCALABLE EVALUATION OF THE EXISTENCE OF ONE OR MORE CONDITIONS BASED ON APPLICATION OF ONE OR MORE EVALUATION TIERS
2y 5m to grant Granted Sep 02, 2025
17/930,992
Patent 12393902
CONTINUOUS AND ANONYMOUS RISK EVALUATION
2y 5m to grant Granted Aug 19, 2025
17/095,303
Patent 12367438
Parallelized and Modular Planning Systems and Methods for Orchestrated Control of Different Actors
2y 5m to grant Granted Jul 22, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
15%
Grant Probability
47%
With Interview (+31.7%)
3y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 175 resolved cases by this examiner. Grant probability derived from career allow rate.