Last updated: April 19, 2026
Application No. 18/617,067
SYSTEM AND METHOD FOR DATA ADAPTIVE SINGLE-SHOT MULTI-LABEL SEGMENTATION WITH FOUNDATION MODELS

Non-Final OA §103
Filed
Mar 26, 2024
Examiner
ALLEN, KYLA GUAN-PING TI
Art Unit
2661
Tech Center
2600 — Communications
Assignee
GE Precision Healthcare LLC
OA Round
1 (Non-Final)
Interview Optional

— +17.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 53 resolved cases, 2023–2026
Examiner Intelligence

ALLEN, KYLA GUAN-PING TI View full profile →
Grants 89% — above average
Career Allow Rate
47 granted / 53 resolved
+26.7% vs TC avg
Strong +17% interview lift
Without
With
+17.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
9.9%
-30.1% vs TC avg
§103
52.5%
+12.5% vs TC avg
§102
19.3%
-20.7% vs TC avg
§112
17.4%
-22.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 53 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending regarding this application.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03/26/2024 (2) are considered and attached.
Claim Objections
Claim 1 is objected to because of the following informalities: 
Claim 1, lines 16-17, recites “outputting, via the processor, from the trained contrastive similarity metric, learning model pixels that are similar to reference pixels; and”. Here, the comma after the word “metric” should be removed and a comma should be instead placed after the word “model”.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 9, 11, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong (U.S. Publication No. 2024/0104882 A1) in view of Cersovsky et al. (U.S. Publication No. 2025/0349101 A1), hereinafter Cersovsky; and Liu et al. (“Prototype-oriented contrastive learning for semi-supervised medical image segmentation”), hereinafter Liu.
Regarding claim 1, Jeong teaches computer-implemented method (Jeong teaches a processor in para. [0123] which carries out the method steps below and a method in para. [0128], FIGS. 1 and 3), comprising: 
obtaining, at a processor, a (Jeong teaches processing an input image in para. [0064]); 
receiving, at the processor, a selection of both a template image and regions of interest within the template image, wherein the region of interest is marked in the template image and is associated with a label (Jeong teaches selecting a reference (template) image in para. [0053], wherein regions of interest (reference points) are selected based on a boundary of a foreground object in para. [0054]-[0055] and FIG. 2. Jeong additionally teaches that the reference points are each associated with IDs (labels) in the aforementioned sections); 
inputting, via the processor, both the (Jeong teaches inputting the input image into trained feature generation model 124 in para. [0056]. Jeong teaches inputting the reference image into the trained feature generation model 124 in para. [0059]); 
outputting, via the processor, from the trained the template image (Jeong teaches “feature generation operation 206 provides the reference image R1_1 to trained feature generation model 124 and receives a corresponding feature map that includes a respective feature vector FV for each pixel of the reference image R1_1. Each pixel-level feature vector FV includes a respective set of generated features {f1, . . . , fn}, where n is the number of features (e.g., dimensions) per pixel” in para. [0058] and [0056], wherein this process is similarly implemented for the input image as shown in para. [0065]); 
inputting, via the processor, both the pixel level feature vectors and the reference pixel level feature vector into a trained (Jeong teaches “corresponding point detection operation 126 can be a discrete rules-based operation or an ML-based model that has been trained to perform a pixel matching task” in para. [0066], wherein the ML-based model is interpreted as equivalent to the trained contrastive similarity metric learning model. Please note that  the ML-based model as utilized in para. [0066] is separate and distinct from the trained feature generation model 124), wherein the trained (Jeong teaches determining a correspondence match point by utilizing an ML model (para. [0066]) and analyzing a similarity between the reference feature vector and the input image feature vectors as shown in para. [0067]-[0070]); 
outputting, via the processor, from the trained (Jeong teaches “the feature map 306 pixels that are identified as being a corresponding match to a respective reference point feature vector are considered “points-of-interest (POI)” and can be identified in a POI list 310 that is generated in respect of the input image 302” in para. [0067], wherein the POIs are interpreted as equivalent to the claimed similar pixels); and 
labeling, via the processor, pixels in the medical image associated with the pixel level feature vectors that are similar to the reference pixel level feature vector wherein the pixels that are labeled in the medical image correspond to the region of interest (Jeong teaches determining “respective regions-of-interest 314_1, 314_2 and 314_3 overlaid as dashed rectangles on image 302 (which maps directly to pixels of feature map 306). Each region-of-interest 314_1, 314_2 and 314_3 encompasses a respective sub-set of image pixels that include a respective object instance 104I_1, 104I_2, and 104I-3.” in para. [0068] and FIG. 4. Jeong additionally teaches determining pixels of interest based on a similarity between pixels in the reference feature vector and the input image feature vector(s), wherein “the POI data included the POI list 310 can include, among other things, one or more of: Reference Object ID; Reference Point IDs; Input Image ID; Unique ID for each object instance (and/or region-of-interest) in the input image 302. List of corresponding point pixel coordinates in the input image 302 for each of pixels matched to a respective reference point” in para. [0071]. Here, corresponding point identifiers, as further shown in FIG. 4 #CP_1, etc.., for each corresponding pixel is interpreted as equivalent to the claimed pixel label).
While Jeong teaches that “it is possible to mask the image foreground (e.g., regions of interest that correspond to one or more objects) to get training key-points (e.g., corresponding positive key-points) only from the foreground” in a learning phase in para. [0110], Jeong fails to teach the above process in the context of a medical image of a portion of a subject, a trained contrastive similarity metric learning model, a trained vision transformer model, and an initial segmentation mask.
However, Cervosky teaches the above process in the context of a medical image of a portion of a subject (Cersovsky teaches using a medical image in the practice of the described embodiment wherein “a “medical image” is a visual representation of the human body or a part thereof” in para. [0055]), 
a trained vision transformer model (Cersovsky “the semantic representations of the medical images are embeddings generated with the help of a pre-trained vision transformer” in para. [0107]. Cersovsky further teaches that the semantic representations are used to generate attention maps in para. [0131]-[0132]), and 
an initial segmentation mask (Cersovsky teaches “segmentation masks can be generated from the attention maps” in para. [0149]. Here, the segmentation masks are interpreted as the claimed initial segmentation mask since the segmentation mask is further used to create a segmented medical image, which inherently infers that its masked state is an initial state before a final product is produced).
	Jeong and Cersovsky are both considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong to incorporate the teachings of Cersovsky and include “the above process in the context of a medical image of a portion of a subject, a trained vision transformer model, and an initial segmentation mask”. The motivation for doing so would have been “to be able to segment [] images with a high degree of accuracy without the need for manually segmented image”, as suggested by Cersovsky in para. [0007]. Cersovsky additionally suggests that “vision transformers have been shown to achieve excellent performance on various image classification benchmarks, often outperforming traditional convolutional neural networks (CNNs) which have been the dominant approach in computer vision for many years” in para. [0111]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong with Cersovsky to obtain the invention specified in the above claim limitations.
	 Jeong and Cersovsky fail to teach a trained contrastive similarity metric learning model in the context of the above claim. 
	However, Liu teaches a trained contrastive similarity metric learning model (Liu teaches a “prototype-oriented contrastive learning framework for semi-supervised medical image segmentation” in FIG. 2, which “is designed to regularize the feature space by mining the semantic relations across images” at the pixel level in Section 3.2).
Jeong, Cersovsky, and Liu are both considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong (as modified by Cersovsky) to incorporate the teachings of Liu and include “a trained contrastive similarity metric learning model”. The motivation for doing so would have been that “the proposed method substantially improved the segmentation accuracy in terms of the three metrics compared with the previous semi-supervised method, achieving up to 84.28% in Dice coefficient, 74.43% in Jaccard coefficient and 10.35 mm in 95HD. SCC achieved the highest ASD value of 1.77 mm” and “generat[ing] more accurate boundary predictions and fewer incorrect segmentation regions”, as suggested by Liu in Section 4.2 (1). Additionally, Liu recites that contrastive learning “encourages the positive features to be aligned and the embeddings to match a uniform distribution in a hypersphere, which preserves as much information of the data as possible” in Section 3.2.2. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong and Cersovsky with Liu to obtain the invention specified in claim 1.

Regarding claim 9, Jeong, Cersovsky, and Liu teach the computer-implemented method of claim 1, further comprising: 
receiving, at the processor, the selection of a plurality of regions of interest within the template image, wherein each region of interest of the plurality of regions of interest is respectively marked in the template image and is associated with a respective label (Jeong teaches selecting a reference (template) image in para. [0053], wherein regions of interest (reference points) are selected based on a boundary of a foreground object in para. [0054]-[0055] and FIG. 2. Jeong additionally teaches that the reference points are each associated with IDs (labels) in the aforementioned sections. Jeong additionally teaches that “multiple different object types can be processed as respective reference objects (for example, boxes of different sizes and shapes), with a respective set of reference points and reference feature vectors being generated for each object and included in the reference point feature vector (RPFV) list 210” as shown in para. [0060]) (Cersovsky additionally teaches generating semantic representations (i.e. feature vectors as shown in para. [0097]) for multiple elements in a single image in para. [0142]-[0144]); 
outputting, via the processor, from the trained vision transformer model respective reference pixel level feature vectors from each region of interest of the plurality of regions of interest of the template image (Jeong teaches that “the corresponding reference features that are generated across the set of 2D reference image RI_1, RI_N_ri can be amalgamated for each reference point to provide a reference feature vector RFV that captures data across multiple 2D-Camera views” in para. [0061]) (Cersovsky teaches generating semantic embeddings using a trained vision transformer model in para. [0107], which are further used to create a cross-attention map wherein “in the cross-attention map, the meaning of an element or a group of elements of the semantic representation for the reconstruction of each image element or group of image elements of the medical image and/or vice versa can be visualized” as shown in para. [0144]); 
inputting, via the processor, each respective reference pixel level feature vector into the trained contrastive similarity metric learning model, wherein the trained contrastive similarity metric learning model is configured to automatically determine which of the pixel level feature vectors are similar to each respective reference pixel level feature vector for each region of interest (Jeong teaches that “correspondence module 110 is configured perform a correspondence task that detects and recognizes points in the input images that correspond to the points that are identified in the reference point feature vector list 210” in para. [0063], wherein the identified points are interpreted as equivalent to the regions of interest. See also para. [0067] and the teaching of the contrastive similarity metric learning model in claim 1); 
outputting, via the processor, from the trained contrastive similarity metric learning model respective groups of pixels that are similar to each of respective reference pixels for each region of interest (Jeong teaches “the feature map 306 pixels that are identified as being a corresponding match to a respective reference point feature vector are considered “points-of-interest (POI)” and can be identified in a POI list 310 that is generated in respect of the input image 302” in para. [0067]); and 
individually labeling, via the processor, the respective groups of pixels in the medical image associated with each group of the respective groups of pixel level feature vectors that are similar to each respective reference pixel level feature vector for each region of interest with a respective initial segmentation mask, wherein the respective groups of pixels that are individually labeled in the medical image correspond to respective regions of interest of the plurality of regions of interest (Jeong teaches detecting points of interest (POI), wherein “the POI data included the POI list 310 can include, among other things, one or more of: Reference Object ID; Reference Point IDs; Input Image ID; Unique ID for each object instance (and/or region-of-interest) in the input image 302; List of corresponding point pixel coordinates in the input image 302 for each of pixels matched to a respective reference point” in para. [0071]. Here, the labeling of the Unique ID for each object instance and the labelling of each corresponding point (see also FIG. 4) is interpreted as equivalent to the labelling the individual pixels and the labelling the individual objects that correspond to the respective regions of interest) (Cersovsky teaches the respective initial segmentation mask and the medical image as shown in claim 1). Similar motivations as applied to claim 1 can be applied here.

Regarding claim 11, Jeong teaches a system (Jeong, FIG. 1), comprising: 
a memory encoding processor-executable routines (Jeong teaches a processor acoupled witgh a memory in para. [0123]-[0124]); and 
a processor configured to access the memory and to execute the processor-executable routines, wherein the processor-executable routines, when executed by the processor, cause the processor to: 
obtain a (Jeong teaches processing an input image in para. [0064]); 
receive a selection of both a template image and a region of interest within the template image, wherein the region of interest is marked in the template image and is associated with a label (Jeong teaches selecting a reference (template) image in para. [0053], wherein regions of interest (reference points) are selected based on a boundary of a foreground object in para. [0054]-[0055] and FIG. 2. Jeong additionally teaches that the reference points are each associated with IDs (labels) in the aforementioned sections); 
input both the (Jeong teaches inputting the input image into trained feature generation model 124 in para. [0056]. Jeong teaches inputting the reference image into the trained feature generation model 124 in para. [0059]); 
output from the trained (Jeong teaches “feature generation operation 206 provides the reference image R1_1 to trained feature generation model 124 and receives a corresponding feature map that includes a respective feature vector FV for each pixel of the reference image R1_1. Each pixel-level feature vector FV includes a respective set of generated features {f1, . . . , fn}, where n is the number of features (e.g., dimensions) per pixel” in para. [0058] and [0056], wherein this process is similarly implemented for the input image as shown in para. [0065]); 
input both the pixel level feature vectors and the reference pixel level feature vector into a trained (Jeong teaches “corresponding point detection operation 126 can be a discrete rules-based operation or an ML-based model that has been trained to perform a pixel matching task” in para. [0066], wherein the ML-based model is interpreted as equivalent to the trained contrastive similarity metric learning model), wherein the trained contrastive (Jeong teaches determining a correspondence match point by utilizing an ML model (para. [0066]) and analyzing a similarity between the reference feature vector and the input image feature vectors as shown in para. [0067]-[0070]); 
output from the trained (Jeong teaches “the feature map 306 pixels that are identified as being a corresponding match to a respective reference point feature vector are considered “points-of-interest (POI)” and can be identified in a POI list 310 that is generated in respect of the input image 302” in para. [0067], wherein the POIs are interpreted as equivalent to the claimed similar pixels); and 
label the pixels in the medical image associated with the pixel level feature vectors that are similar to the reference pixel level feature vector (Jeong teaches determining “respective regions-of-interest 314_1, 314_2 and 314_3 overlaid as dashed rectangles on image 302 (which maps directly to pixels of feature map 306). Each region-of-interest 314_1, 314_2 and 314_3 encompasses a respective sub-set of image pixels that include a respective object instance 104I_1, 104I_2, and 104I-3.” in para. [0068] and FIG. 4. Jeong additionally teaches determining pixels of interest based on a similarity between pixels in the reference feature vector and the input image feature vector(s), wherein “the POI data included the POI list 310 can include, among other things, one or more of: Reference Object ID; Reference Point IDs; Input Image ID; Unique ID for each object instance (and/or region-of-interest) in the input image 302. List of corresponding point pixel coordinates in the input image 302 for each of pixels matched to a respective reference point” in para. [0071]. Here, corresponding point identifiers, as further shown in FIG. 4 #CP_1, etc.., for each corresponding pixel is interpreted as equivalent to the claimed pixel label).
While Jeong teaches that “it is possible to mask the image foreground (e.g., regions of interest that correspond to one or more objects) to get training key-points (e.g., corresponding positive key-points) only from the foreground” in a learning phase in para. [0110], Jeong fails to teach the above process in the context of a medical image of a portion of a subject, a trained contrastive similarity metric learning model, a trained vision transformer model, and an initial segmentation mask.
However, Cervosky teaches the above process in the context of a medical image of a portion of a subject (Cersovsky teaches using a medical image in the practice of the described embodiment wherein “a “medical image” is a visual representation of the human body or a part thereof” in para. [0055]), 
a trained vision transformer model (Cersovsky “the semantic representations of the medical images are embeddings generated with the help of a pre-trained vision transformer” in para. [0107]. Cersovsky further teaches that the semantic representations are used to generate attention maps in para. [0131]-[0132]), and 
an initial segmentation mask (Cersovsky teaches “segmentation masks can be generated from the attention maps” in para. [0149]. Here, the segmentation masks are interpreted as the claimed initial segmentation mask since the segmentation mask is further used to create a segmented medical image, which inherently infers that its masked state is an initial state before a final product is produced).
	Jeong and Cersovsky are both considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong to incorporate the teachings of Cersovsky and include “the above process in the context of a medical image of a portion of a subject, a trained vision transformer model, and an initial segmentation mask”. The motivation for doing so would have been “to be able to segment [] images with a high degree of accuracy without the need for manually segmented image”, as suggested by Cersovsky in para. [0007]. Cersovsky additionally suggests that “vision transformers have been shown to achieve excellent performance on various image classification benchmarks, often outperforming traditional convolutional neural networks (CNNs) which have been the dominant approach in computer vision for many years” in para. [0111]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong with Cersovsky to obtain the invention specified in the above claim limitations. 
Jeong and Cersovsky fail to teach a trained contrastive similarity metric learning model in the context of the above claim. 
	However, Liu teaches a trained contrastive similarity metric learning model (Liu teaches a “prototype-oriented contrastive learning framework for semi-supervised medical image segmentation” in FIG. 2, which “is designed to regularize the feature space by mining the semantic relations across images” at the pixel level in Section 3.2).
Jeong, Cersovsky, and Liu are both considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong (as modified by Cersovsky) to incorporate the teachings of Liu and include “a trained contrastive similarity metric learning model”. The motivation for doing so would have been that “the proposed method substantially improved the segmentation accuracy in terms of the three metrics compared with the previous semi-supervised method, achieving up to 84.28% in Dice coefficient, 74.43% in Jaccard coefficient and 10.35 mm in 95HD. SCC achieved the highest ASD value of 1.77 mm” and “generat[ing] more accurate boundary predictions and fewer incorrect segmentation regions”, as suggested by Liu in Section 4.2 (1). Additionally, Liu recites that contrastive learning “encourages the positive features to be aligned and the embeddings to match a uniform distribution in a hypersphere, which preserves as much information of the data as possible” in Section 3.2.2. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong and Cersovsky with Liu to obtain the invention specified in claim 11.

Regarding claim 18, Jeong, Cersovsky, and Liu teach the system of claim 11, wherein the processor-executable routines, when executed by the processor, further cause the processor to: 
receive the selection of a plurality of regions of interest within the template image, wherein each region of interest of the plurality of regions of interest is respectively marked in the template image and is associated with a respective label (Jeong teaches selecting a reference (template) image in para. [0053], wherein regions of interest (reference points) are selected based on a boundary of a foreground object in para. [0054]-[0055] and FIG. 2. Jeong additionally teaches that the reference points are each associated with IDs (labels) in the aforementioned sections. Jeong additionally teaches that “multiple different object types can be processed as respective reference objects (for example, boxes of different sizes and shapes), with a respective set of reference points and reference feature vectors being generated for each object and included in the reference point feature vector (RPFV) list 210” as shown in para. [0060]) (Cersovsky additionally teaches generating semantic representations (i.e. feature vectors as shown in para. [0097]) for multiple elements in a single image in para. [0142]-[0144]); 
output from the trained vision transformer model respective reference pixel level feature vectors from each region of interest of the plurality of regions of interest of the template image (Jeong teaches that “the corresponding reference features that are generated across the set of 2D reference image RI_1, RI_N_ri can be amalgamated for each reference point to provide a reference feature vector RFV that captures data across multiple 2D-Camera views” in para. [0061]) (Cersovsky teaches generating semantic embeddings using a trained vision transformer model in para. [0107], which are further used to create a cross-attention map wherein “in the cross-attention map, the meaning of an element or a group of elements of the semantic representation for the reconstruction of each image element or group of image elements of the medical image and/or vice versa can be visualized” as shown in para. [0144]); 
input each respective reference pixel level feature vector into the trained contrastive similarity metric learning model, wherein the trained contrastive similarity metric learning model is configured to automatically determine which of the pixel level feature vectors are similar to each respective reference pixel level feature vector for each region of interest (Jeong teaches that “correspondence module 110 is configured perform a correspondence task that detects and recognizes points in the input images that correspond to the points that are identified in the reference point feature vector list 210” in para. [0063], wherein the identified points are interpreted as equivalent to the regions of interest. See also para. [0067] and Liu’s teaching of the contrastive similarity metric learning model in claim 11); 
output from the trained contrastive similarity metric learning model respective groups of pixels that are similar to each of respective reference pixels for each region of interest (Jeong teaches “the feature map 306 pixels that are identified as being a corresponding match to a respective reference point feature vector are considered “points-of-interest (POI)” and can be identified in a POI list 310 that is generated in respect of the input image 302” in para. [0067]); and 
individually label the respective groups of pixels in the medical image associated with each group of the respective groups of pixel level feature vectors that are similar to each respective reference pixel level feature vector for each region of interest with a respective initial segmentation mask, wherein the respective groups of pixels that are individually labeled in the medical image correspond to respective regions of interest of the plurality of regions of interest (Jeong teaches detecting points of interest (POI), wherein “the POI data included the POI list 310 can include, among other things, one or more of: Reference Object ID; Reference Point IDs; Input Image ID; Unique ID for each object instance (and/or region-of-interest) in the input image 302; List of corresponding point pixel coordinates in the input image 302 for each of pixels matched to a respective reference point” in para. [0071]. Here, the labeling of the Unique ID for each object instance and the labelling of each corresponding point (see also FIG. 4) is interpreted as equivalent to the labelling the individual pixels and the labelling the individual objects that correspond to the respective regions of interest) (Cersovsky teaches the respective initial segmentation mask and the medical image as shown in claim 11). Similar motivations as applied to claim 11 can be applied here.

Regarding claim 20, Jeong teaches a non-transitory computer-readable medium, the computer-readable medium comprising processor-executable code that when executed by a processor (Jeong teaches “ a computer readable medium storing a set of non-transitory executable software instructions that, when executed by one or more processing devices, configure the one or more processing devices to perform one or more of the preceding methods” in para. [0025]), causes the processor to: 
obtain a (Jeong teaches processing an input image in para. [0064]); 
receive a selection of both a template image and a plurality of regions of interest within the template image, wherein each region of interest of the plurality of regions of interest is respectively marked in the template image and is associated with a respective label (Jeong teaches selecting a reference (template) image in para. [0053], wherein regions of interest (reference points) are selected based on a boundary of a foreground object in para. [0054]-[0055] and FIG. 2. Jeong additionally teaches that the reference points are each associated with IDs (labels) in the aforementioned sections); 
input both the (Jeong teaches inputting the input image into trained feature generation model 124 in para. [0056]. Jeong teaches inputting the reference image into the trained feature generation model 124 in para. [0059]); 
output from the trained (Jeong teaches “feature generation operation 206 provides the reference image R1_1 to trained feature generation model 124 and receives a corresponding feature map that includes a respective feature vector FV for each pixel of the reference image R1_1. Each pixel-level feature vector FV includes a respective set of generated features {f1, . . . , fn}, where n is the number of features (e.g., dimensions) per pixel” in para. [0058] and [0056], wherein this process is similarly implemented for the input image as shown in para. [0065]); 
input both the pixel level feature vectors and the respective reference pixel level feature vector into a trained (Jeong teaches “corresponding point detection operation 126 can be a discrete rules-based operation or an ML-based model that has been trained to perform a pixel matching task” in para. [0066], wherein the ML-based model is interpreted as equivalent to the trained contrastive similarity metric learning model), wherein the trained level feature vectors are similar to each respective reference pixel level feature vector for each region of interest (Jeong teaches determining a correspondence match point by utilizing an ML model (para. [0066]) and analyzing a similarity between the reference feature vector and the input image feature vectors as shown in para. [0067]-[0070]); 
output from the trained (Jeong teaches “the feature map 306 pixels that are identified as being a corresponding match to a respective reference point feature vector are considered “points-of-interest (POI)” and can be identified in a POI list 310 that is generated in respect of the input image 302” in para. [0067], wherein the POIs are interpreted as equivalent to the claimed similar pixels); and 
individually label the respective groups of pixels in the (Jeong teaches determining “respective regions-of-interest 314_1, 314_2 and 314_3 overlaid as dashed rectangles on image 302 (which maps directly to pixels of feature map 306). Each region-of-interest 314_1, 314_2 and 314_3 encompasses a respective sub-set of image pixels that include a respective object instance 104I_1, 104I_2, and 104I-3.” in para. [0068] and FIG. 4. Jeong additionally teaches determining pixels of interest based on a similarity between pixels in the reference feature vector and the input image feature vector(s), wherein “the POI data included the POI list 310 can include, among other things, one or more of: Reference Object ID; Reference Point IDs; Input Image ID; Unique ID for each object instance (and/or region-of-interest) in the input image 302. List of corresponding point pixel coordinates in the input image 302 for each of pixels matched to a respective reference point” in para. [0071]. Here, corresponding point identifiers, as further shown in FIG. 4 #CP_1, etc.., for each corresponding pixel is interpreted as equivalent to the claimed pixel label).
While Jeong teaches that “it is possible to mask the image foreground (e.g., regions of interest that correspond to one or more objects) to get training key-points (e.g., corresponding positive key-points) only from the foreground” in a learning phase in para. [0110], Jeong fails to teach the above process in the context of a medical image of a portion of a subject, a trained contrastive similarity metric learning model, a trained vision transformer model, and an initial segmentation mask.
However, Cervosky teaches the above process in the context of a medical image of a portion of a subject (Cersovsky teaches using a medical image in the practice of the described embodiment wherein “a “medical image” is a visual representation of the human body or a part thereof” in para. [0055]), 
a trained vision transformer model (Cersovsky “the semantic representations of the medical images are embeddings generated with the help of a pre-trained vision transformer” in para. [0107]. Cersovsky further teaches that the semantic representations are used to generate attention maps in para. [0131]-[0132]), and 
an initial segmentation mask (Cersovsky teaches “segmentation masks can be generated from the attention maps” in para. [0149]. Here, the segmentation masks are interpreted as the claimed initial segmentation mask since the segmentation mask is further used to create a segmented medical image, which inherently infers that its masked state is an initial state before a final product is produced).
	Jeong and Cersovsky are both considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong to incorporate the teachings of Cersovsky and include “the above process in the context of a medical image of a portion of a subject, a trained vision transformer model, and an initial segmentation mask”. The motivation for doing so would have been “to be able to segment [] images with a high degree of accuracy without the need for manually segmented image”, as suggested by Cersovsky in para. [0007]. Cersovsky additionally suggests that “vision transformers have been shown to achieve excellent performance on various image classification benchmarks, often outperforming traditional convolutional neural networks (CNNs) which have been the dominant approach in computer vision for many years” in para. [0111]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong with Cersovsky to obtain the invention specified in the above claim limitations.
Jeong and Cersovsky fail to teach a trained contrastive similarity metric learning model in the context of the above claim. 
	However, Liu teaches a trained contrastive similarity metric learning model (Liu teaches a “prototype-oriented contrastive learning framework for semi-supervised medical image segmentation” in FIG. 2, which “is designed to regularize the feature space by mining the semantic relations across images” at the pixel level in Section 3.2).
Jeong, Cersovsky, and Liu are both considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong (as modified by Cersovsky) to incorporate the teachings of Liu and include “a trained contrastive similarity metric learning model”. The motivation for doing so would have been that “the proposed method substantially improved the segmentation accuracy in terms of the three metrics compared with the previous semi-supervised method, achieving up to 84.28% in Dice coefficient, 74.43% in Jaccard coefficient and 10.35 mm in 95HD. SCC achieved the highest ASD value of 1.77 mm” and “generat[ing] more accurate boundary predictions and fewer incorrect segmentation regions”, as suggested by Liu in Section 4.2 (1). Additionally, Liu recites that contrastive learning “encourages the positive features to be aligned and the embeddings to match a uniform distribution in a hypersphere, which preserves as much information of the data as possible” in Section 3.2.2. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong and Cersovsky with Liu to obtain the invention specified in claim 20.
 
Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong (U.S. Publication No. 2024/0104882 A1) in view of Cersovsky et al. (U.S. Publication No. 2025/0349101 A1), hereinafter Cersovsky, Liu et al. (“Prototype-oriented contrastive learning for semi-supervised medical image segmentation”), hereinafter Liu, and Morvan et al. (U.S. Publication No. 2023/0207106 A1), hereinafter Morvan.
Regarding claim 4, Jeong, Cersovsky, and Liu teach the computer-implemented method of claim 1, further comprising: 
inputting, via the processor, each medical image of the plurality of medical images into the trained vision transformer model (Cersovsky teaches putting the medical images into the pre-trained vision transformer model in para. [0107]. See also para. [0096]); 
outputting, via the processor, from the trained vision transformer model respective pixel level feature vectors from each medical image of the plurality of medical images (Cersovsky teaches putting the medical images into the pre-trained vision transformer model to create semantic embeddings in para. [0107], wherein “a semantic representation is generated from each image of the variety of medical images” and “the learned semantic representation of a medical image may be a vector or a matrix or a tensor or a set of numerical values that encodes the characteristics and semantic information of the medical image” as shown in para. [0096]-[0097]. See also para. [0101]); 
inputting, via the processor, the respective pixel level feature vectors into the trained contrastive similarity metric learning model from each medical image of the plurality of medical images (Jeong teaches “corresponding point detection operation 126 can be a discrete rules-based operation or an ML-based model that has been trained to perform a pixel matching task” in para. [0066], and determining a correspondence match point by utilizing an ML model (para. [0066]) and analyzing a similarity between the reference feature vector and the input image feature vectors as shown in para. [0067]-[0070]; See also Liu’s teaching of the contrastive similarity metric learning model in claim 1) (Cersovsky teaches the medical images as shown in para. [0096] and [0092]); 
outputting, via the processor, from the trained contrastive similarity metric learning model respective pixels from each medical image of the plurality of medical images that are similar to reference pixels (While Cersovsky teaches a plurality of medical images, Jeong teaches that “correspondence module 110 is configured perform a correspondence task that detects and recognizes points in the input images that correspond to the points that are identified in the reference point feature vector list 210” as shown in para. [0063]; here, Jeong teaches multiple input images. See also the teaching of the similarity metric model as shown in claim 1); and 
labeling, via the processor, the respective pixels in each medical image of the plurality of medical images associated with the respective pixel level feature vectors from each medical image of the plurality of medical images that are similar to the reference pixel level feature vector with a respective initial segmentation mask, wherein the respective pixels that are labeled in each medical image of the plurality of medical images correspond to the region of interest (While Cersovsky teaches a plurality of medical images and the initial segmentation mask as shown in claim 1, Jeong teaches determining “respective regions-of-interest 314_1, 314_2 and 314_3 overlaid as dashed rectangles on image 302 (which maps directly to pixels of feature map 306). Each region-of-interest 314_1, 314_2 and 314_3 encompasses a respective sub-set of image pixels that include a respective object instance 104I_1, 104I_2, and 104I-3.” in para. [0068] and FIG. 4. Jeong additionally teaches determining pixels of interest based on a similarity between pixels in the reference feature vector and the input image feature vector(s), wherein “the POI data included the POI list 310 can include, among other things, one or more of: Reference Object ID; Reference Point IDs; Input Image ID; Unique ID for each object instance (and/or region-of-interest) in the input image 302” in para. [0071]. Here, the unique ID for each object instance is interpreted as equivalent to the claimed pixel label).
While Jeong teaches multiple input images and Cersovsky teaches a plurality of medical images, Jeong, Cersovsky, and Liu fail to teach obtaining, at a processor, a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image (emphasis added).
However, Morvan teaches obtaining, at a processor, a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image (Morvan teaches “the patient may have an injured ankle or foot requiring surgery, and for the surgery or possibly as part of the diagnosis, the surgeon may have requested medical images 108 of the ankle and/or foot bones of the patient to plan the surgery” wherein “segmentation system 106 may generate a segmentation mask based on one of medical images 108 or a composite of two or more of medical images 108” in para. [0038]-[0039]).
	Jeong, Cersovsky, Liu, and Morvan are all considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong (as modified by Cervosky and Liu) to incorporate the teachings of Morvan and include “obtaining, at a processor, a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image”. The motivation for doing so would have been to “provide more complete image data in medical images 108” and “so that the surgeon can view anatomical objects (e.g., bones, soft tissue, etc.) and the size, shape, and interconnections of the anatomical objects with other anatomical features of the patient”, as suggested by Morvan in para. [0038]-[0039]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong, Cersovsky, and Liu with Morvan to obtain the invention specified in claim 4. 

Regarding claim 14, Jeong, Cersovsky, and Liu teach the system of claim 11, wherein the processor-executable routines, when executed by the processor (Jeong, see claim 11), further cause the processor to: 
input each medical image of the plurality of medical images into the trained vision transformer model (Cersovsky teaches putting the medical images into the pre-trained vision transformer model in para. [0107]. See also para. [0096]); 
output from the trained vision transformer model respective pixel level feature vectors from each medical image of the plurality of medical images (Cersovsky teaches putting the medical images into the pre-trained vision transformer model to create semantic embeddings in para. [0107], wherein “a semantic representation is generated from each image of the variety of medical images” and “the learned semantic representation of a medical image may be a vector or a matrix or a tensor or a set of numerical values that encodes the characteristics and semantic information of the medical image” as shown in para. [0096]-[0097]. See also para. [0101]); 
input the respective pixel level feature vectors into the trained contrastive similarity metric learning model from each medical image of the plurality of medical images (Jeong teaches “corresponding point detection operation 126 can be a discrete rules-based operation or an ML-based model that has been trained to perform a pixel matching task” in para. [0066], and determining a correspondence match point by utilizing an ML model (para. [0066]) and analyzing a similarity between the reference feature vector and the input image feature vectors as shown in para. [0067]-[0070]; See also Liu’s teaching of the contrastive similarity metric learning model in claim 11) (Cersovsky teaches the medical images as shown in para. [0096] and [0092]); 
output from the trained contrastive similarity metric learning model respective pixels from each medical image of the plurality of medical images that are similar to reference pixels (While Cersovsky teaches a plurality of medical images, Jeong teaches that “correspondence module 110 is configured perform a correspondence task that detects and recognizes points in the input images that correspond to the points that are identified in the reference point feature vector list 210” as shown in para. [0063]; here, Jeong teaches multiple input images. See also Liu’s teaching of the similarity metric model as shown in claim 11); and 
label the respective pixels in each medical image of the plurality of medical images associated with the respective pixel level feature vectors from each medical image of the plurality of medical images that are similar to the reference pixel level feature vector with a respective initial segmentation mask, wherein the respective pixels that are labeled in each medical image of the plurality of medical images correspond to the region of interest (While Cersovsky teaches a plurality of medical images and the initial segmentation mask as shown in claim 11, Jeong teaches determining “respective regions-of-interest 314_1, 314_2 and 314_3 overlaid as dashed rectangles on image 302 (which maps directly to pixels of feature map 306). Each region-of-interest 314_1, 314_2 and 314_3 encompasses a respective sub-set of image pixels that include a respective object instance 104I_1, 104I_2, and 104I-3.” in para. [0068] and FIG. 4. Jeong additionally teaches determining pixels of interest based on a similarity between pixels in the reference feature vector and the input image feature vector(s), wherein “the POI data included the POI list 310 can include, among other things, one or more of: Reference Object ID; Reference Point IDs; Input Image ID; Unique ID for each object instance (and/or region-of-interest) in the input image 302” in para. [0071]. Here, the unique ID for each object instance is interpreted as equivalent to the claimed pixel label).
While Jeong teaches multiple input images and Cersovsky teaches a plurality of medical images, Jeong, Cersovsky, and Liu fail to teach obtaining, obtain a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image (emphasis added).
However, Morvan teaches obtaining a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image (Morvan teaches “the patient may have an injured ankle or foot requiring surgery, and for the surgery or possibly as part of the diagnosis, the surgeon may have requested medical images 108 of the ankle and/or foot bones of the patient to plan the surgery” wherein “segmentation system 106 may generate a segmentation mask based on one of medical images 108 or a composite of two or more of medical images 108” in para. [0038]-[0039]).
	Jeong, Cersovsky, Liu, and Morvan are all considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong to incorporate the teachings of Morvan (as modified by Cersovsky and Liu) and include “obtaining a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image”. The motivation for doing so would have been to “provide more complete image data in medical images 108” and “so that the surgeon can view anatomical objects (e.g., bones, soft tissue, etc.) and the size, shape, and interconnections of the anatomical objects with other anatomical features of the patient”, as suggested by Morvan in para. [0038]-[0039]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong, Cersovsky, and Liu with Morvan to obtain the invention specified in claim 14. 

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong (U.S. Publication No. 2024/0104882 A1) in view of Cersovsky et al. (U.S. Publication No. 2025/0349101 A1), hereinafter Cersovsky, Liu et al. (“Prototype-oriented contrastive learning for semi-supervised medical image segmentation”), hereinafter Liu, and Ryan (U.S. Publication No. 2023/0186509 A1). 
Regarding claim 3, Jeong, Cersovsky, and Liu teach the computer-implemented method of claim 1.
While Jeong teaches labeling pixels in the medical image associated with the pixel level feature vectors that are similar to the reference pixel level feature vector, Jeong, Cersovsky, and Liu fail to teach utilizing connected component analysis on the pixels to generate the initial segmentation mask.
However, Ryan teaches utilizing connected component analysis on the pixels to generate the initial segmentation mask (Ryan teaches that “pixels in the foreground mask may be grouped into an area of connected pixels, known as a blob, using known connected component analysis techniques” in para. [0142], wherein the blob (initial segmentation mask) is further used to detect/identify (label) articles).
Jeong, Cersovsky, Liu, and Ryan are all considered to be analogous to the claimed invention because they are in the same field of analyzing images to determine identifiers for objects through feature extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong (as modified by Cersovsky and Liu) to incorporate the teachings of Ryan and include “utilizing connected component analysis on the pixels to generate the initial segmentation mask”. The motivation for doing so would have been that the connected component analysis process “advantageously limits the noise and creates a boundary around the entire detected object rather than creating several small ROIs”, as suggested by Ryan in para. [0142]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong, Cersovsky, and Liu with Ryan to obtain the invention specified in claim 3. 
Regarding claim 13, Jeong, Cersovsky, and Liu teach the system of claim 11.
While Jeong teaches labeling pixels in the medical image associated with the pixel level feature vectors that are similar to the reference pixel level feature vector, Jeong, Cersovsky, and Liu fail to teach utilizing connected component analysis on the pixels to generate the initial segmentation mask.
However, Ryan teaches utilizing connected component analysis on the pixels to generate the initial segmentation mask (Ryan teaches that “pixels in the foreground mask may be grouped into an area of connected pixels, known as a blob, using known connected component analysis techniques” in para. [0142], wherein the blob (initial segmentation mask) is further used to detect/identify (label) articles).
Jeong, Cersovsky, Liu, and Ryan are all considered to be analogous to the claimed invention because they are in the same field of analyzing images to determine identifiers for objects through feature extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong (as modified by Cersovsky and Liu) to incorporate the teachings of Ryan and include “utilizing connected component analysis on the pixels to generate the initial segmentation mask”. The motivation for doing so would have been that the connected component analysis process “advantageously limits the noise and creates a boundary around the entire detected object rather than creating several small ROIs”, as suggested by Ryan in para. [0142]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong, Cersovsky, and Liu with Ryan to obtain the invention specified in claim 13. 

Claims 6, 8, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Jeong (U.S. Publication No. 2024/0104882 A1) in view of Cersovsky et al. (U.S. Publication No. 2025/0349101 A1), hereinafter Cersovsky, Liu et al. (“Prototype-oriented contrastive learning for semi-supervised medical image segmentation”), hereinafter Liu, Morvan et al. (U.S. Publication No. 2023/0207106 A1), hereinafter Morvan, and Tizhoosh et al. (U.S. Publication No. 2022/0068498 A1), hereinafter Tizhoosh.
Regarding claim 6, Jeong, Cersovsky, and Liu teach the computer-implemented method of claim 1, further comprising: 
obtaining, at a processor, a medical imaging volume (Jeong teaches obtaining a plurality of images in para. [0063]) (Cersovsky teaches medical images as shown in claim 1);
inputting, via the processor, each medical image of the plurality of medical images into the trained vision transformer model (Jeong teaches inputting the input image into trained feature generation model 124 in para. [0056]. Jeong teaches inputting the reference image into the trained feature generation model 124 in para. [0059]) (Cersovsky additionally teaches inputting medical images into a trained vision transformer model as shown in claim 1); 
outputting, via the processor, from the trained vision transformer model respective pixel level feature vectors and respective image level features from each medical image of the plurality of medical images (Jeong teaches “feature generation operation 206 provides the reference image R1_1 to trained feature generation model 124 and receives a corresponding feature map that includes a respective feature vector FV for each pixel of the reference image R1_1. Each pixel-level feature vector FV includes a respective set of generated features {f1, . . . , fn}, where n is the number of features (e.g., dimensions) per pixel” in para. [0058], wherein this process is similarly implemented for the input image as shown in para. [0056]) (Cersovsky teaches a plurality of medical images in para. [0055]-[0057]); 
inputting, via the processor, the respective pixel level feature vectors into the trained contrastive similarity metric learning model from the set of (Jeong teaches “corresponding point detection operation 126 can be a discrete rules-based operation or an ML-based model that has been trained to perform a pixel matching task” in para. [0066], wherein Jeong further teaches determining a correspondence match point by utilizing an ML model (para. [0066]) and analyzing a similarity between the reference feature vector and the input image feature vectors as shown in para. [0067]-[0070]. See also Liu’s teaching of the contrastive similarity metric learning model in claim 1); 
outputting, via the processor, from the trained contrastive similarity metric learning model respective pixels from the set of (Jeong teaches “the feature map 306 pixels that are identified as being a corresponding match to a respective reference point feature vector are considered “points-of-interest (POI)” and can be identified in a POI list 310 that is generated in respect of the input image 302” in para. [0067]; See also Liu’s teaching of the contrastive similarity metric learning model in claim 1); and 
labeling, via the processor, the respective pixels in each medical image of the set of most relevant medical images associated with the respective pixel level feature vectors from each medical image of the set of most relevant medical images that are similar to the reference pixel level feature vector with a respective initial segmentation mask, wherein the respective pixels that are labeled in each medical image of the set of (Jeong teaches detecting points of interest (POI), wherein “the POI data included the POI list 310 can include, among other things, one or more of: Reference Object ID; Reference Point IDs; Input Image ID; Unique ID for each object instance (and/or region-of-interest) in the input image 302; List of corresponding point pixel coordinates in the input image 302 for each of pixels matched to a respective reference point” in para. [0071]. Here, the labeling of the Unique ID for each object instance and the labelling of each corresponding point (see also FIG. 4) is interpreted as equivalent to the labelling the individual pixels and the labelling the individual objects that correspond to the respective regions of interest) (Cersovsky teaches the respective initial segmentation mask and the medical image as shown in claim 1).  Similar motivations as applied to claim 1 can be applied here regarding the combination of Jeong and Cersovsky.
	Jeong, Cersovsky, and Liu fail to teach obtaining, at a processor, a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image; determining, via the processor, a set of most relevant medical images from the plurality of medical images.
However, Morvan teaches obtaining, at a processor, a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image (Morvan teaches “the patient may have an injured ankle or foot requiring surgery, and for the surgery or possibly as part of the diagnosis, the surgeon may have requested medical images 108 of the ankle and/or foot bones of the patient to plan the surgery” wherein “segmentation system 106 may generate a segmentation mask based on one of medical images 108 or a composite of two or more of medical images 108” in para. [0038]-[0039]).
	Jeong, Cersovsky, and Morvan are all considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong to incorporate the teachings of Morvan (as modified by Cersovsky) and include “obtaining, at a processor, a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image”. The motivation for doing so would have been to “provide more complete image data in medical images 108” and “so that the surgeon can view anatomical objects (e.g., bones, soft tissue, etc.) and the size, shape, and interconnections of the anatomical objects with other anatomical features of the patient”, as suggested by Morvan in para. [0038]-[0039]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong, Cersovsky, and Liu with Morvan to obtain the invention specified in the above claim limitation.
	Jeong, Cersovsky, Liu, and Morvan fail to teach determining, via the processor, a set of most relevant medical images from the plurality of medical images.
	However, Tizhoosh teaches determining, via the processor, a set of most relevant medical images from the plurality of medical images (Tizhoosh teaches that “for each intermediate set of images 342, 344, and 346, the processor 112 defines a curated set of images for storage in the image database from the images in the intermediate set based on the relevancy indicator of each image. The curated set of images 360 are a subset of the intermediate set of images 350 with greater relevancy” in para. [0233]).
Jeong, Cersovsky, Liu, Morvan, and Tizhoosh are all considered to be analogous to the claimed invention because they are in the same field of analyzing images to determine identifiers for images through feature extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong (as modified by Cersovsky, Liu, and Morvan) to incorporate the teachings of Tizhoosh and include “determining, via the processor, a set of most relevant medical images from the plurality of medical images”. The motivation for doing so would have been to generate “a subset of the intermediate set of images 350 with greater relevancy” and that “by automatically identifying whether the image contains relevant information, selectively retaining a version of the image in an alternative formats and excluding the image from the database, and deleting the image, storage requirements can be reduced without forgoing relevant information” , as suggested by Tizhoosh in para. [0233] and [0324], respectively. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong, Cersovsky, Liu, and Morvan with Tizhoosh to obtain the invention specified in claim 6. 

Regarding claim 8, Jeong, Cersovsky, Morvan, Liu, and Tizhoosh teach the computer-implemented method of claim 6, 
wherein determining the set of most relevant medical images from the plurality of medical images is based on the image level features (Tizhoosh teaches determining information related to identifiers of medical images which includes feature vectors in para. [0198], wherein “the relevance value of an image 612a represents the degree of relevancy of the domain knowledge parameters of an image 612a to the domain knowledge parameters of the plurality of images 612” as shown in para. [0271]. This process involves determining image level features (see domain knowledge parameters in para. [0258]) for each medical image and comparing these features with other features of corresponding images).   
It would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong (as modified by Cersovsky, Liu, and Morvan) to incorporate the teachings of Tizhoosh and include “wherein determining the set of most relevant medical images from the plurality of medical images is based on the image level features”. The motivation for doing so would have been to generate “a subset of the intermediate set of images 350 with greater relevancy” and that “by automatically identifying whether the image contains relevant information, selectively retaining a version of the image in an alternative formats and excluding the image from the database, and deleting the image, storage requirements can be reduced without forgoing relevant information” , as suggested by Tizhoosh in para. [0233] and [0324], respectively. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong, Cersovsky, Liu, and Morvan with Tizhoosh to obtain the invention specified in claim 8. 

Regarding claim 16, Jeong, Cersovsky, and Liu teach the system of claim 11, wherein the processor-executable routines, when executed by the processor, further cause the processor to: 
obtain a medical imaging volume (Jeong teaches obtaining a plurality of images in para. [0063]) (Cersovsky teaches medical images as shown in claim 11); 
input each medical image of the plurality of medical images into the trained vision transformer model (Jeong teaches inputting the input image into trained feature generation model 124 in para. [0056]. Jeong teaches inputting the reference image into the trained feature generation model 124 in para. [0059]) (Cersovsky additionally teaches inputting medical images into a trained vision transformer model as shown in claim 11); 
output from the trained vision transformer model respective pixel level feature vectors and respective image level features from each medical image of the plurality of medical images (Jeong teaches “feature generation operation 206 provides the reference image R1_1 to trained feature generation model 124 and receives a corresponding feature map that includes a respective feature vector FV for each pixel of the reference image R1_1. Each pixel-level feature vector FV includes a respective set of generated features {f1, . . . , fn}, where n is the number of features (e.g., dimensions) per pixel” in para. [0058], wherein this process is similarly implemented for the input image as shown in para. [0056]) (Cersovsky teaches a plurality of medical images in para. [0055]-[0057]); 
input the respective pixel level feature vectors into the trained contrastive similarity metric learning model from the set of most relevant medical images (Jeong teaches “corresponding point detection operation 126 can be a discrete rules-based operation or an ML-based model that has been trained to perform a pixel matching task” in para. [0066], wherein Jeong further teaches determining a correspondence match point by utilizing an ML model (para. [0066]) and analyzing a similarity between the reference feature vector and the input image feature vectors as shown in para. [0067]-[0070]; See also Liu’s teaching of the contrastive similarity metric learning model in claim 11); 
output from the trained contrastive similarity metric learning model respective pixels from the set of most relevant medical images that are similar to reference pixels (Jeong teaches “the feature map 306 pixels that are identified as being a corresponding match to a respective reference point feature vector are considered “points-of-interest (POI)” and can be identified in a POI list 310 that is generated in respect of the input image 302” in para. [0067]; See also Liu’s teaching of the contrastive similarity metric learning model in claim 11); and 
label the respective pixels in each medical image of the set of most relevant medical images associated with the respective pixel level feature vectors from each medical image of the set of most relevant medical images that are similar to the reference pixel level feature vector with a respective initial segmentation mask, wherein the respective pixels that are labeled in each medical image of the set of most relevant medical images correspond to the region of interest (Jeong teaches detecting points of interest (POI), wherein “the POI data included the POI list 310 can include, among other things, one or more of: Reference Object ID; Reference Point IDs; Input Image ID; Unique ID for each object instance (and/or region-of-interest) in the input image 302; List of corresponding point pixel coordinates in the input image 302 for each of pixels matched to a respective reference point” in para. [0071]. Here, the labeling of the Unique ID for each object instance and the labelling of each corresponding point (see also FIG. 4) is interpreted as equivalent to the labelling the individual pixels and the labelling the individual objects that correspond to the respective regions of interest) (Cersovsky teaches the respective initial segmentation mask and the medical image as shown in claim 11). Similar motivations as applied to claim 11 can be applied here regarding the combination of Jeong and Cersovsky.
Jeong, Cersovsky, and Liu fail to teach obtaining a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image.
However, Morvan teaches obtaining a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image (Morvan teaches “the patient may have an injured ankle or foot requiring surgery, and for the surgery or possibly as part of the diagnosis, the surgeon may have requested medical images 108 of the ankle and/or foot bones of the patient to plan the surgery” wherein “segmentation system 106 may generate a segmentation mask based on one of medical images 108 or a composite of two or more of medical images 108” in para. [0038]-[0039]).
Jeong, Cersovsky, Liu and Morvan are all considered to be analogous to the claimed invention because they are in the same field of segmenting objects within images. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong to incorporate the teachings of Morvan (as modified by Cersovsky and Liu) and include “obtain a medical imaging volume of the portion of the subject, wherein the medical imaging volume comprises a plurality of medical images including the medical image”. The motivation for doing so would have been to “provide more complete image data in medical images 108” and “so that the surgeon can view anatomical objects (e.g., bones, soft tissue, etc.) and the size, shape, and interconnections of the anatomical objects with other anatomical features of the patient”, as suggested by Morvan in para. [0038]-[0039]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong, Cersovsky, and Liu with Morvan to obtain the invention specified in the above claim limitation.
Jeong, Cersovsky, Liu, and Morvan fail to teach determining a set of most relevant medical images from the plurality of medical images.
	However, Tizhoosh teaches determining a set of most relevant medical images from the plurality of medical images (Tizhoosh teaches that “for each intermediate set of images 342, 344, and 346, the processor 112 defines a curated set of images for storage in the image database from the images in the intermediate set based on the relevancy indicator of each image. The curated set of images 360 are a subset of the intermediate set of images 350 with greater relevancy” in para. [0233]).
 Jeong, Cersovsky, Liu, Morvan, and Tizhoosh are all considered to be analogous to the claimed invention because they are in the same field of analyzing images to determine identifiers for images through feature extraction. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Jeong (as modified by Cersovsky, Liu, and Morvan) to incorporate the teachings of Tizhoosh and include to “determine a set of most relevant medical images from the plurality of medical images”. The motivation for doing so would have been to generate “a subset of the intermediate set of images 350 with greater relevancy” and that “by automatically identifying whether the image contains relevant information, selectively retaining a version of the image in an alternative formats and excluding the image from the database, and deleting the image, storage requirements can be reduced without forgoing relevant information” , as suggested by Tizhoosh in para. [0233] and [0324], respectively. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Jeong, Cersovsky, Liu, and Morvan with Tizhoosh to obtain the invention specified in claim 16. 



Allowable Subject Matter
Claims 2, 5, 7, 10, 12, 15, 17, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
The following is a statement of reasons for the indication of allowable subject matter.
 	The best prior art of record Jeong, Cersovsky, Liu, Tizhoosh, Morvan, and Ryan. Prior art applied alone or in combination with fails to anticipate or render obvious claims 2, 5, 7, 10, 12, 15, 17, and 19.
Claim 2
Regarding claim 2, Jeong, Cersovsky, and Liu teach the computer-implemented method of claim 1. 
	Cersovsky further teaches that the segmented medical image was generated based on the different segmentation masks in para. [0149].
Jeong further teaches labeling pixels of input images images that correspond to the region of interest in a reference image (see claim 1).  
	Morvan further teaches utilizing, via the processor, a promptable segmentation model to label the medical image with a refined segmentation mask of a region that corresponds to the region of interest, wherein the initial segmentation mask serves as an automatic prompt for labeling (Jeong teaches labeling pixels of input images images that correspond to the region of interest in a reference image in claim 1) (Morvan teaches a method of segmenting medical images wherein “segmentation system 106 generates an initial segmentation mask using a neural network, such as a convolutional neural network (CNN), and then generates a refined segmentation mask based on the initial segmentation mask” as shown in para. [0041]. See also FIGs. 7 and 8. Para. [0084]-[0085] describes the refinement process in the context of labelling, wherein the initial segmentation mask is labeled and serves as a prompt for further refinement labelling. See also para. [0062] which suggests that the initial segmentation mask serves as a prompt for marking voxels with cost values and FIG. 9. Here, Jeong’s teaching of generating the pixels of interest in an input image that align with the feature vector in the region of interest can be combined with Morvan’s teaching of generating initial segmentation masks of medical images, which serve as an automatic prompt for labeling, and refining said initial masks to teach the above limitation).
	However, neither Jeong, nor Cersovsky, nor Liu, nor Morvan, nor Tizhoosh, nor Ryan, nor the combination, teaches a promptable segmentation model to label the medical image with a refined segmentation mask of a region that corresponds to the region of interest, wherein the initial segmentation mask serves as an automatic prompt for labeling in the context of the pipeline regarding the vision transformer model and trained contrastive similarity metric model as laid out in claim 1, upon which claim 2 is dependent.
Similar analysis is applicable to claim 12. 
Claim 5
Regarding claim 5, Jeong, Cersovsky, Liu, and Morvan teach the computer-implemented method of claim 4.
Jeong and Morvan further teach utilizing, via the processor, a promptable segmentation model to label each medical image of the plurality of medical images with a respective refined segmentation mask of a respective region that corresponds to the region of interest, wherein the respective initial segmentation mask serves as an automatic prompt for labeling (Jeong teaches labeling pixels of input images images that correspond to the region of interest in a reference image in claim 1) (Morvan teaches a method of segmenting medical images wherein “segmentation system 106 generates an initial segmentation mask using a neural network, such as a convolutional neural network (CNN), and then generates a refined segmentation mask based on the initial segmentation mask” as shown in para. [0041]. See also FIGs. 7 and 8. Para. [0084]-[0085] describes the refinement process in the context of labelling, wherein the initial segmentation mask is labeled and serves as a prompt for further refinement labelling. See also para. [0062] which suggests that the initial segmentation mask serves as a prompt for marking voxels with cost values and FIG. 9. Here, Jeong’s teaching of generating the pixels of interest in an input image that align with the feature vector in the region of interest can be combined with Morvan’s teaching of generating initial segmentation masks of medical images, which serve as an automatic prompt for labeling, and refining said initial masks to teach the above limitation).
However, neither Jeong, nor Cersovsky, nor Liu, nor Morvan, nor Tizhoosh, nor Ryan, nor the combination, teaches a promptable segmentation model to label each medical image of the plurality of medical images with a respective refined segmentation mask of a respective region that corresponds to the region of interest, wherein the respective initial segmentation mask serves as an automatic prompt for labeling in the context of the pipeline regarding the vision transformer model and trained contrastive similarity metric model as laid out in claim 1, upon which claim 5 is dependent.
Similar analysis is applicable to claim 15. 
Claim 7
Regarding claim 7, Jeong, Cersovsky, Liu, Morvan, and Tizhoosh teach the computer-implemented method of claim 6.
Jeong and Morvan further teach utilizing, via the processor, a promptable segmentation model to label each medical image of the set of most relevant medical images with a respective refined segmentation mask of a respective region that corresponds to the region of interest, wherein the respective initial segmentation mask serves as an automatic prompt for labeling (Jeong teaches labeling pixels of input images images that correspond to the region of interest in a reference image in claim 1) (Morvan teaches a method of segmenting medical images wherein “segmentation system 106 generates an initial segmentation mask using a neural network, such as a convolutional neural network (CNN), and then generates a refined segmentation mask based on the initial segmentation mask” as shown in para. [0041]. See also FIGs. 7 and 8. Para. [0084]-[0085] describes the refinement process in the context of labelling, wherein the initial segmentation mask is labeled and serves as a prompt for further refinement labelling. See also para. [0062] which suggests that the initial segmentation mask serves as a prompt for marking voxels with cost values and FIG. 9. Here, Jeong’s teaching of generating the pixels of interest in an input image that align with the feature vector in the region of interest can be combined with Morvan’s teaching of generating initial segmentation masks of medical images, which serve as an automatic prompt for labeling, and refining said initial masks to teach the above limitation) (Tizhoosh additionally teaches a process of generating a set of the most relevant medical images (see citations and motivations as applied to claim 6 regarding the combination of Tizhoosh with Jeong (as modified by Cersovsky, Liu, and Morvan))).
However, neither Jeong, nor Cersovsky, nor Liu, nor Morvan, nor Tizhoosh, nor Ryan, nor the combination, teaches a promptable segmentation model to label each medical image of the set of most relevant medical images with a respective refined segmentation mask of a respective region that corresponds to the region of interest, wherein the respective initial segmentation mask serves as an automatic prompt for labeling in the context of the pipeline regarding the vision transformer model and trained contrastive similarity metric model as laid out in claim 1, upon which claim 7 is dependent.
Similar analysis is applicable to claim 17. 
Claim 10
Regarding claim 10, Jeong, Cersovsky, and Liu teach the computer-implemented method of claim 9.
Cersovsky further teaches that the segmented medical image was generated based on the different segmentation masks in para. [0149].
Jeong further teaches labeling pixels of input images that correspond to the region of interest in a reference image (see claim 1).
	Morvan and Jeong further teach utilizing, via the processor, a promptable segmentation model to label the medical image with respective refined segmentation masks of respective regions that respectively correspond to the respective regions of interest of the plurality of regions of interest, wherein the respective initial segmentation masks serve as automatic prompts for labeling (Jeong teaches labeling pixels of input images images that correspond to the regions of interest in a reference image in para. [0071]-[0072] and FIG. 4) (Morvan teaches a method of segmenting medical images wherein “segmentation system 106 generates an initial segmentation mask using a neural network, such as a convolutional neural network (CNN), and then generates a refined segmentation mask based on the initial segmentation mask” as shown in para. [0041]. See also FIGs. 7 and 8, wherein it is apparent that there exists multiple initial segmentation masks corresponding to multiple regions of interest, which are then used to generate multiple refined segmentation masks. Para. [0084]-[0085] describes the refinement process in the context of labelling, wherein the initial segmentation mask is labeled and serves as a prompt for further refinement labelling. See also para. [0062] which suggests that the initial segmentation mask serves as a prompt for marking voxels with cost values and FIG. 9. Here, Jeong’s teaching of generating the pixels of interest in an input image that align with the feature vector in the region of interest can be combined with Morvan’s teaching of generating initial segmentation masks of medical images, which serve as an automatic prompt for labeling, and refining said initial masks to teach the above limitation).
	However, neither Jeong, nor Cersovsky, nor Liu, nor Morvan, nor Tizhoosh, nor Ryan, nor the combination, teaches a promptable segmentation model to label the medical image with respective refined segmentation masks of respective regions that respectively correspond to the respective regions of interest of the plurality of regions of interest, wherein the respective initial segmentation masks serve as automatic prompts for labeling in the context of the pipeline regarding the vision transformer model and trained contrastive similarity metric model as laid out in claim 1, upon which claim 10 is dependent.
Similar analysis is applicable to claim 19. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Geiger et al. (U.S. Publication No. 2008/0267468 A1) teaches a method and computer program for segmenting a prostate from a medical image.
Zhang et al. (U.S. Publication No. 2022/0222932 A1) teaches an image region segmentation method and apparatus.
Liang et al. (U.S. Publication No. 2006/0209063 A1) teaches segmenting an object in image data.
Tan et al. (U.S. Publication No. 2021/0334598 A1) teaches segmenting an object in image data.
Any inquiry concerning this communication or earlier communications from the examiner
should be directed to KYLA G ALLEN whose telephone number is (703)756-5315. The examiner can
normally be reached M-F 7:30am - 4:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use
the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,
John Villecco can be reached on (571) 272-7319. The fax phone number for the organization where this
application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users. To
file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit
https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and
https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional
questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like
assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or
571-272-1000.
/Kyla Guan-Ping Tiao Allen/
Examiner, Art Unit 2661

/JOHN VILLECCO/Supervisory Patent Examiner, Art Unit 2661
Read full office action
Prosecution Timeline

Mar 26, 2024
Application Filed
Mar 03, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/060,260
Patent 12597119
OPERATING METHOD OF ELECTRONIC DEVICE INCLUDING PROCESSOR EXECUTING SEMICONDUCTOR LAYOUT SIMULATION MODULE BASED ON MACHINE LEARNING
2y 5m to grant Granted Apr 07, 2026
17/685,863
Patent 12588594
SYSTEM AND METHOD FOR IDENTIFYING LENGTHS OF PARTICLES
2y 5m to grant Granted Mar 31, 2026
17/986,620
Patent 12591963
SYSTEM AND METHOD FOR ENHANCING DEFECT DETECTION IN OPTICAL CHARACTERIZATION SYSTEMS USING A DIGITAL FILTER
2y 5m to grant Granted Mar 31, 2026
18/127,902
Patent 12548152
INTRACRANIAL ARTERY STENOSIS DETECTION METHOD AND SYSTEM
2y 5m to grant Granted Feb 10, 2026
17/986,817
Patent 12541833
ASSESSING IMAGE/VIDEO QUALITY USING AN ONLINE MODEL TO APPROXIMATE SUBJECTIVE QUALITY VALUES
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+17.1%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 53 resolved cases by this examiner. Grant probability derived from career allow rate.