DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-9 and 11-20 have been considered but are moot because the new ground of rejection does not rely on the combination of references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Specifically, see the citation of the Kirillov et al. (“Segment Anything”, copy attached, see PTO-892) reference in the rejection that follows.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 2, 4-7, 9, 11, 12, 14-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Kirillov et al. (“Segment Anything”, copy attached, see PTO-892) in view of Zhao (U.S.P.G. Pub. No. 2025/0061566).
Regarding claim 1, Kirillov et al. (“Segment Anything”, copy attached, see PTO-892) discloses:
An apparatus, comprising:
one or more processors configured to:
obtain a image (see Figure 1, Abstract – segmentation on images);
receive, from a user of the apparatus, a natural language based prompt to tag a target structure in the image (see Figure 1, promptable engine for segmentation; page 4, “2. Segment Anything Task”, prompt from NLP to segmentation);
determine one or more text features associated with the natural language based prompt (pages 2, 4, the prompt includes a variety of information indicating what to segment including, but not limited to free-form text that describes a part of an image);
identify one or more visual features corresponding to the target structure based on the one or more text features associated with the natural language based prompt and a machine-learning (ML) model (page 4, the NLP indicates what to segment in the image; see also pages 5-6 regarding annotation of structures), wherein the ML model has been pre-trained to learn a correspondence between a plurality of text embeddings and a plurality visual embeddings in an embedding space (page 4, pre-training; see also pages 5-6 regarding assisted-manual, semi-automatic annotation), and wherein the one or more processors are configured to identify the one or more visual features corresponding to the target structure by determining, based on the correspondence between the plurality of text embeddings and the plurality visual embeddings learned by the ML model, that the one or more visual features are correlated to the one or more text features associated with the natural language based prompt (Figure 1, 4, pages 5-6, the corresponding structures are annotated); and
tag the target structure in the image based on the one or more identified visual features (Figure 1, 4, pages 5-6, the corresponding structures are annotated/tagged/segmented; see additionally “Masks” that are output)
Kirillov does not explicitly disclose:
obtain a medical image;
tagging a medical image;
Zhao (U.S.P.G. Pub. No. 2025/0061566) discloses:
obtain a medical image (paragraphs [0022]-[0024], medical image data);
tagging a medical image (paragraphs [0023]-[0024], [0028], [0031], DICOM file, for example, includes text strings indicating content of the medical image data – e.g., “’NERUO^HEAD” describes head region data– but other attributes can be ‘Series Description’ or ‘Body Part Examined’; further, EMR includes findings in medical records in text; see also paragraph [0036], the processing module can further highlight the detected objects of interest or display a text label describing an aspect of the image that can be displayed with the user selects the object)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the system of Zhao with the system of Kirillov such that the image obtained was a medical image and the medical image was then tagged as described in Zhao. The suggestion/motivation would have been in order to implement a system capable of “increasing efficiency” and have a “significant amount of time [be] saved” within the application of “searching for findin[g] matching objects of interest described in the EMR” (paragraph [0018] of the Zhao reference).
Regarding claim 2, the combination of Kirillov and Zhao discloses the apparatus of the parent claim (claim 1).
Kirillov additionally discloses:
wherein the ML model comprises a vision transformer configured to encode image features of the image (pages 2, 4, image encoder), the ML model further comprising a text encoder configured to encode the natural language based prompt (pages 2, 4, the image is the prompt includes a variety of information indicating what to segment including, but not limited to free-form text that describes a part of an image);
Kirillov does not explicitly disclose:
Wherein the image is a medical image;
Zhao additionally discloses:
wherein the ML model comprises a vision transformer configured to encode image features of the medical image (paragraphs [0030], the classifier is used to identify a type of organ), the ML model further comprising a text encoder configured to encode the prompt (paragraphs [0029], [0031]-[0033], the image is segmented via machine learning/AI to parse the image into particular structures using image and textual descriptions)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the system of Zhao with the system of Kirillov such that the image obtained was a medical image and the medical image was then tagged as described in Zhao. The suggestion/motivation would have been in order to implement a system capable of “increasing efficiency” and have a “significant amount of time [be] saved” within the application of “searching for findin[g] matching objects of interest described in the EMR” (paragraph [0018] of the Zhao reference).
Regarding claim 4, the combination of Kirillov and Zhao discloses the apparatus of the parent claim (claim 1).
Kirillov additionally discloses:
wherein the one or more processors being configured to tag the target structure in the image comprises the one or more processors being configured to generate a heatmap that indicates a location of the target structure in the image (Figure 5, page 6, a probability associated with the masks is generated; this constitutes a “heatmap” since it correlates location and probability of the masks)
As previously noted, Zhao discloses:
Wherein the image is a medical image (paragraphs [0022]-[0024], medical image data);
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the system of Zhao with the system of Kirillov such that the image obtained was a medical image and the medical image was then tagged as described in Zhao. The suggestion/motivation would have been in order to implement a system capable of “increasing efficiency” and have a “significant amount of time [be] saved” within the application of “searching for findin[g] matching objects of interest described in the EMR” (paragraph [0018] of the Zhao reference).
Regarding claim 5, the combination of Kirillov and Zhao discloses the apparatus of the parent claim (claim 1).
Kirillov additionally discloses:
wherein the one or more processors are further configured to:
obtain multiple textual descriptions associated with a set of image classification labels (page 5, various annotated labels described by text);
pair the medical image with one or more of the multiple text descriptions to obtain one or more image-text pairs (pages 6-7, the images are matched to the text labels); and
determine a class of the image based on the one or more image-text pairs and the correspondence between the plurality of text embeddings and the plurality visual embeddings learned by the ML model (pages 6-7, the appropriate label is assigned to the image in association with the generated mask)
As previously noted, Zhao discloses:
Wherein the image is a medical image (paragraphs [0022]-[0024], medical image data);
Zhao additionally discloses:
obtain multiple textual descriptions associated with a set of image classification labels (paragraphs [0023]-[0024], [0028], [0031], DICOM file, for example, includes text strings indicating content of the medical image data – e.g., “’NERUO^HEAD” describes head region data– but other attributes can be ‘Series Description’ or ‘Body Part Examined’; further, EMR includes findings in medical records in text);
pair the medical image with one or more of the multiple text descriptions to obtain one or more corresponding image-text pairs (paragraphs [0029]-[0031], the image is used in conjunction with the associated text data from attributes and EMR);
classify the medical image based on a machine-learning (ML) model and the one or more image-text pairs, wherein the ML model is configured to predict respective similarities between the medical image and the corresponding text descriptions in the one or more image-text pairs (paragraphs [0029], [0031]-[0033], the image is segmented via machine learning/AI to parse the image into particular structures using both image and textual descriptions), and wherein the one or more processors are configured to determine a class of the medical image by comparing the similarities predicted by the ML model (paragraphs [0032]-[0033], [0035], the descriptions in EMR can be used to identify WHAT – a particular abnormality or disease/pathology findings and WHERE– the anatomical location of the abnormality)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the system of Zhao with the system of Kirillov such that the image obtained was a medical image and the medical image was then tagged as described above in regards to Zhao. The suggestion/motivation would have been in order to implement a system capable of “increasing efficiency” and have a “significant amount of time [be] saved” within the application of “searching for findin[g] matching objects of interest described in the EMR” (paragraph [0018] of the Zhao reference).
Regarding claim 6, the combination of Kirillov and Zhao discloses the apparatus of the parent claim (claim 5).
Kirillov does not explicitly disclose:
wherein the set of image classification labels identifies multiple body parts, multiple imaging modalities, multiple image views, or multiple imaging protocols.
Zhao discloses:
wherein the set of image classification labels identifies multiple body parts, multiple imaging modalities, multiple image views, or multiple imaging protocols (paragraphs [0023]-[0024], [0028], [0031], DICOM file, for example, includes text strings indicating content of the medical image data – e.g., “’NERUO^HEAD” describes head region data– but other attributes can be ‘Series Description’ or ‘Body Part Examined’. The DICOM schema allows for classification of a variety of different body parts);
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the system of Zhao with the system of Kirillov such that the image obtained was a medical image and the medical image was then tagged with the set of image classification labels identifies multiple body parts, multiple imaging modalities, multiple image views, or multiple imaging protocols as described in Zhao. The suggestion/motivation would have been in order to implement a system capable of “increasing efficiency” and have a “significant amount of time [be] saved” within the application of “searching for findin[g] matching objects of interest described in the EMR” (paragraph [0018] of the Zhao reference).
Regarding claim 7, the combination of Kirillov and Zhao discloses the apparatus of the parent claim (claim 5).
Kirillov does not explicitly disclose
wherein at least one of the multiple textual descriptions includes a negation of an association between the medical image and one of the image classification labels
Zhao additionally discloses:
wherein at least one of the multiple textual descriptions includes a negation of an association between the medical image and one of the image classification labels (paragraphs [0035]-[0036], for example, the textual data may indicate a nodule is not malignant or non-cancerous)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the system of Zhao with the system of Kirillov such that the image obtained was a medical image and the medical image was then tagged with at least one of the multiple textual descriptions includes a negation of an association between the medical image and one of the image classification labels as described in Zhao. The suggestion/motivation would have been in order to implement a system capable of “increasing efficiency” and have a “significant amount of time [be] saved” within the application of “searching for findin[g] matching objects of interest described in the EMR” (paragraph [0018] of the Zhao reference).
Regarding claim 9, the combination of Kirillov and Zhao discloses the apparatus of the parent claim (claim 1).
Kirillov additionally discloses:
wherein the one or more processors are further configured to track the target structure in one or more other images based on the natural language based prompt and the ML model (pages 4-6, the system can segment further images for the same structure based on the NL prompt and the model)
As previously noted, Zhao discloses:
Wherein the image is a medical image (paragraphs [0022]-[0024], medical image data);
Regarding claim 11, the structural elements of apparatus claim 1 perform all of the steps of method claim 11. Thus, claim 11 is rejected for the same reasons discussed in the rejection of claim 1.
Regarding claim 12, the structural elements of apparatus claim 2 perform all of the steps of method claim 12. Thus, claim 12 is rejected for the same reasons discussed in the rejection of claim 2.
Regarding claim 14, the structural elements of apparatus claim 4 perform all of the steps of method claim 14. Thus, claim 14 is rejected for the same reasons discussed in the rejection of claim 4.
Regarding claim 15, the structural elements of apparatus claim 5 perform all of the steps of method claim 15. Thus, claim 15 is rejected for the same reasons discussed in the rejection of claim 5.
Regarding claim 16, the structural elements of apparatus claim 6 perform all of the steps of method claim 16. Thus, claim 16 is rejected for the same reasons discussed in the rejection of claim 6.
Regarding claim 17, the structural elements of apparatus claim 7 perform all of the steps of method claim 17. Thus, claim 17 is rejected for the same reasons discussed in the rejection of claim 7.
Regarding claim 19, the structural elements of apparatus claim 9 perform all of the steps of method claim 19. Thus, claim 19 is rejected for the same reasons discussed in the rejection of claim 9.
Regarding claim 20, arguments analogous to claims 1 and 11 are applicable. The computer readable medium is inherently taught as evidenced by the discussion of the computerized models through in Kirillov and inherently taught in Zhao et al. as evidenced by the electronic computing device executing sequences of instructions designed to implement the disclosed methods (see paragraph [0012] of Zhao et al.)
Claim(s) 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Kirillov in view of Zhao as applied above, further in view of Reicher (U.S.P.G. Pub. No. 2023/0335261).
Regarding claim 3, the combination of Kirillov and Zhao discloses the apparatus of the parent claim (claim 1).
The combination of Kirillov and Zhao does not explicitly disclose:
wherein the natural language based prompt includes a voice prompt provided by the user of the apparatus.
Reicher et al. (U.S.P.G. Pub. No. 2023/0335261) discloses:
wherein the natural language based prompt includes a voice prompt provided by the user of the apparatus (paragraphs [0005], [0060]-[0061], the user can prompt with voice in NL to help determine an area of interest)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the system of Reicher with the combination of Zhao and Kirillov such that the natural language based prompt would include a voice prompt provided by the user of the apparatus as described in Reicher. The suggestion/motivation would have been in order to implement a system capable of “facilitate[ing] more accurate and simplified reporting” of medical images (paragraph [0005] of the Reicher reference).
Regarding claim 13, the structural elements of apparatus claim 3 perform all of the steps of method claim 13. Thus, claim 13 is rejected for the same reasons discussed in the rejection of claim 3.
Claim(s) 8 and 18 rejected under 35 U.S.C. 103 as being unpatentable over Kirillov in view of Zhao as applied above, further in view of Jain (U.S.P.G. Pub. No. 2024/0028831).
Regarding claim 8, the combination of Kirillov and Zhao discloses the apparatus of the parent claim (claim 1).
Kirillov does not explicitly disclose:
wherein the ML model is trained using a training dataset comprising multiple training image-text pairs, wherein each training image-text pair includes a training image and a training textual description, wherein the ML model is trained to learn a similarity or dissimilarity between the training image and the training textual description in each training image-text pair based on a contrastive learning technique, and wherein the class of the medical image is not present in the training dataset.
Jain discloses:
wherein the ML model is trained using a training dataset comprising multiple training image-text pairs, wherein each training image-text pair includes a training image and a training textual description, wherein the ML model is trained to learn a similarity or dissimilarity between the training image and the training textual description in each training image-text pair based on a contrastive learning technique, and wherein the medical image is not present in the training dataset (paragraphs [0041]-[0041], for example, training data for second associations)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to combine the system of Jain with the combination of Zhao and Kirillov such that the ML model is trained using a training dataset comprising multiple training image-text pairs, wherein each training image-text pair includes a training image and a training textual description, wherein the ML model is trained to learn a similarity or dissimilarity between the training image and the training textual description in each training image-text pair based on a contrastive learning technique, and wherein the class of the medical image is not present in the training dataset as described in Jain. The suggestion/motivation would have been in order to implement a system capable of “ensur[ing] that the generated associations are based on the content similarity of image and textual data” (paragraph [0040] of the Jain reference) to thereby improve accuracy.
Regarding claim 18, the structural elements of apparatus claim 8 perform all of the steps of method claim 18. Thus, claim 18 is rejected for the same reasons discussed in the rejection of claim 8.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN R WALLACE whose telephone number is (571)270-1577. The examiner can normally be reached Monday-Friday from 8:30-5 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Tieu can be reached at 571-272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JOHN R WALLACE/Primary Examiner, Art Unit 2682