Last updated: April 19, 2026
Application No. 18/683,595
SALIENCY MAPS FOR MEDICAL IMAGING

Non-Final OA §102§103
Filed
Feb 14, 2024
Examiner
SHARIFF, MICHAEL ADAM
Art Unit
2672
Tech Center
2600 — Communications
Assignee
Koninklijke Philips N V
OA Round
1 (Non-Final)
Interview Optional

— +22.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 115 resolved cases, 2023–2026
Examiner Intelligence

SHARIFF, MICHAEL ADAM View full profile →
Grants 82% — above average
Career Allow Rate
94 granted / 115 resolved
+19.7% vs TC avg
Strong +22% interview lift
Without
With
+22.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
16 currently pending
Career history
131
Total Applications
across all art units
Statute-Specific Performance

§101
17.9%
-22.1% vs TC avg
§103
43.1%
+3.1% vs TC avg
§102
18.6%
-21.4% vs TC avg
§112
16.4%
-23.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 115 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 
	The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f), because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) are: “computational system” in claims 1-2, 6-7, 9-10, and 13, and “eye tracking device” in claim 4.
Because these claim limitation(s) are being interpreted under 35 U.S.C. 112(f), they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. 
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f), applicant may: (1) amend the claim limitation(s) to avoid them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f).
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-5 and 13-15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by non-patent literature "Enhanced CNN-based gaze estimation on wireless capsule endoscopy images"; 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS); IEEE, 2021 (Gatoula et al.) (hereinafter Gatoula).
Regarding claims 1-5, Gatoula teaches a medical system comprising: (Gatoula, abstract: “Wireless capsule endoscopy (WCE) is a modality used for the non-invasive examination of the gastrointestinal (GI) tract. Physicians diagnose pathologies in images derived from Capsule Endoscopy (CE) using specific gaze patterns to observe pathologically related visual cues.”)
a memory configured to store machine executable instructions, wherein the memory further stores a trained first machine learning module trained to output in response to receiving a medical image as input a saliency map as output, the saliency map being predictive of a distribution of user attention over the medical image; a computational system, wherein execution of the machine executable instructions causes the computational system to: receive a medical image; provide the medical image as input to the trained first machine learning module; in response to the providing of the medical image, receive a saliency map of the medical image as output from the trained first machine learning module, wherein the saliency map predicts a distribution of user attention over the medical image; provide the saliency map of the medical image (Gatoula, page 191, Section III. Methodology, Section A. Model Architecture, para. 1-4; page 189, para. 6; FIG. 1-3: “The model that was employed for tackling the task of saliency prediction, follows an auto-encoder architecture and thus consisting of two separate models, an encoder and a decoder. The encoder takes as an input an RGB image and outputs its corresponding latent representation. Then the output of the encoder is used as input for the decoder, which uses the latent representation to estimate the respective saliency map of the input image. A visual representation of the proposed CNN model architecture can be seen in Fig. 1 … Thus, we can state that the values of an estimated saliency map express the probability value of the potential eye-fixations of medical experts on the corresponding WCE image … For training the proposed CNN model the Binary Cross Entropy (BCE) function was selected … After the saliency map estimation, we deploy an additional step for further improvement of the saliency map by removing salient values with low probability of occurrence.”; “In detail, a convolutional autoencoder is trained on WCE images, containing normal and abnormal cases, to predict saliency maps that estimate the physicians' eye fixations.

    PNG
    media_image1.png
    352
    945
    media_image1.png
    Greyscale
;
			
    PNG
    media_image2.png
    468
    440
    media_image2.png
    Greyscale
					
    PNG
    media_image3.png
    192
    464
    media_image3.png
    Greyscale
).
	Regarding claim 2, Gatoula teaches the medical system of claim 1, wherein execution of the machine executable instructions further causes the computational system to provide the trained first machine learning module, wherein the providing of the trained first machine learning module comprises providing the first machine learning module; providing first training data comprising first pairs of training medical images and training saliency maps, wherein the training salient maps are descriptive of distributions of user attention over the training medical images; training the first machine learning module using the first training data, wherein the resulting trained first machine learning module is trained to output the training saliency maps of the first pairs in response to receiving the training medical images of the first pairs (Gatoula, page 192, Section III. B. Fixation Dataset: “The training dataset was created by utilizing a highly experienced physician in the field of WCE. The physician was given the objective to examine the images of the KID dataset while having his eyes movements being tracked. KID dataset contains categorized WCE images for a variety of non-pathological and pathological findings from the gastrointestinal tract. In this work, we used images that depict polypoid, vascular and inflammatory lesions and normal tissue from the oesophagus and the colon. For the fixation dataset a total of 1025 images were used. The eye-fixation data were collected using The Eye Tribe® eye tracker. For the data collection, the physician was instructed to examine a WCE image until he was confident for the diagnostic assessment. During the examination the eye tracker was tracking his eye-movements in terms of points along with the timestamp of each point. The physician was encouraged to not move from his initial position for which the eye-tracker has been calibrated for the maximum eye tracking accuracy. However, slight position adjustments have been occurring during the examination procedure, in order to maintain high accuracy eye-tracking, the eye-tracker was re-calibrated every 30 images. After the image examination step, the data points that were captured by the eye tracker on each image were separated in two categories, saccades and fixations … For each of the 1025 images of the KID dataset used in this implementation, the eye-tracker collected the physician's eye-movements i.e., saccades and eye-fixations. Physician's eye-movements were described in terms of points i.e. pixel coordinates, along with the timestamp referring to each tracked point.”).
	Regarding claim 3, Gatoula teaches the medical system of claim 2, wherein the medical system further comprises a display device, wherein the providing of the first training data comprises for each of the training medical images of the first training data: displaying the respective training medical image using the display device; measuring a distribution of user attention over the displayed training medical image; generating the training saliency map of the first pair of training data comprising the displayed training medical image using the measured distribution of user attention over the training medical image (Gatoula, page 191, Section III. Methodology, Section A. Model Architecture, para. 1-4; page 189, para. 6; FIG. 1-3; see rejection of claim 1 above; Gatoula, page 192, Section III. B. Fixation Dataset; see rejection of claim 2 above; a display is implicit to viewing the saliency maps generated as shown in FIG. 2-3 above; FIG. 2-3 further show the results of the CNN outputting the saliency map that show the distribution of user attention (eye tracking) and as discussed in the rejection of claim 2 above, a training data set is used to train the CNN).
	Regarding claim 4, Gatoula teaches the medical system of claim 3, wherein the medical system further comprises an eye tracking device configured for measuring positions and movements of eyes of a user of the medical system, wherein the memory further stores an attention determining module configured for determining the distribution of user attention over the displayed training medical image using the eye tracking device to determine for the user of the medical system looking at the displayed training medical image points of attention within the displayed training medical image (Gatoula, page 189, para. 5-6: “Traditionally, in the context of WCE imaging, the identification of visual saliency information is being achieved through eye-tracking data. The eye-tracking data collection is implemented through dedicated devices or conventional cameras, such as webcams”; “In this paper, we propose a CNN model for saliency prediction, based on the physician's eye-fixations, on WCE images. In detail, a convolutional autoencoder is trained on WCE images, containing normal and abnormal cases, to predict saliency maps that estimate the physicians' eye fixations.”).
	Regarding claim 5, Gatoula teaches the medical system of claim 1, wherein the trained first machine learning module is trained to output in response to receiving a medical image as input a user individual saliency map predicting a user individual distribution of user attention over the input medical image (Gatoula, FIG. 1-3; see rejection of claim 1 above showing the output saliency maps).
Regarding claim 13, Gatoula teaches a medical system comprising: (Gatoula, abstract: “Wireless capsule endoscopy (WCE) is a modality used for the non-invasive examination of the gastrointestinal (GI) tract. Physicians diagnose pathologies in images derived from Capsule Endoscopy (CE) using specific gaze patterns to observe pathologically related visual cues.”)
a memory storing machine executable instructions; a computational system, wherein execution of the machine executable instructions causes the computational system to provide a trained machine learning module trained to output in response to receiving a medical image as input a saliency map as output, the saliency map being predictive of a distribution of user attention over the medical image, wherein the providing of the trained machine learning module comprises: providing the machine learning module; providing training data comprising pairs of training medical images and training saliency maps, wherein the training salient maps are descriptive of distributions of user attention over the training medical images; training the machine learning module using the training data, wherein the resulting trained machine learning module is trained to output the training saliency maps of the pairs in response to receiving the training medical images of the pairs (Gatoula, page 191, Section III. Methodology, Section A. Model Architecture, para. 1-4; page 189, para. 6; FIG. 1-3: “The model that was employed for tackling the task of saliency prediction, follows an auto-encoder architecture and thus consisting of two separate models, an encoder and a decoder. The encoder takes as an input an RGB image and outputs its corresponding latent representation. Then the output of the encoder is used as input for the decoder, which uses the latent representation to estimate the respective saliency map of the input image. A visual representation of the proposed CNN model architecture can be seen in Fig. 1 … Thus, we can state that the values of an estimated saliency map express the probability value of the potential eye-fixations of medical experts on the corresponding WCE image … For training the proposed CNN model the Binary Cross Entropy (BCE) function was selected … After the saliency map estimation, we deploy an additional step for further improvement of the saliency map by removing salient values with low probability of occurrence.”; “In detail, a convolutional autoencoder is trained on WCE images, containing normal and abnormal cases, to predict saliency maps that estimate the physicians' eye fixations.

    PNG
    media_image1.png
    352
    945
    media_image1.png
    Greyscale
;
			
    PNG
    media_image2.png
    468
    440
    media_image2.png
    Greyscale
						
    PNG
    media_image3.png
    192
    464
    media_image3.png
    Greyscale
;
Gatoula, page 192, Section III. B. Fixation Dataset: “The training dataset was created by utilizing a highly experienced physician in the field of WCE. The physician was given the objective to examine the images of the KID dataset while having his eyes movements being tracked. KID dataset contains categorized WCE images for a variety of non-pathological and pathological findings from the gastrointestinal tract. In this work, we used images that depict polypoid, vascular and inflammatory lesions and normal tissue from the oesophagus and the colon. For the fixation dataset a total of 1025 images were used. The eye-fixation data were collected using The Eye Tribe® eye tracker. For the data collection, the physician was instructed to examine a WCE image until he was confident for the diagnostic assessment. During the examination the eye tracker was tracking his eye-movements in terms of points along with the timestamp of each point. The physician was encouraged to not move from his initial position for which the eye-tracker has been calibrated for the maximum eye tracking accuracy. However, slight position adjustments have been occurring during the examination procedure, in order to maintain high accuracy eye-tracking, the eye-tracker was re-calibrated every 30 images. After the image examination step, the data points that were captured by the eye tracker on each image were separated in two categories, saccades and fixations … For each of the 1025 images of the KID dataset used in this implementation, the eye-tracker collected the physician's eye-movements i.e., saccades and eye-fixations. Physician's eye-movements were described in terms of points i.e. pixel coordinates, along with the timestamp referring to each tracked point.”).
	Regarding claim 14, Gatoula teaches a computer program comprising machine executable instructions stored on a non-transitory computer readable memory for execution by a computational system controlling a medical system, wherein the computer program further comprises (Gatoula, abstract: “In this work, we propose a CNN auto-encoder model, that is capable of predicting saliency maps estimating the gaze-patterns, in terms of eye-fixations, of physicians in CE images. The proposed model outperforms other approaches for visual saliency estimation based on physicians' eye fixation by providing an AUC-J of 0.726 among CE images depicting various pathological and normal cases.”; CNN must implicitly run on a computer with a processor and a memory storing instructions).
	With regards to the remaining limitations of claim 14, they recite the functions of the apparatus of claim 1, as a computer program comprising machine executable instructions stored on a non-transitory computer readable memory. Thus, the analysis in rejecting claim 1 is equally applicable to the remaining limitations of claim 14.
	With regards to claim 15, it recites the functions of the apparatus of claim 1, as a process. This, the analysis in rejecting claim 1 is equally applicable to claim 15.	 
	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Gatoula, in view of non-patent literature "Efficient folded attention for medical image reconstruction and segmentation." Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 35. No. 12. 2021 (Zhang et al.) (hereinafter Zhang).
Regarding claim 12, Gatoula teaches the medical system of claim 1, wherein the medical system is configured to acquire medical imaging data for reconstructing the medical image (Gatoula, page 191, Section III Methodology A. Model Architecture, Decoder: “Moreover, after the final convolutional block of the decoder, a reconstruction convolutional block has been added aiming towards a more thorough analysis of the latent representation of the encoder, and a more precise restoration of the ground truth saliency map. The size of the final reconstruction block can affect the quality of the results produced by the model. To maintain low complexity and high reconstruction quality, 3 convolutional layers followed by a pixel-wise convolution for the final saliency prediction was experimentally determined to achieve satisfactory results. Each one the three convolutional layers of the reconstruction block is composed of the same number of kernels, equal to the respective number utilized in the convolutional layers of the block before the reconstruction module.”).
Gatoula fails to teach
	wherein the medical imaging data is acquired using any one of the following data acquisition methods: magnetic resonance imaging, computed-tomography imaging, positron emission tomography imaging, single photon emission computed tomography imaging.
Zhang teaches
wherein the medical imaging data is acquired using any one of the following data acquisition methods: magnetic resonance imaging, computed-tomography imaging, positron emission tomography imaging, single photon emission computed tomography imaging. (Zhang, page 10874, para. 1; Fig. 5: “To acquire and reconstruct COSMOS data, 6 healthy subjects were recruited to do MRI scan with 5 brain orientations using a 3.0T GE scanner”;
	
    PNG
    media_image4.png
    622
    1112
    media_image4.png
    Greyscale
).
	It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the medical imaging data, as taught by Gatoula, to be acquired using magnetic resonance imaging, as taught by Zhang.
	The suggestion/motivation for doing so would have been that saliency maps can enhance explainability by suggesting the anatomic localization of relevant brain features to assign neuroanatomic interpretability to models that estimate biological brain age (BA) from magnetic resonance imaging (MRI).
	Therefore, it would have been obvious to combine Gatoula, with Zhang, to obtain the invention as specified in claim 12.
	Allowable Subject Matter
Claims 6-11 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL ADAM SHARIFF whose telephone number is 571-272-9741. The examiner can normally be reached M-F 8:30-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 571-272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MICHAEL ADAM SHARIFF/
Examiner, Art Unit 2672

/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672
Read full office action
Prosecution Timeline

Feb 14, 2024
Application Filed
Jan 09, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/641,703
Patent 12602903
Method for Analyzing Image Information Using Assigned Scalar Values
2y 5m to grant Granted Apr 14, 2026
18/403,766
Patent 12579776
DISPLAY DEVICE, DISPLAY METHOD, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
17/988,484
Patent 12561959
METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR TARGET IMAGE PROCESSING
2y 5m to grant Granted Feb 24, 2026
18/378,405
Patent 12548293
IMAGE DETECTION METHOD AND APPARATUS
2y 5m to grant Granted Feb 10, 2026
17/951,997
Patent 12541976
RELATIONSHIP MODELING AND ANOMALY DETECTION BASED ON VIDEO DATA
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+22.3%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 115 resolved cases by this examiner. Grant probability derived from career allow rate.