Last updated: April 19, 2026
Application No. 18/453,697
CLINICAL WORKFLOW EFFICIENCY USING LARGE LANGUAGE MODELS

Non-Final OA §103
Filed
Aug 22, 2023
Examiner
SMITH, SEAN THOMAS
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Siemens Healthineers AG
OA Round
3 (Non-Final)
Interview Optional

— +33.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 6 resolved cases, 2023–2026
Examiner Intelligence

SMITH, SEAN THOMAS View full profile →
Grants 83% — above average
Career Allow Rate
5 granted / 6 resolved
+21.3% vs TC avg
Strong +33% interview lift
Without
With
+33.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
37 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
27.9%
-12.1% vs TC avg
§103
50.7%
+10.7% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
8.6%
-31.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 6 resolved cases
Office Action

§103
DETAILED ACTION
This Office Action is responsive to Amendments and Arguments filed on December 3rd, 2025. Claims 2, 4, 11, 13 and 16 are cancelled, claims 1, 3, 5-10, 12, 14, 15 and 17-20 are pending and have been examined.
Any objection/rejection not mentioned in this Office Action has been withdrawn by the Examiner.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on August 22nd, 2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Response to Amendments and Arguments
With regard to rejections made under 35 U.S.C. 101, Applicant argues, “The claims are integrated into the practical application of an improvement in the functioning of a computer or other technology. Specifically, the claims are integrated into the practical application of an improved AI architecture for generating a response based on multi- modal input data (tabulated measurements and medical images).
Embodiments of the invention provide for a technical solution for addressing the technical problem of summarizing multi-modal patient data. Specifically, embodiments of the invention provide for an improved AI architecture that combines a large language model for processing tabulated measurements with a machine learning based encoder network for encoding medical images… Advantageously, the improved AI architecture of amended claim 1 enables summarizing of both tabulated measurements and medical images together,” (page 9 of Remarks).
Applicant’s argument is persuasive. Accordingly, the rejections under 35 U.S.C. 101 are withdrawn.
With regard to rejections made under 35 U.S.C. 103, Applicant argues, “In the rejection of claim 4, the Office Action acknowledges that ‘neither Jhaveri nor Batman explicitly teach ... wherein the patient data comprises tabulated measurements.’ It follows that Jhaveri and Batman also do not teach or suggest ‘generating a response summarizing the patient data based on the instructions by: transforming the tabulated measurements into a sequence of tokens using a large language model, encoding the one or more medical images into embeddings using a machine learning based encoder network, aligning the embeddings with a text embedding function of the large language model, and generating the response based on the sequence of tokens and the aligned embeddings using the large language model,’ as recited in amended claim 1,” (page 12 of Remarks).
Applicant further argues, “While the cited portions of Lipkova may refer to tabular data, the cited portions of Lipkova do not teach or suggest generating a response summarizing the tabular data as provided in amended claim 1… Further, the cited portions of Lipkova do not teach or suggest at least ‘encoding the one or more medical images into embeddings using a machine learning based encoder network, aligning the embeddings with a text embedding function of the large language model, and generating the response based on the sequence of tokens and the aligned embeddings using the large language model,’ as recited in amended claim 1… Therefore, for at least the reasons discussed above, amended independent claim 1 is allowable over Jhaveri, Batman, and/or Lipkova,” (page 13 of Remarks).
Applicant’s argument is moot, as new grounds of rejection are made under Jhaveri and Batman in view of U.S. Patent Application 2023/0222285 to Zhang et al. Further details are provided below.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f):
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f). The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) except as otherwise indicated in an Office action.
Claim 10 recites “means for receiving one or more prompts…” “means for generating a response…” “means for outputting the response,” “means for receiving a question…” “means for generating an additional response…” and “means for outputting the additional response.” Claim 14 recites “means for converting training tabulated data to token sequences,” “means for combining the token sequences with corresponding ground truth summaries,” and “means for fine-tuning the large language model…”. Because these claim limitations are being interpreted under 35 U.S.C. 112(f), they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof (see paragraphs [0032], [0067] and [0074]).
PLEASE NOTE: This is NOT a rejection. Please do not address this section as a rejection. Should Applicant disagree with the INTERPRETATION, applicant may: (1) amend the claim limitation(s) to avoid them being interpreted under 35 U.S.C. 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5-6, 8, 10, 14-15, 17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication 2021/0287783 to Jhaveri (hereinafter, "Jhaveri") in view of U.S. Patent Application Publication 2022/0230714 to Batman et al. (hereinafter, "Batman"), further in view of U.S. Patent Application Publication 2023/0222285 to Zhang et al. (hereinafter, “Zhang”).
Regarding claims 1, 10 and 15, Jhaveri teaches a method, apparatus and computer readable medium comprising receiving one or more prompts comprising 1) patient data retrieved from one or more patient databases and 2) instructions, the patient data comprising tabulated measurements and one or more medical images (paragraph [0148], "The workflow tracker dashboard may be a graphical user interface that facilitates display of patient medical information, as further shown and described with respect to FIGS. 11-13, including: medical history; acquired patient images; detailed acquisition information related to acquired patient images (e.g., where and when the images were acquired, the protocol used in acquisition, who acquired the images, the ordering provider, etc.); contact information for all verified care providers; location information for all current users; completed steps of a current user's workflow with regard to the patient (e.g., images reviewed, patient added to the worklist, etc.); and a patient worklist updated in real-time based on the workflow of current users that may include next steps," paragraphs [0150]-[0151], "Further, verified care providers may send requests to the via the communication thread for various information related to the patient care, including patient medical history, care guidelines, predicted future patient state, recommended lab tests, etc… In some examples, the messages may be automatically generated by a virtual healthcare assistant (VHA) using a data mining algorithm as the patient's worklist is continually updated in real-time. The messages sent may be stored and executed on the server system, a cloud, and/or a remote device," and paragraph [0153], "At 1016, method 1000 includes outputting the communication thread and/or dashboard for display when prompted. In an example, the prompt may include an explicit request to view the communication thread and/or workflow tracker dashboard for the patient, entered by selection of an appropriate link/control button on the workflow tracker dashboard or selection of the patient's communication thread from a collaborative interface, as indicated at 1018."); and
outputting the response (paragraph [0153], "For example, a message button may be displayed via the collaborative interface, and selection of the message button may trigger display of the communication thread and/or workflow tracker dashboard for that patient. In another example, a patient link may be selected to launch the communication thread or dashboard from a collaborative system interface.").
Jhaveri does not explicitly teach “transforming the tabulated measurements into a sequence of tokens using a large language model,” “encoding the one or more medical images into embeddings using a machine learning based encoder network,” “aligning the embeddings with a text embedding function of the large language model,” or “generating the response based on the sequence of tokens and the aligned embeddings using the large language model,” and thus, Zhang is introduced.
Zhang teaches generating a response summarizing the patient data based on the instructions by: transforming the tabulated measurements into a sequence of tokens using a large language model (paragraph [0111], "The textual input contents can be tokenized by a WordPiece tokenizer," and paragraph [0105], "The block type can include the semantic type of the content presented in the block, such as header, paragraph, images, list (bullet-items), and table. In some implementations, the document tokenizer can define 14 different block types (e.g., header, paragraph, list, table, image, caption, and padding block)."),
encoding the one or more medical images into embeddings using a machine learning based encoder network (paragraph [0112], "The image contents can be first fed to a convolutional neural network (CNN), followed by a transformation fully-connected (FC) layer to align the resulting visual embedding to the same size of the textual token embedding."),
aligning the embeddings with a text embedding function of the large language model (paragraph [0112], "The image contents can be first fed to a convolutional neural network (CNN), followed by a transformation fully-connected (FC) layer to align the resulting visual embedding to the same size of the textual token embedding."), and
generating the response based on the sequence of tokens and the aligned embeddings using the large language model (paragraph [0123], "In some implementations, the document-level model 208 can also be a visual-linguistic transformer. The resulting document-level representation can be used to generate document summaries.").
Jhaveri and Zhang are considered analogous because they are each concerned with information retrieval and reporting. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Jhaveri with the teachings of Zhang for the purpose of improving record accessibility. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
The combination of Jhaveri and Zhang does not explicitly teach “receiving a question relating to the response from a user,” “generating an additional response responding to the question using the large language model,” or “outputting the additional response,” and thus, Batman is introduced.
Batman teaches receiving a question relating to the response from a user (paragraph [0012], "Thus, if the second care provider requests information about the patients beyond what is summarized in the handoff report, the handoff system can provide the requested information.");generating an additional response responding to the question using the large language model (paragraph [0012], "The workflow system may generate answers to the inquiries based on the patient data and may provide the answers to the clinical device."); andoutputting the additional response (paragraph [0012], "Thus, if the second care provider requests information about the patients beyond what is summarized in the handoff report, the handoff system can provide the requested information.").
Jhaveri, Zhang and Batman are considered analogous because they are each concerned with processing medical information. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Jhaveri and Zhang with the teachings of Batman for the purpose of improving user experience and summary quality. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claims 5 and 14, Zhang teaches converting training tabulated data to token sequences (paragraph [0045], "The systems or methods may use a tokenizer for the data processing. In some implementations, the system may use a block tokenizer. The block tokenizer can process block positions, block types (e.g., header, image and text, paragraph, etc.), block attributes (e.g., bold, italic, font size, etc.), and images. For example, the systems and methods may utilize a go/block-tokenizer for block and layout processing. The tokenizer may be able to characterize the data into sub-groups (e.g., header, lists, paragraph, image, table, etc.) based on block type.");
combining the token sequences with corresponding ground truth summaries (paragraph [0058], "Ground truth data and other prediction data may be used to evaluate the HVL transformer prediction. The training may be run iteratively to allow the model to make accurate predictions. "); and
fine-tuning the large language model based on the combined token sequences and corresponding ground truth summaries (paragraph [0066], "In some implementations, the method can include evaluating a pre-training loss function that evaluates a difference between the prediction output and ground truth data associated with the masked training block and the plurality of additional masked training blocks. The method can include adjusting one or more parameters of the machine-learned semantic document encoding model based at least in part on the pre-training loss function.").
	Jhaveri and Zhang are considered analogous because they are each concerned with information retrieval and reporting. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Jhaveri with the teachings of Zhang for the purpose of system accuracy. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claims 6 and 17, Jhaveri teaches method and computer readable medium wherein the patient data is retrieved from a plurality of patient databases (paragraph [0040], "Thus, the user may easily find current (in real-time) and previous information related to patient care from collaborative care providers/users as well as identify where the information came from thereby saving time and increasing operational efficiency. The infrastructure 200 includes a hospital information system (HIS) 204, a radiology information system (RIS) 206, a picture archiving and communication system (PACS) 208, an interface unit 210, a data center 212, and a workstation 214.").
Regarding claims 8 and 19, Jhaveri teaches a method and computer readable medium wherein the one or more patient databases comprise at least one of EHR (electronic health record), EMR (electronic medical record), PHR (personal health record), HIS (health information system), RIS (radiology information system), PACS (picture archiving and communication system), and LIMS (laboratory information management system) (paragraph [0040], "Thus, the user may easily find current (in real-time) and previous information related to patient care from collaborative care providers/users as well as identify where the information came from thereby saving time and increasing operational efficiency. The infrastructure 200 includes a hospital information system (HIS) 204, a radiology information system (RIS) 206, a picture archiving and communication system (PACS) 208, an interface unit 210, a data center 212, and a workstation 214.").
Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Jhaveri, Zhang and Batman as applied to claims 1 and 10 above, and further in view of "Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT" by Chaganti et al. (hereinafter, "Chaganti").
Regarding claims 3 and 12, the combination of Jhaveri, Zhang and Batman does not teach “the patient data comprises information automatically extracted from medical images using an artificial intelligence based system,” and thus, Chaganti is introduced. Chaganti teaches the patient data comprises information automatically extracted from medical images using an artificial intelligence based system (see Materials and Methods, Lung and Lobe Segmentation, "Next, the lung ROI image is resampled to a 2-mm isotropic volume and fed into a Deep Image-to-Image Network (DI2IN) (16) to generate the lung segmentation.").
Jhaveri, Zhang, Batman and Chaganti are considered analogous because they are each concerned with processing and reporting information. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Jhaveri, Zhang and Batman with the teachings of Chaganti for the purpose of improving record accessibility. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Claims 7, 9, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jhaveri, Zhang and Batman as applied to claims 1, 10 and 15 above, and further in view of "Artificial Intelligence for Multimodal Data Integration in Oncology" by Lipkova et al. (hereinafter, "Lipkova").
Regarding claims 7 and 18, the combination of Jhaveri, Zhang and Batman does not teach “the patient data comprises unstructured data using different nomenclature,” and thus, Lipkova is referenced. Lipkova teaches the patient data comprises unstructured data using different nomenclature (see Challenges and Clinical Adoption, Alignment of diverse modalities, "This refers to the integration of data from different scales, time points, or measurements. Often an acquisition of one modality results in the destruction of the sample, preventing collection of multiple measurements from the same system... Here, cross-modal autoencoders can be used to enable integration and translation between arbitrary modalities.").
Jhaveri, Zhang, Batman and Lipkova are considered analogous because they are each concerned with processing and reporting  information. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Jhaveri, Zhang and Batman with the teachings of Lipkova for the purpose of improving record accessibility. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Regarding claims 9 and 20, the combination of Jhaveri, Zhang and Batman does not teach “the large language model is constrained to a specific medical domain,” and thus, Lipkova is referenced. Lipkova teaches the large language model is constrained to a specific medical domain (Introduction, "AI models are able to integrate complementary information and clinical context from diverse data sources to provide more accurate patient predictions (Figure 1A) (Hosny et al., 2018)… In this review, we summarize AI methods and strategies for multimodal data fusion, outline prospective on AI driven exploration through multimodal associations and interpretability methods, and conclude with directions for AI adoption in precision oncology.").
Jhaveri, Zhang, Batman and Lipkova are considered analogous because they are each concerned with processing and reporting information. Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Jhaveri, Zhang and Batman with the teachings of Lipkova for the purpose of improving language model accuracy. Given that all the claimed elements were known in the prior art, one skilled in the art could have combined the elements by known methods with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
U.S. Patent 9,003,319 to Linthicum et al.
U.S. Patent 9,715,576 to Hayter, II.
U.S. Patent 12,136,481 to Baker et al.
U.S. Patent Application Publication 2014/0324469 to Reiner.
U.S. Patent Application Publication 2019/0147988 to Sharifi Sedeh et al.
U.S. Patent Application Publication 2019/0287012 to Celikyilmaz et al.
U.S. Patent Application Publication 2020/0335187 to Lefkofsky et al.
U.S. Patent Application Publication 2021/0398650 to Baker et al.
U.S. Patent Application Publication 2022/0310219 to Sanchez, Jr. et al.
U.S. Patent Application Publication 2023/0187031 to White et al.
U.S. Patent Application Publication 2023/0237773 to Li et al.
U.S. Patent Application Publication 2023/0259544 to Sotudeh Gharebagh et al.
U.S. Patent Application Publication 2024/0290065 to Park et al.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN T SMITH whose telephone number is (571)272-6643. The examiner can normally be reached Monday - Friday 8:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, PIERRE-LOUIS DESIR can be reached at (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SEAN THOMAS SMITH/Examiner, Art Unit 2659     

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Aug 22, 2023
Application Filed
Jun 10, 2025
Non-Final Rejection — §103
Sep 18, 2025
Response Filed
Oct 27, 2025
Final Rejection — §103
Dec 03, 2025
Response after Non-Final Action
Dec 10, 2025
Request for Continued Examination
Jan 06, 2026
Response after Non-Final Action
Feb 09, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/393,807
Patent 12602540
LEVERAGING A LARGE LANGUAGE MODEL ENCODER TO EVALUATE PREDICTIVE MODELS
2y 5m to grant Granted Apr 14, 2026
18/092,987
Patent 12530534
SYSTEM AND METHOD FOR GENERATING STRUCTURED SEMANTIC ANNOTATIONS FROM UNSTRUCTURED DOCUMENT
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 2 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+33.3%)
2y 8m
Median Time to Grant
High
PTA Risk
Based on 6 resolved cases by this examiner. Grant probability derived from career allow rate.