Last updated: May 29, 2026

Application No. 18/329,904

MULTI-MODAL REPRESENTATION LEARNING FOR ELECTRONIC DATA INTERCHANGE AND TEXT

Non-Final OA §103

Filed

Jun 06, 2023

Examiner

NGUYEN, CHAU T

Art Unit

2145

Tech Center

2100 — Computer Architecture & Software

Assignee

International Business Machines Corporation

OA Round

1 (Non-Final)

Interview Optional

— +31.5% interview lift. Examiner has a relatively high allowance rate (68%); +31.5% interview lift. A written response may suffice.

Based on 552 resolved cases, 2023–2026

Examiner Intelligence

NGUYEN, CHAU T View full profile →

Grants 68% — above average

Career Allowance Rate

373 granted / 552 resolved

+12.6% vs TC avg

Strong +32% interview lift

Without

With

+31.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 11m

Avg Prosecution

15 currently pending

Career history

585

Total Applications

across all art units

Statute-Specific Performance

§101

5.4%

-34.6% vs TC avg

§103

75.2%

+35.2% vs TC avg

§102

11.6%

-28.4% vs TC avg

§112

4.8%

-35.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 552 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/06/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they do not include the following reference sign(s) mentioned in the description: paragraphs [0058] and [0059] describe item “input 414” in Figure 4, however, Figure 4 does not have any item with label “414”.  Please verify.  Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Objections
Claims 3-7, 9, 11-14, and 16-20 are objected to because of the following informalities:
Claim 1:
	Line 4 recites “multi-model”, which should be rewritten as “multi-modal” for consistency with the Specification.
	Lines 10-11 recite “multi-model”, which should be rewritten as “multi-modal” for consistency with the Specification.
	Line 13 recites “multi-model”, which should be rewritten as “multi-modal” for consistency with the Specification.

Claims 3-7, 9, 11-14, and 16-20 contain similar issues as discussed in claim 1 above.  Therefore, claims 3-7, 9, 11-14, and 16-20 are objected under the same rationale.
Appropriate correction is required.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (Huang), US Patent Application Publication No. US 2023/0112710 A1, and further in view of Morariu et al. (Morariu), US Patent Application Publication No. US 2023/0376687 A1.

As independent claim 1, Huang discloses a method, comprising:
encoding a text sample with a text encoder, wherein the encoding creates an embedded text sample in a conjoined embedding space (Figure 3 and paragraphs [0085]-[0087], [0090]: machine learning model 350 that is trained on training dataset that includes training sample, wherein the training sample includes a textual description of a GUI, for example, first encoder (text encoder) using a bidirectional encoder representations from transformers (BERT) to generate embedding vectors to represent textual descriptions, and embedding text description into a common embedding space (conjoined embedding space) that is populated with both text embedding);
encoding an electronic data sample with a multi-model encoder, wherein the encoding the electronic data sample creates an embedded electronic data sample in the conjoined embedding space, and wherein the multi-model encoder includes a plurality of expert networks (Figure 3 and paragraphs [0085]-[0088]: machine learning model 350 that is trained on training dataset that includes training sample, wherein the training sample includes an image of a GUI (electronic data sample), for example, second encoder (a multi-model encoder) to generate embedding vectors to represent the image of the GUI (electronic data sample), and embedding the image of the GUI into the common embedding space (conjoined embedding space) that is populated with both text embedding and UI embedding);
training the multi-modal encoder to determine a similarity score between a target text and an electronic data segment, wherein the multi-modal encoder is trained by machine learning based on comparing labeled training samples and corresponding predictions generated by the multi-model encoder (Figures 3 and paragraphs [0088]-[0091]: the training sample includes the image of the GUI that represents different graphical elements (electronic data segments) that make up the GUI; paragraph [0055]: the training process uses a validation set to evaluate the performance of the machine learning model, wherein the validation set includes multiple samples similar to the training samples of the training dataset; paragraphs [0076]-[0077]: generating prediction data for the predicted GUI); and
outputting the multi-model encoder configured to query a data repository of documents for one or more selected documents that match a natural language text input (paragraph [0078]: if the user wants a GUI with a login page with two buttons, the textual description can be “login page with two buttons”.
	Huang discloses encoding an electronic data sample with the second encoder, wherein the electronic data sample includes an image of a GUI, and the image of the GUI includes different segments (Figures 1 & 3 and paragraph [0033]).  Thus, one of ordinary skill in the art would interpret that the image of the GUI can be an EDI document, and the second encoder is a multi-modal EDI model (paragraphs [0084]-[0085].  To support Examiner’s interpretation, Morariu discloses generating and using a multi-modal multi-granular model to analyze document regions of multiple sizes (e.g., granularities) and generate data (e.g., feature vectors) suitable for use in performing multiple tasks, and the multi-modal multi-granular model (multi-model EDI encoder) can be used in connection with one or more other machine learning models to perform various tasks such as the page-level document extraction, region-level entity recognition, and/or token-level token classification (Morariu, paragraphs [0018], [0021] and Figure 1).  Morariu further discloses the document can be an invoice or receipt, which is an EDI document (paragraph [0022]).
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the system of Huang to include a multi-modal multi-granular model (multi-model EDI encoder) to transform the input includes features extracted from a page-level, region-level, and word-level of the document (EDI document), as taught by Morariu.  Morariu suggests that the multi-modal multi-granular model, advantageously generates data that can be used to perform multiple distinct task (e.g., entity recognition, document classification, etc.) at multiple granularities which reduces model storage cost and maintenance as well as improves performance over conventional system as a result of the model obtaining information from regions at different granularities (Morariu, paragraph [0022]).

As to dependent claim 2, Huang and Morariu disclose determining an EDI type of the EDI sample (Huang, paragraph [0040]; Morariu, Figure 1, paragraphs [0018], [0021]-[0022]); and
routing the EDI sample to one of the plurality of expert networks based on the EDI type (Huang, paragraph [0090]; Morariu, paragraph [0035]).

As to dependent claim 3, Huang and Morariu disclose wherein the routing is performed by a router layer of the multi-model EDI encoder, wherein the router layer is configured to route an EDI segment to one of the plurality of expert networks in response to the determining the type of the EDI segment (Morariu, paragraph [0035]).

As to dependent claim 4, Huang and Morariu disclose wherein the multi-model EDI encoder includes a self-attention layer configured to weight each element of the EDI segment based on a context, the weight corresponding to the relative importance of each element with respect to each other element (Huang, paragraph [0090]; Morariu, paragraphs [0019], [0036]).

As to dependent claim 5, Huang and Morariu disclose wherein the training includes iteratively adjusting parameters of the multi-modal EDI encoder based on a cross-entropy between the labeled training samples and corresponding predictions generated by the multi-model EDI encoder (Huang, paragraphs [0011], [0053]; Morariu, paragraph [0060]).

As to dependent claim 6, Huang and Morariu disclose inputting a target text to the multi-model EDI encoder; and identifying one or more EDI segments embedded in the embedding space in response to the multi-model EDI encoder's determining a match between the target text and one or more EDI segments (Huang, paragraph [0115]).

As to dependent claim 7, Huang and Morariu disclose further comprising: inputting a plurality of new EDI segments to the multi-model EDI encoder (Huang, paragraphs [0064], [0069]); and
generating by the multi-model EDI encoder one or more clusters based on determining similarities between pairs of the new EDI segments (Huang, paragraphs [0064], [0069]).

As to dependent claim 8, Huang and Morariu disclose wherein the text encoder is a transformer-based encoder (Huang, paragraph [0090]).

As to dependent claim 11, Huang and Morariu disclose wherein the training includes computing a cross-entropy based on the prediction generated by the multi-model EDI encoder for each of the plurality of EDI segments (Huang, paragraphs [0043], [0046]; Morariu, paragraph [0060]).

Claims 9-10 and 12-13 are system claims that contain similar limitations of claims 1-2 and 6-7, respectively.  Therefore, claims 9-10 and 12-13 are rejected under the same rationale.

Claims 14-20 are computer program product claims that contain similar limitations of claims 1-7, respectively.  Therefore, claims 14-20 are rejected under the same rationale.

Conclusion


	
Any inquiry concerning this communication should be directed to CHAU T NGUYEN at telephone number (571)272-4092. The examiner can normally be reached on M-F from 8am to 5pm (PT).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) Form at https://www.uspto.gov/patents/uspto-automated-interview-request-air-form.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula, can be reached at telephone number 5712724128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from Patent Center and the Private Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from Patent Center or Private PAIR. Status information for unpublished applications is available through Patent Center and Private PAIR for authorized users only. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/CHAU T NGUYEN/

Read full office action

Prosecution Timeline

Jun 06, 2023

Application Filed

Apr 23, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/962,463

Patent 12596765

GENERATION AND USE OF CONTENT BRIEFS FOR NETWORK CONTENT AUTHORING

3y 6m to grant Granted Apr 07, 2026

17/533,285

Patent 12591795

METHOD FOR PROVIDING EXPLAINABLE ARTIFICIAL INTELLIGENCE

4y 4m to grant Granted Mar 31, 2026

18/335,832

Patent 12585722

IMAGE GENERATION SYSTEM, COMMUNICATION APPARATUS, METHODS OF OPERATING IMAGE GENERATION SYSTEM AND COMMUNICATION APPARATUS, AND STORAGE MEDIUM

2y 9m to grant Granted Mar 24, 2026

17/934,644

Patent 12579356

MATHEMATICAL CALCULATIONS WITH NUMERICAL INDICATORS

3y 5m to grant Granted Mar 17, 2026

18/462,335

Patent 12547825

WHITELISTING REDACTION SYSTEMS AND METHODS

2y 5m to grant Granted Feb 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

68%

Grant Probability

99%

With Interview (+31.5%)

3y 11m (~12m remaining)

Median Time to Grant

Low

PTA Risk

Based on 552 resolved cases by this examiner. Grant probability derived from career allowance rate.