Last updated: May 29, 2026

Application No. 18/063,788

PRETRAINING OF SPLIT LAYER PORTIONS FOR MULTILINGUAL MODEL

Non-Final OA §103

Filed

Dec 09, 2022

Examiner

ISKENDER, ALVIN ALIK

Art Unit

2654

Tech Center

2600 — Communications

Assignee

International Business Machines Corporation

OA Round

1 (Non-Final)

This examiner grants 48% of cases after interview

— +60.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 25 resolved cases, 2023–2026

Examiner Intelligence

ISKENDER, ALVIN ALIK View full profile →

Grants 48% of resolved cases

Career Allowance Rate

12 granted / 25 resolved

-14.0% vs TC avg

Strong +60% interview lift

Without

With

+60.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

12 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

0.8%

-39.2% vs TC avg

§103

88.8%

+48.8% vs TC avg

§102

10.4%

-29.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 25 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-9, 12-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over He et al. (US 20230153532 A1) in view of Kang et al. ("Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge").
Claim 1: He teaches a method for training a machine learning model, the method comprising:
pre-training the lower portion via a generator task and via alternating between inputting of monolingual text data and multilingual text data; ([0070], [0032], [0035]: upstream encoder block comprising monolingual and multilingual text; the upstream encoder block is a generator)
pre-training the upper portion via a discriminator task; ([0035], [0070]: downstream block is a discriminator)
joining the pre-trained lower portion to the pre-trained upper portion to form a trained multilingual machine learning model. (Fig 3: upstream and downstream encoder blocks joined into a single encoder)
However, He does not teach splitting a machine learning model into a lower portion and an upper portion, the lower portion comprising at least one layer, the upper portion comprising at least one layer;
Kang does teach splitting a machine learning model into a lower portion and an upper portion, the lower portion comprising at least one layer, the upper portion comprising at least one layer; (Section 4.3: Layer-granularity computation partitioning; Abstract: a variety of DNN architectures may be partitioned at the layer level).
It would have been obvious to one with ordinary skill in the art before the effective filing date to split a machine layer model into two portions at the layer level as taught by Kang because it allows for the computational overhead of neural network training to be distributed across multiple devices (see Kang Section 1).

Claim 2: Parent claim 1 is addressed above. He further teaches the method wherein the pre-training of the upper portion via the discriminator task comprises:
receiving output from the lower portion performing the generator task during the pre-training of the lower portion, ([0033]-[0035], Fig 3: upstream encoder feeds into downstream encoder)
performing classification of the received output. ([0033]-[0035]: downstream encoder makes a binary classification from the data it receives from the upstream encoder).

Claim 3: Parent claim 2 is addressed above. He further teaches the method wherein classes for the classification are selected from a group consisting of original data and noisy data.  ([0035]: “The binary classifier in the discriminator determines whether a corresponding token is an original token or a token replaced by the generator”).

Claim 4: Parent claim 1 is addressed above. He further teaches the method wherein the pre-training of the upper portion via the discriminator task comprises applying a gradient that, during back-propagation, passes through all tokens generated by the lower portion. ([0041]: update all parameters in a single backward pass from the pre-training output).

Claim 5: Parent claim 1 is addressed above. He further teaches the method wherein the discriminator task comprises validating tokens predicted by the lower portion. ([0033]-[0035]: downstream encoder makes a binary classification from the data it receives from the upstream encoder).

Claim 6: Parent claim 1 is addressed above. He further teaches the method wherein the multilingual text data comprises a first portion and a second portion, the first portion comprising first text in a first language, the second portion comprising second text that is a translation of the first text into a second language. ([0017]: machine translation task).

Claim 7: Parent claim 1 is addressed above. Kang further teaches the method further comprising:
evaluating a performance of the pre-trained lower portion; (Section 5.2: dynamically select a partition point depending on per-layer performance)
in response to the evaluation, reallocating a distribution of the layers between the lower portion and the upper portion. (Section 5.2: dynamically select a partition point depending on per-layer performance).

Claim 8: Parent claim 7 is addressed above. Kang further teaches the method wherein the reallocating comprises giving one or more layers of the lower portion to the upper portion. (Section 5.2: dynamically select a partition point depending on per-layer performance, assessing factors such as mobile energy consumption).

Claim 9: Parent claim 7 is addressed above. He further teaches the method wherein the reallocating is performed in response to the evaluation indicating that performance of the lower portion for the generator task exceeds a pre- determined threshold. (Algorithm 1: Neurosurgeon DNN partitioning algorithm).

Claim 12: Parent claim 1 is addressed above. He further teaches the method wherein the trained multilingual machine learning model is configured to perform a natural language processing task comprising providing an answer in response to receiving a question and in response to receiving a text passage that comprises the answer. ([0030]: question answering).

Claim 13: Parent claim 1 is addressed above. Kang further teaches the method further comprising:
adding a generator layer to the lower portion for the performing of the generator task of the pre-training of the lower portion; (Section 4.1: dropout layer)
removing the generator layer from the pre-trained lower portion before the joining of the pre-trained lower portion to the pre-trained upper portion to form the trained multilingual machine learning model. (Section 4.1: dropout layer).

Claim 14: Parent claim 1 is addressed above. Kang further teaches the method further comprising:
adding a discriminator layer to the upper portion for the performing of the discriminator task of the pre-training of the upper portion; and (Section 4.1: dropout layer)
removing the discriminator layer from the pre-trained upper portion before the joining of the pre-trained lower portion to the pre-trained upper portion to form the trained multilingual machine learning model. (Section 4.1: dropout layer).

Claim 15: Parent claim 1 is addressed above. He further teaches the method further comprising adding a task-specific layer to the joined pre- trained lower and upper portions to form the trained multilingual machine learning model, wherein the task-specific layer is added to the joined pre-trained lower and upper portions so as to receive output from the pre-trained upper portion. ([0028]: add a task-specific output layer to realize a different NLU operation).

Claim 16: Parent claim 1 is addressed above. He further teaches the method wherein the joining of the pre-trained lower portion to the pre- trained upper portion comprises the pre-trained upper portion being positioned to receive output from the pre-trained lower portion as part of the trained multilingual machine learning model. (Figure 3: upstream encoder feeds into downstream encoder).

Claim 17: Parent claim 1 is addressed above. He and Kang further teach the method wherein the machine learning model that is split comprises a transformer that implements self-attention. (He [0003], [0021]: transformer model with self-attention mechanism; Kang Section 5.1: Partitioning and performance prediction works with arbitrary neural network architectures).

Claim 18: Parent claim 1 is addressed above. He further teaches the method wherein the pre-training of the lower portion via the generator task comprises:
masking portions of the monolingual text data and of the multilingual text data and predicting, via the lower portion, content of the masked portions. ([0034]: masked-language modeling).

Claims 19-20 are analogous to claim 1 addressed above and so are rejected in a similar manner.

Claim(s) 10-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over He in view of Kang as applied to claim 1 above, and further in view of D’Souza (US 20210110277 A1).
Claim 10: Parent claim 1 is addressed above. Neither He nor Kang teach the method wherein the trained multilingual machine learning model is configured to perform a sequence classification task comprising sentence pair relationship classification.
However, D’Souza does teach the method wherein the trained multilingual machine learning model is configured to perform a sequence classification task comprising sentence pair relationship classification. ([0020]: use of a neural network to classify sentence entailment).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to configure the machine learning model to classify a sentence pair relationship because the information is advantageous for question answering applications (see D’Souza [0020]).

Claim 11: Parent claim 10 is addressed above. D’Souza further teaches the method wherein classes for the sentence pair relationship classification are selected from a group consisting of an entailment, a contradiction, and neutral. ([0023]: classify entailment, contradiction, or neutral)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALVIN ISKENDER whose telephone number is (703)756-4565. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, HAI PHAN can be reached at (571) 272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ALVIN ISKENDER/Examiner, Art Unit 2654                                                       

/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654

Read full office action

Prosecution Timeline

Dec 09, 2022

Application Filed

Oct 16, 2023

Response after Non-Final Action

Mar 25, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/650,876

Patent 12632658

SYSTEM AND METHODS FOR KEY-PHRASE EXTRACTION

4y 3m to grant Granted May 19, 2026

17/188,310

Patent 12562244

COMBINING DOMAIN-SPECIFIC ONTOLOGIES FOR LANGUAGE PROCESSING

4y 12m to grant Granted Feb 24, 2026

17/911,224

Patent 12531078

NOISE SUPPRESSION FOR SPEECH ENHANCEMENT

3y 4m to grant Granted Jan 20, 2026

17/926,994

Patent 12505825

SPONTANEOUS TEXT TO SPEECH (TTS) SYNTHESIS

3y 1m to grant Granted Dec 23, 2025

17/750,973

Patent 12456457

ALL DEEP LEARNING MINIMUM VARIANCE DISTORTIONLESS RESPONSE BEAMFORMER FOR SPEECH SEPARATION AND ENHANCEMENT

3y 5m to grant Granted Oct 28, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

48%

Grant Probability

99%

With Interview (+60.3%)

3y 3m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 25 resolved cases by this examiner. Grant probability derived from career allowance rate.