Last updated: April 19, 2026
Application No. 17/764,005
MULTI-TASK ADAPTER NEURAL NETWORKS

Final Rejection §103
Filed
Mar 25, 2022
Examiner
SITIRICHE, LUIS A
Art Unit
2126
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
2 (Final)
Interview Optional

— +22.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 468 resolved cases, 2023–2026
Examiner Intelligence

SITIRICHE, LUIS A View full profile →
Grants 78% — above average
Career Allow Rate
363 granted / 468 resolved
+22.6% vs TC avg
Strong +22% interview lift
Without
With
+22.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
24 currently pending
Career history
492
Total Applications
across all art units
Statute-Specific Performance

§101
24.2%
-15.8% vs TC avg
§103
39.1%
-0.9% vs TC avg
§102
12.4%
-27.6% vs TC avg
§112
13.5%
-26.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 468 resolved cases
Office Action

§103
DETAILED ACTION
This Office Action is in response to the remarks entered on 10/21/2025 .Claims 1, 10-13, 20, 23-24 are amended. Claims 1-24 are pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-9, 21-24 are rejected under 35 U.S.C. 103 as being unpatentable over Zeng et al (“Spectrogram based multi-task audio classification”- hereinafter Zeng, as submitted in IDS dated 09/05/2023) in view of Keskar et al (US 2019/0130273 – hereinafter Keskar).
	Referring to Claim 1, Zeng teaches a system comprising a multi-task adapter neural network for performing a plurality of machine learning tasks (see Zeng at p. 3711 section 4.2: “Figure 6 illustrates an example of the MTL model which is used for two audio classification tasks”; wherein this MTL model is interpreted as the multi-task adapter neural network), wherein the multi-task adapter neural network is configured to: 
receive a shared input for the plurality of machine learning tasks (see Zeng at p. 3711 section 4.2 The proposed model for audio classification: “The next several GResNets blocks and a full connected layer are stacked to get the shared representation between the two tasks”. Further at p. 3712: Fig. 6. Zeng teaches a spectrogram sampling of an audio signal being inputted for audio classification, equivalent to the shared input), and 
process the shared input to generate, for each of the plurality of machine learning tasks, a respective predicted output (see Zeng at p. 3711 section 4.2: “Then, the extracted features are used in the softmax layer to generate predictions for each task” and 3712: Fig. 6. Zeng teaches generate predictions, equivalent to the predicted output); 
wherein the multi-task adapter neural network comprises: 
a shared encoder (see Zeng at p. 3712: Fig. 6. Zeng teaches a Feature abstractor, equivalent to the shared encoder) configured to: 
receive the shared input (see Zeng at p. 3711 section 4.2 The proposed model for audio classification: “The next several GResNets blocks and a full connected layer are stacked to get the shared representation between the two tasks”. Further at p. 3712: Fig. 6. Zeng teaches a spectrogram sampling of an audio signal being inputted for audio classification, equivalent to the shared input), and 
process the shared input to extract shared feature representations for the plurality of machine learning tasks (see Zeng at p. 3710 section 4 The proposed model: “We proposed a new CNN-based architecture to extract the shared feature of all tasks”. Further, at p. 3711 section 4.2: “The next several GResNets blocks and a full connected layer are stacked to get the shared representation between the two tasks. Then, the extracted features are used in the softmax layer to generate predictions for each task”. This is equivalent to processing the shared input); and 
a plurality of task-adapter encoders, wherein each of the plurality of task-adapter encoders is associated with a respective machine learning task in the plurality of machine learning tasks and is configured to (see Zeng at section 4.2 on page 3711: “Our multi-task model is a neural network with different number of the softmax classifiers. Let NT indicate the set of tasks and T be the number of tasks. The classification layer of the multi-task model includes T softmax classifiers”. The multi-task adapter neural network, which is illustrated in figure 6 on page 3712 of D1, comprises T such task-adapter encoders. To this end, see "First Task" and "Tth Task" in the Fig. 6. Further, see at the far right, multi task learning for the First task up until the Tth Task).
receive the shared input (see Zeng at p. 3711 section 4.2 The proposed model for audio classification: “The next several GResNets blocks and a full connected layer are stacked to get the shared representation between the two tasks”. Further at p. 3712: Fig. 6. Zeng teaches a spectrogram sampling of an audio signal being inputted for audio classification, equivalent to the shared input), 
receive the shared feature representations from the shared encoder (see Zeng at p. 3711 section 4.2: “The next several GResNets blocks and a full connected layer are stacked to get the shared representation between the two tasks. Then, the extracted features are used in the softmax layer to generate predictions for each task”. This is equivalent to processing the shared input. Further, See Fig. 6 on page 3712: The task-adapter encoders all receive the shared input via the convolutional layer, which is located in between the shared input (referred to as "Spectrogram sampling" in figure 6) and the shared encoder (referred to as "Feature abstractor" in figure 6), plus the identity connections which "bypass" the various layers of the shared encoder. Related Fig. 5 on page 3711 also illustrates the functioning of the identity connections in more detail), and 
process the shared input and the shared feature representations to generate the respective predicted output for the respective machine learning task (see Zeng at p. 3712 Fig. 6. The two arrows which go into the "Pooling,/2" which is located in between the shared encoder (referred to as "Feature abstractor") and the "Classification layer", ultimately producing the classification of the audios (as it can be seen all the P(y=0/x)…P(y=n/x) which are equivalent to the predicted output).
However, Zeng fails to explicitly teach:
wherein layers of the plurality of task-adapter encoders are arranged in parallel with corresponding layers of the shared encoder, and wherein at least one layer of each task-adapter encoder is configured to receive as input a combination of (i) an output of a layer preceding the corresponding layer of the shared encoder and (ii) an output of a previous layer of the task- adapter encoder.
Keskar teaches, in an analogous system, wherein layers of the plurality of task-adapter encoders are arranged in parallel with corresponding layers of the shared encoder, and wherein at least one layer of each task-adapter encoder is configured to receive as input a combination of (i) an output of a layer preceding the corresponding layer of the shared encoder and (ii) an output of a previous layer of the task- adapter encoder (see Keskar at [0029]: “Each subsequent layer among branched attention encoder layers 320a-n receives the layer encoded representations 325a-(n-1) generated by a preceding layer among branched attention encoder layers 320a-(n-1). Similarly, each of branched attention decoder layers 330a-(n-1) generates a respective layer decoded representation 335a-(n-1) that is received by a subsequent layer among decoder layers 330b-n. An output layer 340 receives decoded representation 335n from the decoder layer 330n and generates output sequence 304”. Further at [0032]: “As depicted in FIG. 3B, branched attention encoder layer 320f includes a plurality of branches 360a-m arranged in parallel. Each of branches 360a-m receives a copy of layer encoded representation 325e and generates a respective branch output representation (e.g., branch output representations 365a-m). An aggregation node 366 aggregates branch output representations 365a-m to form layer encoded representation 325f”. Examiner interprets Keskar’s plurality of parallel branches in the encoded layer as the claimed ‘layers of the plurality of task-adapter encoders are arranged in parallel’, and the aggregation of the ranch output representation to form layer encoded representation as the claimed “combination of (i) an output of a layer preceding the corresponding layer of the shared encoder and (ii) an output of a previous layer of the task- adapter encoder”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Zeng with the above teachings of Keskar by generating a respective predicted output for each of a plurality of machine learning tasks given a shared input, as taught by Zeng, wherein layers of the plurality of task-adapter encoders are arranged in parallel with corresponding layers of the shared encoder, as taught by Keskar. The modification would have been obvious because one of ordinary skill in the art would be motivated to develop techniques for training machine models faster and/or with less training data and improving neural network models for sequence-to-sequence prediction.

Referring to Claim 2, the combination of Zeng and Keskar teaches the system of claim 1, wherein the plurality of machine learning tasks comprise audio processing tasks (see Zeng at p. 3711 section 4.2: “Figure 6 illustrates an example of the MTL model which is used for two audio classification tasks”).

Referring to Claim 3, the combination of Zeng and Keskar teaches the system of claim 1, wherein each of the plurality of task-adapter encoders comprises a plurality of neural network layers (see Zeng at p. 3712: Fig. 6. Zeng teaches a Feature abstractor showing a plurality of neural network layers), and is configured to apply a gating mechanism on channel outputs of a neural network layer of the plurality of neural network layers to select channel inputs for the next neural network layer of the plurality of neural network layers (see Zeng at p. 3709 section 3.4 Gate mechanism “The gate mechanism proposed in LSTM [16] can be viewed as the milestone for solving the gradient problem. The input/output gates are used to control how much information should be kept in the cell”).

Referring to Claim 4, the combination of Zeng and Keskar teaches the system of claim 1, wherein the task-adapter encoders are arranged in parallel with the shared encoder (see Zeng at p. 3712 Fig. 6, where it can be seen all the convolutional layers of the feature abstractor are arranged in parallel form).

Referring to Claim 5, the combination of Zeng and Keskar teaches the system of claim 1, wherein the shared encoder comprises a plurality of convolutional neural network layers (see Zeng at p. 3712 Fig. 6, where it can be seen all the convolutional layers of the feature abstractor).

Referring to Claim 6, the combination of Zeng and Keskar teaches the system of claim 5, wherein each of the plurality of task-adapter encoders comprises a plurality of convolutional neural network layers (see Zeng at p. 3712 Fig. 6, where it can be seen all the convolutional layers of the feature abstractor).

Referring to Claim 7, the combination of Zeng and Keskar teaches the system of claim 6, wherein the shared encoder and each of the plurality of task-adapter encoders have the same number of convolutional neural network layers (see Zeng at p.3716 first full paragraph “Then, we compared the GResNets model with ResNets and CNNs models. To replace the GResNets blocks of our multi-task model, we used ResNets blocks and CNNs that contained the same convolutional layers”).

Referring to Claim 8, the combination of Zeng and Keskar teaches the system of claim 1, wherein the shared input is a two-dimensional channel input (see Zeng at p. 3712: Fig. 6. Zeng teaches a spectrogram sampling of an audio signal being inputted for audio classification, equivalent to the shared input. Further, at p. 3708 section 3.2 Spectrogram “A spectrogram is regarded as a very detailed and accurate representation of audio information. A common spectrogram is an image where one axis represents time, the other axis is frequency and the color of each point indicates the amplitude of those points”. Therefore, time and frequency are the dimensions, being 2 dimensions).

Referring to Claim 9, the combination of Zeng and Keskar teaches the system of claim 8, wherein the shared input is an audio recording that has a two-dimensional channel for time and frequency (see Zeng at p. 3712: Fig. 6. Zeng teaches a spectrogram sampling of an audio signal being inputted for audio classification, equivalent to the shared input. Further, at p. 3708 section 3.2 Spectrogram “A spectrogram is regarded as a very detailed and accurate representation of audio information. A common spectrogram is an image where one axis represents time, the other axis is frequency and the color of each point indicates the amplitude of those points”. Therefore, time and frequency are the dimensions, being 2 dimensions).

Referring to Claim 21, the combination of Zeng and Keskar teaches the system of claim 1, wherein the shared encoder and the plurality of task-adapter encoders are jointly trained to optimize a loss function that represents performance of the multi-task adapter neural network on the plurality of machine learning tasks and computational cost to perform the plurality of machine learning tasks (see Zeng at p. 3708 first paragraph “We assume that the cost function is J(·), and the parameters of the multi-task model are learned by minimizing the following formula:  
    PNG
    media_image1.png
    58
    108
    media_image1.png
    Greyscale
”. Therefore, this formula is given by Zeng for the purpose of minimizing the cost function, equivalent to optimizing a loss function).

Referring to Claim 22, the combination of Zeng and Keskar teaches the system of claim 21, wherein the loss function is a weighted sum of cross- entropy losses for the plurality of machine learning tasks and the computational cost of computing the predicted outputs by the plurality of task-adapter encoders for a given set of channel selection variables (see Zeng at p. 3708 first paragraph “We assume that the cost function is J(·), and the parameters of the multi-task model are learned by minimizing the following formula:  
    PNG
    media_image1.png
    58
    108
    media_image1.png
    Greyscale
”. Further, see Zeng at p. 3714 section 5.2 Experimental settings “The learning rate is 0.001 during training and the cost function is cross-entropy loss”).

Referring to independent Claim 23 and Claim 24, they are rejected on the same basis as independent claim 1 since they are analogous claims.

Claims 10-12 are rejected under 35 U.S.C. 103 as being unpatentable over Zeng in view of Keskar and further in view of Soldevila et al (US Pub. No. 2017/0011280- hereinafter Soldevila).
Referring to Claim 10, the combination of Zeng and Keskar teaches the system of claim 8, however, fails to teach wherein the shared encoder comprises a plurality of neural network layers, and wherein each of the plurality of neural network layers of the shared encoder outputs a three-dimensional tensor which is a stack of two-dimensional channel outputs.
Soldevila teaches, in an analogous system, wherein the shared encoder comprises a plurality of neural network layers, and wherein each of the plurality of neural network layers of the shared encoder outputs a three-dimensional tensor which is a stack of two-dimensional channel outputs (see Soldevila at [0052]: “During a forward pass of the NN 56 (in direction A indicated by the bold arrows), the filters 96, 98 are run in a sliding window fashion across the output of the previous layer (or the image itself for the first layer 82) in order to produce a 3D tensor 108, 106, etc., which is a stack of per-filter activation maps”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Zeng and Keskar with the above teachings of Soldevila by generating a respective predicted output for each of a plurality of machine learning tasks given a shared input, as taught by the combination of Zeng and Keskar, wherein the output is a 3D tensor, as taught by Soldevila. The modification would have been obvious because one of ordinary skill in the art would be motivated to run the filters in a CNN in a sliding window fashion across the output of the previous layer (or the image itself for the first layer) in order to produce a 3D tensor, which is a stack of per-filter activation maps. 

Referring to Claim 11, the combination of Zeng and Keskar teaches the system of claim 6, however, fails to teach wherein each of the plurality of neural network layers of each of the plurality of task-adapter encoders outputs a three- dimensional tensor which is a stack of two-dimensional channel outputs.
Soldevila teaches, in an analogous system, wherein each of the plurality of neural network layers of each of the plurality of task-adapter encoders outputs a three- dimensional tensor which is a stack of two-dimensional channel outputs (see Soldevila at [0052]: “During a forward pass of the NN 56 (in direction A indicated by the bold arrows), the filters 96, 98 are run in a sliding window fashion across the output of the previous layer (or the image itself for the first layer 82) in order to produce a 3D tensor 108, 106, etc., which is a stack of per-filter activation maps”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Zeng and Keskar with the above teachings of Soldevila by generating a respective predicted output for each of a plurality of machine learning tasks given a shared input, as taught by the combination of Zeng and Keskar, wherein the output is a 3D tensor, as taught by Soldevila. The modification would have been obvious because one of ordinary skill in the art would be motivated to run the filters in a CNN in a sliding window fashion across the output of the previous layer (or the image itself for the first layer) in order to produce a 3D tensor, which is a stack of per-filter activation maps. 

Referring to Claim 12, the combination of Zeng, Keskar and Soldevila teaches the system of claim 11, wherein each of the plurality of neural network layers in the shared encoder receives as input an output of a previous neural network layer in the shared encoder (see Zeng at p. 3712 Fig. 6. The two arrows which go into the "Pooling,/2" which is located in between the shared encoder (referred to as "Feature abstractor") and the "Classification layer", wherein it can be seen the processing occurring from the left direction passing through the layers towards the right direction, therefore, the output of the previous layers are the inputs of the next layers).
Allowable Subject Matter
Claims 13-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


	
Response to Arguments
Applicant's arguments filed 10/21/2025 have been fully considered.
In reference to Applicant’s arguments:
- Claim rejections under 35 USC 101.
Examiner’s response: 
            Rejections are withdrawn in view of amendments and applicant’s arguments.
In reference to Applicant’s arguments:
- Claim rejections under 35 USC 103.
Examiner’s response: 
Applicant’s arguments about the 103 prior art rejections for independent claims are mainly directed to the newly added limitation in independent claims 1, 23 and 24. These arguments have been fully considered but are moot in view of new grounds of rejection.
Examiner would like to respectfully suggest amending independent claims to include all the limitations from claim 13 in order to better reflect the alleged differences between the prior art of record and the instant application, as this claim contains allowable subject matter over the prior art. 







Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LUIS A SITIRICHE whose telephone number is (571)270-1316. The examiner can normally be reached M-F 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/LUIS A SITIRICHE/           Primary Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Mar 25, 2022
Application Filed
Jul 30, 2025
Non-Final Rejection — §103
Oct 21, 2025
Response Filed
Feb 10, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/392,690
Patent 12585947
MODIFYING COMPUTATIONAL GRAPHS
2y 5m to grant Granted Mar 24, 2026
17/164,756
Patent 12579476
ADAPTIVE LEARNING FOR IMAGE CLASSIFICATION
2y 5m to grant Granted Mar 17, 2026
17/684,752
Patent 12579445
MODELS FOR PREDICTING RESISTANCE TRENDS
2y 5m to grant Granted Mar 17, 2026
16/950,570
Patent 12572791
METHOD, DEVICE AND COMPUTER PROGRAM FOR PREDICTING A SUITABLE CONFIGURATION OF A MACHINE LEARNING SYSTEM FOR A TRAINING DATA SET
2y 5m to grant Granted Mar 10, 2026
17/216,362
Patent 12572857
Adaptive Probabilistic Latent Semantic Analysis System For Automated Document Coding And Review In Electronic Discovery
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+22.1%)
3y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 468 resolved cases by this examiner. Grant probability derived from career allow rate.