Office Action Analysis: 17899913 — CONTROLLABLE DYNAMIC MULTI-TASK ARCHITECTURES

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Claims 1-18 are presented for examination in this application, 17/899,913 filed  2022-08-31, having an effective filing date of 2021-09-03 via provisional application 63/240,522. 
	The Examiner cites particular sections in the references as applied to the claims
below for the convenience of the applicant(s). Although the specified citations are
representative of the teachings in the art and are applied to the specific limitations within
the individual claim, other passages and figures may apply as well. It is respectfully
requested that, in preparing responses, the applicant(s) fully consider the references in
their entirety as potentially teaching all or part of the claimed invention, as well as the
context of the passage as taught by the prior art or disclosed by the Examiner.

Drawings
	The drawings submitted on 2022-08-31 have been considered and accepted.

Information Disclosure Statement
Acknowledgement is made of the information disclosure statement filed 2022-08-31. All patents and non-patent literature have been considered.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


	Claims 1-18 are rejected under 35 U.S.C 101 as being unpatentable because the claimed invention in these claims is directed an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg 50-57 (January 7, 2019) (“2019 PEG”).

Regarding claim 1:
	Step 1 – Is the claim directed to a process, machine, manufacture, or a composition of matter? 
	Yes, the claim is directed to a method. 
	Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or a natural phenomenon?
	Yes, the claim recites an abstract idea: 
optimizing a branching regularized loss function to train an edge hypernet — this limitation amounts to mathematical calculations 2106.04(a)(2) I. C.
Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application? 
No, the claim recites additional elements that do not integrate the judicial exception into a practical application: 
a method for building a dynamic multi-task network — this limitation is directed to mere instructions to apply an exception, as the use of a computer or other machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
generating a hypernetwork configured to be trained for a plurality of tasks — this limitation amounts to merely indicating a field of use, specifically that of multi-task learning, or technological environment in which to apply a judicial exception per 2106.05(h).
receiving a task preference vector identifying a hierarchical priority for the plurality of tasks, and a resource constraint as a tuple — this limitation amounts to data gathering which is an insignificant extra-solution activity (see MPEP 2106.05(g)(3)) which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity.
finding tree sub-structures and the corresponding modulation of features for every tuple within an N-stream anchor network — this limitation is directed to mere instructions to apply an exception, as the use of a computer or other machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
training a weight hypernet, keeping the anchor network and the edge hypernet fixed — this limitation is directed to mere instructions to apply an exception, as the use of a computer or other machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception. Any additional elements that were determined to be insignificant extra-solution activity in step 2A prong 2 are further evaluated in step 2B on whether they are well-understood, routine, and conventional activities. The “receiving a task preference vector identifying a hierarchical priority for the plurality of tasks, and a resource constraint as a tuple” limitation was found to be an insignificant extra-solution activity in claim 1. This limitation is directed at a high-level of generality and amount to transmitting data over a network, which is a well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.). As discussed above with respect to perform the abstract idea amounts to no more than field of use to apply the exception. Generally linking the use of a judicial exception to a particular technological environment or field of use cannot provide an inventive concept. Thus, the claim is not patent eligible. 

Regarding claim 2: 
	Claim 2 recites a machine learning process recited at a high level of generality, which amounts to mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Claim 11 is analogous.

Regarding claim 3: 
	Claim 2 recites a machine learning process recited at a high level of generality, which amounts to mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Claim 12 is analogous.

Regarding claim 4:
	Claim 4 recites a machine learning process recited at a high level of generality, which amounts to mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Claim 13 is analogous

Regarding claim 5: 
	Claim 5 recites optimizing a task loss which merely amounts to being directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). Claim 14 is analogous.

Regarding claim 6: 
Claim 6 recites that tasks with higher preferences have a greater influence which merely amounts to being directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). Claim 15 is analogous.

Regarding claim 7: 
	Claim 7 recites calculating an active loss and an inactive loss which merely amounts to mathematical calculations (see MPEP 2106.04(a)(2) I. C.) Claim 16 is analogous.

Regarding claim 8: 
Claim 8 recites how the active loss is weighted which merely amounts to mathematical relationships (see MPEP 2106.04(a)(2) I. A.). Claim 17 is analogous.

Regarding claim 9: 
	Claim 9 recites the active loss formula which merely amounts to mathematical formulas or equations (see MPEP 2106.04(a)(2) I. B.). Claim 18 is analogous.

Regarding claim 10: 
Step 1 – Is the claim directed to a process, machine, manufacture, or a composition of matter? 
	Yes, the claim is directed to a method. 
	Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or a natural phenomenon?
	Yes, the claim recites an abstract idea: 
optimizing a branching regularized loss function to train an edge hypernet — this limitation amounts to mathematical calculations 2106.04(a)(2) I. C.
Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application? 
No, the claim recites additional elements that do not integrate the judicial exception into a practical application: 
a method for building a dynamic multi-task network — this limitation is directed to mere instructions to apply an exception, as the use of a computer or other machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
generating a hypernetwork configured to be trained for a plurality of tasks — this limitation amounts to merely indicating a field of use, specifically that of multi-task learning, or technological environment in which to apply a judicial exception per 2106.05(h).
receiving a task preference vector identifying a hierarchical priority for the plurality of tasks, and a resource constraint as a tuple — this limitation amounts to data gathering which is an insignificant extra-solution activity (see MPEP 2106.05(g)(3)) which has been recognized by the courts (as per Ultramercial, 772 F.3d at 715, 112 USPQ2d at 1754) as insignificant extra-solution activity.
finding tree sub-structures and the corresponding modulation of features for every tuple within an N-stream anchor network — this limitation is directed to mere instructions to apply an exception, as the use of a computer or other machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
training a weight hypernet, keeping the anchor network and the edge hypernet fixed — this limitation is directed to mere instructions to apply an exception, as the use of a computer or other machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
No, there are no additional elements that amount to significantly more than the judicial exception. Any additional elements that were determined to be insignificant extra-solution activity in step 2A prong 2 are further evaluated in step 2B on whether they are well-understood, routine, and conventional activities. The “receiving a task preference vector identifying a hierarchical priority for the plurality of tasks, and a resource constraint as a tuple” limitation was found to be an insignificant extra-solution activity in claim 1. This limitation is directed at a high-level of generality and amount to transmitting data over a network, which is a well-understood, routine, and conventional activity (see MPEP 2106.05(d) II.). As discussed above with respect to perform the abstract idea amounts to no more than field of use to apply the exception. Generally linking the use of a judicial exception to a particular technological environment or field of use cannot provide an inventive concept. Thus, the claim is not patent eligible.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-6 are rejected under 35 U.S.C 103 as being unpatentable over Lin et al. (“Controllable Pareto Multi-Task Learning” hereinafter referred to as Lin) in view of Sarafian et al. (“Recomposing the Reinforcement Learning Building Blocks with Hypernetworks” hereinafter referred to as Sarafian) in view of Mahabadi et al. (“Parameter-efficient Multi-Task Fine-tuning for Transformers via Shared Hypernetworks” hereinafter referred to as Mahabadi) in further view of Guo et al. (“Learning to Branch for Multi-Task Learning” hereinafter referred to as Guo).
Regarding claim 1: 
	Lin teaches a method for building a dynamic multi-task network, comprising
	generating a hypernetwork configured to be trained for a plurality of tasks (see section 2 ‘Related Work’: The hypernetwork is initially proposed for dynamic modeling and model compression.”. Also see section 3 ‘MTL as Multi-Objective Optimization’: “An MTL problem involves learning multiple related tasks at the same time.”.)
	receiving a task preference vector identifying a hierarchical priority for the plurality of tasks, (see section 4 ‘Preference-Based Solution Generator’: “As shown in Fig. 2, we want to build a solution generator to map a preference vector p to its corresponding solution θp. If an optimal generator θp = g(p|φ ∗ ) is obtained, MTL practitioners can assign their preference via the preference vector p, and directly obtain the corresponding solution θp with the specific trade-off among tasks”. Also see fig. 4) 
…
optimizing a branching regularized loss function to train an (see section 2: “A Hypernetwork (Ha et al., 2016) is a neural-network architecture designed to process a tuple (z, x) ∈ Z × X and output a value y ∈ Y”).



Lin does not explicitly mention the use of resource constraints as a tuple. Sarafian, however, explicitly teaches resource constraints as a tuple (see section 2: “A Hypernetwork (Ha et al., 2016) is a neural-network architecture designed to process a tuple (z, x) ∈ Z × X and output a value y ∈ Y”)
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, and Sarafian before him or her, to modify the method of claim 1 to include attributes of resource constraints as a tuple in order to process weights for a dynamic network (see section 2: “Hypernetwork (Ha et al., 2016) is a neural-network architecture designed to process a tuple (z, x) ∈ Z × X and output a value y ∈ Y . It is comprised of two networks, a primary network wθ : Z → R nw which produces weights wθ(z) for a dynamic network fwθ(z) : X → Y .”).	
	Lin in view of Sarafian does not teach an edge hypernet or training a weight hypernet, keeping the anchor network and the edge hypernetwork fixed.
Mahabadi, analogously teaches an edge hypernet (see pg. 1 fig. 1: “Left: Adapter integration in the T5 model. Right: Our HYPERFORMER adapter architecture. Following Houlsby et al. (2019), we include adapter modules after the two feed-forward layers. The Adapter hypernetwork h l A produces the weights (Ul τ and Dl τ ) for task-specific adapter modules conditioned on an input task embedding Iτ .”).
…
training a weight hypernet, keeping the anchor network and the edge hypernet fixed (see pg. 1 section 1: “During training, we only train hypernetwork parameters ν, task embeddings {Iτ } T τ=1, and layer normalizations in fθ(.), while the rest of the pretrained model parameters θ are fixed:”)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, and Mahabadi before him or her, to modify the method of claim 1 to include attributes of training a weight hypernet, keeping the anchor network and the edge hypernet fixed in order to fine-tune parameters efficiently (see pg. 2 section 2: “In summary, we make the following contributions: (1) We propose a parameter-efficient method for multi_task fine-tuning based on hypernetworks and adapter layers.”)
	Neither Lin nor Sarafian nor Mahabadi teaches finding tree sub-structures and the corresponding modulation of features for every tuple within an N-stream anchor network.
	Guo, however, analogously teaches finding tree sub-structures and the corresponding modulation of features for every (see fig. 2: “Illustrations the proposed learning to branch pipeline. (a) We initialize the sampling probability with a uniform distribution so each parent node has an equal chance to send its activation values to a child node. (b) The computed update gradients then increase the probability of sampling certain paths that are more likely to reduce the overall loss. (c) Once the overall validation loss converges, each child node selects one parent node with the highest sampling probability while removing unselected paths and current nodes. (d) We can construct a deeper tree-structured multi0task neural network by stacking such branching blocks.”)
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, Guo before him or her, to modify the method of claim 1 to include attributes of finding tree sub-structures and the corresponding modulation of features for every tuple within an N-stream anchor network in order to allow for effective and efficient network configuration sampling (see pg. 3 section 3.2: “The key ingredient for effective and efficient network configuration sampling is our proposed differentiable tree- structured network topology.”).

Regarding claim 2: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo teaches the method of claim 1.
	Lin further teaches wherein the N-stream anchor network has fixed weights for finding the tree sub-structures (pg. 6 section 7: “In this section, we validate the performance of the proposed controllable Pareto MTL method to generate trade-off curves for different MTL problems. We compare it with the following MTL algorithms: 1) Linear Scalarization: simple linear combination of different tasks with fixed weights”.).

Regarding claim 3: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo teaches the method of claim 2.
	Lin in view of Sarafian in further view of Mahabadi does not teach wherein finding the tree-substructures includes selecting a parent from every node. 
	Guo, however, analogously teaches wherein the finding the tree sub-structures includes selecting a parent from every node (see fig. 2 : “Figure 2. Illustrations of the proposed learning to branch pipeline. (a) We initialize the sampling probability with a uniform distribution so each parent node has an equal chance to send its activation values to a child node. (b) The computed update gradients then increase the probability of sampling certain paths that are more likely to reduce the overall loss. (c) Once the overall validation loss converges, each child node selects one parent node with the highest sampling probability while removing unselected paths and parent nodes. (d) We can construct a deeper tree-structured multi-task neural network by stacking such branching blocks.”)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, and Guo before him or her, to modify the method of claim 3 to include attributes of wherein the finding the tree sub-structures includes selecting a parent from every node in order to sample parent nodes to further select one and remove unselected paths and parent nodes (see Guo at fig. 2: “(a) We initialize the sampling probability with a uniform distribution so each parent node has an equal chance to send its activation values to a child node. (b) The computed update gradients then increase the probability of sampling certain paths that are more likely to reduce the overall loss. (c) Once the overall validation loss converges, each child node selects one parent node with the highest sampling probability while removing unselected paths and parent nodes. (d) We can construct a deeper tree-structured multi-task neural network by stacking such branching blocks.”)

Regarding claim 4: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo teaches the method of claim 3.
	Lin in view of Sarafian in further view of Guo does not explicitly teach wherein the edge hypernet predicts the branching parameters within the anchor network. 
	Mahabadi, however, analogously teaches wherein the edge hypernet predicts the branching parameters within the anchor network (see fig. 1: “Similarly, the layer normalization hypernetwork h l LN generates the conditional layer normalization parameters (βτ and γτ ).”)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, and Guo before him or her, to modify the method of claim 4 to include attributes of teaches wherein the (see pg. 2 section 1  : “In summary, we make the following contributions: (1) We propose a parameter-efficient method for multi-task fine-tuning based on hypernetworks and adapter layers.”)

Regarding claim 5: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo teaches the method of claim 4.
	Lin further teaches optimizing a task loss, Ltask, by taking into account the individual task performances without considering a computational cost (see pg. 3 section 3: “For training a deep multi-task neural network, it is to minimize the losses for multiple tasks: 
	
    PNG
    media_image1.png
    43
    510
    media_image1.png
    Greyscale

where θ is the neural network parameters and Li(θ) is the empirical loss of the i-th task.”)

Regarding claim 6:
Lin in view of Sarafian in further view of Mahabadi in further view of Guo teaches the method of claim 5.
	Lin further teaches wherein tasks with higher preferences have a greater influence (see pg. 4 section 4: “Preference-Based Linear Scalarization: A simple and straightforward approach is to define the preference vec_tor p and the corresponding solution θp via the weighted linear scalarization: 

    PNG
    media_image2.png
    53
    334
    media_image2.png
    Greyscale

where the preference vector p = (p1 , p2 , · · · , pm) is the weight for each task, and θp is the optimal solution for the weighted linear scalarization”)

Claims 7-8 are rejected under 35 U.S.C 103 as being unpatentable over Lin et al. (“Controllable Pareto Multi-Task Learning” hereinafter referred to as Lin) in view of Sarafian et al. (“Recomposing the Reinforcement Learning Building Blocks with Hypernetworks” hereinafter referred to as Sarafian) in further view of Mahabadi et al. (“Parameter-efficient Multi-Task Fine-tuning for Transformers via Shared Hypernetworks” hereinafter referred to as Mahabadi) in further view of Guo et al. (“Learning to Branch for Multi-Task Learning” hereinafter referred to as Guo) and further in view of Bruggemann et al. (“Automated Search for Resource-Efficient Branched Multi-Task Networks” hereinafter referred to as Bruggemann).
Regarding claim 7: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo teaches the method of claim 6.
Lin in view of Sarafian in further view of Mahabadi in further view of Guo does not teach calculating an active loss and an inactive loss to determine branching within the anchor network. 
Bruggemann, however, analogously teaches calculating an active loss and an inactive loss to determine branching within the anchor network (see fig. 1: “To form a branching structure, the subgraphs are combined according to the sampling consensus, yielding task groupings at each layer. During the architecture search, the masks z (·) are learned by minimizing a resource loss Lresource (computed using a look-up table) and task performance losses LA to LD simultaneously”).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, Guo, and Bruggemann before him or her, to modify the method of claim 7 to include attributes of calculating an active loss and an inactive loss to determine branching within the anchor network in order to actively navigate the efficiency vs. performance trade-off (see Bruggemann at pg. 5 section 3.3: “To obtain more compact encoder structures and to actively navigate the efficiency vs. performance trade-off”)

Regarding claim 8: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo and further in view of Bruggemann teaches the method of claim 7.
Lin in view of Sarafian in further view of Mahabadi in further view of Guo does not teach the active loss is additionally weighted by a cost preference, c, of the resource constraint tuple to enable the control of total computational cost.
Bruggemann, however, analogously teaches the active loss is additionally weighted by a cost preference, c, of the resource constraint tuple to enable the control of total computational cost (see pgs. 5-6 section 3.3: “To obtain more compact encoder structures and to actively navigate the efficiency vs. performance trade-off, we introduce a resource-aware term Lresource in the objective function: 

    PNG
    media_image3.png
    37
    273
    media_image3.png
    Greyscale

… The derivations presented in this section yield a proxyless resource loss function, i.e., it encourages solutions which directly minimize the expected resource cost of the final model.” )
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, Guo, and Bruggemann before him or her, to modify the method of claim 8 to include attributes of wherein the active loss is additionally weighted by a cost preference, c, of the resource constraint tuple to enable the control of total computational cost in order to actively navigate the efficiency vs. performance trade-off (see Bruggemann at pg. 5 section 3.3: “To obtain more compact encoder structures and to actively navigate the efficiency vs. performance trade-off”)

Claims 10-15, and 18 are rejected under 35 U.S.C 103 as being unpatentable over Lin et al. (“Controllable Pareto Multi-Task Learning” hereinafter referred to as Lin) in view of Sarafian et al. (“Recomposing the Reinforcement Learning Building Blocks with Hypernetworks” hereinafter referred to as Sarafian) in view of Mahabadi et al. (“Parameter-efficient Multi-Task Fine-tuning for Transformers via Shared Hypernetworks” hereinafter referred to as Mahabadi) in further view of Guo et al. (“Learning to Branch for Multi-Task Learning” hereinafter referred to as Guo) in further in view of Meyerson et al. (US20190130257A1 hereinafter referred to as Meyerson)
Regarding claim 10: 
Lin teaches a method for building a dynamic multi-task network, comprising
	generating a hypernetwork configured to be trained for a plurality of tasks (see section 2 ‘Related Work’: The hypernetwork is initially proposed for dynamic modeling and model compression.”. Also see section 3 ‘MTL as Multi-Objective Optimization’: “An MTL problem involves learning multiple related tasks at the same time.”.)
	receiving a task preference vector identifying a hierarchical priority for the plurality of tasks, (see section 4 ‘Preference-Based Solution Generator’: “As shown in Fig. 2, we want to build a solution generator to map a preference vector p to its corresponding solution θp. If an optimal generator θp = g(p|φ ∗ ) is obtained, MTL practitioners can assign their preference via the preference vector p, and directly obtain the corresponding solution θp with the specific trade-off among tasks”. Also see fig. 4)
…
	optimizing a branching regularized loss function to train an edge hypernet (see section 2: “A Hypernetwork (Ha et al., 2016) is a neural-network architecture designed to process a tuple (z, x) ∈ Z × X and output a value y ∈ Y”).
Lin does not explicitly mention the use of resource constraints as a tuple. Sarafian, however, explicitly teaches resource constraints as a tuple (see section 2: “A Hypernetwork (Ha et al., 2016) is a neural-network architecture designed to process a tuple (z, x) ∈ Z × X and output a value y ∈ Y”).
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin and Sarafian before him or her, to modify the system of claim 10 to include attributes of resource constraints as a tuple in order to process weights for a dynamic network (see section 2: “Hypernetwork (Ha et al., 2016) is a neural-network ar_chitecture designed to process a tuple (z, x) ∈ Z × X and output a value y ∈ Y . It is comprised of two networks, a primary network wθ : Z → R nw which produces weights wθ(z) for a dynamic network fwθ(z) : X → Y .”)
	Lin in view of Sarafian does not teach an edge hypernet or training a weight hypernet, keeping the anchor network and the edge hypernetwork fixed.
Mahabadi, analogously teaches an edge hypernet (see pg. 1 fig. 1: “Left: Adapter integration in the T5 model. Right: Our HYPERFORMER adapter architecture. Following Houlsby et al. (2019), we include adapter modules after the two feed-forward layers. The Adapter hypernetwork h l A produces the weights (Ul τ and Dl τ ) for task-specific adapter modules conditioned on an input task embedding Iτ .”).
…
	training a weight hypernet, keeping the anchor network and the edge hypernet fixed (see pg. 1 section 1: “During training, we only train hypernetwork parameters ν, task embeddings {Iτ } T τ=1, and layer normalizations in fθ(.), while the rest of the pretrained model parameters θ are fixed:”).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, and Mahabadi before him or her, to modify the system of claim 10 to include attributes of training a weight hypernet, keeping the anchor network and the edge hypernet fixed in order to fine-tune parameters efficiently (see pg. 2 section 2: “In summary, we make the following contributions: (1) We propose a parameter-efficient method for multi_task fine-tuning based on hypernetworks and adapter layers.”)
	Neither Lin nor Sarafian nor Mahabadi teaches finding tree sub-structures and the corresponding modulation of features for every tuple within an N-stream anchor network.
	Guo, however, analogously teaches finding tree sub-structures and the corresponding modulation of features for every (see fig. 2: “Illustrations the proposed learning to branch pipeline. (a) We initialize the sampling probability with a uniform distribution so each parent node has an equal chance to send its activation values to a child node. (b) The computed update gradients then increase the probability of sampling certain paths that are more likely to reduce the overall loss. (c) Once the overall validation loss converges, each child node selects one parent node with the highest sampling probability while removing unselected paths and current nodes. (d) We can construct a deeper tree-structured multi0task neural network by stacking such branching blocks.”).
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, Guo before him or her, to modify the system of claim 10 to include attributes of finding tree sub-structures and the corresponding modulation of features for every tuple within an N-stream anchor network in order to allow for effective and efficient network configuration sampling (see pg. 3 section 3.2: “The key ingredient for effective and efficient network configuration sampling is our proposed differentiable tree- structured network topology.”).
Neither Lin, Sarafian, Mahabadi, nor Guo teaches one or more processors, a memory in communication with one or more processors.
Meyerson, however, analogously teaches one or more processors, a memory in communication with one or more processors (see “A third exemplary embodiment is a computer-implemented learning process for training and sharing generic functional modules across multiple diverse (architecture, task) pairs for solving multiple diverse problems. The computer-implemented process includes: means for decomposing by one or more specially programmed processors each of the multiple (architecture, task) pairs into equally sized pseudo-tasks; means for aligning by the one or more specially programmed processors pseudo-tasks across the multiple diverse architectures; and means for sharing by the one or more specially programmed processors learned parameters across the aligned pseudo-tasks, wherein each diverse architecture is preserved in performance of its paired task.”).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, Guo, and Meyerson before him or her, to modify the system of claim 10 to include attributes of one or more processors, a memory in communication with one or more processors in order to perform the methods of the disclosure (see para [0187]: “Yet another implementation may include a system including memory and one or more processors operable to execute instructions, stored in the memory, to perform the method described above.”)

Regarding claim 11: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo and further in view of Meyerson teaches the system of claim 10.
	Lin further teaches wherein the N-stream anchor network has fixed weights for finding the tree sub-structures (pg. 6 section 7: “In this section, we validate the performance of the proposed controllable Pareto MTL method to generate trade-off curves for different MTL problems. We compare it with the following MTL algorithms: 1) Linear Scalarization: simple linear combination of different tasks with fixed weights”.).

Regarding claim 12: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo and further in view of Meyerson teaches the system of claim 11.
	Lin in view of Sarafian in further view of Mahabadi in further view of Meyerson does not teach wherein finding the tree-substructures includes selecting a parent from every node. 
	Guo, however, analogously teaches wherein the finding the tree sub-structures includes selecting a parent from every node (see fig. 2 : “Figure 2. Illustrations of the proposed learning to branch pipeline. (a) We initialize the sampling probability with a uniform distribution so each parent node has an equal chance to send its activation values to a child node. (b) The computed update gradients then increase the probability of sampling certain paths that are more likely to reduce the overall loss. (c) Once the overall validation loss converges, each child node selects one parent node with the highest sampling probability while removing unselected paths and parent nodes. (d) We can construct a deeper tree-structured multi-task neural network by stacking such branching blocks.”)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, ,Guo, and Meyerson before him or her, to modify the system of claim 12 to include attributes of wherein the finding the tree sub-structures includes selecting a parent from every node in order to sample parent nodes to further select one and remove unselected paths and parent nodes (see Guo at fig. 2: “(a) We initialize the sampling probability with a uniform distribution so each parent node has an equal chance to send its activation values to a child node. (b) The computed update gradients then increase the probability of sampling certain paths that are more likely to reduce the overall loss. (c) Once the overall validation loss converges, each child node selects one parent node with the highest sampling probability while removing unselected paths and parent nodes. (d) We can construct a deeper tree-structured multi-task neural network by stacking such branching blocks.”). 

Regarding claim 13: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo and further in view of Meyerson teaches the system of claim 12.
	Lin in view of Sarafian in further view of Guo and further in view of Meyerson does not explicitly teach wherein the edge hypernet predicts the branching parameters within the anchor network. 
	Mahabadi, however, analogously teaches wherein the edge hypernet predicts the branching parameters within the anchor network (see fig. 1: “Similarly, the layer normalization hypernetwork h l LN generates the conditional layer normalization parameters (βτ and γτ ).”)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, Guo, and Meyerson before him or her, to modify the method of system of claim 13 to include attributes of teaches wherein the edge hypernet predicts the branching parameters within the anchor network in order to fine-tune parameters efficiently (see pg. 2 section 1  : “In summary, we make the following contributions: (1) We propose a parameter-efficient method for multi-task fine-tuning based on hypernetworks and adapter layers.”)

Regarding claim 14: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo and further in view of Meyerson teaches the system of claim 13.
	Lin further teaches optimizing a task loss, Ltask, by taking into account the individual task performances without considering a computational cost (see pg. 3 section 3: “For training a deep multi-task neural network, it is to minimize the losses for multiple tasks: 
	
    PNG
    media_image1.png
    43
    510
    media_image1.png
    Greyscale

where θ is the neural network parameters and Li(θ) is the empirical loss of the i-th task.”)

Regarding claim 15:
Lin in view of Sarafian in further view of Mahabadi in further view of Guo and further in view of Meyerson teaches the system of claim 14.
	Lin further teaches wherein tasks with higher preferences have a greater influence (see pg. 4 section 4: “Preference-Based Linear Scalarization: A simple and straightforward approach is to define the preference vec_tor p and the corresponding solution θp via the weighted linear scalarization: 

    PNG
    media_image2.png
    53
    334
    media_image2.png
    Greyscale

where the preference vector p = (p1 , p2 , · · · , pm) is the weight for each task, and θp is the optimal solution for the weighted linear scalarization”)

Claims 16 and 17 are rejected under 35 U.S.C 103 as being unpatentable over Lin et al. (“Controllable Pareto Multi-Task Learning” hereinafter referred to as Lin) in view of Sarafian et al. (“Recomposing the Reinforcement Learning Building Blocks with Hypernetworks” hereinafter referred to as Sarafian) in view of Mahabadi et al. (“Parameter-efficient Multi-Task Fine-tuning for Transformers via Shared Hypernetworks” hereinafter referred to as Mahabadi) in further view of Guo et al. (“Learning to Branch for Multi-Task Learning” hereinafter referred to as Guo) in further in view of Meyerson et al. (US20190130257A1 hereinafter referred to as Meyerson) and further in view of Bruggemann et al. (“Automated Search for Resource-Efficient Branched Multi-Task Networks” hereinafter referred to as Bruggemann).
Regarding claim 16: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo and further in view of Meyerson teaches the system of claim 15.
Lin in view of Sarafian in further view of Mahabadi in further view of Guo and further in view of Meyerson does not teach calculating an active loss and an inactive loss to determine branching within the anchor network. 
Bruggemann, however, analogously teaches calculating an active loss and an inactive loss to determine branching within the anchor network (see fig. 1: “To form a branching structure, the subgraphs are combined according to the sampling consensus, yielding task groupings at each layer. During the architecture search, the masks z (·) are learned by minimizing a resource loss Lresource (computed using a look-up table) and task performance losses LA to LD simultaneously”).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, Guo, Meyerson, and Bruggemann before him or her, to modify the system of claim 16 to include attributes of calculating an active loss and an inactive loss to determine branching within the anchor network in order to actively navigate the efficiency vs. performance trade-off (see Bruggemann at pg. 5 section 3.3: “To obtain more compact encoder structures and to actively navigate the efficiency vs. performance trade-off”).

Regarding claim 17: 
Lin in view of Sarafian in further view of Mahabadi in further view of Guo in further view of Meyerson and further in view of Bruggemann teaches the system of claim 16.
Lin in view of Sarafian in further view of Guo in further view of Mahabadi and further in view of Meyerson does not teach the active loss is additionally weighted by a cost preference, c, of the resource constraint tuple to enable the control of total computational cost.
Bruggemann, however, analogously teaches the active loss is additionally weighted by a cost preference, c, of the resource constraint tuple to enable the control of total computational cost (see pgs. 5-6 section 3.3: “To obtain more compact encoder structures and to actively navigate the efficiency vs. performance trade-off, we introduce a resource-aware term Lresource in the objective function: 

    PNG
    media_image3.png
    37
    273
    media_image3.png
    Greyscale

… The derivations presented in this section yield a proxyless resource loss function, i.e., it encourages solutions which directly minimize the expected resource cost of the final model.” )
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lin, Sarafian, Mahabadi, Guo, Meyerson, and Bruggemann before him or her, to modify the system of claim 17 to include attributes of wherein the active loss is additionally weighted by a cost preference, c, of the resource constraint tuple to enable the control of total computational cost in order to actively navigate the efficiency vs. performance trade-off (see Bruggemann at pg. 5 section 3.3: “To obtain more compact encoder structures and to actively navigate the efficiency vs. performance trade-off”).

Allowable Subject Matter
Claims 9 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims provided the 101 rejections are overcome. 
Regarding claims 9 and 18, the closest prior art of record to the limitations of the aforementioned claims, Bruggemann et al. (“Automated Search for Resource-Efficient Branched Multi-Task Networks”) recites training a network on tasks to calculate task groupings, using a loss based on a similarity analysis matrix for finding branching structures, and utilizing a loss for edge detection (appendix B).
The examiner has found that the distinct features of the applicant’s claimed invention over the prior art is the explicit claiming of the aforementioned limitations, specified in claims 9 and 18. When viewed individually or in combination with other prior art of record, the limitations specified in claims 9 and 18 are distinct.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew A Bracero whose telephone number is (571)270-0592. The examiner can normally be reached Monday - Friday 9:00a.m. - 5:00 p.m. ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at Monday – Friday 9:00 a.m. – 5:00 p.m. E.T at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW BRACERO/Examiner, Art Unit 2126                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
CONTROLLABLE DYNAMIC MULTI-TASK ARCHITECTURES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

CONTROLLABLE DYNAMIC MULTI-TASK ARCHITECTURES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email