Last updated: April 19, 2026
Application No. 17/102,776
TRANSFER LEARNING FOR SOUND EVENT CLASSIFICATION

Non-Final OA §103
Filed
Nov 24, 2020
Examiner
BOSTWICK, SIDNEY VINCENT
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Qualcomm Incorporated
OA Round
7 (Non-Final)
Interview Optional

— +38.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 136 resolved cases, 2023–2026
Examiner Intelligence

BOSTWICK, SIDNEY VINCENT View full profile →
Grants 52% of resolved cases
Career Allow Rate
71 granted / 136 resolved
-2.8% vs TC avg
Strong +38% interview lift
Without
With
+38.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
68 currently pending
Career history
204
Total Applications
across all art units
Statute-Specific Performance

§101
24.4%
-15.6% vs TC avg
§103
40.9%
+0.9% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
21.9%
-18.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on November 19, 2025, in which claims 1, 3-5, 14, 15, 20, 21, and 28-30 are currently amended. Claims 1-17, 20-22, 25, and 28-36 are currently pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on November 19, 2025 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant’s arguments with respect to rejection of claims 1-17, 20-22, 25, and 28-36 under 35 U.S.C. 101 based on amendment have been considered and are persuasive. The rejection under 35 U.S.C. 101 is withdrawn in view of Applicant’s amendments and the Remarks submitted 11/19/2025.

Applicant’s arguments with respect to rejection of claims 1-17, 20-22, 25, and 28-36 under 35 U.S.C. 103 based on amendment have been considered. The argument is moot in view of a new ground of rejection set forth below.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“means for storing” in claim 28
“means for processing” in claim 28
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  The instant specification provides explicit support for “means for processing” at least at [¶0118] and the “means for storing” at least at [¶0054].
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

	Claims 1-4, 6, 7, 9, 14, 20-22, 25, 28-30, and 34-36 are rejected under U.S.C. §103 as being unpatentable over the combination of Moschitti (“Transfer Learning for Sequence Labeling Using Source Model and Target Data”, 2019) and Fukuda (US20200034703A1).

	 Regarding claim 1, Moschitti teaches A device comprising: a memory configured to store a trained first neural network including a first set of parameters, the trained first neural network trained to detect a first set of [sound] classes; ([p. 3] "Training of a source model We supposed that a sequence labeling model is trained on source data until the optimal parameters ˆθS are obtained. These will be saved and reused for transfer learning" Saving interpreted as synonymous with storing)
	and one or more processors configured to:([pp. 10-11] "https://github.com/liah-chan/transferNER " See Moschitti source on Github which is provided as Python source files which are instructions must necessarily be executed on a processor)
	after a second neural network is initialized based on the trained first neural network, wherein the trained first neural network is configured to output a first output having a first number of classifications, wherein the second neural network is configured to output a second output having a second number of classifications greater than the first number of classifications([p. 3] "In the initial phase, a sequence labeling model, MS, is trained on a source dataset, DS, which has E classes. Then, in the next phase, a new model, MT, needs to be learned on target dataset, DT, which contains new input examples and E+M classes, where M is the number of new classes" See also "copy from trained parameter" in Algorithm 2 for how second neural network is initialized based on the trained first neural network)
	after generation of one or more coupling networks configured to combine classification results generated by the second neural network and the trained first neural network, wherein the one or more coupling networks are configured to output a third output having the second number of classifications, the third output based on a combination of the first output and the second output([p. 4] "We design a neural adapter, shown in Fig. 2, to solve the problem of disagreement in annotations between the source and target data. This is essentially a component that helps to map the predictions from the output space of the source domain into that of the target domain" See FIG. 1 and FIG. 2.  Neural Adapter interpreted as coupling network configured to combine classification results generated by the second neural network and the trained first neural network.  More specifically bridging the separate models respective first and second outputs is interpreted as combining classification results to generate the third output. (source model generates first output, target model pre-adapter generates second output, target model after adapter generates third output))
	and while the second output of the second neural network and the first output of the trained first neural network are linked as input to the one or more coupling networks,  adapt the second neural network and the one or more coupling networks to generate a trained second neural network and trained one or more coupling networks;([p. 4] "The parameters of the adapter are jointly learned in the subsequent step with the rest of the target model parameters. The parameters of the source model is, however, not updated")
	based on a comparison between an accuracy of [sound] classes assigned by the trained second neural network to [audio] data samples of the first set of [sound] classes and an accuracy of [sound] classes assigned by the trained first neural network to the audio data samples of the first set of [sound] classes select the trained second neural network or a combination of the trained first neural network, the trained second neural network, and the trained one or more coupling networks, as an active [sound] event classifier([p. 6] "the comparison between the results of the transferred models with adapter and those without adapter shows a consistent improvement on F1 score over the original NE categories.  In some cases, for example, while using the adapter on the I-CAB dataset, the transfer model performance of the original NE categories even surpasses the F1 source of the model [...] we show in detail the improvement obtained by using the adapter in Figure 4" Moschitti explicitly selects models for experimental comparison of the accuracy and explicitly selects the transfer model based on an improvement in accuracy.  See also FIG. 3 and 4).
	However, Moschitti does not explicitly teach perform, based on the active sound event classifier, a sound event classification operation configured to recognize a sound event in an audio signal.

	Fukuda, in the same field of endeavor, teaches perform, based on the active sound event classifier, a sound event classification operation configured to recognize a sound event in an audio signal([¶0030] "The teacher neural network 1 (210A) may receive “input data 1” as a teacher input data, and output a soft label output corresponding to the input data 1 from its output layer (shown as 220A). The soft label output may be a classification of the audio data identifying phonemes." See also FIG. 2).

	Moschitti as well as Fukuda are directed towards machine learning transfer learning.  Therefore, Moschitti as well as Fukuda are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Moschitti with the teachings of Fukuda by applying the transfer learning method in Moschitti to sound classification.  While it would have already been an obvious design choice to use the model in Moschitti for sound classification, this is explicitly reinforced by Fukuda who provides as additional motivation for combination ([¶0069] “The client computer 610 may obtain audio data 640 of a speech from a person (e.g., a user of the client computer 610) as teacher input data.”). This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 2, the combination of Moschitti and Fukuda teaches The device of claim 1, wherein, the one or more processors are configured to: perform the comparison between the accuracy of sound classes assigned by the trained second neural network to the audio data samples of the first set of sound classes and the accuracy of sound classes assigned by the trained first neural network to the audio data samples of the first set of sound classes;(Fukuda [¶0027] "the teacher neural networks and the student neural networks may receive audio data as the input data and output a classification of the audio data as the soft label output." [¶0028] "Input data 1 and input data 2 may be audio data of human speech sampled at a particular sampling frequency" [¶0048] "the student training section 150, at block 332, may select a teacher neural network based on the accuracy evaluated at block 312. In the embodiment, the student training section 150 may select, at block 332, a less accurate teacher neural network earlier than other teacher neural networks among the plurality of teacher neural networks in terms of iterations of block 320 to block 350. In a specific embodiment, the student training section 150 may select, at block 332, the most accurate teacher neural network among the plurality of teacher neural networks last (e.g., the last loop of block 332 to block 342)" [¶0061] "the student training section 150 may use a soft label output of the most accurate teacher neural network (e.g., Teacher NN3) among the plurality of teacher neural networks at the last iteration of block 320 to block 350 for each input data.")
	determine a value of a metric based on the comparison(Fukuda [¶0052] " the student training section 150 may determine θ such that L(θ) is minimized. L(θ) is defined as follows: L(θ)=−Σi q ilog pi, where θ represents the student neural network including weights between nodes and trainable parameters of the student neural network, i represents an index of nodes in the output layer of the student neural network and the teacher neural networks, qi represents a value of i-th index" I interpreted as a determined value of a metric based on the comparison.)
	determine whether to discard the trained first neural network further based on the value of the metric(Fukuda [¶0056] "At block 350, the student training section 150 may determine whether to continue the training of the student neural network. The student training section 150, at block 350, may go back to block 320 to begin a new iteration starting at block 320 with selecting new teacher input data, unless the student training section 150 determines, at block 350, to end the training, in which point the student training section150 may end the training of the student neural network." ending training with teacher network interpreted as synonymous with deciding to discard the trained first neural network.).
	
	 Regarding claim 3, the combination of Moschitti and Fukuda teaches The device of claim 1, wherein the first output of the trained first neural network indicates a sound class assigned to particular audio data samples by the trained first neural network and the second output of the trained second neural network indicates a sound class assigned to the particular audio data samples by the trained second neural network.(Fukuda [¶0027] "FIG. 2 shows an exemplary framework, according to an embodiment of the present invention. In a specific embodiment, the teacher neural networks and the student neural networks may receive audio data as the input data and output a classification of the audio data as the soft label output.")
	the second neural network trained to detect the first set of sound classes and a second set of sound classes, (Fukuda [¶0025] " the student training section 150 may use common teacher input data and each soft label output from the plurality teacher neural networks for each training." [¶0027] " the teacher neural networks and the student neural networks may receive audio data as the input data and output a classification of the audio data as the soft label output.")
	and the trained first neural network trained independently of the second set of sound classes(Fukuda See FIG. 3 block 310).
	
	 Regarding claim 4, the combination of Moschitti and Fukuda teaches The device of claim 1, wherein the first output of the trained first neural network includes a first count of data elements corresponding to the first number of [sound] classes of the first set of [sound] classes, (Moschitti [p. 1] "(i) a source model, MS, already trained to recognize a certain number of categories on the source data" [p. 4] "the fully-connected layer after the word BLSTM maps the output h to a vector p of size nE")
	the second output of the trained second neural network includes a second count of data elements corresponding to the second number of sound classes of a second set of sound classes, (Moschitti [p. 1] "training a new model, MT, on the target data, DT, where new categories appear, in addition to those of the DS" [p. 4] "we
extend the output layer by size nM, where M is the number of new categories")
	and the trained one or more coupling networks include a neural adapter comprising one or more adapter layers (Moschitti [p. 1] "we pro pose to use a neural adapter: it connects MS to MT, also enabling the latter to use the features from the former" See also FIG. 1 and FIG. 2)
	configured to generate, based on the first output of the trained first neural network, a fourth output having the second count of data elements.(Moschitti [p. 4] "We design a neural adapter, shown in Fig. 2, to solve the problem of disagreement in annotations between the source and target data. This is essentially a component that helps to map the predictions from the output space of the source domain into that of the target domain [...] the neural adapter connects each output of the fully-connected layer in MS to the corresponding output of MT [...] obtained by pT t = at⊕pT t, where at = [−→ at⊕←− at] and ⊕is the element-wise summation" The adapter's input is explicitly the source model's fully-connected output and it produces an adapter output which is then combined element-wise with the target vector.  Element wise summation requires the vectors to be the same dimensionality so a_t must match the target output vector size (the second count)).
	
	 Regarding claim 6, the combination of Moschitti and Fukuda teaches The device of claim 1, wherein an output layer of the trained first neural network includes N output nodes, and an output layer of the second neural network includes N+K output nodes, where N is an integer greater than or equal to one, and K is an integer greater than or equal to one.(Moschitti [p. 3] "In the initial phase, a sequence labeling model, MS, is trained on a source dataset, DS, which has E classes. Then, in the next phase, a new model, MT, needs to be learned on target dataset, DT, which contains new input examples and E+M classes, where M is the number of new classes" [p. 4] "the fully-connected layer after the word BLSTM maps the output h to a vector p of size nE [...] the associated weight matrix of the fully-connected layer Ws O also updates from the original shape nC ×p to a new matrix Ws O of shape (nC +nM)×p." See also "copy from trained parameter" in Algorithm 2 for how second neural network is initialized based on the trained first neural network)
	and the N output nodes correspond to N [sound] event classes that the trained first neural network is to recognize and the N+K output nodes include N output nodes that correspond to the N [sound] event classes and K output nodes that correspond to K additional [sound] event classes.(Moschitti [p. 3] "In the initial phase, a sequence labeling model, MS, is trained on a source dataset, DS, which has E classes. Then, in the next phase, a new model, MT, needs to be learned on target dataset, DT, which contains new input examples and E+M classes, where M is the number of new classes" The combination with Fukuda is relied upon for the sound specific classes).
	
	 Regarding claim 7, the combination of Moschitti and Fukuda teaches The device of claim 1, wherein: the set of parameters of the trained first neural network include linking weights, the second neural network includes a second set of parameters including linking weights, (Moschitti [p. 4] "all the other parameters, θ¯O, i.e, those not in the output layer, are initialized with the corresponding parameters from the source model, i.e., ˆθS¯O . This way, the associated weight matrix of the fully-connected layer Ws O also updates from the original shape nC ×p to a new matrix Ws O of shape (nC +nM)×p. Note that the parameters ˆθS O and all the other parameters, ˆθS¯O , are essentially the weights in the matrix Ws O and the weights in the other layers" Weights interpreted as synonymous with linking weights)
	the one or more coupling networks include a third set of parameters including linking weights, (Moschitti [p. 4] "More precisely, we use − → A and ←− A to denote the forward → and backward adapter (i.e., a BLSTM). It takes the output of the fully-connected layer pS t as input at each time step t" BLSTM has weights by definition.  BLSTM weights interpreted as third set of parameters including linking weights.)
	and to adapt the second neural network and the one or more coupling networks, the one or more processors are configured to iteratively update the linking weights of the second neural network and the linking weights of the one or more coupling networks, (Moschitti [p. 4] "for e = 1 →n epochs do […] θT := θT −α∆θTL […] The parameters of the adapter are jointly learned"  See Algorithm 3 Target Model Training)
	until a terminal condition is satisfied, to generate the trained second neural network and the trained one or more coupling networks(Moschitti [p. 4] "Training the target model In Algorithm 3, the new parameters are updated as a standard training cycle (we use a validation set and early stopping as in the source model training" early stopping interpreted as terminal condition).
	
	 Regarding claim 9, the combination of Moschitti and Fukuda teaches The device of claim 1, wherein,  the one or more processors are configured to: prior to initialization of the second neural network, designate the trained first neural network as the active sound event classifier (Fukuda [¶0039] "At block 312, a processing section, such as the processing section 140, may evaluate the plurality of teacher neural networks. The processing section 140 may evaluate an accuracy of each of the plurality of teacher neural networks using test data." See FIG. 3 312)
	and based on a result of the comparison designate the combination of the trained first neural network, the trained second neural network, and the trained one or more coupling networks together as the active sound event classifier based on a determination not to discard the trained first neural network.(Fukuda [¶0027] "the teacher neural networks and the student neural networks may receive audio data as the input data and output a classification of the audio data as the soft label output." [¶0028] "Input data 1 and input data 2 may be audio data of human speech sampled at a particular sampling frequency" [¶0048] "the student training section 150, at block 332, may select a teacher neural network based on the accuracy evaluated at block 312. In the embodiment, the student training section 150 may select, at block 332, a less accurate teacher neural network earlier than other teacher neural networks among the plurality of teacher neural networks in terms of iterations of block 320 to block 350. In a specific embodiment, the student training section 150 may select, at block 332, the most accurate teacher neural network among the plurality of teacher neural networks last (e.g., the last loop of block 332 to block 342)" [¶0061] "the student training section 150 may use a soft label output of the most accurate teacher neural network (e.g., Teacher NN3) among the plurality of teacher neural networks at the last iteration of block 320 to block 350 for each input data.").
	
	 Regarding claim 14, claim 14 is directed towards the method performed by the device of claim 1.  Therefore, the rejection applied to claim 1 also applies to claim 14.
	
	 Regarding claim 20, the combination of Moschitti and Fukuda teaches The method of claim 14, further comprising: training a first neural network to detect the first set of sound classes  to generate the trained first neural network, the trained first neural network trained independently of a second set of sound classes; (Fukuda [¶0028] "In the embodiment of FIG. 2, one teacher neural network 1 (shown as 210A) receives input data 205, such as “input data 1”, and another teacher neural network 2 (shown as 210B) receives the same or substantially the same input data for each training. For example, Teacher neural networks 210A and 210B both receive “input data 1.” in a training, and then both receive “input data 2” in another training" See FIG. 2, output of second teacher network interpreted as second set of sound classes.)
	wherein adapting the second neural network to generate the trained second neural network comprises training the second neural network to detect the first set of sound classes and the second set of sound classes, and wherein the first output of the trained first neural network indicates a sound class assigned to particular audio data samples by the trained first neural network and the second output of the trained second neural network indicates a sound class assigned to the particular audio data samples by the trained second neural network.(Fukuda [¶0027] "the teacher neural networks and the student neural networks may receive audio data as the input data and output a classification of the audio data as the soft label output." [¶0028] "Input data 1 and input data 2 may be audio data of human speech sampled at a particular sampling frequency" [¶0048] "the student training section 150, at block 332, may select a teacher neural network based on the accuracy evaluated at block 312. In the embodiment, the student training section 150 may select, at block 332, a less accurate teacher neural network earlier than other teacher neural networks among the plurality of teacher neural networks in terms of iterations of block 320 to block 350. In a specific embodiment, the student training section 150 may select, at block 332, the most accurate teacher neural network among the plurality of teacher neural networks last (e.g., the last loop of block 332 to block 342)" [¶0061] "the student training section 150 may use a soft label output of the most accurate teacher neural network (e.g., Teacher NN3) among the plurality of teacher neural networks at the last iteration of block 320 to block 350 for each input data.").
	
	 Regarding claim 21, the combination of Moschitti and Fukuda teaches The method of claim 20, wherein the one or more coupling networks are configured to generate the third output that indicates a [sound] class assigned to the particular [audio] data samples by the one or more coupling networks based on the first output of the trained first neural network and the second output of the second neural network. (Moschitti [p. 4] "We design a neural adapter, shown in Fig. 2, to solve the problem of disagreement in annotations between the source and target data. This is essentially a component that helps to map the predictions from the output space of the source domain into that of the target domain [...] the neural adapter connects each output of the fully-connected layer in MS to the corresponding output of MT [...] obtained by pT t = at⊕pT t, where at = [−→ at⊕←− at] and ⊕is the element-wise summation" Moschitti explicitly merges adapter output with target model output using element-wise summation, which is an aggregation layer performing the required merge).
	
	 Regarding claim 22, the combination of Moschitti and Fukuda teaches The method of claim 14, further comprising: determining a first value indicating the accuracy of sound classes assigned by the trained first neural network to audio data samples of the first set of sound classes; determining a second value indicating the accuracy of the sound classes assigned by the trained second neural network to the audio data samples of the first set of sound classes,(Fukuda [¶0027] "the teacher neural networks and the student neural networks may receive audio data as the input data and output a classification of the audio data as the soft label output." [¶0028] "Input data 1 and input data 2 may be audio data of human speech sampled at a particular sampling frequency" [¶0048] "the student training section 150, at block 332, may select a teacher neural network based on the accuracy evaluated at block 312. In the embodiment, the student training section 150 may select, at block 332, a less accurate teacher neural network earlier than other teacher neural networks among the plurality of teacher neural networks in terms of iterations of block 320 to block 350. In a specific embodiment, the student training section 150 may select, at block 332, the most accurate teacher neural network among the plurality of teacher neural networks last (e.g., the last loop of block 332 to block 342)" [¶0061] "the student training section 150 may use a soft label output of the most accurate teacher neural network (e.g., Teacher NN3) among the plurality of teacher neural networks at the last iteration of block 320 to block 350 for each input data." Examiner notes that alternatively the loss calculated in Fukuda [¶0052] could also read on this limitation.)
	and determining whether to discard the trained first neural network is based on a comparison of the first value and the second value.(Fukuda [¶0056] "At block 350, the student training section 150 may determine whether to continue the training of the student neural network. The student training section 150, at block 350, may go back to block 320 to begin a new iteration starting at block 320 with selecting new teacher input data, unless the student training section 150 determines, at block 350, to end the training, in which point the student training section150 may end the training of the student neural network." ending training with teacher network interpreted as synonymous with deciding to discard the trained first neural network.).
	
	 Regarding claim 25, the combination of Moschitti and Fukuda teaches The method of claim 14, further comprising: training the second neural network and the one or more coupling networks, and (Moschitti [p. 4] "Training the target model In Algorithm 3, the new parameters are updated as a standard training cycle (we use a validation set and early stopping as in the source model training")
	wherein link weights of the trained first neural network are maintained and unchanged during training of the second neural network and the one or more coupling networks.(Fukuda See FIG. 3 310, teacher networks are fully trained (weights are learned and static after training) before training student network.).
	
	 Regarding claim 28, claim 28 is substantially similar to claim 1.  Therefore the rejection applied to claim 1 also applies to claim 28.
	
	 Regarding claim 29, the combination of Moschitti and Fukuda teaches The non-transitory device of claim 28, further comprising: the means for processing is configured to determine whether to discard the trained first neural network (Fukuda [¶0056] "At block 350, the student training section 150 may determine whether to continue the training of the student neural network. The student training section 150, at block 350, may go back to block 320 to begin a new iteration starting at block 320 with selecting new teacher input data, unless the student training section 150 determines, at block 350, to end the training, in which point the student training section150 may end the training of the student neural network." ending training with teacher network interpreted as synonymous with deciding to discard the trained first neural network.)
	based on a value of a metric,(Fukuda [¶0052] " the student training section 150 may determine θ such that L(θ) is minimized. L(θ) is defined as follows:

L(θ)=−Σi q ilog pi, where θ represents the student neural network including weights between nodes and trainable parameters of the student neural network, i represents an index of nodes in the output layer of the student neural network and the teacher neural networks, qi represents a value of i-th index" I interpreted as a determined value of a metric based on the comparison.)
	 the value of the metric indicative of the accuracy of the sound classes assigned by the second neural network to the audio data samples of the first set of sound classes as compared to the accuracy of the sound classes assigned by the first neural network to the audio data samples of the first set of sound classes.(Fukuda [¶0027] "the teacher neural networks and the student neural networks may receive audio data as the input data and output a classification of the audio data as the soft label output." [¶0028] "Input data 1 and input data 2 may be audio data of human speech sampled at a particular sampling frequency" [¶0048] "the student training section 150, at block 332, may select a teacher neural network based on the accuracy evaluated at block 312. In the embodiment, the student training section 150 may select, at block 332, a less accurate teacher neural network earlier than other teacher neural networks among the plurality of teacher neural networks in terms of iterations of block 320 to block 350. In a specific embodiment, the student training section 150 may select, at block 332, the most accurate teacher neural network among the plurality of teacher neural networks last (e.g., the last loop of block 332 to block 342)" [¶0061] "the student training section 150 may use a soft label output of the most accurate teacher neural network (e.g., Teacher NN3) among the plurality of teacher neural networks at the last iteration of block 320 to block 350 for each input data." Examiner notes that alternatively the loss calculated in Fukuda [¶0052] could also read on this limitation.).
	
	 Regarding claim 30, claim 30 is substantially similar to claim 1.  Therefore the rejection applied to claim 1 also applies to claim 30.
	
	 Regarding claim 34, the combination of Moschitti and Fukuda teaches The device of claim 2, wherein, to discard the trained first neural network, the one or more processors are configured to: archive the base trained first neural network.(Fukuda [¶0019] "The storing section 100 may be implemented by a volatile or non-volatile memory of the apparatus 10. In some embodiments, the storing section 100 may store training data, test data, teacher input data, student and/or teacher neural networks, parameters and other data related thereto.").
	
	 Regarding claim 35, the combination of Moschitti and Fukuda teaches The device of claim 2, wherein, to discard the trained first neural network, the one or more processors are configured to: move the trained first neural network to a memory location configured to store inactive or unused resources.(Fukuda [¶0019] "The storing section 100 may be implemented by a volatile or non-volatile memory of the apparatus 10. In some embodiments, the storing section 100 may store training data, test data, teacher input data, student and/or teacher neural networks, parameters and other data related thereto.").
	
	 Regarding claim 36, the combination of Moschitti and Fukuda teaches The device of claim 2, wherein, to discard the trained first neural network, the one or more processors are configured to: retain the trained first neural network in the memory but refrain from sound event classification using the trained first neural network.(Fukuda [¶0047] "the student training section 150 may, at block 332, select a teacher neural network in a predetermined order in each iteration of block 320 to block 350. For example, the student training section 150, at block 332, may select a teacher neural network in an ascending order (such as Teacher Neural Network (or TNN) 1 at a first loop of block 332 to block 342, TNN2 at a second loop, TNN3 at a third loop for each iteration of block 320 to block 350)." Fukuda teaches that the student training section selectively refrains from using particular teacher neural networks based on an order.).
	
	Claims 5, 15,  and 16 are rejected under U.S.C. §103 as being unpatentable over the combination of Moschitti and Fukuda and in further view of Sun (US20180357736A1).

	 Regarding claim 5, the combination of Moschitti and Fukuda teaches The device of claim 4, wherein the trained one or more coupling networks include a merger adapter including one or more aggregation layers configured to merge the third output from the neural adapter and the second output of the trained second neural network  and including an output layer to generate the third output.(Moschitti [p. 4] "We design a neural adapter, shown in Fig. 2, to solve the problem of disagreement in annotations between the source and target data. This is essentially a component that helps to map the predictions from the output space of the source domain into that of the target domain [...] the neural adapter connects each output of the fully-connected layer in MS to the corresponding output of MT [...] obtained by pT t = at⊕pT t, where at = [−→ at⊕←− at] and ⊕is the element-wise summation" Moschitti explicitly merges adapter output with target model output using element-wise summation, which is an aggregation layer performing the required merge).
	However, the combination of Moschitti and Fukuda doesn't explicitly teach and the one or more processors is an application specific integrated circuit (ASIC).

	Sun, in the same field of endeavor, teaches and the one or more processors is an application specific integrated circuit (ASIC)([¶0046] "he processing engine 112 may include […] an application-specific integrated circuit (ASIC)").

	Moschitti and Fukuda as well as Sun are directed towards neural network optimization methods.  Therefore, Moschitti and Fukuda as well as Sun are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Moschitti and Fukuda with the teachings of Sun by selectively discarding networks based on a comparison of classification accuracies using an ASIC.  While Moschitti and Fukuda explicitly teaches discarding a first coupled model based on the optimization (minimized error/maximized accuracy) and while it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select a model with improved accuracy or performance and discard alternative models with relatively lower performance, Sun explicitly reinforces this motivation for combination ([¶0108] " The processing engine 112 may determine the at least one second ETA model based on the ranking result. For example, the processing engine 112 may identify a first accuracy score (e.g., the highest accuracy score, the second highest accuracy score, the lowest accuracy score, etc.) of the accuracy scores based on the ranking result. The processing engine 112 may select, from the plurality of first ETA models, at least one first ETA model associated with the first accuracy score (e.g., the highest accuracy score, the second highest accuracy score, the lowest accuracy score, etc.) as the at least one second ETA model.").  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 15, the combination of Moschitti and Fukuda teaches The method of claim 14, further comprising initializing the second neural network based on the trained first neural network; (Fukuda [¶0054] "the student neural network may be smaller than the plurality of neural networks such as having a small number of nodes and/or layers than the plurality of teacher neural networks. For example, the student neural network may be a compact CNN that has 2 convolutional layers with 64 and 128 hidden nodes each, 2 fully connected layers with 768 hidden units per layer, and 256 hidden-unit bottleneck layer." See also FIG. 2)
	linking the first output of the trained first neural network and the second output of the second neural network to the one or more coupling networks;(Moschitti [p. 4] "We design a neural adapter, shown in Fig. 2, to solve the problem of disagreement in annotations between the source and target data. This is essentially a component that helps to map the predictions from the output space of the source domain into that of the target domain [...] the neural adapter connects each output of the fully-connected layer in MS to the corresponding output of MT [...] obtained by pT t = at⊕pT t, where at = [−→ at⊕←− at] and ⊕is the element-wise summation" Moschitti explicitly merges adapter output with target model output using element-wise summation, which is an aggregation layer performing the required merge)
	during training of the second neural network and the one or more coupling networks: providing the audio data samples of the first set of sound classes to the trained first neural network and the second neural network; (Fukuda See FIG. 2 input data 1 is fed into Teacher 210A and Student 250)
	assigning, by the second neural network, the sound classes to the audio data samples of the first set of sound classes;(Fukuda [¶0027] "FIG. 2 shows an exemplary framework, according to an embodiment of the present invention. In a specific embodiment, the teacher neural networks and the student neural networks may receive audio data as the input data and output a classification of the audio data as the soft label output.")
	and determining a value of a metric indicative of the accuracy of the sound classes assigned by the second neural network to the audio data samples of the first set of sound classes as compared to the accuracy of the sound classes assigned by the trained first neural network to the audio data samples of the first set of sound classes,(Fukuda [¶0052] " the student training section 150 may determine θ such that L(θ) is minimized. L(θ) is defined as follows: L(θ)=−Σi q ilog pi,
where θ represents the student neural network including weights between nodes and trainable parameters of the student neural network, i represents an index of nodes in the output layer of the student neural network and the teacher neural networks, qi represents a value of i-th index" I interpreted as a determined value of a metric based on the comparison.).
	However, the combination of Moschitti and Fukuda doesn't explicitly teach and determining whether to discard the trained first neural network is based on the value of the metric.

	Sun, in the same field of endeavor, teaches and determining whether to discard the trained first neural network is based on the value of the metric. ([¶0107] "the processing engine 112 may select, from the plurality of first ETA models, at least one second ETA model (also referred to as the “selected ETA model”) from the plurality of first models based on a selection mechanism. For example, the processing engine 112 may make the selection based on the accuracy scores associated with the first ETA models. More particularly, for example, the processing engine 112 may select one or more first ETA models associated with particular accuracy scores as the at least one second model. In some embodiments, the processing engine 112 may select one or more first ETA models associated with accuracy scores that are greater than a threshold as the at least one second model. In some embodiments, the processing engine 112 may rank the first ETA models based on the accuracy scores and select a certain number of first models based on the ranking (e.g., the top five first models, top 10% of the first ETA models) the at least one second model.").

	Moschitti and Fukuda as well as Sun are directed towards neural network optimization methods.  Therefore, Moschitti and Fukuda as well as Sun are reasonably pertinent analogous art.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Moschitti and Fukuda with the teachings of Sun by selectively discarding networks based on a comparison of classification accuracies using an ASIC.  While Moschitti and Fukuda explicitly teaches discarding a first coupled model based on the optimization (minimized error/maximized accuracy) and while it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select a model with improved accuracy or performance and discard alternative models with relatively lower performance, Sun explicitly reinforces this motivation for combination ([¶0108] " The processing engine 112 may determine the at least one second ETA model based on the ranking result. For example, the processing engine 112 may identify a first accuracy score (e.g., the highest accuracy score, the second highest accuracy score, the lowest accuracy score, etc.) of the accuracy scores based on the ranking result. The processing engine 112 may select, from the plurality of first ETA models, at least one first ETA model associated with the first accuracy score (e.g., the highest accuracy score, the second highest accuracy score, the lowest accuracy score, etc.) as the at least one second ETA model.").  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 16, the combination of Moschitti, Fukuda, and Sun teaches The method of claim 15, wherein the second neural network is initialized and linking is performed automatically based on detecting a trigger event.(Fukuda See FIG. 3 342 and 350 interpreted as trigger events.).
	
	Claims 8, 31,  and 33 are rejected under U.S.C. §103 as being unpatentable over the combination of Moschitti and Fukuda and Chen (US10990850B1).

	 Regarding claim 8, the combination of Moschitti and Fukuda teaches The device of claim 1, wherein, prior to initialization of the second neural network, designate the trained first neural network as the active sound event classifier (Fukuda [¶0039] "At block 312, a processing section, such as the processing section 140, may evaluate the plurality of teacher neural networks. The processing section 140 may evaluate an accuracy of each of the plurality of teacher neural networks using test data." See FIG. 3 312)
	and based on a result of the comparison designate the trained second neural network as the active sound event classifier (Fukuda [¶0027] "the teacher neural networks and the student neural networks may receive audio data as the input data and output a classification of the audio data as the soft label output." [¶0028] "Input data 1 and input data 2 may be audio data of human speech sampled at a particular sampling frequency" [¶0048] "the student training section 150, at block 332, may select a teacher neural network based on the accuracy evaluated at block 312. In the embodiment, the student training section 150 may select, at block 332, a less accurate teacher neural network earlier than other teacher neural networks among the plurality of teacher neural networks in terms of iterations of block 320 to block 350. In a specific embodiment, the student training section 150 may select, at block 332, the most accurate teacher neural network among the plurality of teacher neural networks last (e.g., the last loop of block 332 to block 342)" [¶0061] "the student training section 150 may use a soft label output of the most accurate teacher neural network (e.g., Teacher NN3) among the plurality of teacher neural networks at the last iteration of block 320 to block 350 for each input data.").
	However, the combination of Moschitti and Fukuda doesn't explicitly teach and discard the trained first neural network..

	Chen, in the same field of endeavor, teaches and discard the trained first neural network.([Col. 17 l. 54-Col. 18 l. 5] "The model training system 620 can modify the machine learning model accordingly. For example, the model training system 620 can cause the virtual machine instance 622 to optionally delete an existing ML training container [...] The model training system 620 can then instruct the virtual machine instance 622 to delete the ML training container 630 and/or to delete any model data stored in the training model data store 675.").

	The combination of Moschitti and Fukuda as well as Chen are directed towards neural network knowledge distillation.  Therefore, the combination of Moschitti and Fukuda as well as Chen are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Moschitti and Fukuda with the teachings of Chen by expanding the output layer of the parent network to generate a student network that can output more class labels.  Chen provides as additional motivation for combination ([p. 7 §4] "We found that using hint loss can help incremental model improve the value of all classes mAP and FmAP, and only loss a little new classes information in incremental learning of multi classes").  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 31, the combination of Moschitti and Fukuda teaches The device of claim 2.
	However, the combination of Moschitti and Fukuda doesn't explicitly teach wherein, to discard the trained first neural network, the one or more processors are configured to: delete the trained first neural network from the memory.

	Chen, in the same field of endeavor, teaches The device of claim 2, wherein, to discard the trained first neural network, the one or more processors are configured to: delete the trained first neural network from the memory.([Col. 17 l. 54-Col. 18 l. 5] "The model training system 620 can modify the machine learning model accordingly. For example, the model training system 620 can cause the virtual machine instance 622 to optionally delete an existing ML training container [...] The model training system 620 can then instruct the virtual machine instance 622 to delete the ML training container 630 and/or to delete any model data stored in the training model data store 675.").

	The combination of Moschitti and Fukuda as well as Chen are directed towards neural network knowledge distillation.  Therefore, the combination of Moschitti and Fukuda as well as Chen are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Moschitti and Fukuda with the teachings of Chen by expanding the output layer of the parent network to generate a student network that can output more class labels.  Chen provides as additional motivation for combination ([p. 7 §4] "We found that using hint loss can help incremental model improve the value of all classes mAP and FmAP, and only loss a little new classes information in incremental learning of multi classes").  This motivation for combination also applies to the remaining claims which depend on this combination.

	 Regarding claim 33, the combination of Moschitti and Fukuda teaches The device of claim 2.
	However, the combination of Moschitti and Fukuda doesn't explicitly teach, wherein, to discard the trained first neural network, the one or more processors are configured to: mark the trained first neural network for deletion from the memory.

	Chen, in the same field of endeavor, teaches to discard the trained first neural network, the one or more processors are configured to: mark the trained first neural network for deletion from the memory. ([Col. 17 l. 54-Col. 18 l. 5] "The model training system 620 can modify the machine learning model accordingly. For example, the model training system 620 can cause the virtual machine instance 622 to optionally delete an existing ML training container [...] The model training system 620 can then instruct the virtual machine instance 622 to delete the ML training container 630 and/or to delete any model data stored in the training model data store 675.").

	The combination of Moschitti and Fukuda as well as Chen are directed towards neural network knowledge distillation.  Therefore, the combination of Moschitti and Fukuda as well as Chen are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Moschitti and Fukuda with the teachings of Chen by expanding the output layer of the parent network to generate a student network that can output more class labels.  Chen provides as additional motivation for combination ([p. 7 §4] "We found that using hint loss can help incremental model improve the value of all classes mAP and FmAP, and only loss a little new classes information in incremental learning of multi classes").  This motivation for combination also applies to the remaining claims which depend on this combination.

	Claims 10, 11,  and 13 are rejected under U.S.C. §103 as being unpatentable over the combination of Moschitti and Fukuda and Lee (US20200035233A1).

	 Regarding claim 10, the combination of Moschitti and Fukuda teaches The device of claim 1.
	However, the combination of Moschitti and Fukuda doesn't explicitly teach, wherein the one or more processors are integrated within a mobile computing device.

	Lee, in the same field of endeavor, teaches The device of claim 1, wherein the one or more processors are integrated within a mobile computing device.([¶0142] "The at least one voice recognition device 10 may include a mobile phone").

	The combination of Moschitti and Fukuda as well as Lee are directed towards neural network transfer learning.  Therefore, the combination of Moschitti and Fukuda as well as Lee are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Moschitti and Fukuda with the teachings of Lee by using a mobile device, integrated circuit, or device in a vehicle.  Lee provides as additional motivation for combination ([¶0158] “The data learning unit 22 may further include a learning data preprocessor (not shown) and a learning data selector (not shown) to improve the analysis result of a recognition model or reduce resources or time for generating a recognition model.”).

	 Regarding claim 11, the combination of Moschitti and Fukuda teaches The device of claim 1.
	However, the combination of Moschitti and Fukuda doesn't explicitly teach, wherein the one or more processors are integrated within a vehicle.

	Lee, in the same field of endeavor, teaches The device of claim 1, wherein the one or more processors are integrated within a vehicle.([¶0164] "the AI device 20 may be implemented by being functionally embedded in an autonomous module included in a vehicle").

	The combination of Moschitti and Fukuda as well as Lee are directed towards neural network transfer learning.  Therefore, the combination of Moschitti and Fukuda as well as Lee are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Moschitti and Fukuda with the teachings of Lee by using a mobile device, integrated circuit, or device in a vehicle.  Lee provides as additional motivation for combination ([¶0158] “The data learning unit 22 may further include a learning data preprocessor (not shown) and a learning data selector (not shown) to improve the analysis result of a recognition model or reduce resources or time for generating a recognition model.”).

	 Regarding claim 13, the combination of Moschitti and Fukuda teaches The device of claim 1.
	However, the combination of Moschitti and Fukuda doesn't explicitly teach, wherein the one or more processors are included in an integrated circuit.

	Lee, in the same field of endeavor, teaches The device of claim 1, wherein the one or more processors are included in an integrated circuit.([¶0259] "The various modules shown in FIGS. 7 and 8 may be implemented in hardware, software instructions for execution by one or more processors, firmware, including one or more signal processing and/or application specific integrated circuits, or a combination thereof.").

	The combination of Moschitti and Fukuda as well as Lee are directed towards neural network transfer learning.  Therefore, the combination of Moschitti and Fukuda as well as Lee are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Moschitti and Fukuda with the teachings of Lee by using a mobile device, integrated circuit, or device in a vehicle.  Lee provides as additional motivation for combination ([¶0158] “The data learning unit 22 may further include a learning data preprocessor (not shown) and a learning data selector (not shown) to improve the analysis result of a recognition model or reduce resources or time for generating a recognition model.”).

	Claim 12 is rejected under U.S.C. §103 as being unpatentable over the combination of Moschitti and Fukuda and Sinha (US 20210192357 A1).

	 Regarding claim 12, the combination of Moschitti and Fukuda teaches The device of claim 1.
	However, the combination of Moschitti and Fukuda doesn't explicitly teach, wherein the one or more processors are integrated within one or more of an augmented reality headset, a mixed reality headset, a virtual reality headset, or a wearable device.

	Sinha, in the same field of endeavor, teaches the one or more processors are integrated within one or more of an augmented reality headset, a mixed reality headset, a virtual reality headset, or a wearable device.([¶0007] "Gradient adversarial training techniques can be used to train multitask networks, knowledge distillation networks, adversarial defense networks, or any other type of neural network. Gradient adversarial training techniques can be used to train neural networks for computer vision tasks and such training may be advantageous for augmented, mixed, or virtual reality systems.").

	The combination of Moschitti and Fukuda as well as Sinha are directed towards knowledge distillation.  Therefore, the combination of Moschitti and Fukuda as well as Sinha are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Moschitti and Fukuda with the teachings of Sinha by using knowledge distillation to generate a model that can run on an augmented or virtual reality device .  Sinha provides as additional motivation for combination ([¶0007] “such training may be advantageous for augmented, mixed, or virtual reality systems”).

	Claim 17 is rejected under U.S.C. §103 as being unpatentable over the combination of Moschitti, Fukuda, Sun, and Draelos (US20170177993A1).

	 Regarding claim 17, the combination of Moschitti, Fukuda, and Sun teaches The method of claim 16.
	However, the combination of Moschitti, Fukuda, and Sun doesn't explicitly teach, wherein the trigger event is based on encountering a threshold quantity of unrecognized sound classes, is specified by a user setting, or a combination thereof.

	Draelos, in the same field of endeavor, teaches the trigger event is based on encountering a threshold quantity of unrecognized sound classes, is specified by a user setting, or a combination thereof. ([¶0146] "After again calculating the reconstruction error on samples from the new class, additional nodes are added until either the reconstruction error for all samples falls below the threshold or a user-specified maximum number of new nodes is reached for the current layer. Once neurogenic deep learning is complete for the first layer, the same process repeats for each of the succeeding layers of the encoding network using outputs from the previous layer").

	The combination of Moschitti, Fukuda, and Sun as well as Draelos are directed towards neural network transfer learning.  Therefore, the combination of Moschitti, Fukuda, and Sun as well as Draelos are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Moschitti, Fukuda, and Sun with the teachings of Draelos by using a user specified condition for triggering neural network linking.  Draelos provides as additional motivation for combination ([¶0169] "The speed of training on new kinds of data may be increased when comparing an ability to perform incremental learning versus full network learning. Further, with the neural network being adaptable to new data, the neural network may be selected, trained with and sized correctly for the current problem at hand. The technical effect is particularly useful for embedded system applications.").

	Claim 32 is rejected under U.S.C. §103 as being unpatentable over the combination of Moschitti and Fukuda and Nguyen (US 10496884 B1).

	 Regarding claim 32, the combination of Moschitti and Fukuda teaches The device of claim 2.
	However, the combination of Moschitti and Fukuda doesn't explicitly teach, wherein, to discard the trained first neural network, the one or more processors are configured to: reallocate a portion of the memory allocated to the trained first neural network.

	Nguyen, in the same field of endeavor, teaches to discard the trained first neural network, the one or more processors are configured to: reallocate a portion of the memory allocated to the trained first neural network. ([Col. 25 l. 15-20] "the system may detect that the new study comprises 70 slides because the laboratory technician took 70 images. Therefore, the batch size is changed to 70 for this specific case. The memory required to handle the new batch size is calculated, and the network is rebound to reallocate the new memory requirements").

	The combination of Moschitti and Fukuda as well as Nguyen are directed towards neural network knowledge distillation.  Therefore, the combination of Moschitti and Fukuda as well as Nguyen are analogous art in the same field of endeavor.  It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Moschitti and Fukuda with the teachings of Nguyen by reallocating a portion of memory allocated to a neural network.  Nguyen provides a particular use case as additional motivation for combination ([Col. 25 l. 15-20] "the system may detect that the new study comprises 70 slides because the laboratory technician took 70 images. Therefore, the batch size is changed to 70 for this specific case. The memory required to handle the new batch size is calculated, and the network is rebound to reallocate the new memory requirements").

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Nov 24, 2020
Application Filed
Feb 29, 2024
Non-Final Rejection — §103
Apr 30, 2024
Examiner Interview Summary
Apr 30, 2024
Examiner Interview (Telephonic)
Jun 11, 2024
Response Filed
Jul 20, 2024
Final Rejection — §103
Sep 03, 2024
Examiner Interview Summary
Sep 03, 2024
Applicant Interview (Telephonic)
Oct 08, 2024
Request for Continued Examination
Oct 16, 2024
Response after Non-Final Action
Dec 03, 2024
Non-Final Rejection — §103
Feb 13, 2025
Applicant Interview (Telephonic)
Feb 13, 2025
Examiner Interview Summary
Mar 13, 2025
Response Filed
Mar 13, 2025
Response after Non-Final Action
Apr 14, 2025
Final Rejection — §103
Jun 16, 2025
Applicant Interview (Telephonic)
Jun 16, 2025
Examiner Interview Summary
Jun 17, 2025
Response after Non-Final Action
Jul 17, 2025
Request for Continued Examination
Jul 22, 2025
Response after Non-Final Action
Aug 25, 2025
Non-Final Rejection — §103
Oct 23, 2025
Applicant Interview (Telephonic)
Oct 23, 2025
Examiner Interview Summary
Nov 19, 2025
Response Filed
Jan 21, 2026
Final Rejection — §103
Mar 26, 2026
Request for Continued Examination
Mar 30, 2026
Response after Non-Final Action
Apr 11, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/373,021
Patent 12561604
SYSTEM AND METHOD FOR ITERATIVE DATA CLUSTERING USING MACHINE LEARNING
2y 5m to grant Granted Feb 24, 2026
18/486,534
Patent 12547878
Highly Efficient Convolutional Neural Networks
2y 5m to grant Granted Feb 10, 2026
16/902,547
Patent 12536426
Smooth Continuous Piecewise Constructed Activation Functions
2y 5m to grant Granted Jan 27, 2026
18/607,777
Patent 12518143
FEEDFORWARD GENERATIVE NEURAL NETWORKS
2y 5m to grant Granted Jan 06, 2026
16/940,293
Patent 12505340
STASH BALANCING IN MODEL PARALLELISM
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
52%
Grant Probability
90%
With Interview (+38.2%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.
TRANSFER LEARNING FOR SOUND EVENT CLASSIFICATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email