Last updated: July 17, 2026
Application No. 17/563,812
TRAINING A MACHINE LEARNING-BASED MODEL FOR ACTION RECOGNITION

Non-Final OA §103
Filed
Dec 28, 2021
Priority
Mar 02, 2021 — SE 2150238-0
Examiner
HUANG, YAO D
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Sony Group Corporation
OA Round
3 (Non-Final)
This examiner grants 63% of cases after interview

— +33.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 130 resolved cases, 2023–2026
Examiner Intelligence

HUANG, YAO D View full profile →
Grants 63% of resolved cases
Career Allowance Rate
82 granted / 130 resolved
+8.1% vs TC avg
Strong +33% interview lift
Without
With
+33.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
14 currently pending
Career history
149
Total Applications
across all art units
Statute-Specific Performance

§101
2.5%
-37.5% vs TC avg
§103
93.3%
+53.3% vs TC avg
§102
2.2%
-37.8% vs TC avg
§112
2.0%
-38.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 130 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on June 2, 2026, has been entered.

Remarks
This Office Action is in response to applicant’s after-final amendment field on June 2, 2026, and RCE filed on June 16, 2026, under which claims 1-20 are pending and under consideration.

Response to Arguments
Applicant’s arguments have overcome the previous prior art rejections under § 103. However, upon further consideration, new grounds of rejection have been made under § 103 based on a new reference Motiian. 
Applicant’s arguments regarding the previous prior art rejection are moot under the new grounds of rejection because they rely on new claim limitations that are addressed by the new reference (Motiian) or address parts of the previous rejections that have been updated in the current action.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 1-12, 16-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zunino et al., “Predicting Intentions from Motion: The Subject-Adversarial Adaptation Approach,” International Journal of Computer Vision (2020) 128:220–239, published 18 September 2019 (“Zunino”) in view of Ganin et al., “Domain-Adversarial Training of Neural Networks,” arXiv:1505.07818v4 [stat.ML] 26 May 2016, Journal of Machine Learning Research 17 (2016) 1-35 (“Ganin”) and Motiian et al., “Few-Shot Adversarial Domain Adaptation,” arXiv:1711.02536v1 [cs.CV] 5 Nov 2017 (“Motiian”).
As to claim 1, Zunino teaches a method of training a first machine learning-based model (MLM) for action recognition, [Abstract: “This paper aims at investigating the action prediction problem from a pure kinematic perspective. Specifically, we address the problem of recognizing future actions, indeed human intentions, underlying a same initial (and apparently unrelated) motor act.” § 1, last paragraph: “we propose an original approach derived from the domain adaptation research which considers each subject as a domain and adopt a novel subject-adversarial training pipeline to generalize better among the subjects.”] said method comprising: 
a) obtaining training data comprising time sequences of data samples, wherein the time sequences of data samples represent predefined subjects which are performing predefined actions; [§ 4.1, paragraph 4: “We consider each grasping a bottle movement in our dataset D as a triplet [x, s, y], where x is an arbitrary feature vector encoding it, s is the subject’s label, and y is the intention’s label (see Fig. 4).” That is, y is a labeled (predefined) action, and s is a labeled (predefined) subject performing that action. The dataset is specifically a set of time sequences. See § 3, paragraphs 1-2: “The dataset was designed as follows. Seventeen naïve volunteers were seated …and participants were asked to grasp it in order to perform one of the following 4 different intentions… After a training session, the final dataset is composed by 253 trials of pouring, 262 of passing, 300 of drinking and 283 of placing - 1098 in total. For each, both video and 3D data have been collected.” The videos are “time sequences” because they are 3D kinematic data with time markers (§ 3.1, left column, last full paragraph: “the acquisition of each trial is automatically ruled by a thresholding of the wrist velocity v(t) at time t, acquired by the corresponding marker. Being ε = 20 mm/s, at the first instant t0 when v(t0) > ε, the acquisition starts and it is stopped at time tf, when the wrist velocity v(tf) < ε”) or are “video sequences” with frames (see next paragraph).”] 
b) determining to […] perform one of the following steps: [The instant claim recites either i) or ii) in the alternative. While only one step is required to meet the limitations of the claims, Zunino teaches training both models. Thus, both steps are mapped below. The Examiner notes that the instant claim language of “perform one of the following” is open-ended and does not have the close-ended meaning of “perform only one of the following and not the other” in the manner of a negative limitation. That is, the current claim language does not require a negative limitation of determining not to perform the other step, or a limitation where the other step is not actually performed. Furthermore, the act of “determining” is met because performing a process in a computing context implies determining to perform that process.] 
i) training […] the first MLM based on the training data, to discriminate between the predefined actions, [As shown in FIG. 4, the model parts labeled “Feature Representation” and “Intention Prediction” correspond to a first MLM. As shown in this figure, this part of the model predicts (discriminates between) various intention (predefined actions as defined in § 3, paragraph 1), i.e., pouring, passing, drinking, placing. This model is trained, as stated in § 4.1, paragraph 5, text below equation (2): “Precisely, Eq. (1) promotes an accurate prediction of intentions: the loss function ℓi is minimized as to penalize discrepancies between the actual intention label y and the high-level embedding g which is trained to be discriminative for the sake of the intention prediction task.” Note that the function g represents the first model.], […] a second MLM […] [The limitation of the second model is taught as explained below.] […] or 
ii) training […] the second MLM based on feature data that is extracted by the first MLM for the training data, to discriminate between the predefined subjects; [As shown in FIG. 4, the model parts labeled “Subject Confusion” correspond to a second MLM. As shown in this figure, this part of the model predicts (discriminates between) various subject identifies. This model is trained, as stated in § 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s.” Note that function h represents the second model.] […]
wherein the training of the first MLM is performed to be adversarial to the discrimination between the predefined subjects by the second MLM, [§ 4.1, paragraph 5, text below equation (2): “This second loss function is minimized with respect to the weights Wf which defines the feature encoding f, being at the same time maximized at the classifier level—that is, the weights Ws. The whole idea is to deploy an adversarial game in which we want to train at our best an effective feature encoding f which is effective in predicting intentions, without suffering of the retrieved subjects’ related biases.” That is, Wf is trained in a manner adversarial to Ws with respect to the second loss function.]
wherein the training of the first MLM comprises: determining parameter values of the first MLM that minimizes a first loss function that represents a difference between action data generated by the first MLM and action reference data, which is predefined and associated with the training data, [§ 4.1, paragraph 5, text below equation (2): “Precisely, Eq. (1) promotes an accurate prediction of intentions: the loss function ℓi is minimized as to penalize discrepancies between the actual intention label y and the high-level embedding g which is trained to be discriminative for the sake of the intention prediction task.” As shown in equation (1), the weights Wf and Wi of the first model is trained to minimize the loss function ℓi of equation (1), which correspond to a “first loss function” of the instant claim and which defines a loss between the predefined actions y and the action data g(f(x|Wf), Wi) that is generated by the first model represented by the function g.] and that minimizes a second loss function that represents how much subject-related information is contained in the feature data. [§ 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s. This second loss function is minimized with respect to the weights Wf which defines the feature encoding f,… The whole idea is to deploy an adversarial game in which we want to train at our best an effective feature encoding f which is effective in predicting intentions, without suffering of the retrieved subjects’ related biases… We try to get rid of the subjects’ biases by achieving a random chance classifier for subject identities:” That is, ℓs corresponds to a “second loss function” of the instant claim. This loss function represents how much subject-related information is contained in the feature data because the minimization of it minimizes subject-related bias, which corresponds to “subject-related information.” This is also described in § 4.1, paragraph 5: “We look for a feature representation f(x|Wf ), depending on some parameters Wf , which is trained to be intention-discriminative and subject-invariant,” where the invariance refers to removal of subject-related biases.]
Zunino does not explicitly teach: 
(1)	The limitation of “all parameter values of a second MLM based on feature data that is extracted by the first MLM for the training data” [Examiner’s Note: In general, the parameter values of the subject confusion head in Zunino are trained based on the output f of the feature representation block, since it receives those output f as input (as well as on the intention prediction block, since each part of the model affects the other). However, the Examiner interprets this limitation as also implying that multiple repetitions of the training process are required, in order for the parameter of the second model to be “based on” the first model before the first model is trained in a particular instance of step i). Therefore, on this basis, Zunino does not teach the repetition aspect that is implied in conjunction with the other limitations of the claim, even though the “based on” part when taken alone is taught] 
(2)	the process of determining to perform i) or ii) “exclusively” so as to have an order of operation in which i) is performed by training “only” the first MLM wherein all parameter values of the second MLM “are fixed”, ii) is performed by training “only” the second MLM “wherein all parameter values of the first MLM are fixed”, together with a further step of “c) performing the step determined in b) and subsequently performing the other step between i) and ii).”
Ganin teaches the limitation (1) listed above. In general, Ganin teaches the general technique of domain-adversarial machine learning, and the disclosure of Zunino is an extension of Ganin to multiple domains (see Zunino, § 4.2, paragraph 4: “We accommodate the publicly available code3 of Ganin et al. (2016) to deal with a different number of subjects to perform adaptation. Indeed, Ganin et al. (2016) considers a simplified setting of one target domain only, whereas, differently, we consider multiple domains.”). Therefore, the techniques of Ganin are generally relevant to those of Zunino.
In particular, Ganin teaches “all parameter values of a second MLM based on feature data that is extracted by the first MLM for the training data.” [In general, FIG. 1 (page 12) teaches an analogous first model in the form of feature extractor Gf and label predictor Gy, and an analogous second model in the form of domain classifier Gd, whose parameters θf, θy, θd are trained in accordance with the update process shown in equations 13-15, which itself is an extension of Algorithm 1, on page 10. Page 10, Algorithm 1, teaches the iterative (repeating) process in lines 5-6 (“while stopping criterion is not met, do” and “for i from 1 to n do”). In each iteration, both models are trained, as shown in lines 33-36 of Algorithm 1, where the weights and biases of the two models are updated sequentially. Therefore, in any arbitrary repetition, “the other step” (i.e., the process training another one of the two models) is performed. This also means that in any current iteration, the parameter values of the second model (Gd) are based on output features extracted by the first model (Gf) since they were previously trained based on those output features.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin by implementing the training process of Ganin in the process of Zunino, so as to arrive at the limitation of “all parameter values of a second MLM based on feature data that is extracted by the first MLM for the training data” of the claimed invention. One of ordinary skill in the art would have been motivated to do so in order to implement training in a manner that improves performance of the model, since Zunino’s method is based on Ganin (see parts of Zunino, § 4.2, paragraph 4 cited above) and Ganin teaches that domain-adversarial training should be iterated over a sufficient number of times in order to improve performance (see Ganin, § 5.3.3, paragraph 2: “After the sufficient number of iterations, domain-adversarial training consistently improves the performance of re-identification…”) 
Motiian teaches the process of determining to perform i) or ii) “exclusively” so as to have an order of operation in which i) is performed by training “only” the first MLM wherein all parameter values of the second MLM “are fixed”, ii) is performed by training “only” the second MLM “wherein all parameter values of the first MLM are fixed”, together with a further step of “c) performing the step determined in b) and subsequently performing the other step between i) and ii).” [FIG. 2 caption: “In the first step, we initialized g and h using the source samples Ds. (b) We freeze g and train a DCD. The picture shows a pair from the second group G2 when the samples come from two different distributions but the same class label. (c) We freeze the DCD and update g and h.” See also last paragraph of § 3.1: “The next step is training DCD using the four groups of pairs. This should be done by freezing g. In the next step, the inference function g and prediction function h should be updated in order to confuse DCD and maintain high classification accuracy. This should be done by freezing DCD. See Algorithm1 and Figure 2.” That is, the network g/h network (combined i.e., the feature extractor g and the predictor model h, as described in § 3) and the DCD are analogous to the first and second MLM, respectively, and one is trained while the other is fixed. As shown in Algorithm 1, this process repeats, such as that lines 6 and 7 are analogous to repeating a sequence of steps i) and ii). Note that while the text does not explicitly refer to freezing h, it is understood from equation (3), Algorithm 1, and FIG. 2 that when the DCD is updated (trained), h is fixed in that it is not updated, since h is not involved in the update of DCD at all. Specifically, equation (3) does not compute h, indicating that its weight is not changed.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of the references combined thus far with the teachings of Motiian by implementing the technique of Motiian in which different parts of the model are trained in separate steps in which the other model is fixed. Doing so would have enabled few-shot learning that increases a model’s performance at a very high rate with respect to the inclusion of additional samples (see Motiian, § 1, paragraph 5: “In this few-shot learning regime, our SDA method has proven capable of increasing a model’s performance at a very high rate with respect to the inclusion of additional samples. Indeed, even one additional sample can significantly increase performance.”).

As to claim 2, the combination of Zunino, Ganin, and Motiian teaches the method of claim 1, as set forth above. 
Motiian further teaches “wherein all parameter values of the second MLM are fixed during the training of the first MLM” [FIG. 2 caption: “In the first step, we initialized g and h using the source samples Ds. (b) We freeze g and train a DCD. The picture shows a pair from the second group G2 when the samples come from two different distributions but the same class label. (c) We freeze the DCD and update g and h.” See also last paragraph of § 3.1: “The next step is training DCD using the four groups of pairs. This should be done by freezing g. In the next step, the inference function g and prediction function h should be updated in order to confuse DCD and maintain high classification accuracy. This should be done by freezing DCD. See Algorithm1 and Figure 2.” That is, the network g/h network (combined i.e., the feature extractor g and the predictor model h, as described in § 3) and the DCD are analogous to the first and second MLM, respectively, and one is trained while the other is fixed.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far, including the teachings of Motiian discussed above, to have also arrived at the limitations of the instant dependent claim. The motivation for doing so is covered by the motivation given for the teachings of Motiian in the rejection of the parent claim.

As to claim 3, the combination of Zunino, Ganin, and Motiian teaches the method of claim 2, as set forth above.
Motiian further teaches “wherein all parameter values of the first MLM are fixed during the training of the second MLM.” [FIG. 2 caption: “In the first step, we initialized g and h using the source samples Ds. (b) We freeze g and train a DCD. The picture shows a pair from the second group G2 when the samples come from two different distributions but the same class label. (c) We freeze the DCD and update g and h.” See also last paragraph of § 3.1: “The next step is training DCD using the four groups of pairs. This should be done by freezing g. In the next step, the inference function g and prediction function h should be updated in order to confuse DCD and maintain high classification accuracy. This should be done by freezing DCD. See Algorithm1 and Figure 2.” That is, the network g/h network (combined i.e., the feature extractor g and the predictor model h, as described in § 3) and the DCD are analogous to the first and second MLM, respectively, and one is trained while the other is fixed. As shown in Algorithm 1, this process repeats, such as that lines 6 and 7 are analogous to repeating a sequence of steps i) and ii). Note that while the text does not explicitly refer to freezing h, it is understood from equation (3), Algorithm 1, and FIG. 2 that when the DCD is updated (trained), h is fixed in that it is not updated, since h is not involved in the update of DCD at all. Specifically, equation (3) does not compute h, indicating that its weight is not changed.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far, including the teachings of Motiian discussed above, to have also arrived at the limitations of the instant dependent claim. The motivation for doing so is covered by the motivation given for the teachings of Motiian in the rejection of the parent claim.

As to claim 4, the combination of Zunino, Ganin, and Motiian teaches the method of claim 2, wherein the second loss function represents a difference between subject identity data generated by the second MLM and target data, which is predefined and associated with the training data. [Zunino, § 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s.” Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.” As stated in equation (2), ℓs is a function of the subject identity data h and the predefined target data s. The fact that this function is a cross-entropy loss means that it represents a difference between the two.]

As to claim 5, the combination of Zunino, Ganin, and Motiian teaches the method of claim 4, wherein the training of the second MLM comprises determining parameter values of the second MLM that minimizes a third loss function that represents a difference between the subject identity data generated by the second MLM and further target data, which is predefined and associated with the training data. [Zunino, § 4.2, paragraph 4: “…Then, we used the gradient reversal layer (Ganin et al. 2016) to change the sign of the derivative of the subject loss s with respect to Wf (after a re-scaling by a parameter λ).” As shown in FIG. 4, the subject loss function ℓs (i.e., second loss function, which “represents a difference between the subject identity data generated by the second MLM and further target data”) is transformed into a negative in the operation of computing the gradient reversal operation denoted by -λ∂ℓs/∂dWf. Here, the factor -ℓs in this term corresponds to a third loss function that is being minimized during the model training (note that the maximization of ℓs in equation (2) also refers to the minimization of -ℓs with the λ coefficient).]

As to claim 6, the combination of Zunino, Ganin, and Motiian teaches the method of claim 5, wherein the second loss function is a negation of the third loss function. [As discussed in the rejection of the parent dependent claim, in Zunino, ℓs and -ℓs correspond to the second and third loss functions. Mathematically, they are negations of one another.]

As to claim 7, the combination of Zunino, Ganin, and Motiian teaches the method of claim 4, wherein the training of the first MLM results in a probability distribution over the predefined subjects, and wherein the target data comprises a reference probability distribution that represents fractional occurrences of the predefined subjects in the training data, and wherein the second loss function operates on the probability distribution and the reference probability distribution. [Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.” As stated in equation (2), ℓs is a function of the subject identity data h and the predefined target data s. The fact that this function is a cross-entropy loss means that it represents a difference between the two both in the form of respective probability distributions, since a cross-entropy loss is by definition a difference between two probability distributions.]

As to claim 8, the combination of Zunino, Ganin, and Motiian teaches the method of claim 4, wherein the training of the first MLM, for a time sequence associated with a predefined action, results in a probability distribution over the predefined subjects, wherein the target data comprises a reference probability distribution that represents fractional occurrences of the predefined subjects in the training data for each predefined action, wherein the second loss function operates on a difference between the probability distribution and a corresponding reference probability distribution, wherein the corresponding reference probability distribution is associated with the predefined action. [Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.” As stated in equation (2), ℓs is a function of the subject identity data h and the predefined target data s. The fact that this function is a cross-entropy loss means that it represents a difference between the two both in the form of respective probability distributions, since a cross-entropy loss is by definition a difference between two probability distributions.]

As to claim 9, the combination of Zunino, Ganin, and Motiian teaches the method of claim 8, wherein the second loss function aggregates, for the time sequences, differences between the probability distribution generated for each time sequence and the corresponding reference probability distribution. [In Zunino, equation (2), the subscript [x, s, y]∈D indicates that the loss is aggregated across each time sequence x and corresponding subject labels s (hence, also the probability distribution given the context of cross-entropy loss) in the dataset D]

As to claim 10, the combination of Zunino, Ganin, and Motiian teaches the method of claim 4, wherein the subject identity data comprises a second probability value for at least one of the predefined subjects, and wherein the second loss function operates on the second probability value. [Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.” As stated in equation (2), ℓs is a function of the subject identity data h and the predefined target data s. The fact that this function is a cross-entropy loss means that it represents a difference between the two both in the form of respective probability distributions, since a cross-entropy loss is by definition a difference between two probability distributions.]

As to claim 11, the combination of Zunino, Ganin, and Motiian teaches the method of claim 4, further comprising: […]
training the first MLM on at least part of the training data, to discriminate between the predefined actions, […] [Zunino, § 4.1, paragraph 5, text below equation (2): “Precisely, Eq. (1) promotes an accurate prediction of intentions: the loss function ℓi is minimized as to penalize discrepancies between the actual intention label y and the high-level embedding g which is trained to be discriminative for the sake of the intention prediction task.” Zunino, § 4.2, paragraph 4 states: “The optimization of (1) and (2) is carried out by using a joint back-propagation[.] In particular, we compute the updates on the parameters Ws and Wi separately on the two branches. Then, we used the gradient reversal layer (Ganin et al. 2016) to change the sign of the derivative of the subject loss s with respect to Wf (after a re-scaling by a parameter λ). The derivative of ℓi with respect to Wf is instead back-propagated with the correct sign (see Fig. 4).”]
training the second MLM based on feature data extracted by the first MLM for said at least part of the training data, to discriminate between the predefined subjects; [Zunino, § 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s.” Note that function h represents the second model.] and 
evaluating the subject identity data and/or the action data generated by the training of the first MLM and the second MLM. [This limitation is met by the definitions of the loss functions in Zunino as discussed above. That is, in equations (1) and (2) of Zunino, g and h correspond to action data and the subject identity data, which are evaluated in the loss function. See Zunino, Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.”].
Ganin further teaches “obtaining deployment data comprising additional time sequences of data samples, wherein the additional time sequences represent additional predefined subjects performing non-categorized actions, and wherein the additional predefined subjects are included among the predefined subjects;” [Abstract: “The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary).” That is, unlabeled data correspond to non-categorized actions in the context of based reference Zunino. § 3, paragraph 1: “Moreover, we have two different distributions over X × Y, called the source domain DS and the target domain DT. An unsupervised domain adaptation learning algorithm is then provided with a labeled source sample S drawn i.i.d. from DS, and an unlabeled target sample T drawn i.i.d. from DXT.” Note that the concept of “target domain” is analogous to additional predefined subjects when applied to the context of Zunino. The unlabeled data is regarded as “deployment data” because the model is being used (deployed) for domain adaptation. See also § 5.1.1, paragraph 1 for additional examples.] “including the deployment data in the training data;” [As shown in Algorithm 1 on page 10, S and T are included in the training process, thus collectively constituting training data.] “while excluding from the first loss function the action data that is generated by the first MLM for the additional time sequences.” [See § 4.1, text above and below equation (5): “Training the neural network then leads to the following optimization problem on the source domain…” As shown, the loss function Ly, which is analogous to the “first loss function” of Zunino and the instant claim, is based on the samples of index n, which corresponds to t the samples S. That is, the samples T, which have index n’ are excluded from the loss function Ly. See also Algorithm 1, which teaches that the samples T with index n’ are handled separately in lines 24-31, after the samples S are used in lines 7-9 and 11-14, where line 11 corresponds to the use of the loss function based on samples S.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin by implementing the techniques of Zunino discussed above, including the use of unlabeled samples and the associated training process, so as to arrive at the limitations of the instant dependent claim. One of ordinary skill in the art would have been motivated to do so in order to “learn a mapping between domains in the situation when the target domain data are either fully unlabeled (unsupervised domain annotation) or have few labeled samples (semi-supervised domain adaptation)” (Ganin, § 1, paragraph 2). 

As to claim 12, the combination of Zunino, Ganin, and Motiian teaches the method of claim 11, as set forth above.
Ganin further teaches “wherein said evaluating comprises: 
determining, based on the subject identity data generated by the second MLM, at least one selected subject among the additional predefined subjects;” [Algorithm 1, line 26, where Gd is the determination of the domain (subject) from xj, which is one among the n’-index samples T represented by line 24. See also equation (7) and the text above it.] and 
“indicating at least one of the additional time sequences that is performed by said at least one selected subject as a candidate to be categorized by action.” [This limitation is met by the determined association between xj (analogous to a time sequence), and Gd (analogous to the selected subject), xj is being categorized as Gd by the features of xj (the action).]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin to have also arrived at the limitations of the instant dependent claim. The motivation for doing so is covered by the motivation given for the teachings of Ganin in the rejection of the parent dependent claim.

As to claim 16, the combination of Zunino, Ganin, and Motiian teaches the method of claim 1, wherein the first MLM comprises a [… (processing layer)] and an action classification layer, which is directly or indirectly connected to the [… (processing layer)], wherein said feature data represents output data of at least one of the processing layers. [§ 4.2, paragraph 4: A multi-layer perceptron (MLP) network with one hidden layer of dimension 200 was designed as the shared feature representation f(x|Wf ). For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi.” That is, the hidden layer constitutes a processing layer, while the softmax constitutes an action classification layer.] 
Ganin further teaches a “sequence of processing layers” [In general, Ganin teaches the general technique of domain-adversarial machine learning, of which the Zunino is an extension. In particular, a sequence of processing layers is shown in FIG. 1. See FIG. 1 caption: “The proposed architecture includes a deep feature extractor (green).” Specifically, § 5.2.2 teaches: “In general, we compose feature extractor from two or three convolutional layers, picking their exact configurations from previous works. More precisely, four different architectures were used in our experiments. The first three are shown in Figure 4. For the Office domains, we use pre-trained AlexNet from the Caffe-package (Jia et al., 2014).”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of Zunino with the teachings of Ganin by implementing the first model to include a sequence of processing layers, including a plurality of convolutional layers, as taught in Ganin, so as to arrive at the limitations of the instant claim. The motivation for doing so would have been to implement a neural network architecture, namely a convolutional neural network, that is known in the art as being suitable for performing image classification (see Ganin § 1, second-to-last paragraph: “We further evaluate the approach extensively for an image classification task, and present results on traditional deep learning image data sets”).

As to claim 17, the combination of Zunino, Ganin, and Motiian teaches the method of claim 16, as set forth above.
Ganin further teaches “wherein one or more of the processing layers is a convolutional layer.” [§ 5.2.2 teaches: “In general, we compose feature extractor from two or three convolutional layers, picking their exact configurations from previous works. More precisely, four different architectures were used in our experiments. The first three are shown in Figure 4. For the Office domains, we use pre-trained AlexNet from the Caffe-package (Jia et al., 2014).”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin to have also arrived at the limitations of the instant dependent claim. The motivations for combining the teachings of the references as set forth in the rejection of the parent independent claim and parent dependent claims also cover the motivation for doing so to arrive at the instant dependent claim.

As to claim 19, the combination of Zunino, Ganin, and Motiian teaches the method of claim 16, as set forth above.
Ganin further teaches “wherein the second MLM is trained based on the output data of two or more processing layers in the sequence of processing layers.” [The entire model is trained based on all layers of the model, since the losses Ld and Ly in Ganin (analogous to the losses ℓs and ℓi in Zunino) are based on the features extracted by the feature extractor). In other words, this limitation flows as a consequence of the feature extractor being a sequence of processing layers. Furthermore, the limitation of “two or more” is disclosed by the fact that Ganin teaches multiple convolutional layers, as discussed above.] 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin to have also arrived at the limitations of the instant dependent claim. The motivations for combining the teachings of the references as set forth in the rejection of the parent independent claim and parent dependent claims also cover the motivation for doing so to arrive at the instant dependent claim.

2.	Claims 13-15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zunino in view of Ganin and Motiian, and further in view of Pei et al., “Multi-Adversarial Domain Adaptation,” The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018 (“Pei”).
As to claim 13, the combination of Zunino, Ganin, and Motiian teaches the method of claim 1, further comprising: […]
training the first MLM based on at least part of the training data, to discriminate between the predefined actions, […]; [Zunino, § 4.1, paragraph 5, text below equation (2): “Precisely, Eq. (1) promotes an accurate prediction of intentions: the loss function ℓi is minimized as to penalize discrepancies between the actual intention label y and the high-level embedding g which is trained to be discriminative for the sake of the intention prediction task.” Zunino, § 4.2, paragraph 4 states: “The optimization of (1) and (2) is carried out by using a joint back-propagation[.] In particular, we compute the updates on the parameters Ws and Wi separately on the two branches. Then, we used the gradient reversal layer (Ganin et al. 2016) to change the sign of the derivative of the subject loss s with respect to Wf (after a re-scaling by a parameter λ). The derivative of ℓi with respect to Wf is instead back-propagated with the correct sign (see Fig. 4).”]
training a […] MLM based on feature data extracted by the first MLM for said at least part of the training data, to determine if the feature data originates from the deployment data; [Zunino, § 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s.”] and 
evaluating output data generated by the […] MLM during the training of the first MLM and/or the […] MLM. [This limitation is met by the definitions of the loss functions in Zunino as discussed above. That is, in equations (1) and (2) of Zunino, g and h correspond to action data and the subject identity data, which are evaluated in the loss function. See Zunino, Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.”].
Ganin further teaches “obtaining deployment data comprising additional time sequences of data samples, wherein the additional time sequences represent additional predefined subjects performing non-categorized actions;” [Abstract: “The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary).” That is, unlabeled data correspond to non-categorized actions in the context of based reference Zunino. § 3, paragraph 1: “Moreover, we have two different distributions over X × Y, called the source domain DS and the target domain DT. An unsupervised domain adaptation learning algorithm is then provided with a labeled source sample S drawn i.i.d. from DS, and an unlabeled target sample T drawn i.i.d. from DXT.” Note that the concept of “target domain” is analogous to additional predefined subjects when applied to the context of Zunino. The unlabeled data is regarded as “deployment data” because the model is being used (deployed) for domain adaptation. See also § 5.1.1, paragraph 1 for additional examples.] “including the deployment data in the training data;” [As shown in Algorithm 1 on page 10, S and T are included in the training process, thus collectively constituting training data.] “while excluding from the first loss function the action data that is generated by the first MLM for the additional time sequences.” [See § 4.1, text above and below equation (5): “Training the neural network then leads to the following optimization problem on the source domain…” As shown, the loss function Ly, which is analogous to the “first loss function” of Zunino and the instant claim, is based on the samples of index n, which corresponds to t the samples S. That is, the samples T, which have index n’ are excluded from the loss function Ly. See also Algorithm 1, which teaches that the samples T with index n’ are handled separately in lines 24-31, after the samples S are used in lines 7-9 and 11-14, where line 11 corresponds to the use of the loss function based on samples S.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin by implementing the techniques of Zunino discussed above, including the use of unlabeled samples and the associated training process, so as to arrive at the above limitations of the instant dependent claim. One of ordinary skill in the art would have been motivated to do so in order to “learn a mapping between domains in the situation when the target domain data are either fully unlabeled (unsupervised domain annotation) or have few labeled samples (semi-supervised domain adaptation)” (Ganin, § 1, paragraph 2). 
The combination of references thus far does not teach a “third” MLM.
Pei, which also pertains to the general problem of domain-adversarial machine learning (see title), teaches “training a third MLM based on the feature data extracted by the first MLM for said at least part of the training data, to determine if the feature data originates from the deployment data” [FIG. 2, showing that different domain (analogous to subject in the context of Zunino) classification networks, as described on page 3937, left column, middle: “K domain discriminators Gdk, k = 1, … K.” That is, a second domain discriminator (e.g., at k = 2) corresponds to a third MLM. Furthermore, the discriminators, including a third model is trained in accordance with the loss function Ldk of equation (3) on page 3937, left column, which is based on the feature data extracted by the first MLM, i.e. Gf(xi). The limitation of “to determine if the feature data originates from the deployment data” is met because the deployment data is used in the training process, and outputs based there on are determined as originated from it.] and “evaluating output data generated by the third MLM during the training of the first MLM and/or the third MLM.” [This limitation is disclosed by equation (3) where Gdk represents the output data generated by the third MLM.] 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino and Ganin with the teachings of Pei by implementing the subject classifier in Zunino as a plurality of subject classification networks and training the plurality of subject classification networks, in accordance with the techniques taught by Pei, so as to arrive at the limitations of the instant dependent claim. The motivation for doing so would have been to implement a model architecture that captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators, as suggested by Pei (abstract: “we present a multi-adversarial domain adaptation (MADA) approach, which captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators”).

As to claim 14, the combination of Zunino, Ganin, and Motiian, and Pei teaches the method of claim 13, as set forth above. 
Pei further teaches “wherein the training of the first MLM is performed to be adversarial to the determination by the third MLM.” [FIG. 2 caption: “The blue part shows the multiple adversarial networks (each for a class, K in total) crafted in this paper.” Note that in this context, “adversarial” has the same meaning as that in Zunino, which is that the multiple adversarial networks are adversarial to the data label predictor, analogous to the first MLM.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of Zunino and Ganin with the teachings of Pei, so as to also arrive at the limitations of the instant dependent claim. Since the teachings of Pei discussed for the instant claim are part of the techniques already discussed in the rejection of the parent dependent claim, the motivation for combining the teachings of the references as set forth in the rejection of the parent dependent claim also covers the limitations of the instant dependent claim.

As to claim 15, the combination of Zunino, Ganin, and Motiian, and Pei teaches the method of claim 13, as set forth above.
Pei further teaches “wherein said evaluating comprises: determining, based on the output data generated by the third MLM, at least one of the additional time sequences; and indicating the at least one of the additional time sequences as a candidate to be categorized by action.” [As shown in FIG. 2, each domain discriminator determines an output Gd, and determination of this output also determines and categorizes the corresponding time sequence x that was used to generate the output.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of Zunino and Ganin with the teachings of Pei, so as to also arrive at the limitations of the instant dependent claim. Since the teachings of Pei discussed for the instant claim are part of the techniques already discussed in the rejection of the parent dependent claim, the motivation for combining the teachings of the references as set forth in the rejection of the parent dependent claim also covers the limitations of the instant dependent claim.

As to claim 20, the combination of Zunino, Ganin, and Motiian teaches the method of claim 1, as set forth above, but does not teach the further limitations of the instant dependent claim.
Pei, which also pertains to the general problem of domain-adversarial machine learning (see title), teaches “wherein the second MLM comprises a plurality of subject classification networks which are operable in parallel, and wherein the subject classification networks differ by one of more of initialization values, network structure, or input data.” [FIG. 2, showing that different domain (analogous to subject in the context of Zunino) classification networks, as described on page 3937, left column, middle: “K domain discriminators Gdk, k = 1, … K.” As shown in the figure, the different domain discriminators receive different input data ŷKf and are also trained to have different weights (i.e., different neural network initialization values and structures, noting that weights are part of the structure of a neural network) as described on page 3937, left column, bottom: “The multiple domain discriminators are trained with probability-weighted data points ŷiK Gf(xi), which naturally learn multiple domain discriminators with different parameters θkd; discriminators with different parameters promote positive transfer for each instance.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino and Ganin with the teachings of Pei by implementing the subject classifier in Zunino as a plurality of subject classification networks in accordance with the techniques taught by Pei, so as to arrive at the limitations of the instant dependent claim. The motivation for doing so would have been to implement a model architecture that captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators, as suggested by Pei (abstract: “we present a multi-adversarial domain adaptation (MADA) approach, which captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators”).

3.	Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Zunino in view of Ganin and Motiian, and further in view of Nguyen et al., “Weakly Supervised Action Localization by Sparse Temporal Pooling Network,” CVPR 2018, pp. 6752-6761 (“Nguyen”).
As to claim 18, the combination of Zunino, Ganin, and Motiian teaches the method of claim 16, as set forth above, but does not teach the further limitations of the instant dependent claim.
Nguyen teaches “further comprising time-averaging the output data of said at least one of the processing layers, wherein the second MLM is trained based on the time-averaged output data.” [§ 3.1, paragraph 2: “Formally, let xt ∈ R m be the m dimensional feature representation extracted from a video segment centered at time t, and λt be the corresponding attention weight. The video level representation, denoted by x̄, corresponds to an attention weighted temporal average pooling, which is given by [see equation in text].” § 3.1, paragraph 2: “The loss function in the proposed network is composed of two terms, the classification loss and the sparsity loss, which is given by… Lsparsity is the sparsity loss on the attention weights.” See also FIG. 2. Note that training is performed based on this model architecture, as disclosed in § 4.2, paragraph 2: “We sample 400 segments at uniform interval from each video in both training and testing.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Nguyen by implementing the use of weighted temporal average pooling and an attention module, so as to arrive at the limitations of the instant dependent claim. The motivation would have been to implement a model architecture that enables prediction of temporal intervals of human actions with no requirement of temporal localization annotations, as suggested by Nugyen (see abstract: “Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations.”). 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following document depicts the state of the art.
Tzeng et al., “Adversarial Discriminative Domain Adaptation,” arXiv:1702.05464v1 [cs.CV] 17 Feb 2017 (“Tzeng”) teaches that removal of the gradient reversal layer in domain-adversarial networks is well known.
US-20220180200-A1 teaches a domain-adversarial model with multiple heads.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Dec 28, 2021
Application Filed
Dec 31, 2025
Non-Final Rejection mailed — §103
Mar 18, 2026
Response Filed
Apr 08, 2026
Final Rejection mailed — §103
Jun 02, 2026
Response after Non-Final Action
Jun 16, 2026
Request for Continued Examination
Jun 18, 2026
Response after Non-Final Action
Jun 29, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/224,306
Patent 12626122
METHODS OF PROVIDING TRAINED HYPERDIMENSIONAL MACHINE LEARNING MODELS HAVING CLASSES WITH REDUCED ELEMENTS AND RELATED COMPUTING SYSTEMS
5y 1m to grant Granted May 12, 2026
17/447,542
Patent 12626138
CAUSALITY DETECTION FOR OUTLIER EVENTS IN TELEMETRY METRIC DATA
4y 8m to grant Granted May 12, 2026
16/461,763
Patent 12619852
Method and System for Simulating, Predicting, Interpreting, Comparing, or Visualizing Complex Data
6y 11m to grant Granted May 05, 2026
17/533,679
Patent 12608604
METHOD AND APPARATUS FOR TRAINING ARTIFICIAL INTELLIGENCE BASED ON EPISODE MEMORY
4y 5m to grant Granted Apr 21, 2026
17/747,036
Patent 12536455
Method for Early Warning Brandish of Transmission Wire Based on Improved Bayes-Adaboost Algorithm
3y 8m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
63%
Grant Probability
96%
With Interview (+33.4%)
4y 0m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 130 resolved cases by this examiner. Grant probability derived from career allowance rate.