Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Remarks
This Office Action is in response to applicant’s amendment filed on March 18, 2026, under which claims 1-20 are pending and under consideration.
Response to Arguments
Applicant’s amendments have overcome the previous claim rejections under § 112. Therefore, the previous § 112 rejections have been withdrawn.
Applicant’s amendments to claim 1 have overcome the previous § 102 rejection of that claim over Zunino. However, claim 1 is now given a new rejection under § 103 in which Ganin (which was previously applied to various dependent claims) is cited in combination with Zunino. Specifically, Ganin is relied upon to teach the limitation of repeating the training processes.
Applicant’s arguments directed to the prior art rejections are moot because they do not specifically address Ganin, which is now cited for the new limitations.
However, in regards to applicant’s general observation that “utilization of a gradient reversal layer removes the need to decide whether to train the first or second MLM, as both are trained at the same time” (applicant’s response, page 8), the Examiner notes that the current claim language does not include a negative limitation of not using a gradient reversal layer, nor does it distinguish training both models at the same time. As discussed in the updated rejections below, the phrase “perform one of the following” is open-ended and does not have the close-ended meaning of “perform only one of the following and not the other” in the manner of a negative limitation. Even if the claim did recite the latter, Ganin, Algorithm 1 nonetheless teaches that the weights/biases of the two models are updated in separate steps (separate lines in the algorithm). Thus, in each step, only one model is being “trained” and not the other. Please see the rejections below for further details. Therefore, if the applicant intends to distinguish over the cited art on the basis of the concepts in FIG. 6B of this application, it would be necessary to reflect those concepts with more particularity in the claims.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
1. Claims 1-12, 16-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zunino et al., “Predicting Intentions from Motion: The Subject-Adversarial Adaptation Approach,” International Journal of Computer Vision (2020) 128:220–239, published 18 September 2019 (“Zunino”) in view of Ganin et al., “Domain-Adversarial Training of Neural Networks,” arXiv:1505.07818v4 [stat.ML] 26 May 2016, Journal of Machine Learning Research 17 (2016) 1-35 (“Ganin”).
As to claim 1, Zunino teaches a method of training a first machine learning-based model (MLM) for action recognition, [Abstract: “This paper aims at investigating the action prediction problem from a pure kinematic perspective. Specifically, we address the problem of recognizing future actions, indeed human intentions, underlying a same initial (and apparently unrelated) motor act.” § 1, last paragraph: “we propose an original approach derived from the domain adaptation research which considers each subject as a domain and adopt a novel subject-adversarial training pipeline to generalize better among the subjects.”] said method comprising:
a) obtaining training data comprising time sequences of data samples, wherein the time sequences of data samples represent predefined subjects which are performing predefined actions; [§ 4.1, paragraph 4: “We consider each grasping a bottle movement in our dataset D as a triplet [x, s, y], where x is an arbitrary feature vector encoding it, s is the subject’s label, and y is the intention’s label (see Fig. 4).” That is, y is a labeled (predefined) action, and s is a labeled (predefined) subject performing that action. The dataset is specifically a set of time sequences. See § 3, paragraphs 1-2: “The dataset was designed as follows. Seventeen naïve volunteers were seated …and participants were asked to grasp it in order to perform one of the following 4 different intentions… After a training session, the final dataset is composed by 253 trials of pouring, 262 of passing, 300 of drinking and 283 of placing - 1098 in total. For each, both video and 3D data have been collected.” The videos are “time sequences” because they are 3D kinematic data with time markers (§ 3.1, left column, last full paragraph: “the acquisition of each trial is automatically ruled by a thresholding of the wrist velocity v(t) at time t, acquired by the corresponding marker. Being ε = 20 mm/s, at the first instant t0 when v(t0) > ε, the acquisition starts and it is stopped at time tf, when the wrist velocity v(tf) < ε”) or are “video sequences” with frames (see next paragraph).”]
b) determining to perform one of the following steps: [The instant claim recites either i) or ii) in the alternative. While only one step is required to meet the limitations of the claims, Zunino teaches training both models. Thus, both steps are mapped below. The Examiner notes that the instant claim language of “perform one of the following” is open-ended and does not have the close-ended meaning of “perform only one of the following and not the other” in the manner of a negative limitation. That is, the current claim language does not require a negative limitation of determining not to perform the other step, or a limitation where the other step is not actually performed. Furthermore, the act of “determining” is met because performing a process in a computing context implies determining to perform that process.] i) training the first MLM based on the training data, to discriminate between the predefined actions; [As shown in FIG. 4, the model parts labeled “Feature Representation” and “Intention Prediction” correspond to a first MLM. As shown in this figure, this part of the model predicts (discriminates between) various intention (predefined actions as defined in § 3, paragraph 1), i.e., pouring, passing, drinking, placing. This model is trained, as stated in § 4.1, paragraph 5, text below equation (2): “Precisely, Eq. (1) promotes an accurate prediction of intentions: the loss function ℓi is minimized as to penalize discrepancies between the actual intention label y and the high-level embedding g which is trained to be discriminative for the sake of the intention prediction task.” Note that the function g represents the first model.] or ii) training a second MLM based on feature data that is extracted by the first MLM for the training data, to discriminate between the predefined subjects; [As shown in FIG. 4, the model parts labeled “Subject Confusion” correspond to a second MLM. As shown in this figure, this part of the model predicts (discriminates between) various subject identifies. This model is trained, as stated in § 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s.” Note that function h represents the second model.] […]
wherein the training of the first MLM is performed to be adversarial to the discrimination between the predefined subjects by the second MLM, [§ 4.1, paragraph 5, text below equation (2): “This second loss function is minimized with respect to the weights Wf which defines the feature encoding f, being at the same time maximized at the classifier level—that is, the weights Ws. The whole idea is to deploy an adversarial game in which we want to train at our best an effective feature encoding f which is effective in predicting intentions, without suffering of the retrieved subjects’ related biases.” That is, Wf is trained in a manner adversarial to Ws with respect to the second loss function.]
wherein the training of the first MLM comprises: determining parameter values of the first MLM that minimizes a first loss function that represents a difference between action data generated by the first MLM and action reference data, which is predefined and associated with the training data, [§ 4.1, paragraph 5, text below equation (2): “Precisely, Eq. (1) promotes an accurate prediction of intentions: the loss function ℓi is minimized as to penalize discrepancies between the actual intention label y and the high-level embedding g which is trained to be discriminative for the sake of the intention prediction task.” As shown in equation (1), the weights Wf and Wi of the first model is trained to minimize the loss function ℓi of equation (1), which correspond to a “first loss function” of the instant claim and which defines a loss between the predefined actions y and the action data g(f(x|Wf), Wi) that is generated by the first model represented by the function g.] and that minimizes a second loss function that represents how much subject-related information is contained in the feature data. [§ 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s. This second loss function is minimized with respect to the weights Wf which defines the feature encoding f,… The whole idea is to deploy an adversarial game in which we want to train at our best an effective feature encoding f which is effective in predicting intentions, without suffering of the retrieved subjects’ related biases… We try to get rid of the subjects’ biases by achieving a random chance classifier for subject identities:” That is, ℓs corresponds to a “second loss function” of the instant claim. This loss function represents how much subject-related information is contained in the feature data because the minimization of it minimizes subject-related bias, which corresponds to “subject-related information.” This is also described in § 4.1, paragraph 5: “We look for a feature representation f(x|Wf ), depending on some parameters Wf , which is trained to be intention-discriminative and subject-invariant,” where the invariance refers to removal of subject-related biases.]
Zunino does not explicitly teach “c) repeating b) at least one, where in the other step between i) and ii) is performed.”
Ganin teaches the above limitation. In general, Ganin teaches the general technique of domain-adversarial machine learning, and the disclosure of Zunino is an extension of Ganin to multiple domains (see Zunino, § 4.2, paragraph 4: “We accommodate the publicly available code3 of Ganin et al. (2016) to deal with a different number of subjects to perform adaptation. Indeed, Ganin et al. (2016) considers a simplified setting of one target domain only, whereas, differently, we consider multiple domains.”). Therefore, the techniques of Ganin are generally relevant to those of Zunino.
In particular, Ganin teaches “c) repeating b) at least one, wherein the other step between i) and ii) is performed.” [In general, FIG. 1 (page 12) teaches an analogous first model in the form of feature extractor Gf and label predictor Gy, and an analogous second model in the form of domain classifier Gd, whose parameters θf, θy, θd are trained in accordance with the update process shown in equations 13-15, which itself is an extension of Algorithm 1, on page 10. Page 10, Algorithm 1, teaches the iterative (repeating) process in lines 5-6 (“while stopping criterion is not met, do” and “for i from 1 to n do”). In each iteration, both models are trained, as shown in lines 33-36 of Algorithm 1, where the weights and biases of the two models are updated sequentially. Therefore, in any arbitrary repetition, “the other step” (i.e., the process training another one of the two models) is performed.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin by implementing the training process of Ganin in the process of Zunino, so as to arrive at the claimed invention, including “c) repeating b) at least one, wherein the other step between i) and ii) is performed.” One of ordinary skill in the art would have been motivated to do so because incorporating the training process of Ganin enables implementation of the method of Zunino. Specifically, Ganin teaches specific implementational details that enable and implement the model that is described in Zunino, since Zunino’s method is an extension of Ganin (see parts of Zunino, § 4.2, paragraph 4 cited above).
As to claim 2, the combination of Zunino and Ganin teaches the method of claim 1, as set forth above.
Zunino alone does not explicitly teach the further limitations of “wherein all parameter values of the second MLM are fixed during the training of the first MLM.” [The Examiner notes that § 4.2, paragraph 4 states: “The optimization of (1) and (2) is carried out by using a joint back-propagation[.] In particular, we compute the updates on the parameters Ws and Wi separately on the two branches. Then, we used the gradient reversal layer (Ganin et al. 2016) to change the sign of the derivative of the subject loss s with respect to Wf (after a re-scaling by a parameter λ). The derivative of ℓi with respect to Wf is instead back-propagated with the correct sign (see Fig. 4).” This description is consistent with the instant limitations at issue, but does not explicitly teach the detail of whether parameter values are fixed.]
However, Ganin further teaches “wherein all parameter values of the second MLM are fixed during the training of the first MLM” [FIG. 1 (page 12) teaches an analogous first model in the form of feature extractor Gf and label predictor Gy, and an analogous second model in the form of domain classifier Gd, whose parameters θf, θy, θd are trained in accordance with the update process shown in equations 13-15, which itself is an extension of Algorithm 1, on page 10. As shown in equations 13-15, the update (training) of the parameters θf, θy of the first model does not affect the parameters θd of the second model, which are therefore “fixed.” Note that in Algorithm 1, the parameters of both models are, in general, fixed in each training iteration over the course of the training process in lines 8-31.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin to have also arrived at the limitations of the instant dependent claim. The motivation for doing so is the same that which was given for the teachings of Ganin in the rejection of the parent independent claim.
As to claim 3, the combination of Zunino and Ganin teaches the method of claim 2, as set forth above.
Ganin further teaches “wherein all parameter values of the first MLM are fixed during the training of the second MLM.” [In equations 13-15, the update (training) of the parameters θd of the second model, does not affect the parameters θf, θy of the first model, which are therefore “fixed.” Note that in Algorithm 1, the parameters of both models are, in general, fixed in each training iteration over the course of the training process in lines 8-31.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin to have also arrived at the limitations of the instant dependent claim. The motivation for doing so is the same that which was given for the teachings of Ganin in the rejection of the parent independent claim.
As to claim 4, the combination of Zunino and Ganin teaches the method of claim 2, wherein the second loss function represents a difference between subject identity data generated by the second MLM and target data, which is predefined and associated with the training data. [Zunino, § 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s.” Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.” As stated in equation (2), ℓs is a function of the subject identity data h and the predefined target data s. The fact that this function is a cross-entropy loss means that it represents a difference between the two.]
As to claim 5, the combination of Zunino and Ganin teaches the method of claim 4, wherein the training of the second MLM comprises determining parameter values of the second MLM that minimizes a third loss function that represents a difference between the subject identity data generated by the second MLM and further target data, which is predefined and associated with the training data. [Zunino, § 4.2, paragraph 4: “…Then, we used the gradient reversal layer (Ganin et al. 2016) to change the sign of the derivative of the subject loss s with respect to Wf (after a re-scaling by a parameter λ).” As shown in FIG. 4, the subject loss function ℓs (i.e., second loss function, which “represents a difference between the subject identity data generated by the second MLM and further target data”) is transformed into a negative in the operation of computing the gradient reversal operation denoted by -λ∂ℓs/∂dWf. Here, the factor -ℓs in this term corresponds to a third loss function that is being minimized during the model training (note that the maximization of ℓs in equation (2) also refers to the minimization of -ℓs with the λ coefficient).]
As to claim 6, the combination of Zunino and Ganin teaches the method of claim 5, wherein the second loss function is a negation of the third loss function. [As discussed in the rejection of the parent dependent claim, in Zunino, ℓs and -ℓs correspond to the second and third loss functions. Mathematically, they are negations of one another.]
As to claim 7, the combination of Zunino and Ganin teaches the method of claim 4, wherein the training of the first MLM results in a probability distribution over the predefined subjects, and wherein the target data comprises a reference probability distribution that represents fractional occurrences of the predefined subjects in the training data, and wherein the second loss function operates on the probability distribution and the reference probability distribution. [Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.” As stated in equation (2), ℓs is a function of the subject identity data h and the predefined target data s. The fact that this function is a cross-entropy loss means that it represents a difference between the two both in the form of respective probability distributions, since a cross-entropy loss is by definition a difference between two probability distributions.]
As to claim 8, the combination of Zunino and Ganin teaches the method of claim 4, wherein the training of the first MLM, for a time sequence associated with a predefined action, results in a probability distribution over the predefined subjects, wherein the target data comprises a reference probability distribution that represents fractional occurrences of the predefined subjects in the training data for each predefined action, wherein the second loss function operates on a difference between the probability distribution and a corresponding reference probability distribution, wherein the corresponding reference probability distribution is associated with the predefined action. [Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.” As stated in equation (2), ℓs is a function of the subject identity data h and the predefined target data s. The fact that this function is a cross-entropy loss means that it represents a difference between the two both in the form of respective probability distributions, since a cross-entropy loss is by definition a difference between two probability distributions.]
As to claim 9, the combination of Zunino and Ganin teaches the method of claim 8, wherein the second loss function aggregates, for the time sequences, differences between the probability distribution generated for each time sequence and the corresponding reference probability distribution. [In Zunino, equation (2), the subscript [x, s, y]∈D indicates that the loss is aggregated across each time sequence x and corresponding subject labels s (hence, also the probability distribution given the context of cross-entropy loss) in the dataset D]
As to claim 10, the combination of Zunino and Ganin teaches the method of claim 4, wherein the subject identity data comprises a second probability value for at least one of the predefined subjects, and wherein the second loss function operates on the second probability value. [Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.” As stated in equation (2), ℓs is a function of the subject identity data h and the predefined target data s. The fact that this function is a cross-entropy loss means that it represents a difference between the two both in the form of respective probability distributions, since a cross-entropy loss is by definition a difference between two probability distributions.]
As to claim 11, the combination of Zunino and Ganin teaches the method of claim 4, further comprising: […]
training the first MLM on at least part of the training data, to discriminate between the predefined actions, […] [Zunino, § 4.1, paragraph 5, text below equation (2): “Precisely, Eq. (1) promotes an accurate prediction of intentions: the loss function ℓi is minimized as to penalize discrepancies between the actual intention label y and the high-level embedding g which is trained to be discriminative for the sake of the intention prediction task.” Zunino, § 4.2, paragraph 4 states: “The optimization of (1) and (2) is carried out by using a joint back-propagation[.] In particular, we compute the updates on the parameters Ws and Wi separately on the two branches. Then, we used the gradient reversal layer (Ganin et al. 2016) to change the sign of the derivative of the subject loss s with respect to Wf (after a re-scaling by a parameter λ). The derivative of ℓi with respect to Wf is instead back-propagated with the correct sign (see Fig. 4).”]
training the second MLM based on feature data extracted by the first MLM for said at least part of the training data, to discriminate between the predefined subjects; [Zunino, § 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s.” Note that function h represents the second model.] and
evaluating the subject identity data and/or the action data generated by the training of the first MLM and the second MLM. [This limitation is met by the definitions of the loss functions in Zunino as discussed above. That is, in equations (1) and (2) of Zunino, g and h correspond to action data and the subject identity data, which are evaluated in the loss function. See Zunino, Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.”].
Ganin further teaches “obtaining deployment data comprising additional time sequences of data samples, wherein the additional time sequences represent additional predefined subjects performing non-categorized actions, and wherein the additional predefined subjects are included among the predefined subjects;” [Abstract: “The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary).” That is, unlabeled data correspond to non-categorized actions in the context of based reference Zunino. § 3, paragraph 1: “Moreover, we have two different distributions over X × Y, called the source domain DS and the target domain DT. An unsupervised domain adaptation learning algorithm is then provided with a labeled source sample S drawn i.i.d. from DS, and an unlabeled target sample T drawn i.i.d. from DXT.” Note that the concept of “target domain” is analogous to additional predefined subjects when applied to the context of Zunino. The unlabeled data is regarded as “deployment data” because the model is being used (deployed) for domain adaptation. See also § 5.1.1, paragraph 1 for additional examples.] “including the deployment data in the training data;” [As shown in Algorithm 1 on page 10, S and T are included in the training process, thus collectively constituting training data.] “while excluding from the first loss function the action data that is generated by the first MLM for the additional time sequences.” [See § 4.1, text above and below equation (5): “Training the neural network then leads to the following optimization problem on the source domain…” As shown, the loss function Ly, which is analogous to the “first loss function” of Zunino and the instant claim, is based on the samples of index n, which corresponds to t the samples S. That is, the samples T, which have index n’ are excluded from the loss function Ly. See also Algorithm 1, which teaches that the samples T with index n’ are handled separately in lines 24-31, after the samples S are used in lines 7-9 and 11-14, where line 11 corresponds to the use of the loss function based on samples S.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin by implementing the techniques of Zunino discussed above, including the use of unlabeled samples and the associated training process, so as to arrive at the limitations of the instant dependent claim. One of ordinary skill in the art would have been motivated to do so in order to “learn a mapping between domains in the situation when the target domain data are either fully unlabeled (unsupervised domain annotation) or have few labeled samples (semi-supervised domain adaptation)” (Ganin, § 1, paragraph 2).
As to claim 12, the combination of Zunino and Ganin teaches the method of claim 11, as set forth above.
Ganin further teaches “wherein said evaluating comprises:
determining, based on the subject identity data generated by the second MLM, at least one selected subject among the additional predefined subjects;” [Algorithm 1, line 26, where Gd is the determination of the domain (subject) from xj, which is one among the n’-index samples T represented by line 24. See also equation (7) and the text above it.] and
“indicating at least one of the additional time sequences that is performed by said at least one selected subject as a candidate to be categorized by action.” [This limitation is met by the determined association between xj (analogous to a time sequence), and Gd (analogous to the selected subject), xj is being categorized as Gd by the features of xj (the action).]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin to have also arrived at the limitations of the instant dependent claim. The motivation for doing so is the same that which was given for the teachings of Ganin in the rejection of the parent independent claim.
As to claim 16, the combination of Zunino and Ganin teaches the method of claim 1, wherein the first MLM comprises a [… (processing layer)] and an action classification layer, which is directly or indirectly connected to the [… (processing layer)], wherein said feature data represents output data of at least one of the processing layers. [§ 4.2, paragraph 4: A multi-layer perceptron (MLP) network with one hidden layer of dimension 200 was designed as the shared feature representation f(x|Wf ). For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi.” That is, the hidden layer constitutes a processing layer, while the softmax constitutes an action classification layer.]
Ganin further teaches a “sequence of processing layers” [In general, Ganin teaches the general technique of domain-adversarial machine learning, of which the Zunino is an extension. In particular, a sequence of processing layers is shown in FIG. 1. See FIG. 1 caption: “The proposed architecture includes a deep feature extractor (green).” Specifically, § 5.2.2 teaches: “In general, we compose feature extractor from two or three convolutional layers, picking their exact configurations from previous works. More precisely, four different architectures were used in our experiments. The first three are shown in Figure 4. For the Office domains, we use pre-trained AlexNet from the Caffe-package (Jia et al., 2014).”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of Zunino with the teachings of Ganin by implementing the first model to include a sequence of processing layers, including a plurality of convolutional layers, as taught in Ganin, so as to arrive at the limitations of the instant claim. The motivation for doing so would have been to implement a neural network architecture, namely a convolutional neural network, that is known in the art as being suitable for performing image classification (see Ganin § 1, second-to-last paragraph: “We further evaluate the approach extensively for an image classification task, and present results on traditional deep learning image data sets”).
As to claim 17, the combination of Zunino and Ganin teaches the method of claim 16, as set forth above.
Ganin further teaches “wherein one or more of the processing layers is a convolutional layer.” [§ 5.2.2 teaches: “In general, we compose feature extractor from two or three convolutional layers, picking their exact configurations from previous works. More precisely, four different architectures were used in our experiments. The first three are shown in Figure 4. For the Office domains, we use pre-trained AlexNet from the Caffe-package (Jia et al., 2014).”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin to have also arrived at the limitations of the instant dependent claim. The motivation for combining the teachings of the references as set forth in the rejection of the parent independent claim also covers the motivation for doing so to arrive at the instant dependent claim.
As to claim 19, the combination of Zunino and Ganin teaches the method of claim 16, as set forth above.
Ganin further teaches “wherein the second MLM is trained based on the output data of two or more processing layers in the sequence of processing layers.” [The entire model is trained based on all layers of the model, since the losses Ld and Ly in Ganin (analogous to the losses ℓs and ℓi in Zunino) are based on the features extracted by the feature extractor). In other words, this limitation flows as a consequence of the feature extractor being a sequence of processing layers. Furthermore, the limitation of “two or more” is disclosed by the fact that Ganin teaches multiple convolutional layers, as discussed above.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin to have also arrived at the limitations of the instant dependent claim. The motivation for combining the teachings of the references as set forth in the rejection of the parent independent claim also covers the motivation for doing so to arrive at the instant dependent claim.
2. Claims 13-15 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zunino in view of Ganin, and further in view of Pei et al., “Multi-Adversarial Domain Adaptation,” The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 2018 (“Pei”).
As to claim 13, the combination of Zunino and Ganin teaches the method of claim 1, further comprising: […]
training the first MLM based on at least part of the training data, to discriminate between the predefined actions, […]; [Zunino, § 4.1, paragraph 5, text below equation (2): “Precisely, Eq. (1) promotes an accurate prediction of intentions: the loss function ℓi is minimized as to penalize discrepancies between the actual intention label y and the high-level embedding g which is trained to be discriminative for the sake of the intention prediction task.” Zunino, § 4.2, paragraph 4 states: “The optimization of (1) and (2) is carried out by using a joint back-propagation[.] In particular, we compute the updates on the parameters Ws and Wi separately on the two branches. Then, we used the gradient reversal layer (Ganin et al. 2016) to change the sign of the derivative of the subject loss s with respect to Wf (after a re-scaling by a parameter λ). The derivative of ℓi with respect to Wf is instead back-propagated with the correct sign (see Fig. 4).”]
training a […] MLM based on feature data extracted by the first MLM for said at least part of the training data, to determine if the feature data originates from the deployment data; [Zunino, § 4.1, paragraph 5, text below equation (2): “In (2), we still consider a similar setup in which we train a high-level encoding h by mean of a loss function ℓs which consider the subjects’ identity s.”] and
evaluating output data generated by the […] MLM during the training of the first MLM and/or the […] MLM. [This limitation is met by the definitions of the loss functions in Zunino as discussed above. That is, in equations (1) and (2) of Zunino, g and h correspond to action data and the subject identity data, which are evaluated in the loss function. See Zunino, Zunino § 4.2, paragraph 4: “For the intention prediction module, we trained a four-way softmax function using a cross entropy loss for ℓi. Similarly, for the subject confusion module, a 17- or 16-way cross-entropy loss is used for ℓs inSADA and Blind-SADA, respectively.”].
Ganin further teaches “obtaining deployment data comprising additional time sequences of data samples, wherein the additional time sequences represent additional predefined subjects performing non-categorized actions;” [Abstract: “The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary).” That is, unlabeled data correspond to non-categorized actions in the context of based reference Zunino. § 3, paragraph 1: “Moreover, we have two different distributions over X × Y, called the source domain DS and the target domain DT. An unsupervised domain adaptation learning algorithm is then provided with a labeled source sample S drawn i.i.d. from DS, and an unlabeled target sample T drawn i.i.d. from DXT.” Note that the concept of “target domain” is analogous to additional predefined subjects when applied to the context of Zunino. The unlabeled data is regarded as “deployment data” because the model is being used (deployed) for domain adaptation. See also § 5.1.1, paragraph 1 for additional examples.] “including the deployment data in the training data;” [As shown in Algorithm 1 on page 10, S and T are included in the training process, thus collectively constituting training data.] “while excluding from the first loss function the action data that is generated by the first MLM for the additional time sequences.” [See § 4.1, text above and below equation (5): “Training the neural network then leads to the following optimization problem on the source domain…” As shown, the loss function Ly, which is analogous to the “first loss function” of Zunino and the instant claim, is based on the samples of index n, which corresponds to t the samples S. That is, the samples T, which have index n’ are excluded from the loss function Ly. See also Algorithm 1, which teaches that the samples T with index n’ are handled separately in lines 24-31, after the samples S are used in lines 7-9 and 11-14, where line 11 corresponds to the use of the loss function based on samples S.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino with the teachings of Ganin by implementing the techniques of Zunino discussed above, including the use of unlabeled samples and the associated training process, so as to arrive at the above limitations of the instant dependent claim. One of ordinary skill in the art would have been motivated to do so in order to “learn a mapping between domains in the situation when the target domain data are either fully unlabeled (unsupervised domain annotation) or have few labeled samples (semi-supervised domain adaptation)” (Ganin, § 1, paragraph 2).
The combination of references thus far does not teach a “third” MLM.
Pei, which also pertains to the general problem of domain-adversarial machine learning (see title), teaches “training a third MLM based on the feature data extracted by the first MLM for said at least part of the training data, to determine if the feature data originates from the deployment data” [FIG. 2, showing that different domain (analogous to subject in the context of Zunino) classification networks, as described on page 3937, left column, middle: “K domain discriminators Gdk, k = 1, … K.” That is, a second domain discriminator (e.g., at k = 2) corresponds to a third MLM. Furthermore, the discriminators, including a third model is trained in accordance with the loss function Ldk of equation (3) on page 3937, left column, which is based on the feature data extracted by the first MLM, i.e. Gf(xi). The limitation of “to determine if the feature data originates from the deployment data” is met because the deployment data is used in the training process, and outputs based there on are determined as originated from it.] and “evaluating output data generated by the third MLM during the training of the first MLM and/or the third MLM.” [This limitation is disclosed by equation (3) where Gdk represents the output data generated by the third MLM.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino and Ganin with the teachings of Pei by implementing the subject classifier in Zunino as a plurality of subject classification networks and training the plurality of subject classification networks, in accordance with the techniques taught by Pei, so as to arrive at the limitations of the instant dependent claim. The motivation for doing so would have been to implement a model architecture that captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators, as suggested by Pei (abstract: “we present a multi-adversarial domain adaptation (MADA) approach, which captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators”).
As to claim 14, the combination of Zunino, Ganin, and Pei teaches the method of claim 13, as set forth above.
Pei further teaches “wherein the training of the first MLM is performed to be adversarial to the determination by the third MLM.” [FIG. 2 caption: “The blue part shows the multiple adversarial networks (each for a class, K in total) crafted in this paper.” Note that in this context, “adversarial” has the same meaning as that in Zunino, which is that the multiple adversarial networks are adversarial to the data label predictor, analogous to the first MLM.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of Zunino and Ganin with the teachings of Pei, so as to also arrive at the limitations of the instant dependent claim. Since the teachings of Pei discussed for the instant claim are part of the techniques already discussed in the rejection of the parent dependent claim, the motivation for combining the teachings of the references as set forth in the rejection of the parent dependent claim also covers the limitations of the instant dependent claim.
As to claim 15, the combination of Zunino, Ganin, and Pei teaches the method of claim 13, as set forth above.
Pei further teaches “wherein said evaluating comprises: determining, based on the output data generated by the third MLM, at least one of the additional time sequences; and indicating the at least one of the additional time sequences as a candidate to be categorized by action.” [As shown in FIG. 2, each domain discriminator determines an output Gd, and determination of this output also determines and categorizes the corresponding time sequence x that was used to generate the output.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of Zunino and Ganin with the teachings of Pei, so as to also arrive at the limitations of the instant dependent claim. Since the teachings of Pei discussed for the instant claim are part of the techniques already discussed in the rejection of the parent dependent claim, the motivation for combining the teachings of the references as set forth in the rejection of the parent dependent claim also covers the limitations of the instant dependent claim.
As to claim 20, the combination of Zunino and Ganin teaches the method of claim 1, as set forth above, but does not teach the further limitations of the instant dependent claim.
Pei, which also pertains to the general problem of domain-adversarial machine learning (see title), teaches “wherein the second MLM comprises a plurality of subject classification networks which are operable in parallel, and wherein the subject classification networks differ by one of more of initialization values, network structure, or input data.” [FIG. 2, showing that different domain (analogous to subject in the context of Zunino) classification networks, as described on page 3937, left column, middle: “K domain discriminators Gdk, k = 1, … K.” As shown in the figure, the different domain discriminators receive different input data ŷKf and are also trained to have different weights (i.e., different neural network initialization values and structures, noting that weights are part of the structure of a neural network) as described on page 3937, left column, bottom: “The multiple domain discriminators are trained with probability-weighted data points ŷiK Gf(xi), which naturally learn multiple domain discriminators with different parameters θkd; discriminators with different parameters promote positive transfer for each instance.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zunino and Ganin with the teachings of Pei by implementing the subject classifier in Zunino as a plurality of subject classification networks in accordance with the techniques taught by Pei, so as to arrive at the limitations of the instant dependent claim. The motivation for doing so would have been to implement a model architecture that captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators, as suggested by Pei (abstract: “we present a multi-adversarial domain adaptation (MADA) approach, which captures multimode structures to enable fine-grained alignment of different data distributions based on multiple domain discriminators”).
3. Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Zunino in view of Ganin, and further in view of Nguyen et al., “Weakly Supervised Action Localization by Sparse Temporal Pooling Network,” CVPR 2018, pp. 6752-6761 (“Nguyen”).
As to claim 18, the combination of Zunino and Ganin teaches the method of claim 16, as set forth above, but does not teach the further limitations of the instant dependent claim.
Nguyen teaches “further comprising time-averaging the output data of said at least one of the processing layers, wherein the second MLM is trained based on the time-averaged output data.” [§ 3.1, paragraph 2: “Formally, let xt ∈ R m be the m dimensional feature representation extracted from a video segment centered at time t, and λt be the corresponding attention weight. The video level representation, denoted by x̄, corresponds to an attention weighted temporal average pooling, which is given by [see equation in text].” § 3.1, paragraph 2: “The loss function in the proposed network is composed of two terms, the classification loss and the sparsity loss, which is given by… Lsparsity is the sparsity loss on the attention weights.” See also FIG. 2. Note that training is performed based on this model architecture, as disclosed in § 4.2, paragraph 2: “We sample 400 segments at uniform interval from each video in both training and testing.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Nguyen by implementing the use of weighted temporal average pooling and an attention module, so as to arrive at the limitations of the instant dependent claim. The motivation would have been to implement a model architecture that enables prediction of temporal intervals of human actions with no requirement of temporal localization annotations, as suggested by Nugyen (see abstract: “Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations.”).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following reference depicts the state of the art.
Crawshaw, “Multi-task learning with deep neural networks: a survey,” arXiv:2009.09796v1 [cs.LG] 10 Sep 2020 (Crawshaw) teaches various multi-head/multi-task model architectures and task scheduling (§ 3.3) which is applicable to determining which head/task is being selected for training.
Smolyanskiy et al. (US 2021/0150230 A1) teaches that “multiple heads may be co-trained together with a common trunk, or may be trained separately” (see [0126]).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Y.D.H./Examiner, Art Unit 2124
/MIRANDA M HUANG/ Supervisory Patent Examiner, Art Unit 2124