Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claims 1-28 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
Claims 15-21 invoke means-for interpretation. Specification 100 teaches “The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to, a circuit, an application specific integrated circuit (ASIC), or processor.” The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-4, 6-11, 13-18, 20-25 and 27-28 are rejected under 35 U.S.C. 103 as being unpatentable over Cross-Domain Sentiment Classification with In-Domain Contrastive Learning by Li et al and Overcoming catastrophic forgetting in neural networks by Kirkpatrick et al (Kirk).
Claims 5, 12, 19 and 26 are rejected under 35 U.S.C. 103 as being unpatentable over Cross-Domain Sentiment Classification with In-Domain Contrastive Learning by Li et al, Overcoming catastrophic forgetting in neural networks by Kirkpatrick et al (Kirk) and US20190325861A1 to Singh et al.
Li teaches claims 1, 8, 15 and 22. A (Li fig. 1 below)
PNG
media_image1.png
242
494
media_image1.png
Greyscale
Li Fig. 1
training a machine learning model on a source domain; (Li fig. 1 below. Source data is the claims source domain. Li fig. 1 desc. “The model is jointly trained on the three objectives.”)
PNG
media_image1.png
242
494
media_image1.png
Greyscale
testing the machine learning model on a target domain, after training the machine learning model on the source domain; and (Li sec. 4.1 p. 5 “We also train BERT on the source labeled data and directly test on the target labeled data…” Target data is the target domain.)
training the machine learning model on the target domain by regularizing weights of the machine learning model such that (Li fig. 1 desc. “The model is jointly trained on the three objectives.” The third objective is the target data/domain. Li says that they train differently based on shift in the data set.1 Part of Li’s training includes regularizing/updating weights, Li sec. 4.1 p. 5 where they teach that their weights are regularized2 during training, “On the neural network training, we use the AdamW [24] optimizer with learning rate 2e − 5, linear learning rate scheduler, linear learning rate warm up, warm up steps 0.1 of total training steps and weight decay 0.01.” Regularizing weight decay regularizes weights.)
Li doesn’t teach changing penalties.
However, Kirk teaches a processor, memory, computer readable media… (Kirk p. 5 “the EWC approach presented here makes use of a single network with fixed resources (i.e. network capacity) and has minimal computational overhead.” Computational overhead includes memory and processors.) individual shift-agnostic weights that are insensitive to domain shifts domain shifts are subjected to a higher penalty than individual shift-biased weights that are sensitive to domain shifts. (Kirk sec. 1 p. 2 “elastic weight consolidation (EWC for short). This algorithm slows down learning on certain weights based on how important they are to previously seen tasks.” Kirk’s “importance” is shift sensitivity. Kirk sec. 2 p.3 “when we use EWC, and thus take into account how important each weight is to task A, the network can learn task B well without forgetting task A (red curves)…. When moving to a third task, task C, EWC will try to keep the network parameters close to the learned parameters of both task A and B. This can be enforced either with two separate penalties, or…” The higher penalty for important weights is taught throughout Kirk, including sec. 1 p. 2, “This algorithm slows down learning on certain weights based on how important they are to previously seen tasks…. While learning task B, EWC therefore protects the performance in task A by constraining the parameters to stay in a region of low error for task A centered around θ∗ A, as shown schematically in Figure 1. This constraint is implemented as a quadratic penalty, and can therefore be imagined as a spring anchoring the parameters to the previous solution, hence the name elastic. Importantly, the stiffness of this spring should not be the same for all parameters; rather, it should be greater for those parameters that matter most to the performance during task A.”)
Li, Kirk and the claims are all learning algorithms. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use Kirk for weight regularization because Kirk’s “EWC approach presented here makes use of a single network with fixed resources (i.e. network capacity) and has minimal computational overhead.” Kirk p. 5.
Li teaches claims 2, 9, 16 and 23. The processor-implemented method of claim 1, further comprising training the machine learning model based on a main task loss and an auxiliary task loss. (Li fig. 1 main task loss is entropy loss and auxiliary task loss is contrastive loss, see below.)
PNG
media_image1.png
242
494
media_image1.png
Greyscale
Li teaches claims 3, 10, 17 and 24. The processor-implemented method of claim 2, in which the main task loss is based on an entropy derived from a predicted probability, by the machine learning model, on test samples from the target domain. (Li fig. 1 main task loss is entropy loss. The samples are the “Unlabeled target data”, see below.)
PNG
media_image1.png
242
494
media_image1.png
Greyscale
Li teaches claims 4, 11, 18 and 25. The processor-implemented method of claim 3, in which the entropy comprises an entropy minimization loss. (Li Abstract: "we introduce in-domain contrastive learning and entropy minimization" Section 3.3 entropy minimization: "we minimize the entropy of the model’s prediction")
Li teaches claims 5, 12, 19 and 26. The processor-implemented method of claim 3, in which the entropy comprises an entropy (Li sec. 3.3 p. 4 “Applying the entropy loss too hastily asks the model to early decide labels for uncertain instances before the model has learnt good representations and the classification boundary on source domain. Therefore, we apply the entropy loss from the second epoch and it works out as expected.”) Li doesn’t teach the maximization.
However, Singh teaches entropy maximization loss. (Singh para 20 “minimize cross-entropy based label classification loss on the labeled source domain data and at the same time to maximize cross-entropy domain classification loss on the supervised source domain data and unsupervised target domain data.”)
Li, Singh and the claims all use entropy loss. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to combine a maximization loss so that “domain invariant features can be learned directly… with significant improvement over the [other ]… models trained without domain adaptation.” Sing para 9.
Li teaches claims 6, 13, 20 and 27. The processor-implemented method of claim 2, in which the auxiliary task loss moves target representations to a nearest class centroid of the source domain. (Li Sec. 4.2: p. 7 “In-domain contrastive learning focuses on pushing red points away from yellow and green points away from blue, which desirably enlarge the margin between positive and negative.” Li Figure 2 desc. "The red and yellow dots respectively represent the feature of positive and negative samples on the source domain, and the blue and green dots represents those on the target domain. Note that the margin between positive cluster and negative cluster on target domain becomes clearer from left to right.” In Li fig. 2, below, you can see that the target domain positive samples blue dots are pushed towards the source domain positive samples red dots. Pushing two clusters together like that necessarily moves the target cluster toward the centroid of the source cluster.)
PNG
media_image2.png
154
532
media_image2.png
Greyscale
Li teaches claims 7, 14, 21 and 28. The processor-implemented method of claim 6, in which the auxiliary task loss is computed on a representation mapped into an embedding space. (Li Figure 1 shows an embedded space with representation z mapped to it, from there contrastive/auxiliary task loss is computed, see below.)
PNG
media_image1.png
242
494
media_image1.png
Greyscale
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Austin Hicks whose telephone number is (571)270-3377. The examiner can normally be reached Monday - Thursday 8-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/AUSTIN HICKS/Primary Examiner, Art Unit 2142
1 Li abs. “we find through ablation studies that these two techniques behaviour differently in case of large label distri bution shift and conclude that the best practice is to choose one of them adaptively according to label distribution shift.”
2 Footnote 24 in Li points to a paper titled “Fixing weight decay regularization in adam.”