Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on March 8, 2023 and April 11, 2024 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Specification
The disclosure is objected to because of the following informalities: Page 5, lines 24-25 reads "Then the first score corresponding to the first one of the prompt vectors is determined based om the difference, which is not limited in the disclosure" but should read “Then the first score corresponding to the first one of the prompt vectors is determined based on the difference, which is not limited in the disclosure.”
Appropriate correction is required.
Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding claims 1, 9 and 17, the second to last limitation reads “determining a second one of the prompt vectors by modifying, based on the first score, the first one of the prompt vectors.” The specification does not clearly teach how the second prompt vector is determined by modifying the first prompt vector in a way that would enable one of ordinary skill in the art to reproduce this method without undue experimentation as there is no clear teaching of how the first vector is being modified.
Additionally, the final limitation reads “based on the second one of the prompt vectors, returning to obtaining the first score until determining a target prompt vector corresponds to the sample data.” Returning to obtaining the first score based on the second one of the prompt vectors is not clearly described in the specification, one skilled in the art would not readily be able to reproduce this method as there is no clear teaching of what this basis entails in order to determine whether to return to obtaining the first score or not.
Claims 2-8, 10-16, and 18-20 are dependent on claims 1, 9, and 17 and are therefore rejected under the same rationale.
Regarding claims 3-5, 11-13, and 19-20, each of the claims recite “determining a first difference between first scores corresponding to each two adjacent prompt vectors of the L prompt vectors” with claims 3, 11, and 19 stating “when a number of positive values included in each first difference is one” while claims 4, 5, 12, 13, and 20 state “when a number of positive values included in each difference is multiple.” A score is commonly known to be a single value in the art, thus each difference between first scores would return a single value. Thus, claims 3, 11, and 19 would always hold – where the number of positive values would be at most one. However, claims 4, 5, 12, 13 and 20 state that the number of positive values included in each difference is multiple which implies that the first scores would be vector values rather than a single value. There is nothing in the specification that describes the first scores or how they are calculated, therefore, one skilled in the art would not be readily able to reproduce this method without undue experimentation as there is no clear teaching within the disclosure.
Regarding claims 7 and 15, the first limitation reads “recording a sequence of candidate prompt vectors, wherein a third difference between serial number values corresponding to each two adjacent candidate prompt vectors in the sequence of candidate prompt vectors is K, where K is a positive integer.” The specification does not clearly teach the serial number values and, as it is not a term commonly used in the art, there is no clear teaching of the serial number, where this number can be found and how it is used to find a third difference between those values in order to record a sequence of candidate prompt vectors. Therefore, one skilled in the art would not be readily able to reproduce this method without undue experimentation as there is no clear teaching within the disclosure.
Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claims 1, 9 and 17, the second to last limitation reads “determining a second one of the prompt vectors by modifying, based on the first score, the first one of the prompt vectors.” It is not clear how the prompt vectors are being modified in order to determine a second prompt vector. The specification has no definitive clarification as to how the vectors are being modified to create the second vector.
Additionally, the final limitation reads “based on the second one of the prompt vectors, returning to obtaining the first score until determining a target prompt vector corresponds to the sample data.” Examiner understands that the decision to return to the obtaining limitation is based on the second one of the prompt vectors, however, it’s unclear what this basis is. Whether it is the similarity, size, or some other determination of the second one of the prompt vectors is not made clear and the specification has no definitive clarification as to what this basis is. Additionally, it is not clear if the returning to obtaining the first score is returning to that limitation and moving through each limitation after again. However, for prior art purposes, examiner is interpreting this as continuing the last 3 limitations until some convergence criteria is met.
Claims 2-8, 10-16, and 18-20 are dependent on claims 1, 9, and 17 and are therefore rejected under the same rationale.
Regarding claims 3-5, 11-13, and 19-20, determining a difference between each corresponding element in two prompt values depends on the number of positive values in the difference between first scores corresponding to each two adjacent prompt vectors. It is unclear how the number of positive values could be either one or multiple. The first score is not clearly defined in the specification, however, if the first score is a number then it would not be possible to have multiple positive values when looking at the difference and if the first score is a vector then it would not be possible to have both multiple positive values and a single value when looking at the difference. The specification does not provide definitive clarification on either the first score nor the differences between the first scores.
Regarding claims 7 and 15, the first limitation reads “recording a sequence of candidate prompt vectors, wherein a third difference between serial number values corresponding to each two adjacent candidate prompt vectors in the sequence of candidate prompt vectors is K, where K is a positive integer.” It is not clear what “serial number values” are referring to within the claim and the specification does not provide definitive clarification. Examiner understands that the serial number values are integer values of some kind and that the difference between them is used as a positive K value and used to determine the sequence of candidate prompt values. However, it is not clear where this serial number value can be found or how it can be generated to be used.
Examiner’s Note: Due to the breadth of issues presented in the 35 USC § 112(a) and (b) rejections above, the application underwent search and consideration to the best of the examiner’s ability and the closest prior art, as best understood by the examiner, are presented below. Upon submission of an amended set of claims remedying the issues presented above the examiner will be able to perform an updated search and consideration of the claimed invention.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1 rejected under 35 U.S.C. 103 as being unpatentable over Lester et al. (The Power of Scale for Parameter-Efficient Prompt Tuning), hereinafter Lester, in view of Liang et al. (Super Tickets in Pre-Trained Language Models), hereinafter Liang.
Lester was cited in applicant’s IDS dated 4/11/2024.
Regarding claim 1, Lester teaches:
obtaining a first one of prompt vectors and a first vector corresponding to sample data; (Lester, page 3, column 1, paragraph 2: “Our soft-prompts are represented as a parameter Pe 2 Rp_e, where p is the length of the prompt.”)
obtaining a first score corresponding to the first one of the prompt vectors by fusing the first vector and the first one of the prompt vectors and inputting the fused first vector and first one of the prompt vectors into the N pruned models respectively; (Lester, page 3, column 1, paragraphs 1-2: “Our new conditional generation is now Pr_;_P (Y j[P;X]) and can be trained by maximizing the likelihood of Y via backpropagation, while only applying gradient updates to _P .Our soft-prompts are represented as a parameter Pe 2 Rp_e, where p is the length of the prompt. Our prompt is then concatenated to the embedded input forming a single matrix [Pe;Xe] 2 R(p+n)_e which then flows though the encoder-decoder as normal.” And page 4, column 1, paragraph 2: “Our frozen models are built on top of pre-trained T5 checkpoints of all sizes (Small, Base, Large, XL, XXL).”)
determining a second one of the prompt vectors by modifying, based on the first score, the first one of the prompt vectors; and (Lester, page 3, column 1, paragraph 2: “Our models are trained to maximize the probability of Y , but only the prompt parameters Pe are updated.”)
based on the second one of the prompt vectors, returning to obtaining the first score until determining a target prompt vector corresponding to the sample data. (Lester, page 3, column 1, paragraph 1: “Finding an optimal prompt thus requires the selection of prompt tokens, through either manual search or non-differentiable search methods (Jiang et al., 2020; Shin et al., 2020). Prompt tuning removes the restriction that the prompt P be parameterized by _; instead the prompt has its own dedicated parameters, _P , that can be updated.”)
Lester does not explicitly teach:
obtaining N pruned models by N different pruning processing on the pre-trained model, where N is any integer greater than 1;
However, Liang teaches:
obtaining N pruned models by N different pruning processing on the pre-trained model, where N is any integer greater than 1; (Liang, page 1, column 1, paragraph 2: “The Lottery Ticket Hypothesis (LTH, Frankle and Carbin (2018)) suggests that an over-parameterized network consists of “lottery tickets”, and training a certain collection of them (i.e., a subnetwork) can 1) match the performance of the full model; and 2) outperform randomly sampled subnetworks of the same size (i.e., “random tickets”).”)
Liang is considered analogous to the claimed invention as it is in the same field of endeavor, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have modified Lester, which already teaches the method of determining a prompt vector but does not explicitly teach obtaining N pruned models by N different pruning processing on the pre-trained model, to include the teachings of Liang which does teach obtaining N pruned models by N different pruning processing on the pre-trained model since “generalization performance of the winning tickets selected at appropriate compression ratios can not only match, but also exceed that of the full model.” (Liang, page 1, column 2, paragraph 3)
Regarding claim 8, Lester and Liang teach the method of claim 1, as cited above.
Lester does not explicitly teach:
determining a number m of neurons to be pruned, where m is any positive integer; and
obtaining the N pruned models by the N different pruning processing on the pre-trained model based on the number m of neurons to be pruned, wherein at least one neuron between every two pruned models is different.
However, Liang further teaches:
determining a number m of neurons to be pruned, where m is any positive integer; and obtaining the N pruned models by the N different pruning processing on the pre-trained model based on the number m of neurons to be pruned, wherein at least one neuron between every two pruned models is different. (Liang, page 5, column 1, paragraph 2: “Specifically, we prune BERT-base/large in unit of 10% heads and 10% feed-forward layers (FFN) at 8 different sparsity levels (10% heads and 10% FFN, 20% heads and 20%FFN, etc).”)
Regarding claim 9, claim 9 has all the same limitations of claim 1 which are taught by Lester and Liang – see claim 1 above.
Lester does not explicitly mention a processor or memory; however, Liang further teaches:
a processor; and a memory communicatively coupled to the processor; (Liang, page 13, column 1, paragraph 2: “All experiments are conducted on Nvidia V100 GPUs.”)
Regarding claim 16, Lester and Liang teach the device of claim 9, as cited above.
Claim 16 additionally has the same limitations of claim 8 which are taught by Lester and Liang – see claim 8 above.
Regarding claim 17, claim 17 has all the same limitations of claim 1 which are taught by Lester and Liang – see claim 1 above.
Lester does not explicitly teach a storage medium; however, Liang further teaches:
A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method for determining a prompt vector of a pre-trained model (Liang, page 13, column 1, paragraph 2: “All experiments are conducted on Nvidia V100 GPUs.”)
Regarding claims 2-5, 10-13, and 18-20, based off the cited 35 USC §112 issues cited above, examiner generally understands these claims as teaching finding and modifying similar prompt vectors. Lester appears to teach this on page 9, section 7 when discussing neighboring prompts.
Claims 6, 7, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Lester in view of Liang in view of Ma et al. (XPrompt: Exploring the Extreme of Prompt Tuning), hereinafter Ma.
Regarding claim 6, Lester and Liang teach the method of claim 1, as cited above.
Lester further teaches:
obtaining a predictive tag output by each of the pruned models by fusing the first vector and the first one of the prompt vectors and inputting the fused first vector and first one of the prompt vectors into the N pruned models respectively; (Lester, page 3, column 1, paragraphs 1-2: “Our new conditional generation is now Pr_;_P (Y j[P;X]) and can be trained by maximizing the likelihood of Y via backpropagation, while only applying gradient updates to _P .Our soft-prompts are represented as a parameter Pe 2 Rp_e, where p is the length of the prompt. Our prompt is then concatenated to the embedded input forming a single matrix [Pe;Xe] 2 R(p+n)_e which then flows though the encoder-decoder as normal.” And page 4, column 1, paragraph 2: “Our frozen models are built on top of pre-trained T5 checkpoints of all sizes (Small, Base, Large, XL, XXL).”)
Lester does not explicitly teach:
determining a second score corresponding to the first one of the prompt vectors under each of the pruned models based on a difference between each predictive tag and a tagging tag; and performing mean value processing on a plurality of second scores to determine the first score corresponding to the first one of the prompt vectors.
However, Ma teaches:
determining a second score corresponding to the first one of the prompt vectors under each of the pruned models based on a difference between each predictive tag and a tagging tag; and performing mean value processing on a plurality of second scores to determine the first score corresponding to the first one of the prompt vectors. (Ma, page 11036, column 2, paragraph 1: “We then calculate the importance score (Michel et al., 2019) of each token to distinguish the negative prompt tokens from the other ones. The importance score is defined as the expected sensitivity of the model outputs to the mask variables. Formally, the importance score Ipi of each soft prompt token pi is calculated as: Ipi = Ex∼Dx | ∂L(x) ∂γi | (3) where L is the loss function and Dx is the training data distribution.” – where the loss function is analogous to the second score and the importance score is analogous to the first score taught by Lester.)
Ma is considered analogous to the claimed invention as it is in the same field of endeavor, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have modified Lester and Liang, which already teaches a method for determining a prompt vector of a pre-trained model but does not explicitly teach a second score based on a difference between the predictive tag and tagging tag by performing mean value processing, to include the teachings of Ma which does teach a second score based on a difference between the predictive tag and tagging tag by performing mean value processing in order to yield "a more parameter-efficient prompt yet with
a competitive performance." (Ma, abstract)
Regarding claim 14, Lester and Liang teach the device of claim 9, as cited above.
Claim 14 additionally has the same limitations of claim 6 which are taught by Lester, Liang, and Ma – see claim 8 above.
Regarding claims 7 and 15, based off the cited 35 USC §112 issues cited above, examiner generally understands that a set of candidate prompt vectors are first determined before being input into the models to obtain a predicted tag to compare to a tagging tag in order to determine the target prompt vector. Ma and Lester appear to teach this as Ma discusses a collection of prompt tokens on page 11034, column 1, Lester teaches obtaining a predictive tag in the manner described in claim 6 above, with Ma then appearing to teach the determining the a first score and determining a candidate prompt on page 11036, section 4.2.1.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lester et al. (US20230325725)
Chen et al. (The Lottery Ticket Hypothesis for Pre-trained BERT Networks)
Tan et al. (End-to-End Supermask Pruning: Learning to Prune Image Captioning Models)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACQUELINE MEYER whose telephone number is (703)756-5676. The examiner can normally be reached M-F 8:00 am - 4:30 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.C.M./Examiner, Art Unit 2144
/TAMARA T KYLE/Supervisory Patent Examiner, Art Unit 2144