Prosecution Insights
Last updated: April 19, 2026
Application No. 18/587,008

GRADIENT CONTROL DEVICE AND GRADIENT CONTROL METHOD OF LANGUAGE MODEL

Final Rejection §102
Filed
Feb 26, 2024
Examiner
PASHA, ATHAR N
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Seoul National University R&Db Foundation
OA Round
2 (Final)
90%
Grant Probability
Favorable
3-4
OA Rounds
2y 8m
To Grant
99%
With Interview

Examiner Intelligence

Grants 90% — above average
90%
Career Allow Rate
138 granted / 154 resolved
+27.6% vs TC avg
Strong +17% interview lift
Without
With
+17.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
18 currently pending
Career history
172
Total Applications
across all art units

Statute-Specific Performance

§101
21.9%
-18.1% vs TC avg
§103
49.4%
+9.4% vs TC avg
§102
16.9%
-23.1% vs TC avg
§112
5.2%
-34.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 154 resolved cases

Office Action

§102
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant’s arguments filed on 12/23/25 have been received. In light of the amendments, the examiner removes the Claim Objections. In light of the amendments, the examiner removes the 112 rejections. Regarding 102 rejections, the applicant states beginning on page 7 “The Office Action relies on Formula (8) of… Thus, the ratio ak/ar_bar is always less than 1…increase the degree of pushing relative to an original degree before the scale of the second gradient part is reduced.” The gradient is prevented from dropping below a reference value. This is prescribed in equation 8 where ak >ar_bar, then g2k will never drop below the reference value of “1” which is the second argument of the min function. The min function only starts to kick in when vk ∈ Vr, otherwise it is constrained to 1 as per the second line of eq. 8, and thus increases the degree of the pushing relative to an original degree, which the examiner maps to the first part of equation 4 in the reference. In light of this, the examiner respectfully disagrees with the applicant and maintains the rejection. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-3,5-6,9-11 and 13 are rejected under 35 U.S.C. 102 (a)(1) as being anticipated by the Yu (Yu, Sangwon, et al. "Rare tokens degenerate all tokens: Improving neural text generation via adaptive gradient gating for rare token embeddings." arXiv preprint arXiv:2109.03127v2 16 Mar (2022).) With respect to claim 1 Li teaches (claim 1) A gradient control device of a language model, the gradient control device comprising: one or more processors (¶p5Sec4.2 para1: This can be easily implemented by detach() function of Pytorch…All model and training configurations are the same as in the previous section); and memory storing instructions that, when executed by the one or more processors, cause the gradient control device to (¶p5Sec4.2 para1: This can be easily implemented by detach() function of Pytorch…All model and training configurations are the same as in the previous section): (claim 9) A method of controlling a gradient of a language model, the method comprising: calculate a number of occurrences of each token, of a plurality of tokens, in batch data at each training step of a plurality of training steps ranging from a current training step to a set previous training step (¶p5Sec 4.1para2: Therefore, it is necessary to dynamically group rare tokens based on token appearances in recent batch samples.); group rare tokens based on a comparison of the calculated number of occurrences of each token, of the plurality of tokens, with a threshold value (¶p13Sec D ll1-6 In this sections we show how the metrics used on language modeling task change with the hyperparameter in Figure 5. We observed an interesting phenomenon about the non-rare token group when rare token group size increases over a specific threshold ) wherein the rare tokens are grouped by grouping first rare tokens and second rare tokens according to degrees of rarity (¶ p6Col1para1: We define the two rarity levels based on the average number of appearances of the entire rare tokens: if the token appearance ak is smaller than the mean of ar where r 2 Vr, corresponding token is a very rare token); calculate a gate tensor on embedding vectors of the grouped rare tokens (¶p5Sec4.2 ll1-6 With T context feature vectors hi (i 2 [1; T]) from the training sample, the negative log-likelihood loss gradient for the rare token embedding wr is calculated as follows, ¶p5Sec4.2 ll1-6 where xgated is a new parameter whose value is the same as x, and g ∈ [0, 1] is a gate tensor. When the xgated is fed to the function f(·) as input, the gradient for x is gated by g… where g1k denotes a k-th component of g1. g1 controls the degree to which rare token embeddings move away from non-rare feature vectors whose targets differ from each rare token embedding. Also, each component of g1 is calculated based on the rarity of each rare token, ak, so gradient gating for part (b) of Eq. 4 is adaptive for each rare tokens.); scale a gradient part that pushes the embedding vectors of the grouped rare tokens away from feature vectors having relatively non-rare target tokens and feature vectors having relatively rare target tokens, among gradients of a loss function for the embedding vectors of the grouped rare tokens in a training step (p5SeWHc4.2para2: As we described in section 3, part (b) of Eq. 4 should mainly be handled to solve the degeneration problem. To address part (b) of Eq. 4, given a context feature vector of the i-th position hi, we introduce a gate vector g1 ∈ RN as follows ( ¶ ). PNG media_image1.png 83 357 media_image1.png Greyscale where g1k denotes a k-th component of g1. g1 controls the degree to which rare token embeddings move away from non-rare feature vectors whose targets differ from each rare token embedding. Also, each component of g1 is calculated based on the rarity of each rare token, ak, so gradient gating for part (b) of Eq. 4 is adaptive for each rare tokens. calculate, using a second gate tensor application, a second gate tensor on a second gradient part, wherein the second gradient part is configured to push the embedding vectors of the second rare tokens away from feature vectors having the rare target tokens, with a smaller number of occurrences than the non-rare target tokens, when applied to training (¶p4col2paraLast: Part (c) [second gradient part] pushes away wr from the feature vectors whose target tokens are rare.); and control a degree of the pushing p5Sec4.2para1To solely control the gradient for rare token embeddings, we introduce a gradient gating method for a parameter x), wherein the second gate tensor application is configured to keep a scale of the second gradient part from dropping below a reference value by calculating the second gate tensor on the second gradient part, and configured to increase the degree of the pushing relative to an original degree before the scale of the second gradient part is reduced (¶p6Col1para1 For the very rare token embeddings, part (c) of the gradient about embeddings pushes them away from the feature vectors whose targets are less rare tokens that are relatively frequent compared to them. This means that part (c) roles like part (b) in the above situation, which becomes the cause of the degeneration problem. Therefore, we need to handle part (c) of Eq. 4 for very rare tokens. To address part (c) of Eq. 4 for the very rare token embeddings, we introduce another gate vector g2 PNG media_image2.png 57 287 media_image2.png Greyscale where g2k is the k-th component of g2 and ar is the mean of ar where r 2 Vr. g2 controls the degree to which very rare token embeddings move away from less rare feature vectors whose targets differ from each very rare token embedding.) Examiner Note: the min function prevents the value from dropping below 1, and the ratio ak/ar maps to pushing relative to an original degree. ) Examiner Note: The gradient is prevented from dropping below a reference value. This is prescribed in equation 8 where ak >ar_bar, then g2k will never drop below the reference value of “1” which is the second argument of the min function. The min function only starts to kick in when vk ∈ Vr, otherwise it is constrained to 1 as per the second row of eq. 8, and thus increases the degree of the pushing relative to an original degree, which the examiner maps to the first part of equation 4 in the reference. With respect to claims 2 and 10 Yu teaches wherein the memory further stores the calculated number of occurrences of each token of the plurality of tokens ( ¶ Sec4.1para2: To consider the token appearances in recent batch samples, we introduce the token counter memory that remembers the number of the appearances of each token during the previous K training ). With respect to claims 3 and 11 Yu teaches calculate an average number of occurrences of each token of the plurality of tokens by summing all numbers, stored in the memory, of occurrences of each token of the plurality of tokens, and wherein the instructions, when executed by the one or more processors, cause the gradient control device to group the rare tokens by determining one or more tokens, having an average number of occurrences less than the threshold value, to be the rare tokens (¶ p6Col1para1: We define the two rarity levels based on the average number of appearances of the entire rare tokens: if the token appearance ak is smaller than the mean of ar where r 2 Vr, corresponding token is a very rare token.). With respect to claims 5 and 13 Yu teaches calculate, using a first gate tensor application, a first gate tensor on a first gradient part, wherein the first gradient part is configured to push the embedding vectors of the grouped rare tokens away from the feature vectors having the non-rare target tokens when applied to training, among the gradients of the loss function ( p5Col1paraLast: the degeneration problem could be solved to a large extent by mainly addressing the part of the gradient for rare embeddings that pushes away rare token embeddings from non-rare feature vectors, ¶p4Sec3.3para1: With T context feature vectors hi (i 2 [1; T]) from the training sample, the negative log-likelihood loss gradient for the rare token embedding wr is calculated as follows … We divide the gradient for wr to 3 parts in Eq. 4. Part (a) pulls wr close to the feature vectors whose target tokens are vr. Part (b) pushes away wr from the feature vectors whose target tokens are not rare. Part (c) pushes away wr from the feature vectors whose target tokens are rare. As an extension of the analysis in the previous subsection, we freeze these parts of the gradient with various settings during training to identify the key cause of the degeneration problem. In other words, depending on the settings, the specific gradient parts that will not be used for embedding training is detached from the computation graph during training stage, ¶p5Sec4.2para1: where xgated is a new parameter whose value is the gated same as x, and g is a gate tensor. When the xgated is fed to the function f(.) as input, the f ( ) gated gradient for x is gated by g.); With respect to claims 6 Yu teaches reduce, using the first gate tensor application, a scale of the first gradient part according to a reference value by calculating the first gate tensor on the first gradient part (¶p5Sec4.2para1: where xgated is a new parameter whose value is the gated same as x, and g is a gate tensor. When the xgated is fed to the function f(.) as input, the f ( ) gated gradient for x is gated by g, ¶ p5Sec4.2para2 To address part (b) of Eq. 4, given a context feature vector of the i-th position hi, we introduce a gate vector g1 2 RN as follows. g1k = ( ak=K if vk 2 Vr; vk 6= yi 1 else ; (7) where g1k denotes a k-th component of g1. g1 controls the degree to which rare token embeddings move away from non-rare feature vectors whose targets differ from each rare token embedding. Also, each component of g1 is calculated based on the rarity of each rare token, ak, so gradient gating for part (b) of Eq. 4 is adaptive for each rare tokens ); and reduce the degree of the pushing ( p5Sec4.2para1¶ where g1k denotes a k-th component of g1. g1 controls the degree to which rare token embeddings move away from non-rare feature vectors whose targets differ from each rare token embedding).; Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675. The examiner can normally be reached Monday-Thursday Alternate Fridays, 7:30-4:30 PT. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ATHAR N PASHA/Examiner, Art Unit 2657 Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675. The examiner can normally be reached Monday-Thursday Alternate Fridays, 7:30-4:30 PT. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ATHAR N PASHA/Primary Examiner, Art Unit 2657
Read full office action

Prosecution Timeline

Feb 26, 2024
Application Filed
Sep 18, 2025
Non-Final Rejection — §102
Dec 23, 2025
Response Filed
Jan 16, 2026
Final Rejection — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12596882
COMPLIANCE DETECTION USING NATURAL LANGUAGE PROCESSING
2y 5m to grant Granted Apr 07, 2026
Patent 12586563
Method, System and Apparatus for Understanding and Generating Human Conversational Cues
2y 5m to grant Granted Mar 24, 2026
Patent 12579173
SYSTEMS AND METHODS FOR DYNAMICALLY PROVIDING INTELLIGENT RESPONSES
2y 5m to grant Granted Mar 17, 2026
Patent 12566921
GAZETTEER INTEGRATION FOR NEURAL NAMED ENTITY RECOGNITION
2y 5m to grant Granted Mar 03, 2026
Patent 12547844
INTELLIGENT MODEL SELECTION SYSTEM FOR STYLE-SPECIFIC DIGITAL CONTENT GENERATION
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
90%
Grant Probability
99%
With Interview (+17.0%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 154 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month