Last updated: April 18, 2026

Application No. 18/372,900

Scalable Feature Selection Via Sparse Learnable Masks

Non-Final OA §103

Filed

Sep 26, 2023

Examiner

VAUGHAN, MICHAEL R

Art Unit

2431

Tech Center

2400 — Computer Networks

Assignee

Google LLC

OA Round

1 (Non-Final)

Interview Optional

— +31.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 799 resolved cases, 2023–2026

Examiner Intelligence

VAUGHAN, MICHAEL R View full profile →

Grants 78% — above average

Career Allow Rate

626 granted / 799 resolved

+20.3% vs TC avg

Strong +31% interview lift

Without

With

+31.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

23 currently pending

Career history

822

Total Applications

across all art units

Statute-Specific Performance

§101

16.3%

-23.7% vs TC avg

§103

35.5%

-4.5% vs TC avg

§102

23.2%

-16.8% vs TC avg

§112

19.2%

-20.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 799 resolved cases

Office Action

§103

Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. DETAILED ACTION The instant application having Application No. 18/372,900 is presented for examination by the examiner. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim s 1 , 5-7, 11-16, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over USP Application Publication 2021/0034977 to Arik et al., hereinafter Arik in view of USP Application Publication 2017/0061328 to Majumdar et al., hereinafter Majumdar and in view of NPL entitled “ Learning to Explain: An Information-Theoretic Perspective on Model Interpretation ” published in 2018 by Chen et al., hereinafter Chen . As per claims 1, 13, and 20, Arik teaches a method for training a machine learning model with scalable feature selection, comprising: receiving, by one or more processors, a plurality of features for training the machine learning model [received a set of features 206; 0030] ; initializing, by the one or more processors, a learnable mask vector representing the plurality of features [ the feature mask 214 is a learnable mask for soft selection of salient features 206. In some examples, the feature mask 214 uses sparse selection (referred to as a sparse mask) to select the most salient features 206 (e.g., shown as relevant features 206R ; 0034] ; generating, by the one or more processors, a sparse mask vector from the learnable mask vector [ the feature mask 214 uses sparse selection; t o obtain a sparse mask 214, the attentive transformer 212 may use sparsemax normalization ; 0034 and 0035] selecting, by the one or more processors, a selected set of features of the plurality of features based on the sparse mask vector and the number of features to be selected [ selects the salient features 206 from the plurality of features 206 that corresponds to the desired decision output to form a subset (0033) and sparsemax generates non-zero probabilities for only the relevant features 206R of the subset 216 ; ( 0035 ) ] ; and updating, by the one or more processors, the learnable mask vector based on the mutual information based error [ the attentive transformer 212 obtains the trainable mask by using a trainable function and sparsity realization may then be combined with the overall loss ; 0035 ] . Arik is silent in explicitly teaching ( i ) receiving, by the one or more processors, a number of features to be selected; the selected features are based on the number of features to be selected and (ii) computing, by the one or more processors, a mutual information based error based on the selected set of features being input into the machine learning model; and updating the learnable mask vector based on the mutual information based error. In regards to ( i ), Arik already teaches f eature selection generally refers to a process of selecting a subset of features from a larger pool of features (0020). Majumdar teaches receiving a number of features to be selected ( 0061). Majumdar teaches the top K number of data elements having the highest value may be kept. Arik teaches the feature selection should choose the features most useful. Majumdar is essentially selecting the top K most useful features. Thus, the feature selection of Arik could have chosen a given number of the most useful features with predictable results. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. In regards to (ii), Chen teaches , a mutual information based error based on the selected set of features being input into the machine learning model; and updating the learnable mask vector based on the mutual information based error [ maximize the mutual information between the selected subset of features and the response variable with respect to the instancewise feature selector , §1 and we aim to maximize the mutual information between the response variable from the model and the selected features, as a function of the choice of selection rule , §2] . Chen teaches computing an information-theoretic error by evaluating the expected predicted penalty { log 1 Pm(Y | xS ) } for the selected feature subset where minimizing that quality corresponds to maximizing I( Xs ; Y). The error is the expected penalty for how poorly the selected features subset predicts the output. Arik already teaches selecting sparse features subsets with sparse feature masks that improve efficiency. Chen teaches a selection criteria based on how informative the chosen subset is about the output. Implementing Chen’s selection criteria into the feature selection of Arik would improve the quality. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. As per claim 5, Arik teaches training the machine learning model further comprises gradient-descent based learning (0021) . As per claim s 6 and 15 , Arik removing non-selected features of the plurality of features [set to zero; 0035] . As per claim s 7 and 16 , Arik teaches applying a sparsemax normalization to the learnable mask vector (0035) . As per claim s 11 and 19 , the combined system of Arik and Chen teaches computing the mutual information based error is based on maximizing mutual information between a distribution of the selected set of features and a distribution of labels for the selected set of features [Chen: §2.2] . As per claim 12, the combined system of Arik and Chen teaches updating the learnable mask vector is based on minimizing the mutual information based error [Chen is minimizing the information based error by maximizing MI; §2.2] . Claim(s) 2-4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Arik, Majumdar, and Chen as applied to claim s 1 and 13 above, and further in view of USP Application Publication 2023/0351190 to Mishra et al., hereinafter Mishra . As per claim 2, Arik, Majumdar, and Chen are silent in explicitly teaching receiving a total number of training steps. On the other hand Mishra teaches receiving, by the one or more processors, a total number of training steps (0081) . Like in the combination Arik, Majumdar, and Chen where the number of features to be selected is received, Mishra teaches the number of steps can be specified. This gives greater control over the training because Mishra says that the number of training steps is dependent on model performance. The claim is obvious because one of ordinary skill in the art can combine methods known before the effective filing date which produce predictable results. As per claim 3, the combined system of Arik, Majumdar, Chen , and Mishra teaches the receiving, generating, selecting, computing, and updating [as taught by Arik, Majumdar and Chen, above] is iterative for the total number of training steps [Chen: each training in specified steps that are iteratively, via index number, progress to the next step; 0081] . As per claim 4, Arik teaches the learnable mask vector updated after the total number of training steps comprises a final selected set of features to be utilized by the machine learning model (0035) . As per claim 14, it is rejected for the same reasons as claims 3 and 4. Allowable Subject Matter Claims 8-10, 17, and 18 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and all intervening claims. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure is listed on the enclosed PTO-892 form. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL R. VAUGHAN whose telephone number is (571)270-7316. The examiner can normally be reached on Monday - Friday, 9:30am - 5:30pm, EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Lynn Feild can be reached on (571) 272-2092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /MICHAEL R VAUGHAN/ Primary Examiner, Art Unit 2431

Read full office action

Prosecution Timeline

Sep 26, 2023

Application Filed

Mar 31, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/553,704

Patent 12598464

POLICIES RELATED TO NON-PUBLIC NETWORKS

2y 5m to grant Granted Apr 07, 2026

18/175,218

Patent 12580933

CORRELATING FIREWALL AND ZERO TRUST DATA TO MONITOR REMOTE AND HYBRID WORKER SESSIONS

2y 5m to grant Granted Mar 17, 2026

18/173,826

Patent 12561488

SYSTEMS AND METHODS FOR CONTEXTUAL ACTIVATION OF ONLOOKER DETECTION

2y 5m to grant Granted Feb 24, 2026

18/543,244

Patent 12563100

RESOURCE-MONITORING TELEMETRY IN A ZERO-TRUST COMPUTING ENVIRONMENT

2y 5m to grant Granted Feb 24, 2026

18/309,273

Patent 12556587

SYSTEM AND METHOD FOR MANAGING SECURITY MODELS THROUGH SCENARIO GENERATION AND EVALUATION

2y 5m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

78%

Grant Probability

99%

With Interview (+31.1%)

3y 0m

Median Time to Grant

Low

PTA Risk

Based on 799 resolved cases by this examiner. Grant probability derived from career allow rate.