DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner Note
Examiner strongly suggest to conduct and interview to help possibly expedite prosecution or advance prosecution by discussing the limitations below. Examiner thanks the applicant for the patience and cooperation and hopes to further discuss the instant application as soon as possible.
Response to Amendment
The amendment filed on 02/24/2026 has been entered. Claims 1, 5-7, 10 and 12-17 have been amended. No Claims have been canceled. Claims 1-20 remain pending in the application.
Response to Arguments
Regarding Applicant’s arguments, on page 8-13 of the remark filed on 02/24/2026, on the newly amended limitations of independent claim 1: “training the second model, which further compresses the student network compressed from the teacher network by replacing at least one of the second plurality of transformer layers with the reduced-parameter adapter, to perform a phishing detection task by fine-tuning the second model with a labelled target dataset specific to phishing detection based on text of a body of an email communication selected to provoke human responses; and provisioning the second model in an enterprise network to perform the phishing detection task on extracted text of email bodies for emails associated with the enterprise network.”, arguments are not persuasive.
Applicant argues on Page 8 that ““Perumalla is directed to "Techniques for machine learning (ML) model adaptation via segment replacement and student-teacher training." According to Perumalla, "when there are one or more characteristics of the source ML model 120A that are not supported, at circle (4) the model optimizer 108 can determine substitutions that it can use to 'adapt' the source ML model 120A into an adapted ML model 122 that can be executed by the edge device(s)."2 As described by Perumalla, "The model optimizer 108 may then construct an adapted ML model 122 to be effectively the same as the source ML model 120A but with problematic source segments (corresponding to the source characteristics) replaced with corresponding replacement segments (corresponding to the replacement characteristics)."3 In this context, Perumalla discloses that student-teacher training procedures, as is known to those of skill in the art, are typically used to "compress" a large machine learning model into a smaller one. However, embodiments disclosed herein can adapt the technique to instead be used to train an adapted model-which may be of similar size/complexity, may already have various model weights set, etc.- to effectively 'smooth' out any differences in the two models, to thereby cause the adapted model to perform as similar as possible to the teacher model.”. Examiner states on the record that the remarks are a subjective and brief summary of Perumalla as it pertains to applicant’s claims.
Applicant argues on Page 9 “However "construct[ing] an adapted ML model 122 to be effectively the same as the source ML model 120A but with problematic source segments [replaced]," as described by Perumalla is different than "training the second model, which further compresses the student network compressed from the teacher network by replacing at least one of the second plurality of transformer layers with the reduced-parameter adapter, to perform a phishing detection task by fine-tuning the second model with a labelled target dataset specific to phishing detection based on text of a body of an email communication selected to provoke human response" as recited in claim 1”. Applicant’s interpretation of the reference has been noted; however, examiner respectfully disagrees. Examiner states Kutt was relied upon to meet applicant’s argued limitation(s) of, “training the second model, which further compresses the first model by replacing at least one of the plurality of transformer layers with a reduced-parameter adapter, to perform the classification task including phishing detection and malware detection in a natural language classification context based on text of human-readable messages selected to provoke human responses”. Kutt supports the above limitation on (Par. (0100), (Par. (0096), (Par. (0099), (Par. (0120) , (Par. (0125))
Applicant argues on Page 9 “"[Constructing] an adapted ML model to provoke human response" as recited in claim 1. "[Constructing] an adapted ML model 122 to be effectively the same as the source ML model 120A but with problematic source segments [replaced]," as described by Perumalla is also different than "training the second model, which further compresses the first model by replacing at least one of the plurality of trained layers with the reduced-parameter adapter, to perform a phishing detection task in a natural language classification context based on a body and text of an email communication to provide a trained second model" as recited in claim 10.” Applicant’s interpretation of the reference has been noted; however, examiner respectfully disagrees. Examiner states Kutt was relied upon to meet applicant’s argued limitation(s) of, “training the second model, which further compresses the first model by replacing at least one of the plurality of transformer layers with a reduced-parameter adapter, to perform the classification task including phishing detection and malware detection in a natural language classification context based on text of human-readable messages selected to provoke human responses”. Kutt supports the above limitation on (Par. (0100), (Par. (0096), (Par. (0099), (Par. (0120) , (Par. (0125))
Applicant argues on Page 9 “[Constructing] an adapted ML model 122 to be effectively the same as the source ML model 120A but with problematic source segments [replaced]," as described by Perumalla is also different than "training the second model, which further compresses the first model by replacing at least one of the plurality of transformer layers with a reduced- parameter adapter, to perform the classification task including phishing detection and malware detection in a natural language classification context based on text of human- readable messages selected to provoke human responses" as recited in claim 17”. Applicant’s interpretation of the reference has been noted; however, examiner respectfully disagrees. Examiner states Kutt was relied upon to meet applicant’s argued limitation(s) of, “training the second model, which further compresses the first model by replacing at least one of the plurality of transformer layers with a reduced-parameter adapter, to perform the classification task including phishing detection and malware detection in a natural language classification context based on text of human-readable messages selected to provoke human responses”. Kutt supports the above limitation on (Par. (0100), (Par. (0096), (Par. (0099), (Par. (0120) , (Par. (0125))
Applicant argues on Page 10 “Kutt does not fill the void left by Perumalla. Kutt is directed to "Techniques for building multi-representational learning models for static analysis of source."5 In an embodiment, Kutt describes a "full model with fully connected layers for performing multi-representational learning applied to malware classification in accordance with some embodiments. In an example implementation, this model architecture can be trained end- to-end on a single training/validation split."6 In this context, as described by Kutt: When we substitute the final fully connected layers for an ensemble algorithm, we split our data in half to train our feature extraction and classification code separately to avoid biasing the ensemble. As such, in this example implementation, one half of the training data is used to train all of the CNN feature extractors independently with a single linear decision boundary layer at the end. In this way, each CNN is tasked with discovering linearly separating features independent of everything else”. Examiner states applicants arguments are not persuasive as the above remarks appear to be subjective understanding of Kutt. Consider noting on the record that the remarks are subjective and a brief summary of Kutt as it pertains to applicant’s claims.
Applicant argues on Page 10 “However, splitting "data in half' so that "each CNN is tasked with discovering linearly separating features independent of everything else," as described by Kutt is different than "training the second model, which further compresses the student network compressed from the teacher network by replacing at least one of the second plurality of transformer layers with the reduced-parameter adapter, to perform a phishing detection task by fine- tuning the second model with a labelled target dataset specific to phishing detection based on text of a body of an email communication selected to provoke human response" as recited in claim 1.” Applicant’s interpretation of the reference has been noted; however, examiner respectfully disagrees. Kutt teaches on Par. (0100) a training of models in a model architecture by replacing or substituting connected layers that are reduced in parameter by being split in half and with small batch sizes. Kutt further teaches on Par. (0047-0050) that a phishing detection or checking of malware within the body of the email is performed by identifying a website link or download within the email along with an email attachment. Kutt discloses on Par. (0050) a performed action is conducted based on the malicious email being identified. Kutt supports the above limitation on (Par. (0100), (Par. (0096), (Par. (0099), (Par. (0120) , (Par. (0125)).
Applicant further argues on Page 11 “Wang does not fill the void left by Perumalla and Kutt. Wang is directed to "Selecting annotations for training images using a neural network."8 In this context, Wang discloses: In at least one embodiment, model training 3514 may include retraining or updating an initial model 3904 (e.g., a pre-trained model) using new training data (e.g., new input data, such as customer dataset 3906, and/or new ground truth data associated with input data). In at least one embodiment, to retrain, or update, initial model 3904, output or loss layer(s) of initial model 3904 may be reset, or deleted, and/or replaced with an updated or new output or loss layer(s).” Applicant’s interpretation of the reference has been noted; however, examiner respectfully disagrees. Examiner states the above remarks appear to be subjective understanding of Wang. Consider noting on the record that the remarks are subjective and a brief summary of Wang as it pertains to applicant’s claims.
Applicant further argues on Page 11 “However, "retraining or updating an initial model 3904 (e.g., a pre-trained model) using new training data" in which "output or loss layer(s) of initial model 3904 may be reset, or deleted, and/or replaced with an updated or new output or loss layer" as described by Wang is different than "training the second model, which further compresses the student network compressed from the teacher network by replacing at least one of the second plurality of transformer layers with the reduced-parameter adapter, to perform a phishing detection task by fine-tuning the second model with a labelled target dataset specific to phishing detection based on text of a body of an email communication selected to provoke human response" as recited in claim 1.” Applicant’s interpretation of the reference has been noted; however, examiner respectfully disagrees. Examiner states that claim 1 was rejected Perumalla in view of Kutt. Considering noting this on the record. Claim 14 notes Wang, but Wang is used by the examiner to meet any of applicant’s Claim 14 limitation(s).
Applicant further argues on Page 11 “[R]etraining or updating an initial model 3904 (e.g., a pre-trained model) using new training data" in which "output or loss layer(s) of initial model 3904 may be reset, or deleted, and/or replaced with an updated or new output or loss layer" as described by Wang is also different than "training the second model, which further compresses the first model by replacing at least one of the plurality of trained layers with the reduced-parameter adapter, to perform a phishing detection task in a natural language classification context based on a body and text of an email communication to provide a trained second model" as recited in claim 10.” Applicant’s interpretation of the reference has been noted; however, examiner respectfully disagrees. Examiner states that Wang was not cited for any limitation(s) in claim 10. Consider noting on the record that Kutt was relied upon to meet applicant’s argued limitation(s) of, “training the second model, which further compresses the first model by replacing at least one of the plurality of transformer layers with a reduced-parameter adapter, to perform the classification task including phishing detection and malware detection in a natural language classification context based on text of human-readable messages selected to provoke human responses”. Furthermore Examiner notes on the record Wang et al. was used to meet the following limitation in claim 17: ‘adding an untrained classifier”. Cited on (Par. (0123) and that that Kutt was relied upon to meet applicant’s argued limitation(s) of, “training the second model, which further compresses the first model by replacing at least one of the plurality of transformer layers with a reduced-parameter adapter, to perform the classification task including phishing detection and malware detection in a natural language classification context based on text of human-readable messages selected to provoke human responses”. Therefore, the rejection is maintained.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 4, 7, 10, 11, 13, and 15-16, is/are rejected under 35 U.S.C. 103 as being unpatentable over Perumalla et al. (U.S No. 12353971, hereinafter referred to as “Perumalla”) further in view of Kutt et al. (U.S Pub. No. 20210240826, hereinafter referred to as “Kutt”)
In regards to Claim 1, Perumalla teaches a computer program product comprising computer executable code stored in a non-transitory computer readable medium that, when executing on one or more computing devices, performs steps of: (Figure 9 labels 910A, 920, 900; device with processor and memory)
training a teacher network including a first plurality of transformer layers to perform natural language processing using a large-scale natural language data set; (Col. 2 lines 5-30, Col. 5 lines 15-67 and Col. 6 lines 1-6; teacher student network of models with plurality of ML layers))
generating a second model, the second model further compressing the compressed model by removing at least one of the second plurality of transformer layers from the student network and (Col. 9 lines 30-67; creating second ML model by replacing/substituting first layer corresponding to teacher and student model), (Col. 7 lines 25-67; compressing the large machine learning model associated with teacher and student by replacing layers)), (Col. 11 lines 54-67; deleting layers in ML model)
replacing the at least one of the second plurality of transformer layers of the student network with a reduced-parameter adapter in place of the at least one of the second plurality of transformer layers, the adapter configured for performing natural language processing, (Col. 7 lines 25-67; replacing of teacher student layers to compress ML model), (Col. 9 lines 30-67; replacing of layers of teacher student network by substituting layers), (Figure 2 labels 204, 224; a reduced-parameter adapter (parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL)), (Col. 6 lines 45-67; size of replacement layers with characteristic parameters are smaller))
wherein the adapter includes a parameter size smaller than the at least one of the second plurality of transformer layers replaced by the adapter; (Col. 7 lines 25-67; compressing the ML model to a smaller size corresponding to layers of teacher and student ML model), (Figure 2 labels 204, 224; parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL)), Col. 6 lines 45-67; size of replacement layers with characteristic parameters are smaller))
Perumalla does not explicitly teach training the second model, which further compresses the student network compressed from the teacher network by replacing at least one of the second plurality of transformer layers with the reduced-parameter adapter to perform a phishing detection task by fine-tuning the second model with a labelled target dataset specific to phishing detection based on text of a body of an email communication selected to provoke human responses; and provisioning the second model in an enterprise network to perform the phishing detection task on extracted text of email bodies for emails associated with the enterprise network.
Wherein Kutt teaches training the second model, which further compresses the student network compressed from the teacher network by replacing at least one of the second plurality of transformer layers with the reduced-parameter adapter (Par. (0100); training the second model (training models in model architecture) by replacing at least one of the second plurality of transformer layers with the reduced-parameter adapter (by substituting connected layers that are split in half with small batch sizes)), (Par. (0096); : “These MRL model building operations can be implemented using open source tools, such as SKLearn (e.g., available at https://pypi.org/project/sklearn/). At 628, the above system processing can be performed on fully connected layers (training end-to-end) as shown and/or training an independent ensemble algorithm on top of the learned CNN features can also be performed (e.g., implemented using an open source ensemble tool, such as XGBoost, available at https://github.com/dmlc/xgboost), (Par. (0099); “ compression operation as shown at 804,”.), (Par. (0120); : “generating an MRL model for static classification of source code as malware or benign can be implemented to scale down to a computing environment with less resources than would typically be available in a cloud computing/server environment (e.g., limit number of source code representations therefore reducing the total number of models contained inside the MRL instance, set a hard threshold on length of source code representation sequences sent to CNNs to reduce the total amount of memory used and computation performed, use a higher degree of sequence compression in the CNNs to reduce the total amount of memory used and computation performed”), (Par. (0125); “generated (e.g., and/or periodically updated/replaced) based on training and validation data”)
to perform a phishing detection task by fine-tuning the second model with a labelled target dataset specific to phishing detection based on text of a body of an email communication selected to provoke human responses; and (Par. (0047-0048); a phishing detection task by fine-tuning the second model (detecting malware in email of set of machine learning models) with a labelled target dataset specific to phishing detection based on text of a body of an email (body of email (email attachment) with malware that is detected) an email communication selected to provoke human responses (detected malware in email attachment that leads to clicking attachment website or download to malicious IP address), (Par. (0102); We can see in FIG. 10A that the learned features are able to linearly separate our classes (e.g., the MRL model is specifically tuned to separate malicious and benign classes of source code malware,”), . (Par. (0141) the following: “various MRL models for one or more programming/scripting language can be built using open source or other tools, and as applicable, performing hyperparameter tuning as described above, which can, for example, be tuned for efficiently performing these MRL models for static source code classification to be performed/executed on various computing environments”), (Par. (0085) “system/process/computer program product for multi-representational learning applied to malware classification includes receiving training data, wherein the training data includes a set of source code files for training a multi-representational learning (MRL) model for classifying malicious source code and benign source code based on a static analysis;”), (Par. (0088); “train a supervised statistical model to automatically extract learned features, and perform classification on those features, from three different representations of the JavaScript source code at different levels of abstraction: (1) a stream of characters; (2) a stream of tokens; and (3) an Abstract Syntax Tree (AST). The disclosed statistical model can produce a malicious class score on unseen JavaScript files with unknown class labels. A threshold on the malicious class score is imposed to predict class membership and, equivalently, maliciousness”)
provisioning the second model in an enterprise network to perform the phishing detection task on extracted text of email bodies for emails associated with the enterprise network. (Par. (0047 and 0050); action performed on malicious email with attachment and website link in email))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla to incorporate the teaching of Kutt to utilize the above feature because of the analogous concept of natural language processing, with the motivation of compressing models to save space and processing times for the users. By removing layers the training model can be more effective and efficient and tailor the training and machine learning to more useful areas as well as mitigate risk or vulnerabilities with outdated layers. (Kutt Par. (0002 and 0100))
In regards to Claim 10, Perumalla teaches a method, comprising: training a first model to perform a natural language processing task to form a plurality of trained layers; ((Col. 2 lines 5-30, Col. 5 lines 15-67 and Col. 6 lines 1-6; teacher student network of models with plurality of ML layers))
generating a second model, the second model further compressing the first model by removing at least one of the plurality of trained layers from the first model and ((Col. 9 lines 30-67; creating second ML model by replacing/substituting first layer corresponding to teacher and student model), (Col. 7 lines 25-67; compressing the large machine learning model associated with teacher and student by replacing layers)), (Col. 11 lines 54-67; deleting layers in ML model)
replacing the at least one of the plurality of trained layers in the first model with an reduced-parameter adapter in place of the at least one of the plurality of trained layers, (Col. 7 lines 25-67; replacing of teacher student layers to compress ML model), (Col. 9 lines 30-67; replacing of layers of teacher student network by substituting layers), (Figure 2 labels 204, 224; reduced parameter adapter (parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL))
the reduced-parameter adapter configured for performing natural language processing, wherein the reduced parameter adapter includes a parameter size smaller than the at least one of the plurality of trained layers replaced by the reduced-parameter adapter, and (Col. 7 lines 25-67; compressing the ML model to a smaller size corresponding to layers of teacher and student ML model), (Figure 2 labels 204, 224; parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL)), Col. 6 lines 45-67; size of replacement layers with characteristic parameters are smaller))
a residual connector; (Col. 5 lines 46-65; connected layers between ML models and adapters)
Perumalla does not explicitly teach training the second model, which further compresses the first model by replacing at least one of the plurality of trained layers with the reduced-parameter adapter to perform a phishing detection task in a natural language classification context based on a body and text of an email communication to provide a trained second model; and provisioning the trained second model in a system to perform the phishing detection task on extracted text of email bodies for emails.
Wherein Kutt teaches training the second model, which further compresses the first model by replacing at least one of the plurality of trained layers with the reduced-parameter adapter (Par. (0100); training the second model (training models in model architecture) by replacing at least one of the second plurality of transformer layers with the reduced-parameter adapter (by substituting connected layers that are split in half with small batch sizes))
to perform a phishing detection task in a natural language classification context based on a body and text of an email communication to provide a trained second model; and (Par. (0047-0048); a phishing detection task by fine-tuning the second model (detecting malware in email of set of machine learning models) with a labelled target dataset specific to phishing detection based on text of a body of an email (body of email (email attachment) with malware that is detected) an email communication selected to provoke human responses (detected malware in email attachment that leads to clicking attachment website or download to malicious IP address)
provisioning the trained second model in a system to perform the phishing detection task on extracted text of email bodies for emails. (Par. (0047 and 0050); action performed on malicious email with attachment and website link in email))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla to incorporate the teaching of Kutt to utilize the above feature because of the analogous concept of natural language processing, with the motivation of compressing models to save space and processing times for the users. By removing layers the training model can be more effective and efficient and tailor the training and machine learning to more useful areas as well as mitigate risk or vulnerabilities with outdated layers. (Kutt Par. (0002 and 0100))
In regards to Claim 11, the combination of Perumalla and Kutt teach the method of claim 10, Kutt further teaches further comprising using the trained second model in the system to classify malicious communications. (Par. (0047-0050); determining if email is malicious based in attachment of website or attachment classifying malicious content using ML model))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla to incorporate the teaching of Kutt for the reasons discussed in independent claim 1 stated above.
In regards to Claim 15, the combination of Perumalla and Kutt teach the method of claim 10, Kutt further teaches wherein training the second model to perform the phishing detection task comprises training the second model using labeled email data. ((Par. (0047-0050); determining if email is malicious based in attachment of website or attachment labeled in email))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla to incorporate the teaching of Kutt for the reasons discussed in independent claim 1 stated above.
In regards to Claim 16, the combination of Perumalla and Kutt teach the method of claim 10, Kutt further teaches providing message header features of the email communication to the trained second model including one or more of: (Par. (0047-0050); determining if email is malicious based in attachment of website or attachment))
a first indication of whether a first domain of a sender matches a second domain of a receiver; (Par. (0047-0050); checking domain and IP address of sender recipient of email to have malware using ML model)
a second indication of whether the first domain of the sender matches a reply-to address;
a first number of recipients in a 'To' field; and a second number of recipients in a 'CC' field.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla to incorporate the teaching of Kutt for the reasons discussed in independent claim 1 stated above.
Claim 2, is/are rejected under 35 U.S.C. 103 as being unpatentable over Perumalla et al. (U.S No. 12353971, hereinafter referred to as “Perumalla”) and Kutt et al. (U.S Pub. No. 20210240826, hereinafter referred to as “Kutt”) further in view of Lai et al. (U.S Pub. No. 20210182662, hereinafter referred to as “Lai”)
In regards to Claim 2, the combination of Perumalla and Kutt do not explicitly teach wherein the natural language processing includes next sentence prediction.
Wherein Lai teaches wherein the natural language processing includes next sentence prediction. (Par. (0071-0072) “”; next sentence prediction))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Lai to utilize the above feature because of the analogous concept of machine learning and natural language processing, with the motivation of using next sentence prediction to enhance the machine learning network and achieve wide range of results within the storage capacity and space promoting high efficiency (Lai Par. (0002))
Claims 3 and 14, is/are rejected under 35 U.S.C. 103 as being unpatentable over Perumalla et al. (U.S No. 12353971, hereinafter referred to as “Perumalla”) and Kutt et al. (U.S Pub. No. 20210240826, hereinafter referred to as “Kutt”) further in view of Wagner et al. (U.S Pub. No. 20210365723, hereinafter referred to as “Wagner”)
In regards to Claim 3, the combination of Perumalla and Kutt do not explicitly teach wherein the natural language processing includes masked word prediction.
Wherein Wagner teaches wherein the natural language processing includes masked word prediction. (Par. (0021) ”; masked word prediction (sequence of words in a sentence corresponding to masked position values)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Wagner to utilize the above feature because of the analogous concept of machine learning and natural language processing, with the motivation of masking word prediction to prevent susceptibility to harm or risk in the machine learning system. By masking the words users can be assured no predictions exposures or unnecessary risk can be performed by entities without access to the system and in return maintaining the integrity of the system as a whole. (Wagner Par. (0015-0017))
In regards to Claim 14, the combination of Perumalla and Kutt teach the method of claim 10, Kutt further teaches wherein the phishing detection task includes: extracting words from the body and the text of the email communication; ((Par. (0047-0050); determining if email is malicious based in attachment of website or attachment inside email))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla to incorporate the teaching of Kutt to utilize the above feature because of the analogous concept of natural language processing, with the motivation of compressing models to save space and processing times for the users. By removing layers the training model can be more effective and efficient and tailor the training and machine learning to more useful areas as well as mitigate risk or vulnerabilities with outdated layers. (Kutt Par. (0002 and 0100))
Perumalla and Kutt do not explicitly teach extracting words from a body and text of an email communication; tokenizing one or more words into sub-word tokens; and providing the sub-word tokens as input to an embedding layer of the second model.
Wherein Wagner teaches tokenizing one or more words into sub-word tokens; and (Par. (0016) “or training transformer models using position masking. In some embodiments, a system receives a set of input data that includes a sequence (e.g., a set of sentences) of tokens (e.g., words) and position values for each token in the sequence of tokens. In some embodiments, a position value represents the relative position of a particular token in a sequence of tokens.)
providing the sub-word tokens as input to an embedding layer of the second model. (Par. (0040) “”; providing sub-word token (subset of sequence of tokens))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla, Kutt and Wang to incorporate the teaching of Wagner to utilize the above feature because of the analogous concept of machine learning and natural language processing, with the motivation of using sub-word associated with tokens embedded in the layers as a form of criteria to detect mismatches and irregularities based on the sub-tokens. This adds another layer of verification to trained data models by correlating a sequence of tokens as input to distinguish authentic trained models and allows the machine learning system to produce efficient results. (Wagner Par. (0015-0017))
Claim 4, 7, and13, is/are rejected under 35 U.S.C. 103 as being unpatentable over Perumalla et al. (U.S No. 12353971, hereinafter referred to as “Perumalla”) Kutt et al. (U.S Pub. No. 20210240826, hereinafter referred to as “Kutt”), further in view of Wang et al. (U.S. Pub. No. 20210374547, hereinafter referred to as “Wang”)
In regards to Claim 4, the combination of Perumalla and Kutt do not explicitly teach wherein the teacher network includes a Bidirectional Encoder Representation from Transformers model.
Wherein Wang teaches wherein the teacher network includes a Bidirectional Encoder Representation from Transformers model. ((Par. (0066) “embedding generator 204 comprises one or more neural networks that generate text embeddings, such as a word2vec embedding generator, various bidirectional long short-term memory (LSTM) networks, a bidirectional encoder representations from transformers for biomedical text mining (BioBERT) model,”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Wang to utilize the above feature because of the analogous concept of machine learning and natural language processing, with the motivation of securely protecting the training of Machine learning models by detecting early on before possible compromise and harm to the models phishing and creating trust for users creating the models that appropriate measure and mediation would be put in place based on security detection in return creating high credibility and models with effective capabilities without tampering. (Wang Par. (0055-0057))
In regards to Claim 7, the combination of Perumalla and Kutt teach the computer program product of claim 1, Perumalla further teaches reduced-parameter adapter (Figure 2 labels 204, 224; reduced parameter adapter (parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL))
Perumalla and Kutt do not explicitly teach wherein the ….adapter includes an activation function for scaling inputs to outputs.
Wherein Wang teaches wherein the… adapter includes an activation function for scaling inputs to outputs. (Par. (0529) “.”; activation function for scaling inputs to output ( model training corresponding to scaling data))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Wang to utilize the above feature because of the analogous concept of machine learning and natural language processing, with the motivation of securely protecting the training of Machine learning models by detecting early on before possible compromise and harm to the models phishing and creating trust for users creating the models that appropriate measure and mediation would be put in place based on security detection in return creating high credibility and models with effective capabilities without tampering. (Wang Par. (0055-0057))
In regards to Claim 13, the combination of Perumalla and Kutt teach the method of claim 10, Perumalla further teaches reduced-parameter adapter (Figure 2 labels 204, 224; reduced parameter adapter (parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL))
Perumalla and Kutt do not explicitly teach wherein training the second model comprises modifying parameters in the ….adapter.
Wherein Wang teaches wherein training the second model comprises modifying parameters in the….. adapter. (Par. (0073) “,”; modifying parameter in adapter (updated parameters values))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Wang to utilize the above feature because of the analogous concept of machine learning and natural language processing, with the motivation of securely protecting the training of Machine learning models by detecting early on before possible compromise and harm to the models phishing and creating trust for users creating the models that appropriate measure and mediation would be put in place based on security detection in return creating high credibility and models with effective capabilities without tampering. (Wang Par. (0055-0057))
Claim 5, is/are rejected under 35 U.S.C. 103 as being unpatentable over Perumalla et al. (U.S No. 12353971, hereinafter referred to as “Perumalla”) and Kutt et al. (U.S Pub. No. 20210240826, hereinafter referred to as “Kutt”) further in view of Suwelack et al. (U.S Pub. No. 20210124982, hereinafter referred to as “Suwelack”)
In regards to Claim 5, the combination of Perumalla and Kutt teach the computer program product of claim 1, Perumalla further teaches reduced-parameter adapter (Figure 2 labels 204, 224; reduced parameter adapter (parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL))
Perumalla and Kutt do not explicitly teach wherein the …..adapter includes a randomly initialized, trainable adapter block interconnecting two of the second plurality of transformer layers.
Wherein Suwelack teaches wherein the…. adapter includes a randomly initialized, trainable adapter block interconnecting two of the second plurality of transformer layers. (Par. (0110) “One possibility may be that the parameters of the completely connected neuron layers are again randomly initialized before retraining, but the other parameters are taken over from the previous training.”; randomly initialized, interconnecting tow transformer layers (connected layers corresponding to randomly initialized))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Suwelack to utilize the above feature because of the analogous concept of neural networks and machine learning, with the motivation of creating a sense of randomness and unpredictability with adapters, weights and layers to further discourages entities attempting to forge or modify the network. (Sulwelack Par. (0004-0009))
Claim 6, is/are rejected under 35 U.S.C. 103 as being unpatentable over Perumalla et al. (U.S No. 12353971, hereinafter referred to as “Perumalla”) Kutt et al. (U.S Pub. No. 20210240826, hereinafter referred to as “Kutt”) and Wang et al. (U.S. Pub. No. 20210374547, hereinafter referred to as “Wang”), further in view of Colby et al. (U.S Pub. No. 20200176087, hereinafter referred to as “Colby”)
In regards to Claim 6, the combination of Perumalla and Kutt teach the computer program product of claim 1, Perumalla further teaches reduced-parameter adapter (Figure 2 labels 204, 224; reduced parameter adapter (parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL))
Perumalla and Kutt do not explicitly teach wherein the…. adapter includes a fully connected dense layer having a same dimensionality as the second plurality of transformer layers.
Wherein Colby teaches wherein the…… adapter includes a fully connected dense layer having a same dimensionality as the second plurality of transformer layers. (Par. (0032) “The decoder connects directly to the latent dense layer and includes three convolutional ReLU layers with [10, 10, 11] filters and kernel size [9, 9, 10], respectively, as in the encoder portion of the network. Finally, a softmax-activated dense layer, reshaped to match the dimensionality of the one-hot encoded targets, was added to predict final character sequences”; fully dense layer (latent dense layer) having same dimensionality (match the dimensionality))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Colby to utilize the above feature because of the analogous concept for natural language processing, with the motivation of implementing layers in a connected system with the same dimensions to create synchronization and uniformed criteria to detect irregularities and possible tampering based on different layers and in return promotes effective results in the machine learning system. (Colby Par. (0005-0009))
Claims 8 and 9, is/are rejected under 35 U.S.C. 103 as being unpatentable over Perumalla et al. (U.S No. 12353971, hereinafter referred to as “Perumalla”) and Kutt et al. (U.S Pub. No. 20210240826, hereinafter referred to as “Kutt”) further in view of Levy et al. (U.S Pub. No. 20190215329, hereinafter referred to as “Levy”)
In regards to Claim 8, the combination of Perumalla and Kutt do not explicitly teach wherein provisioning the second model includes deploying the second model on a threat management facility for the enterprise network.
Wherein Levy teaches wherein provisioning the second model includes deploying the second model on a threat management facility for the enterprise network. (Par. (0018) “While the model 130 is illustrated as associated with the threat management facility 108, it will be appreciated that the model 130 may be deployed at the threat management facility 108, at the firewall 104, at the endpoint 102 (e.g., with the endpoint threat detection 120)”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Levy to utilize the above feature because of the analogous concept for natural language processing, with the motivation of implementing threat management and a deployment to further securely protected the natural language processing system and produce authentic results without concerns of vulnerabilities or harm. (Levy Par. (0004))
In regards to Claim 9, the combination of Perumalla and Kutt do not explicitly teach wherein provisioning the second model includes deploying the second model on an endpoint associated with the enterprise network.
Wherein Levy teaches wherein provisioning the second model includes deploying the second model on an endpoint associated with the enterprise network. ((Par. (0018) “While the model 130 is illustrated as associated with the threat management facility 108, it will be appreciated that the model 130 may be deployed at the threat management facility 108, at the firewall 104, at the endpoint 102 (e.g., with the endpoint threat detection 120)”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Levy to utilize the above feature because of the analogous concept for natural language processing, with the motivation of implementing threat management and a deployment to further securely protected the natural language processing system and produce authentic results without concerns of vulnerabilities or harm. (Levy Par. (0004)
Claim 12, is/are rejected under 35 U.S.C. 103 as being unpatentable over Perumalla et al. (U.S No. 12353971, hereinafter referred to as “Perumalla”) and Kutt et al. (U.S Pub. No. 20210240826, hereinafter referred to as “Kutt”) further in view of Li et al. (U.S Pub. No. 20200372345, hereinafter referred to as “Li”)
In regards to Claim 12, the combination of Perumalla and Kutt do not explicitly teach wherein at least some of the trained layers from the first model are not modified during training of the second model.
Wherein Li teaches wherein at least some of the trained layers from the first model are not modified during training of the second model. (Par. (0047) “”; not modified (original model unchanged while new layers are trained))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perumalla and Kutt to incorporate the teaching of Li to utilize the above feature because of the analogous concept for natural language processing, with the motivation of having unmodified trained layers to build consistency in the machine learning system and create more efficiency by having trained data unchanged while new models are changed. This creates a frame of reference to new models by identifying data that is consistent and in return producing more cohesive results. (Li Par. (0006-0009))
Claims 17-20, is/are rejected under 35 U.S.C. 103 as being unpatentable over Kutt et al. (U.S Pub. No. 20210240826, hereinafter referred to as “Kutt”) and Perumalla et al. (U.S No. 12353971, hereinafter referred to as “Perumalla”) further in view of Wang et al. (U.S. Pub. No. 20210374547, hereinafter referred to as “Wang”),
In regards to Claim 17, Kutt teaches a system, comprising a security classifier comprising computer executable code stored in a memory and executing on a processor of a threat management resource of an enterprise network, the security classifier performing a classification task, and the security classifier generated by performing the steps of: (Par. (0036); threat engine and network scanning for incoming or outgoing emails corresponding to machine learning classification), (Figure 2; CPU and storage), (Figure 4; threat engine detection), (Par. (0050); classification of malware using machine learning models)
training the second model, which further compresses the first model by replacing at least one of the plurality of transformer layers with the reduced-parameter adapter (Par. (0100); training the second model (training models in model architecture) by replacing at least one of the second plurality of transformer layers with the reduced-parameter adapter (by substituting connected layers that are split in half with small batch sizes))
to perform the classification task including phishing detection and malware detection in a natural language classification context based on text of human-readable messages selected to provoke human responses. (Par. (0047-0048); a phishing detection task by fine-tuning the second model (detecting malware in email of set of machine learning models) with a labelled target dataset specific to phishing detection based on text of a body of an email (body of email (email attachment) with malware that is detected) an email communication selected to provoke human responses (detected malware in email attachment that leads to clicking attachment website or download to malicious IP address)
Kutt does not explicitly teach storing a first model including a plurality of transformer layers configured to perform a natural language processing task; generating a second model, the second model further compressing the first model by removing at least one of the plurality of transformer layers from the first model and replacing the at least one of the plurality of transformer layers in the first model with a reduced-parameter adapter in place of the at least one of the plurality of transformer layers, the reduced-parameter adapter configured for performing natural language processing, wherein the reduced-parameter adapter includes a parameter size smaller than the at least one of the plurality of transformer layers replaced by the reduced-parameter adapter, and adding an untrained classifier; and
Wherein Perumalla teaches storing a first model including a plurality of transformer layers configured to perform a natural language processing task; (Col. 2 lines 5-30, Col. 5 lines 15-67 and Col. 6 lines 1-6; teacher student network of models with plurality of ML layers)), (Col. 13 lines 57-67 and Col. 14 lines 1-25; storing of model and model data))
generating a second model, the second model further compressing the first model by removing at least one of the plurality of transformer layers from the first model and (Col. 9 lines 30-67; creating second ML model by replacing/substituting first layer corresponding to teacher and student model), (Col. 7 lines 25-67; compressing the large machine learning model associated with teacher and student by replacing layers)), (Col. 11 lines 54-67; deleting layers in ML model)
replacing the at least one of the plurality of transformer layers in the first model with an reduced-parameter adapter in place of the at least one of the plurality of transformer layers, (Col. 7 lines 25-67; replacing of teacher student layers to compress ML model), (Col. 9 lines 30-67; replacing of layers of teacher student network by substituting layers), (Figure 2 labels 204, 224; reduced-parameter adapter (parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL))
the reduced-parameter adapter configured for performing natural language processing, wherein the reduced-parameter adapter includes a parameter size smaller than the at least one of the plurality of transformer layers replaced by the reduced-parameter adapter, (Col. 7 lines 25-67; compressing the ML model to a smaller size corresponding to layers of teacher and student ML model), (Figure 2 labels 204, 224; parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL)), Col. 6 lines 45-67; size of replacement layers with characteristic parameters are smaller)), (Figure 2 labels 204, 224; reduced-parameter adapter (parameter size of label 204; X.ADD, X.AVGPOOL etc. is replaced with replacement layer 224 with smaller size parameters i.e. MATMUL))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kutt to incorporate the teaching of Perumalla to utilize the above feature because of the analogous concept of machine learning and natural language processing, with the motivation of replacing layers in machine learning models to promote high effectiveness and create more space for models to adapt and train data without constraints. (Perumalla Par. (0002-0003))
Kutt and Perumalla do not explicitly teach adding an untrained classifier; and
Wherein Wang teaches adding an untrained classifier; and (Par. (0123) “wherein untrained neural network 706 attempts to train itself using unlabeled data. In at least one embodiment, unsupervised learning training dataset 702 will include input data without any associated output data or “ground truth” data. In at least one embodiment, untrained neural network 706 can learn groupings within training dataset 702 and can determine how individual inputs are related to untrained dataset 702.”; untrained classifier (untrained grouping with input data))
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kutt and Perumalla to incorporate the teaching of Wang to utilize the above feature because of the analogous concept of machine learning and natural language processing, with the motivation of securely protecting the training of Machine learning models by detecting early on before possible compromise and harm to the models phishing and creating trust for users creating the models that appropriate measure and mediation would be put in place based on security detection in return creating high credibility and models with effective capabilities without tampering. (Wang Par. (0055-0057))
In regards to Claim 18, the combination of Kutt, Perumalla and Wang teach the of claim 17, Kutt further teaches wherein the classification task comprises classification of maliciousness of messages. ((Par. (0047-0050); determining if email is malicious based in attachment of website or attachment))
In regards to Claim 19, the combination of Kutt, Perumalla and Wang teach the system of claim 17, Kutt further teaches wherein the classification task comprises identification of phishing email messages. (Par. (0047-0051); classifying malicious content in emails)
In regards to Claim 20, the combination of Kutt, Perumalla and Wang teach the system of claim 17, Wang further teaches wherein the first model includes a Bidirectional Encoder Representation from Transformers model. (Par. (0066) “embedding generator 204 comprises one or more neural networks that generate text embeddings, such as a word2vec embedding generator, various bidirectional long short-term memory (LSTM) networks, a bidirectional encoder representations from transformers for biomedical text mining (BioBERT) model,”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kutt and Perumalla to incorporate the teaching of Wang to utilize the above feature because of the analogous concept of machine learning and natural language processing, with the motivation of utilizing a Bidirectional Encoder Representation in the model to produce more efficient results from trained data as well as securely protecting the training of Machine learning models by detecting early on before possible compromise and harm to the models phishing and creating trust for users creating the models that appropriate measure and mediation would be put in place based on security detection in return creating high credibility and models with effective capabilities without tampering. (Wang Par. (0055-0057))
Relevant Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Thomas; Andrew J. (U.S Pub. No. 20220014522) “FEDERATED SECURITY FOR MULTI-ENTERPRISE COMMUNICATIONS”. Considered this reference because it addressed threat management using training models.
KRAUS; Naama. (U.S Pub. No. 20200285737) “DYNAMIC CYBERSECURITY DETECTION OF SEQUENCE ANOMALIES”. Considered this application because it relates to machine learning models trained with security classifiers
Schmidtler; Mauritius (U.S No. 10599844) “Automatic Threat Detection Of Executable Files Based On Static Data Analysis”. Considered this application because it addressed natural language processing based on threat
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HASSAN A HUSSEIN whose telephone number is (571)272-3554. The examiner can normally be reached on 7:30am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Eleni Shiferaw can be reached on (571)272-3867. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-y.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /H.A.H./Examiner, Art Unit 2497 /ELENI A SHIFERAW/ Supervisory Patent Examiner, Art Unit 2497