Office Action Analysis: 18919663 — METHOD FOR DETECTING ABNORMAL BEHAVIOR IN ENCRYPTED NETWORK TRAFFIC USING BERT LANGUAGE MODEL AND APPARATUS FOR THE SAME

Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Status
Claims 1-20 are under examination.

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Applicant’s claim to foreign priority to the following foreign application has been acknowledged by the examiner: KR 10-2024-0004562 (1/11/2024)

Drawings
The drawing, Figure 2, is objected to because:
Specification ¶125 reads “210” instead of “S210” as shown in Fig. 2
Specification ¶125 reads “220” instead of “S220” as shown in Fig. 2
Specification ¶125 reads “230” instead of “S230” as shown in Fig. 2
Specification ¶125 reads “240” instead of “S240” as shown in Fig. 2
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: “neural network layer for detecting” (claims 16 and 18).
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitations to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections – 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claims 16 and 18-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
The limitations in claims 16 and 18 described under the “Claim Interpretation” above section invoke interpretation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitations use a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. A review of the specification reveals no sufficient structure is disclosed to perform the claimed functions. Thus, the claims are indefinite under 35 U.S.C. 112(b) (see related rejection herein, infra). When functional claim language is found indefinite, it typically lacks an adequate written description under § 112(a), because an indefinite, unbounded functional limitation would cover all ways of performing a function and indicate that the inventor has not provided sufficient disclosure to show possession of the invention. Thus, a 112(b) rejection that is based on functional language having unclear claim boundaries should be accompanied by a rejection under 12(a) based on failure to provide a written description for the claim. 
The dependent claims 19 and 20 are similarly rejected on their dependency on the rejected claims.

Claim Rejections – 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claim 16 and 18-20 are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
Regarding claims 16 and 20, their limitations described under the "Claim Interpretation" section above invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Therefore, the claims are indefinite and are rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. 
Applicant may: 
Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)). 
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either:
Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181. 
The dependent claims 19 and 20 are similarly rejected on their dependency on the rejected claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5 and 11-16 are rejected under 35 U.S.C. 103 as being unpatentable over Pan et al. (IEEE Publication “FlowBERT: An Encrypted Traffic Classification Model Based on Transformers Using Flow Sequence”), hereinafter Pan, in view of Shah et al. (United States Patent Publication 2024/0396913), hereinafter Shah. 
Regarding claim 1, Pan teaches A method for detecting anormal behavior in encrypted network traffic (Pan Section 1: “This paper proposes an encrypted traffic classification model based on Transformers using a Flow Sequence called FlowBERT.”), the method being performed by an anormal behavior detection apparatus (Pan Section 3: “This section aims to describe the proposed Transformer-based end-to-end multimodal encrypted traffic classification framework, FlowBERT...”), the method comprising: 
collecting encrypted network traffic from a network (Pan Section 3: “In the data processing module, we extract and clean the unlabeled data and then slice and encode it… The data processing module is critical in preparing the input data for subsequent stages, ensuring that our model can effectively learn and represent the complex relationships in the encrypted traffic data.”); 
generating training data in which [header] information for each packet is preprocessed in a format of a sequence, based on the encrypted network traffic (Pan Section 3 subsection A: “Within each segmented flow, we perform feature extraction to capture pertinent information. Precisely, we extract payload and length sequence information from the sliced flows. By concatenating payload and length data obtained from network packets, we form a comprehensive textual representation, serving as the input sequence for BERT.”); 
training a network traffic classification model based on a Bidirectional Encoder Representations from Transformers (BERT) language model using the training data (Pan Section 3 subsection B: “FlowBERT uses BERT as a pre-training model… Specifically, we first convert network traffic data into text sequences, each character repre-senting a byte. Next, we feed the text sequences into BERT, where each character is embedded into a high-dimensional vector. BERT encodes the input through a multi-head self-attention mechanism, and the output of each layer is a vector containing contextual information from the input sequence.”); 
and classifying anormal behavior traffic (Pan section 3 subsection A: “…such as detecting abnormal transmission patterns within specific application-layer traffic or identifying concealed malicious behavior within regular communications.”) in encrypted network traffic based on the trained network traffic classification model (Pan Section 3 subsection B: “The capability of FlowBERT to effectively integrate pre trained and task-specific representations facilitates quick adaptation to new encryption protocols, novel communication patterns, and emerging threats, contributing to its strong performance and applicability in practical encrypted traffic classification scenarios.”). but Pan fails to teach explicitly teach generating training data in which header information for each packet is preprocessed in a format of a sequence.
However, Shah teaches generating training data in which header information for each packet is preprocessed in a format of a sequence (Shah ¶64-66: “The packet parser component takes the network traffic (in real-time or pcap files) as input data. Each packet transmitted through the TCP contains up to 1594 information bytes. Information related to the environment and protocols can bias the model and make it less applicable to different environments. Hence, to remove this bias, the Ethernet (ETH) header information (14 bytes), the IP version (one byte), the differentiated services field (one byte), the protocol (one byte), and the source and destination IP addresses information (four bytes each) from the IP header were eliminated in these examples. The source and destination ports information bytes (two bytes each) from the TCP header of each packet data were also removed. Additionally, the IP options and TCP options, which can cause misalignment between two packets of the same flow and introduce noise in the model, were removed… After removing these information bytes (a total of 109 bytes, shown in red in FIG. 4) and encoding the temporal correlation feature into the feature space in one experiment, the resulting packet-based feature representation, V.sub.packet, contained a maximum of 1486 bytes of information. Each byte represents a feature in the packet-based feature representation.”).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan in view of Shah to utilize packet header information for training data since it is useful for detecting anormal header level activity (Shah ¶3: “Additionally, most packet-based NIDS capture only payload data, neglecting crucial information from packet headers. This oversight can impair the ability to identify header-level attacks, such as denial-of-service attacks.”).
Claim 12 is substantially similar to claim 1 and is rejected under the same rationale.
Regarding claim 2, Pan teaches wherein the training data corresponds to a packet [header] information sequence (Pan Section 3 subsection A: “By concatenating payload and length data obtained from network packets, we form a comprehensive textual representation, serving as the input sequence for BERT.”) but fails to teach packet header information and in which normal traffic and anormal behavior traffic are labeled. 
However, Shah teaches packet header information [as in claim 1] and in which normal traffic and anormal behavior traffic are labeled (Shah ¶50: “the CNN may be trained using a set of training data comprising network packet flows that have been labeled as indicative of malicious or benign activity.”).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan’s training data in view of Shah to label the normal and anormal traffic to improve accuracy of the classification system (Shah ¶50: “These labels may be leveraged to improve accuracy of a binary output system.”).
Claim 13 is substantially similar to claim 2 and is rejected under the same rationale. 
	Regarding claim 3, Pan fails to teach wherein the packet header information sequence is generated to correspond to a maximum size represented by single field information of a packet header for each protocol, and is generated such that respective pieces of configuration information are separated from each other in the packet header.
	However, Shah teaches wherein the packet header information sequence is generated to correspond to a maximum size represented by single field information of a packet header for each protocol (Shah ¶66: “After removing these information bytes (a total of 109 bytes, shown in red in FIG. 4) and encoding the temporal correlation feature into the feature space in one experiment, the resulting packet-based feature representation, V.sub.packet, contained a maximum of 1486 bytes of information. Each byte represents a feature in the packet-based feature representation.”), and is generated such that respective pieces of configuration information are separated from each other in the packet header (Shah ¶64: “The packet parser component takes the network traffic (in real-time or pcap files) as input data. Each packet transmitted through the TCP contains up to 1594 information bytes. Information related to the environment and protocols can bias the model and make it less applicable to different environments. Hence, to remove this bias, the Ethernet (ETH) header information (14 bytes), the IP version (one byte), the differentiated services field (one byte), the protocol (one byte), and the source and destination IP addresses information (four bytes each) from the IP header were eliminated in these examples. The source and destination ports information bytes (two bytes each) from the TCP header of each packet data were also removed. Additionally, the IP options and TCP options, which can cause misalignment between two packets of the same flow and introduce noise in the model, were removed.”).
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan’s header information in view of Shah to avoid misalignment (Shah ¶64: “(Misalignment occurs when the bytes in two feature representations of packets with and without options are not aligned, leading to a decrease in model performance and interpretability.) Alternatively, a padding or other normalization scheme could be utilized.”).
	Claim 14 is substantially similar to claim 3 and is rejected under the same rationale. 
	Regarding claim 4, Pan fails to teach wherein a portion that is not filled with data in the packet header information sequence is padded with an arbitrary value. 
However, Shah teaches wherein a portion that is not filled with data in the packet header information sequence is padded with an arbitrary value (Shah ¶64-¶66: “The packet parser component takes the network traffic (in real-time or pcap files) as input data. Each packet transmitted through the TCP contains up to 1594 information bytes… Each byte represents a feature in the packet-based feature representation… Since the number of bytes varies depending on the packet type, zero-padding is applied to the feature space to maintain a standard structure, resulting in a fixed number of features (N) for each packet.”).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan in view of Shah to pad the information sequence to avoid misalignment (Shah ¶64: “(Misalignment occurs when the bytes in two feature representations of packets with and without options are not aligned, leading to a decrease in model performance and interpretability.) Alternatively, a padding or other normalization scheme could be utilized.”).
 Claim 15 is substantially similar to claim 4 and is rejected under the same rationale. 
Regarding claim 5, Pan teaches wherein the network traffic classification model corresponds to a form in which an anormal behavior detection (Pan section 3 subsection A: “…such as detecting abnormal transmission patterns within specific application-layer traffic or identifying concealed malicious behavior within regular communications.”) neural network layer for detecting anormal behavior in the encrypted network traffic is added to a pre-trained BERT language model (Pan Section 3 subsection B: “FlowBERT uses BERT as a pre-training model. BERT is a widely applied pre-training model in the natural language processing domain. … For network traffic classification tasks, it is necessary to convert network traffic data into textual form and process it using BERT for classification purposes.”), (Pan Fig.1 shows the neural network layers).
Claim 16 is substantially similar to claim 5 and is rejected under the same rationale. 
Regarding claim 11, Pan in view of Shah teaches wherein the training data is generated by utilizing [header] information of a single packet or pieces of heade] information of multiple packets (Pan Section 3 subsection A: “By concatenating payload and length data obtained from network packets, we form a comprehensive textual representation, serving as the input sequence for BERT.”) (Shah, as in claim 1, teaches header information). but Pan fails to explicitly teach wherein the training data is generated by utilizing header information of a single packet or pieces of header information of multiple packets.
However, Shah teaches wherein the training data is generated by utilizing header information of a single packet or pieces of header information of multiple packets. (Shah ¶64-66: “The packet parser component takes the network traffic (in real-time or pcap files) as input data. Each packet transmitted through the TCP contains up to 1594 information bytes. Information related to the environment and protocols can bias the model and make it less applicable to different environments. Hence, to remove this bias, the Ethernet (ETH) header information (14 bytes), the IP version (one byte), the differentiated services field (one byte), the protocol (one byte), and the source and destination IP addresses information (four bytes each) from the IP header were eliminated in these examples. The source and destination ports information bytes (two bytes each) from the TCP header of each packet data were also removed. Additionally, the IP options and TCP options, which can cause misalignment between two packets of the same flow and introduce noise in the model, were removed… After removing these information bytes (a total of 109 bytes, shown in red in FIG. 4) and encoding the temporal correlation feature into the feature space in one experiment, the resulting packet-based feature representation, V.sub.packet, contained a maximum of 1486 bytes of information. Each byte represents a feature in the packet-based feature representation.”).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan in view of Shah to utilize packet header information for training data since it is useful for detecting anormal header level activity (Shah ¶3: “Additionally, most packet-based NIDS capture only payload data, neglecting crucial information from packet headers. This oversight can impair the ability to identify header-level attacks, such as denial-of-service attacks.”).

Claims 6-9 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Pan in view of Shah in view Koo et al. (United States Patent Publication 2023/0161879 – provided by Applicant), hereinafter Koo.
Regarding claim 6, Pan and Shah teach the method of claim 2, wherein training the network traffic classification model comprises: performing Masked Language Model (MLM) training on the network traffic classification model using the packet header information sequence (Pan Section 3 subsection B: “We employed a self-supervised learning approach, Masked Language Modeling (MLM) [24] to adapt to the network traffic classification task. During the pre-training process, we randomly selected a certain proportion of payload and length sequences in the input sequences and represented them using a special token ‘[MASK].’”) (Shah, as in claim 1, teaches header information); and [performing Next Sentence Prediction (NSP)] training on the network traffic classification model using the packet header information sequence (Pan Section 3 subsection B: “FlowBERT uses BERT as a pre-training model… Specifically, we first convert network traffic data into text sequences, each character repre-senting a byte. Next, we feed the text sequences into BERT, where each character is embedded into a high-dimensional vector. BERT encodes the input through a multi-head self-attention mechanism, and the output of each layer is a vector containing contextual information from the input sequence.”). but fails to explicitly teach and performing Next Sentence Prediction (NSP) training on the network traffic classification model using the packet header information sequence.
However, Koo teaches and performing Next Sentence Prediction (NSP) training on the network traffic classification model using the packet header information sequence (Koo ¶89: “A second task for prelearning the assembly language model 231 is a next sentence prediction (NSP) task using an instruction code sequence.”).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan and Shah in view of Koo for the training of the classification model to include NSP training to better understand a contextual relation of an instruction sequence (Koo ¶89: “…a binarized NSP task is performed to understand a contextual relation of an instruction sequence.”).
Claim 17 is substantially similar to claim 6 and is rejected under the same rationale. 
Regarding claim 7, Pan and Shah teach the method of claim 5, but fail to teach wherein training the network traffic classification model comprises: adding a new malware detection neural network layer for detecting new malware that is previously unknown to the network traffic classification model,
However, Koo teaches wherein training the network traffic classification model comprises: adding a new malware detection neural network layer for detecting new malware that is previously unknown to the network traffic classification model (Koo ¶91: “As for the step (240) of learning a malware classifier based on an assembly language model, in the step (240) of learning a malware classifier based on an assembly language model, a malware binary classifier 241 is generated by adding a neural network layer for malicious code classification based on the prelearned assembly language model 231…”), 
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan and Shah in view of Koo to increase the accuracy of the classification model (Koo ¶106: “In addition, as a method according to embodiments of the present disclosure is capable of differentiating embedding according to a context within an instruction sequence even for a same instruction, it is capable of representing a further detail of the behavior of an instruction sequence, and thus since behavior patterns of sub-divided instruction sequences can be learned, the detection accuracy of a malware classifier may be enhanced.”) 
Pan in view of Shah further teaches and training the new malware detection neural network layer using the packet header information sequence (Pan section 3 subsection B: “In FlowBERT, we use BERT architecture to encode payload and packet length sequences. Specifically, we first convert network traffic data into text sequences, each character representing a byte. Next, we feed the text sequences into BERT, where each character is embedded into a high-dimensional vector.”) (Shah, as in claim 1, teaches header information), (Pan Fig.1 shows the neural network layers).
Claim 18 is substantially similar to claim 7 and is rejected under the same rationale. 
Regarding claim 8, Pan in view of Shah teaches wherein training the new malware detection neural network layer comprises: tokenizing the packet header information sequence [based on an instruction code] used for pre-training of the BERT language model (Pan section 3 subsection A: “By concatenating payload and length data obtained from network packets, we form a comprehensive textual representation, serving as the input sequence for BERT. 4) Encoding: To effectively process the textual input using BERT, we employ BERT’s Tokenizer to tokenize the concatenated string into a series of tokens, including both words and subwords.”) (Shah, as in claim 1, teaches header information) (Pan Fig.1 shows the neural network layers), but fails to teach and training the network traffic classification model to detect new malware that is previously unknown by inputting a tokenized packet header information sequence to the network traffic classification model.
However, Koo teaches wherein training the new malware detection neural network layer comprises: tokenizing the packet header information sequence based on an instruction code used for pre-training of the BERT language model, and training the network traffic classification model to detect new malware that is previously unknown by inputting a tokenized packet header information sequence to the network traffic classification model (Koo ¶105: “When the instruction collector 410 generates a segmented instruction code sequence file of an instruction for an unknown file, after the instruction code tokenizer 420 tokenizes (indexes) an instruction code sequence by using an instruction code dictionary that is used for prelearning of an assembly language model, an instruction code sequence is embedded by inputting an indexed instruction code sequence into the assembly language model 430 that is completely learned, and then an embedding result of the instruction code sequence is output.”).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan in view of Koo to increase the accuracy of the classification model Koo ¶106: “In addition, as a method according to embodiments of the present disclosure is capable of differentiating embedding according to a context within an instruction sequence even for a same instruction, it is capable of representing a further detail of the behavior of an instruction sequence, and thus since behavior patterns of sub-divided instruction sequences can be learned, the detection accuracy of a malware classifier may be enhanced.”).
Claim 19 is substantially similar to claim 8 and is rejected under the same rationale. 
Regarding claim 9, Pan and Shah fail to teach wherein the instruction code includes instruction codes capable of indexing values of all header information appearing in the training data, and includes instruction codes related to special tokens for exception handling in token indexing
However, Koo teaches wherein the instruction code includes instruction codes capable of indexing values of all header information appearing in the training data (Koo ¶75: “Herein, as illustrated in FIG. 11, the instruction code dictionary is for indexing an instruction code by an integer for a learning dataset consisting of a segmented instruction code sequence file. That is, in an embodiment of the present disclosure, an instruction code dictionary is built up which is capable of considering each instruction code as an individual token and indexing every instruction code that appears in whole learning data.”), and includes instruction codes related to special tokens for exception handling in token indexing (Koo ¶76: “The instruction code tokenizer 221 may utilize a special token for model learning and token indexing exception processing.”).
It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan in view of Koo to increase the accuracy of the classification model (Koo ¶106: “In addition, as a method according to embodiments of the present disclosure is capable of differentiating embedding according to a context within an instruction sequence even for a same instruction, it is capable of representing a further detail of the behavior of an instruction sequence, and thus since behavior patterns of sub-divided instruction sequences can be learned, the detection accuracy of a malware classifier may be enhanced.”).
Pan in view of Shar further teaches when header information of each packet is recognized as an individual token (Pan section 3 subsection B: “During the pre-training process, we randomly selected a certain proportion of payload and length sequences in the input sequences and represented them using a special token ‘[MASK].’”)(Shah, as in claim 1, teaches header information).
Claim 20 is substantially similar to claim 9 and is rejected under the same rationale. 

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Pan in view of Shah in view of Koo in view of Sethi et al. (United States Patent Publication 2023/0221974), hereinafter Sethi.
	Regarding claim 10, Pan and Shah fail to teach wherein the instruction codes related to the special tokens correspond to a code indicating a space, a code signifying an individual token, a code representing token indexing not found in a dictionary, a code indicating start of a sequence, a code indicating separation between two sequences, and a code indicating a padding token.
	However, Koo teaches wherein the instruction codes related to the special tokens correspond to a code indicating a space, a code signifying an individual token, a code representing token indexing not found in a dictionary, a code indicating start of a sequence, a code indicating separation between two sequences, [and a code indicating a padding token] (Koo ¶76: “Examples of special tokens may include ‘ ’ for marking a blank, ‘[UNK]’ for indexing a token not in the dictionary, ‘[MASK]’ for masking an individual token, ‘[CLS]’ for indicating a start of a sequence, and ‘[SEP]’ for distinguishing two sequences (sentences).”).
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan in view of Koo to increase the accuracy of the classification model (Koo ¶106: “In addition, as a method according to embodiments of the present disclosure is capable of differentiating embedding according to a context within an instruction sequence even for a same instruction, it is capable of representing a further detail of the behavior of an instruction sequence, and thus since behavior patterns of sub-divided instruction sequences can be learned, the detection accuracy of a malware classifier may be enhanced.”).
	Koo fails to explicitly teach a code indicating a padding token.
	However, Sethi teaches a code indicating a padding token (Sethi ¶123: “In addition to performing tokenization, the state processor may also pad the sequences of tokens. In other words, the state processor may include one or more generic or null tokens (e.g., zero) in sequences of tokens…”).
	It would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Pan and Koo in view of Sethi to include a padding token for increased usability of the sequences (Sethi ¶123: “in order to ensure that all sequences of tokens include the same length (i.e., each sequence includes the same number of tokens). As a result, the padded sequences of tokens may be in a form usable by a prediction algorithm to generate a state prediction model.”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALEC ANKRUM whose telephone number is (571)272-9209. The examiner can normally be reached M-F 7:15am-3:15pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ali Shayanfar can be reached at 571-270-1050. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/A.C.A./Examiner, Art Unit 2434                                                                                                                                                                                                        

/NOURA ZOUBAIR/Primary Examiner, Art Unit 2434
Read full office action
METHOD FOR DETECTING ABNORMAL BEHAVIOR IN ENCRYPTED NETWORK TRAFFIC USING BERT LANGUAGE MODEL AND APPARATUS FOR THE SAME

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD FOR DETECTING ABNORMAL BEHAVIOR IN ENCRYPTED NETWORK TRAFFIC USING BERT LANGUAGE MODEL AND APPARATUS FOR THE SAME

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email