DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
2. The Amendment filed September 16, 2025 has been entered and made of record. Claims 1, 2, 17, 19 and 20 have been amended. Claim 8 is cancelled. Claims 1-7 and 9-20 are presented for examination. Applicant’s amendment to claim 2 has overcome the claim objections previously set forth in the Non-Final Office Action mailed June 16, 2025. The objection of claim 2 has been withdrawn.
Response to Arguments
3. Applicant’s amendment to claim 17 has overcome the U.S.C. § 112(b) or U.S.C. § 112 (pre-AIA ), Second Paragraph rejection previously set forth in the Non-Final Office Action mailed June 16, 2025. The rejection of claim 17 under 35 USC 112(b) has been withdrawn.
4. Applicant’s arguments, see pages 7-9, filed September 16, 2925, with respect to the rejection of claim 1 under 35 U.S.C. § 102 (a)(2) have been considered but are moot in view of the new grounds of rejection. The claim (as amended) does not overcome the new ground of rejection made in view of newly found prior art references.
5. Applicant’s arguments, see page 9, filed September 16, 2925, with respect to the rejection of claims 2-7 and 9-18 under 35 U.S.C. § 103 have been considered but are moot in view of the new grounds of rejection. The claims do not overcome the new ground of rejection made in view of newly found prior art references.
6. Applicant’s arguments, see pages 9-10, filed September 16, 2925, with respect to the rejection of claim 19 under 35 U.S.C. § 102 (a)(2) have been considered but are moot in view of the new grounds of rejection. The claim (as amended) does not overcome the new ground of rejection made in view of newly found prior art references.
7. Applicant’s arguments, see pages 10-11, filed September 16, 2925, with respect to the rejection of claim 20 under 35 U.S.C. § 103 have been considered but are moot in view of the new grounds of rejection. The claim (as amended) does not overcome the new ground of rejection made in view of newly found prior art references.
Claim Objections
8. Claims 1, 15 and 19 are objected to because of the following informalities:
Claim 15 recites “wherein the first clustering assignment matrix and the second assignment matrix being for adjacent time periods” should read “wherein the first clustering assignment matrix and the second assignment matrix are for adjacent time periods”.
Claim 1 recites “… projecting the plurality of sets onto a representation space …” should read “… projecting the plurality of sets of features onto a representation space …”. This objection also applies equally to claim 19 as well.
Claim Rejections - 35 USC § 103
9. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
10. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
11. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
12. Claims 1-3, 5-7, 9, 10, 11-13 and 19 are rejected under 35 U.S.C. § 103 as being unpatentable over Doron et al. (US 2019/0182274 A1), hereafter Doron, in view of Bilge et al. (US 11,025,649 B1), hereafter Bilge.
Noted that indicates what the cited art does not teach.
Regarding claim 1, Doron discloses a method for computer scanning activity detection, comprising: {Doron [Para. 0066] “A method for predicting subsequent attacks in an attack campaign.”}
receiving Darknet data associated with scanning activities of a plurality of scanners; {Doron [Para. 0068] “At S310, incoming events data is received or retrieved from the data sources (e.g., sources 115).” [Para. 0031] “The data sources 115 generate or collect events data related to attacks. The events data may include events, event parameters, or both. To this end, the data sources 115 may include, but are not limited to, databases, security devices, DDoS attack detectors or mitigators, SIEM systems, WAF services, user devices, sources of NetFlow data (e.g., from switch routers), sources of DPI data, case management systems (of, e.g., SOC or NOC) combinations thereof, and the like. The data sources 115 may be third party services such as threat intelligence, DarkNet intelligence, known attacks repositories, and so on. The data sources 115 may be configured to store the events parameters in the database 140, to send the events to the attack predictor 150, or both.”} Data sources 115 collect activity events from multiple sources including DarkNet intelligence.
determining a plurality of sets of features corresponding to the plurality of scanners based on the Darknet data; {Doron [Fig. 3, Para. 0043] “The sequencer 220 is configured to perform the embedding processing using a neural network-based process or an n-grams based process. When utilizing the neural network based embedding process, the sequencer 220 may represent each attack in the sequence as a number or an array of numbers. The number (or an array of numbers) may encode, for example, attack attributes. The attack attributes may include an attack type, source IP address, geo location information, the number of bytes, and the number packets involved in the attack, the duration of the attack, the time gap since the previous attack, and so on.” [Para. 0069] “At S320, at least one sequence is extracted from the events data. The events data represents individual attacks that occurred during a predefined time window and targeted the same protected object. Thus, the sequence includes a series of attacks that occurred during the same time window and against the same target. The extraction of sequences can be performed in real time, in near real time, or based on the stored events data.”} Doron’s system extracts attack attributes (features) from event data. Event data includes event activities from Darknet intelligence, and represents individual attacks that occurred during a predefined time window.
generating a plurality of embeddings based on a deep autoencoder, by projecting the plurality of sets onto a representation space having a lower dimensionality than the Darknet data via a nonlinear autoencoder function, the plurality of embeddings corresponding to the plurality of sets of features to reduce dimensionality of the plurality of sets of features; {Doron [Fig. 3, Para. 0043] “When utilizing the neural network based embedding process, the sequencer 220 may represent each attack in the sequence as a number or an array of numbers. The number (or an array of numbers) may encode, for example, attack attributes.” [Para. 0044] “The sequence is represented using a vector of numbers or arrays of numbers that represent the attacks. A deep learning neural network model is used to transform the vector or array of numbers into a sequence signature.” [Para. 0045] “The sequence signature is a vector of real numbers. A layers of LSTM Neural Network is used to create the sequence signature. The vector of numbers (or arrays of numbers) representing the sequence is fed forward into the LSTM network. The values appear in one of the hidden layers of the Neural Network (e.g., the last one) and are used as the embedded representation of the sequence. This embedded representation is the sequence signature.” [Para. 0048] “The embedding process includes aggregating all the attacks that occurred in a predefined time window (e.g., 5 minutes or one hour) into a time window vector. Each position of the time window vector encodes a distinct attack type of an attack that occurred during the time window. The attack in the vector can be represented using a bit, a number, or an array of numbers. The bit or number represents the attack type. The array of numbers can represent the attack type and attributes of the attack type, in that time window such as a source IP address or geo-location information, the number of bytes and the number packets involved in the attack, the duration of the attack, the time gap from the previous attack, and so on.” [Para. 0049] “ When embedding into vectors, the sequence is represented using a vector of time window vectors, where each element in each time window vector represents a distinct attack that occurred in the time window (as a number or array of numbers). A deep learning neural network model is then used to transform the vector of time window vectors into a sequence signature.”} See paras. 0043-0049 and 0070. Doron’s system sequencer uses a neural network to generate embeddings. In addition, a neural network autoencoder can also be used to generate embeddings (see para. 0050).
generating a plurality of clusters based on the plurality of embeddings using a clustering technique; {Doron [Fig. 3, Para. 0049] “When embedding into vectors, the sequence is represented using a vector of time window vectors, where each element in each time window vector represents a distinct attack that occurred in the time window (as a number or array of numbers). As described above, a deep learning neural network model is then used to transform the vector of time window vectors into a sequence signature. In an embodiment, the sequence signature is a vector of real numbers created using a LSTM Neural Network.” [Para. 0070] “At S330, for each sequence created at S320, a sequence signature is created. The sequence signature is generated using an embedding process discussed in detail above.”} Doron’ system generates a plurality of sequence signatures (clusters) based on the embedding process.
and detecting a temporal change in the plurality of clusters. {Doron [Fig. 3, Para. 0059] “The search is performed using matching clusters. That is, a search for a matching cluster of historic sequences is performed, and subsequent attacks are predicted based on the continuation attacks of the historic sequences in the matching cluster. The prediction engine 230 may match between a current sequence and a cluster of historic sequences. The matching uses the sequence signature representation of the current sequence and of the cluster centroid. The prediction engine 230 may search for a close enough cluster to the current sequence with respect to a distance metric. The distance metric may be, for example, a cosine similarity between vectors, where the vectors represent individual sequence signatures or the cluster centroid.” [Para. 0061] “When at least one historic sequence is found, the continuation attacks of the respective historic sequence are predicted as potential continuations for the current sequence (attack campaign). A match may be defined when the distance metric is less than a predefined threshold embodiment. A confidence score or probability may be assigned to the predicted attacks, based on the rate of sequences in the cluster that pointed on an attack as a possible continuation.”} Also see para. 0071. Doron’s system compares a cluster of historic time sequences to a cluster of current time sequences to determine if any change has occurred.
However, Doron does not teach ; generating a plurality of embeddings based on a deep autoencoder by projecting the plurality of sets onto a representation space having a lower dimensionality than the Darknet data via a nonlinear autoencoder function, the plurality of embeddings corresponding to the plurality of sets of features to reduce dimensionality of the plurality of sets of features;
However, Bilge teaches generating a plurality of embeddings based on a deep autoencoder by projecting the plurality of sets onto a representation space having a lower dimensionality than the Darknet data via a nonlinear autoencoder function, the plurality of embeddings corresponding to the plurality of sets of features to reduce dimensionality of the plurality of sets of features; {Bilge [Col 10, line 48-59]“At step 302 one or more of the systems described herein may receive data describing behavior (events) performed by known malware. For example, receiving module 104 may, as part of computing device 202 in FIG. 2, receive dynamic analysis traces (e.g., text strings) describing actions performed by malware operating in a sandbox environment.” [Col. 10, line 60-62] “At step 304, the received traces may be analyzed.” [Col. 11, line 1-13] “In the second process, the text strings may be mapped to a second set of vectors, and the sequence of events is taken into account. This second process may include an N-grams analysis and/or one or more recurrent neural networks, and may be performed by Ordered Event Module 108, depicted in FIG. 1. This process may include an LSTM word embedding performed over strings of events extracted from malware samples to generate non-linear abstract features.” [Col. 11, line 14-22] “At step 306, the vectors output in step 304 (e.g., by modules 106 and 108) may be input to an Autoencoder Module 110 (see FIG.1), along with supervised information.” [Col. 11, line 23-36] “At step 308, the autoencoder is trained and outputs a feature representation. Feature extraction transforms data from an original, high-dimensional space to a relatively low-dimensional space, and this transformation may be linear or nonlinear.” [Col. 7, line 14-20] “A hidden layer of an autoencoder may include a series of neurons. Each neuron may calculate a linearly weighted combination of outputs from the neurons of the preceding layer. The linear combination calculated at each hidden layer neuron may be mapped to a scalar value using, for example, a non-linear transformation function, such as a sigmoid function or tan h function.”} As disclosed in Bilge, the system receives data related to events, performs n-gram analysis to extract features from the received data, and generate feature vectors. Subsequently, these vectors are input into an autoencoder, which generates embeddings that have a lower dimension than the original received data.
Bilge is analogous art because each of Doron and Bilge pertains to implementing machine learning techniques to identify cyber-attacks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron to include Bilge’s teaching of the limitations of claim 1, listed above. Doing so would “detect threats that are difficult to detect with regular methods such as while/blacklisting, reputation, AV engines” (Bilge, col. 14 line 15-19).
Claim 2:
Regarding claim 2, Doron and Bilge teach the elements of claim 1 as outlined above.
Doron further teaches wherein a set of features of the plurality of sets of features corresponds to a scanner of the plurality of scanners, {Doron [Para. 0043] “When utilizing the neural network based embedding process, the sequencer 220 may represent each attack in the sequence as a number or an array of numbers. The number (or an array of numbers) may encode, for example, attack attributes. The attack attributes may include an attack type, source IP address, geo location information, the number of bytes, and the number packets involved in the attack, the duration of the attack, the time gap since the previous attack, and so on.”} Doron’s system identifies attack attributes that correspond to the Darknet intelligence scanners and use them to retrieve Darknet data associate with the attack attributes.
wherein the scanning activities of the plurality of scanners are within a predetermined period of time, {Doron [Para. 0068] “At S310, incoming events data is received or retrieved from the data sources (e.g., sources 115). In another embodiment, the events data are received in batches.”} Event data are received in batches.
and wherein the set of features comprises at least one of: a traffic volume, a scanning scheme, a targeted application, or a scanner type of the scanner. {Doron [Para. 0043] “When utilizing the neural network based embedding process, the sequencer 220 may represent each attack in the sequence as a number or an array of numbers. The number (or an array of numbers) may encode, for example, attack attributes. The attack attributes may include an attack type, source IP address, geo location information, the number of bytes, and the number packets involved in the attack, the duration of the attack, the time gap since the previous attack, and so on.”} Attack attributes include the number of bytes and the number of packets (traffic volume) involved in an attack.
Claim 3:
Regarding claim 3, Doron teaches the elements of claim 2 as stated above.
Doron further teaches wherein the traffic volume of the scanner within the predetermined period of time comprises at least one of a total number of packets transmitted, a total amount of bytes transmitted, or an average inter-arrival time between packets transmitted. {Doron [Para. 0043] “When utilizing the neural network based embedding process, the sequencer 220 may represent each attack in the sequence as a number or an array of numbers. The number (or an array of numbers) may encode, for example, attack attributes. The attack attributes may include an attack type, source IP address, geo location information, the number of bytes, and the number packets involved in the attack, the duration of the attack, the time gap since the previous attack, and so on.”} Attack attributes include the number of bytes and the number of packets involved in an attack.
Claim 5:
Regarding claim 5, Doron teaches the elements of claim 2 as outlined above.
Doron further teaches wherein the targeted application within the predetermined period of time comprises at least one of a set of ports scanned, or a set of protocol request types scanned. {Doron [Para. 0064] “The decision engine 240 is configured to determine mitigation actions to be performed. The determination may be based on a security policy of a protected object, a rank of the predictive sequence signature, the potential continuations determined for the current attack campaign. The mitigation actions may include actions for preventing or reducing harm from the predicted attacks. When the predicted attacks include an upcoming network time protocol (NTP) flood attack, the determined mitigation actions may include techniques for filtering traffic from forged source addresses (e.g., ingress filtering).”} Doron’s system predicts an upcoming NTP flood attack. During the attack, network time protocol requests that occurred during a specific time period can be scanned.
Claim 7:
Regarding claim 7, Doron and Bilge teach the elements of claim 1 as stated above.
Doron further teaches wherein the plurality of sets of features comprises heterogeneous data containing at least one categorical dataset for a feature and at least one numerical dataset for the feature. {Doron [Para. 0043] “When utilizing the neural network based embedding process, the sequencer 220 may represent each attack in the sequence as a number or an array of numbers. The number (or an array of numbers) may encode, for example, attack attributes. The attack attributes may include an attack type, source IP address, geo location information, the number of bytes, and the number packets involved in the attack, the duration of the attack, the time gap since the previous attack, and so on.”} Attack attributes include an attack type and the number of bytes involved in an attack.
Claim 9:
Regarding claim 9, Doron and Bilge teach the elements of claim 1 as outlined above.
Doron further teaches wherein the deep autoencoder comprises a fully-connected multilayer perceptron neural network. {Doron [Para. 0050] “Neural network autoencoders, LSTM autoencoders, or convolutional autoencoders can also be used to train a network that has its hidden layer (e.g., central layer), being used for the embedding process.”} A neural network autoencoder is a fully-connected multilayer perceptron neural network.
Claim 10:
Regarding claim 10, Doron teaches the elements of claim 9 as outlined above.
Doron further teaches wherein the fully-connected multilayer perceptron neural network uses two layers.{Doron [Para. 0045] “A layers of LSTM Neural Network is used to create the sequence signature. The vector of numbers (or arrays of numbers) representing the sequence is fed forward into the LSTM network. The values appear in one of the hidden layers of the Neural Network (e.g., the last one) and are used as the embedded representation of the sequence.” [Para. 0047] “It should be noted that by utilizing this training, a neural network is created. In such network one of its hidden layers (e.g., the last layer), is used as the embedded representation of sequences.”} Also see para. 0050 in Doron. A neural network has a first hidden layer and a last hidden layer, and therefore uses two layers. In addition, an autoencoder (see para. 0050) has an encoder (which can have multiple hidden layers) and a decoder (which also can have multiple hidden layers).
Claim 11:
Regarding claim 11, Doron and Bilge teach the elements of claim 1 as stated above.
Doron further teaches further comprising: training the deep autoencoder. {Doron [Para. 0050] “Neural network autoencoders, LSTM autoencoders, or convolutional autoencoders can also be used to train a network that has its hidden layer (e.g., central layer), being used for the embedding process.”}
However, Doron does not teach training the deep autoencoder by minimizing a reconstruction loss based on the plurality of sets of features and the plurality of embeddings.
However, Bilge teaches training the deep autoencoder by minimizing a reconstruction loss based on the plurality of sets of features and the plurality of embeddings. {Bilge [Col. 11, line 37-56] “An autoencoder may include two parts, an encoder and a decoder. Considering a data sample X with n samples and m features, the output X′ of the encoder may represent the reduced representation of X and the decoder may be tuned to reconstruct the original dataset X from the encoder's representation Y by minimizing the difference between X and X′. The encoder (a neural network) may map an input X to a hidden representation Y, and the decoder (another neural network) may map hidden representation Y to a “reconstruction” X′. Training an autoencoder may entail finding parameters that minimize the “reconstruction loss”—i.e., the difference (using some accepted measure) between X and X′.”}
Bilge is analogous art because each of Doron and Bilge pertains to implementing machine learning techniques to identify cyber-attacks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron to include Bilge’s teaching of the limitations of claim 11, listed above. Doing so would “detect threats that are difficult to detect with regular methods” (Bilge, col. 14 line 15-19).
Claim 12:
Regarding claim 12, Doron and Bilge teach the elements of claim 11 as stated above.
However, Doron does not teach generating a plurality of decoded input datasets by decoding the plurality of embeddings to map the plurality of decoded input datasets to the plurality of sets of features.
However, Bilge teaches generating a plurality of decoded input datasets by decoding the plurality of embeddings to map the plurality of decoded input datasets to the plurality of sets of features. {Bilge [Col. 11, line 14-22] “At step 306, the vectors output in step 304 (e.g., by modules 106 and 108) may be input to an Autoencoder Module 110 (see FIG.1), along with supervised information.” [Col. 11, line 23-36] “At step 308, the autoencoder is trained and outputs a feature representation.” [Col. 11, line 37-56] “An autoencoder may include two parts, an encoder and a decoder. Considering a data sample X with n samples and m features, the output X′ of the encoder may represent the reduced representation of X and the decoder may be tuned to reconstruct the original dataset X from the encoder's representation Y by minimizing the difference between X and X′. The encoder (a neural network) may map an input X to a hidden representation Y, and the decoder (another neural network) may map hidden representation Y to a “reconstruction” X′. Training an autoencoder may entail finding parameters that minimize the “reconstruction loss”—i.e., the difference (using some accepted measure) between X and X′.” The decoder learns to take the encoding and properly reconstruct it.”}
Bilge is analogous art because each of Doron and Bilge pertains to implementing machine learning techniques to identify cyber-attacks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron to include Bilge’s teaching of the limitations of claim 12, listed above. Doing so would “detect threats that are difficult to detect with regular methods” (Bilge, col. 14 line 15-19).
Claim 13:
Regarding claim 13, Bilge teaches the elements of claim 12 as outlined above.
However, Doron does not teach wherein the reconstruction loss is minimized by minimizing distances between the plurality of sets of features and the plurality of decoded input datasets, the plurality of sets of features corresponding to the plurality of decoded input datasets.
However, Bilge teaches wherein the reconstruction loss is minimized by minimizing distances between the plurality of sets of features and the plurality of decoded input datasets, the plurality of sets of features corresponding to the plurality of decoded input datasets. {Bilge [Col. 11, line 37-56] “An autoencoder may include two parts, an encoder and a decoder. Considering a data sample X with n samples and m features, the output X′ of the encoder may represent the reduced representation of X and the decoder may be tuned to reconstruct the original dataset X from the encoder's representation Y by minimizing the difference between X and X′. The encoder (a neural network) may map an input X to a hidden representation Y, and the decoder (another neural network) may map hidden representation Y to a “reconstruction” X′. Training an autoencoder may entail finding parameters that minimize the “reconstruction loss”—i.e., the difference (using some accepted measure) between X and X′.”} Also see col. 11 line 14-36 in Bilge.
Bilge is analogous art because each of Doron and Bilge pertains to implementing machine learning techniques to identify cyber-attacks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron to include Bilge’s teaching of the limitations of claim 13, listed above. Doing so would “detect threats that are difficult to detect with regular methods” (Bilge, col. 14 line 15-19).
Claim 19:
Regarding claim 19, the claim is directed to a system for malicious activity detection, and the system implements the method recited by claim 1. Therefore, the rejection applied to claim 1 also applies to claim 19. Claim 1 is rejected under the same rationale as claim 19.
Claim 19 further recites a system for malicious activity detection, comprising: at least one processor; a communication device connected to the processor and configured to receive data reflective of network activity; a memory having stored thereon a set of instructions which, when executed by the processor, cause the processor to: perform operations of claim 1. {Doron [Fig. 4, para. 0016] “The system comprises a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: receive events data related to cyber-attacks occurring in a network during a predefined time window;…and determine, based on the matching sequence, at least one subsequent cyber-attack in a respective sequence.” [Para. 0073] “The attack predictor 150 includes a processing circuitry 410 coupled to a memory 415, a storage 420, and a network interface 440. The components of the attack predictor 150 may be communicatively connected via a bus 450.”} See para. 0078 for more details on the network interface.
13. Claims 4 and 14 are rejected under 35 U.S.C. § 103 as being unpatentable over Doron and Bilge as applied to claims 1 and 2, and further in view of Cohen et al., (US 2020/0322368 A1), hereafter Cohen.
Regarding claim 4, Doron teaches the elements of claim 2 as outlined above.
However, Doron and Bilge do not teach wherein the scanning scheme within the predetermined period of time comprises at least one of: a number of distinct destination ports, a number of distinct destination addresses, a prefix destiny, or a destination scheme.
However, Cohen teaches wherein the scanning scheme within the predetermined period of time comprises at least one of: a number of distinct destination ports, a number of distinct destination addresses, a prefix destiny, or a destination scheme. {Cohen [Fig. 2, Para. 0066] “The present invention models the ongoing activities in the darknet by examining the D-Ports of packets arriving to the darknet and cluster them into groups.” [Para. 0068] “First, the data is split into sliding time windows, resulting in multiple windows with length L. For each time window, the destination port (D-port records of the same S-IP are grouped into a port sequence.” [Para. 0069] “By using a word embedding algorithm on the port sequences extracted from the previous stage and treating ports as words and port sequence as sentences, one can transform the port sequences into a meaningful numerical feature vectors.”} Also see para. 0110 in Cohen. Port sequences include a number of distinct destination ports. Cohen’s system transforms port sequences into feature vectors.
Cohen is analogous art because each of Doron, Bilge and Cohen pertains to generating embeddings representing network traffic data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron and Bilge to include Cohen’s teaching of the limitations of claim 4, listed above. Doing so “can detect the reoccurrence of previously observed complex attacks as well as novel attack patterns which were not encountered before” (Cohen, para. 0019).
Claim 14:
Regarding claim 14, Doron and Bilge teach the elements of claim 1 as outlined above.
However, Doron and Bilge do not teach wherein the clustering technique comprises a k-means clustering technique clustering the plurality of embeddings into the plurality of clusters, and wherein a number of the plurality of clusters is smaller than a number of the plurality of embeddings.
However, Cohen teaches wherein the clustering technique comprises a k-means clustering technique clustering the plurality of embeddings into the plurality of clusters, and wherein a number of the plurality of clusters is smaller than a number of the plurality of embeddings. {Cohen [Para. 0065] “FIG. 1 illustrates a schematic view of the proposed DANTE framework. A clustering module performs temporal clustering of the feature vectors over time.” [Para. 0086] “At the first step, the most recent data is sorted and aggregated into overlapping time windows.” [Para. 0088] “At the next step, a clustering algorithm is applied to the data of each time window to group the observations, while any batch clustering algorithm can be used. For example: K-means, Fuzzy C-means, Gaussian mixture models, hierarchical clustering, spectral clustering and more.”} Also see paras. 0065 -0069. Cohen’s system performs temporal clustering, which clusters the features vectors over time. For k-means clustering, the number of clusters (denoted by k) can be predefined by a user before running the algorithm (see para. 0088). Therefore, when k is smaller than the total number of embeddings, total number of clusters will be smaller than the total number of embeddings.
Cohen is analogous art because each of Doron, Bilge and Cohen pertains to generating embeddings representing network traffic data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron and Bilge to include Cohen’s teaching of a k-means clustering technique that clusters embeddings into clusters. Doing so “can detect the reoccurrence of previously observed complex attacks as well as novel attack patterns which were not encountered before” (Cohen, para. 0019).
14. Claim 6 is rejected under 35 U.S.C. § 103 as being unpatentable over Doron and Bilge as applied to claims 1 and 2, and further in view of Roesch et al., (US 7,317,693 B1), hereafter Roesch.
Regarding claim 6, Doron teaches the elements of claim 2 as outlined above.
However, Doron and Bilge do not teach wherein the scanner type of the scanner within the predetermined period of time comprises at least one of: a set of time-to-live (TTL) values of the scanner, or a device operating system (OS) type.
However, Roesch teaches wherein the scanner type of the scanner within the predetermined period of time comprises at least one of: a set of time-to-live (TTL) values of the scanner, or a device operating system (OS) type. {Roesch [Col. 17, line 4-8] “A method 1100 for identifying a router on a network from one packet identifying a primary MAC address and a second packet identifying a network device at least one hop away.” [Col. 17, line 31-46] “In step 1140, the number of hops traveled by the second packet is determined from the second plurality of protocol fields. The number of hops traveled is determined by identifying the operating system transmitting the packet and calculating the difference between the time-to-live value of the second packet and the time-to-live default value of the operation system. The second plurality of protocol fields is compared to an operating system identifying structure. A matched operating system is selected. The default starting time-to-live value for the matched operating system is read from the operating system identifying structure. The packet time-to-live value is read from the second plurality of protocol fields. The number of hops traveled is found by comparing the default starting time-to-live value to the packet time-to-live value.”}
Roesch is analogous art because each of Roesch, Doron and Bilge pertains to analyzing data moving across a network. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron and Bilge to include Roesch’s teaching of determining a set of TTL values of a scanner, or a device OS type. Doing so would “improve existing intrusion detection systems or real-time network reporting mechanisms” (Roesch, Col. 33 line 58-62).
15. Claims 15-18 are rejected under 35 U.S.C. § 103 as being unpatentable over Doron, Bilge and Cohen as applied to claims 1 and 14, and further in view of Fan et al., (US 2021/0049452 A1), hereafter Fan.
Regarding claim 15, Cohen teaches the elements of claim 14 as outlined above.
However, Dorn, Bilge and Cohen do not teach wherein the plurality of clusters comprises a first clustering assignment matrix and a second clustering assignment matrix, wherein the first clustering assignment matrix and the second clustering assignment matrix being for adjacent time periods.
However, Fan teaches wherein the plurality of clusters comprises a first clustering assignment matrix and a second clustering assignment matrix, wherein the first clustering assignment matrix and the second clustering assignment matrix being for adjacent time periods. {Fan [Para. 0028] “As shown in FIG. 3A, anomaly detection service 120 may calculate a pairwise inner product of time series within a segment 304 to produce an n*n*3 “image” matrix 306. Matrix 306 may be suitable for processing by GAN (generative adversarial network) 200. In some embodiments, as shown in FIG. 3B, matrix 306 may be further modified into a final input shape 308 for processing by GAN 200. This modification may include appending at least one matrix from at least one adjacent segment 304 to matrix 306 as shown. By appending an adjacent matrix, it may be possible to assemble a time sequence of the output corresponding to the time sequence of the multivariate time series data input. For example, this calculation may proceed as follows. First, it may be assumed that the entire time series related to training (or at least the entire time series for a time period of interest) is pulled from monitored service 110. Anomaly detection service 120 may generate signature (covariance) matrices (n*n) per each time step in training (every 5 minutes in the illustrated example) and per each predefined window size. Then, for a single time step, anomaly detection service 120 may generate three signature matrices associated with different window sizes. These three signature matrices may be used as three channels of image input. However, considering a single time step as input might not reflect the temporal dependency that exist between time steps. Therefore, anomaly detection service 120 may also append previous immediate h steps to the current time step as input, in order to reflect temporal dependencies. The final input of shape (h+1)*n*n*3 may be stored per time step and fed to GAN 200.”} As disclosed in Fan, anomaly detection service 120 generates a first and a second matrices associated with different adjacent time period window sizes.
Fan is analogous art because each of Doron, Bilge, Cohen and Fan pertains to analyzing network traffic to detect network anomalies. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron, Bilge and Cohen to include Fan’s teaching of the limitations of claim 15, listed above. One could use this combination to implement features of claim 15. Doing so would “improve anomaly detection” (Fan, para. 0019).
Claim 16:
Regarding claim 16, Fan teaches the elements of claim 15 as outlined above.
However, Doron, Bilge and Cohen do not teach generating a first probability density function capturing the first clustering assignment matrix: and generating a second probability density function capturing the second clustering assignment matrix.
However, Fan teaches generating a first probability density function capturing the first clustering assignment matrix: and generating a second probability density function capturing the second clustering assignment matrix. {Fan [Para. 0031] “Once GAN 400 has been trained, it may be applied to score anomalies in data input as final input shape 308. As shown in FIG. 4B, this may be performed by fixing both encoder 204 settings, decoder 206 settings, and discriminator 208 setting to the trained settings and passing input data x through GAN 400, where input data x is the final input shape 308 being analyzed. The output of GAN 400 may include a residual matrix representing a difference between input data x and output data x′ and/or a residual matrix representing a difference between z and z′. An anomaly score may be generated based on these matrices, and a threshold difference may be established, where data having an anomaly score below (or equal or below) the threshold are judged as not likely being anomalous, and data have an anomaly score equal or above the threshold are judged as being anomalous.”} Fan’s system assigns an anomaly score to input data and uses two probability density functions for anomaly scoring. The first probability density function captures a first matrix x, and the second probability density function captures a second matrix z.
Fan is analogous art because each of Doron, Bilge, Cohen and Fan pertains to analyzing network traffic to detect network anomalies. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron, Bilge and Cohen to include Fan’s teaching of the limitations of claim 16, listed above. One could use this combination to implement features of claim 16. Doing so would “improve anomaly detection” (Fan, para. 0019).
Claim 17:
Regarding claim 17,Fan teaches the elements of claim 16 as outlined above.
However, Doron, Bilge and Cohen do not teach wherein the detecting the temporal change comprises transmitting an alert based a distance between the first probability density function and the second probability density function.
However, Fan teaches wherein the detecting the temporal change comprises transmitting an alert based on a distance between the first probability density function and the second probability density function. {Fan [Para. 0032] “Anomalous data may refer to time steps in final input shape 308 with abnormal values and/or abnormal correlations between time series in final input shape 308. The trained GAN 400 may be used for testing new samples and detecting anomalous time steps. For each input x of the final input shape 308 in a test set, an output z, x′, and z′ may be generated by the generator's network. The L2 distance between x and x′ and the L2 distance between z and z′ may be calculated and used for score assignment. Abnormal patterns in input data may result in large reconstruction error that is reflected in contextual and latent loss.” [Para. 0046] “At 910, anomaly detection service 120 and/or troubleshooting service 130 may perform troubleshooting (e.g., a remedial action) to address any anomalies detected at 908. After anomaly detection service 120 detects an anomaly, troubleshooting service 130 may alert analysts and data engineers for troubleshooting.”} As disclosed in Fan, anomaly detection service 120 includes a GAN (see para. 0022). The anomaly detection service 120 detects anomalies when the distance between functions are above a threshold, and sends an alert.
Fan is analogous art because each of Doron, Bilge and Cohen and Fan pertains to analyzing network traffic to detect network anomalies. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron, Bilge and Cohen to include Fan’s teaching of the limitations of claim 17, listed above. One could use this combination to implement futures of claim 17. Doing so would “improve anomaly detection” (Fan, para. 0019).
Claim 18:
Regarding claim 18, Fan teaches the elements of claim 17 as outlined above.
However, Doron and Cohen do not teach wherein the distance is a 2-Wasserstein distance on the first probability density function and the second probability density function.
However, Fan teaches wherein the distance is a 2-Wasserstein distance on the first probability density function and the second probability density function. {Fan [Para. 0036] “The performance and/or trainability of discriminator 208 may be enhanced by configuring discriminator 208 to use a Wasserstein function. FIGS. 7A-7D describe a Wasserstein function used by a discriminator 208 of a GAN 400. Wasserstein is a loss function defined to calculate the distance between two distributions. On the other hand, the role of discriminator 208 is to maximize the distance between two distributions of real and fake data. Therefore, the whole objective of discriminator 208 (previously adversarial loss) may be performed by the Wasserstein distance function.”} Fan teaches a N-Wasserstein function, which is used to calculate the distance between two probability distributions.
Fan is analogous art because each of Doron, Bilge, Cohen and Fan pertains to analyzing network traffic to detect network anomalies. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron, Bilge and Cohen to include Fan’s teaching of a 2-Wasserstein distance that measures the distance between two probability distributions. Doing so would “improve anomaly detection” (Fan, para. 0019).
16. Claim 20 is rejected under 35 U.S.C. § 103 as being unpatentable over Doron et al. (US 2019/0182274 A1), hereafter Doron, in view of Cohen et al., (US 2020/0322368 A1), hereafter Cohen, and further in view of Howard et al. (US 2019/0199736 A1), hereafter Howard.
Noted that indicates what the cited art does not teach.
Regarding claim 20, Doron teaches a system for detecting malicious computer activity, comprising: {Doron[Para. 0023] “Techniques for detecting future or subsequent cyber-attacks which may be part of an attack campaign. The prediction is based on processing of events associated with a detected attack.”}
at least one processor; and at least one memory having stored thereon a set of instructions which, when executed by the processor cause the processor to: {Doron [Fig. 4, Para. 0016] “The system comprises a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: receive events data related to cyber-attacks occurring in a network during a predefined time window;”}
at least one network connection in communication with the at least one processor: {Doron [Fig. 4, Para. 0073] “The attack predictor 150 includes a processing circuitry 410 coupled to a memory 415, a storage 420, and a network interface 440. The components of the attack predictor 150 may be communicatively connected via a bus 450.” [Para. 0078] “The network interface 440 allows the attack predictor 150 to communicate with the data sources 115, the protected objects 130, the database 140,… for example, receiving and retrieving events data.”}
receive a first set of Darknet data via the at least one network connection, corresponding to a first temporal period; {Doron [Para. 0068] “At S310, incoming events data is received or retrieved from the data sources (e.g., sources 115). The events data may be received continuously. In another embodiment, the events data are received in batches.” [Para. 0031] “The data sources 115 generate or collect events data related to attacks. The events data may include events, event parameters, or both. The data sources 115 may be third party services such as threat intelligence, DarkNet intelligence, known attacks repositories, and so on. The data sources 115 may be configured to store the events parameters in the database 140, to send the events to the attack predictor 150, or both.”} Data sources 115 collect activity events from multiple sources including DarkNet intelligence during a time period (e.g., a first historic time period).
cluster the first set of Darknet data to create first cluster data; {Doron [Fig. 3, Para. 0059] “The search is performed using matching clusters. That is, a search for a matching cluster of historic sequences is performed, and subsequent attacks are predicted based on the continuation attacks of the historic sequences in the matching cluster. The prediction engine 230 may match between a current sequence and a cluster of historic sequences. The matching uses the sequence signature representation of the current sequence and of the cluster centroid.”} Doron’s system generates a historic cluster from historic sequences of Darknet data.
receive a second set of Darknet data via the at least one network connection, corresponding to a second temporal period: [Para. 0068] “At S310, incoming events data is received or retrieved from the data sources (e.g., sources 115).” [Para. 0031] “The data sources 115 generate or collect events data related to attacks. The events data may include events, event parameters, or both. The data sources 115 may be third party services such as threat intelligence, DarkNet intelligence, known attacks repositories.”} Doron’s system receives current Darknet data during a current time period.
cluster the second set of Darknet data to create second cluster data; {Doron [Fig. 3, Para. 0059] “The search is performed using matching clusters. That is, a search for a matching cluster of historic sequences is performed, and subsequent attacks are predicted based on the continuation attacks of the historic sequences in the matching cluster. The prediction engine 230 may match between a current sequence and a cluster of historic sequences. The matching uses the sequence signature representation of the current sequence and of the cluster centroid.”} Doron’s system generates a current cluster from current sequences of Darknet data.
generate a plurality of similarity scores using a plurality of Jaccard measures to compare the first cluster data and the second cluster data; {Doron [Fig. 3, Para. 0059] “The search is performed using matching clusters. That is, a search for a matching cluster of historic sequences is performed, and subsequent attacks are predicted based on the continuation attacks of the historic sequences in the matching cluster. The prediction engine 230 may match between a current sequence and a cluster of historic sequences. The matching uses the sequence signature representation of the current sequence and of the cluster centroid. That is, the prediction engine 230 may search for a close enough cluster to the current sequence with respect to a distance metric. The distance metric may be, for example, a cosine similarity between vectors, where the vectors represent individual sequence signatures or the cluster centroid.”} Doron’ system compares a historic cluster to a current cluster using a cosine similarity metric.
determine at least one of: (i) an existence of a cluster within the second cluster data that is not within a similarity threshold of any clusters of the first cluster data: or (ii) a change in characteristics of a given cluster from the first cluster data to the second cluster data; {Doron [Para. 0061] “When at least one historic sequence is found, the continuation attacks of the respective historic sequence are predicted as potential continuations for the current sequence (attack campaign). A match may be defined when the distance metric is less than a predefined threshold embodiment. A confidence score or probability may be assigned to the predicted attacks, based on the rate of sequences in the cluster that pointed on an attack as a possible continuation.”} Also see para. 0071. Doron’s system compares a historic cluster to a current cluster to determine if any change has occurred. The prediction engine 230 determines whether a cluster within the current cluster is not within a similarity threshold.
However, Doron does not teach generate a plurality of similarity scores using a plurality of Jaccard measures to compare the first cluster data and the second cluster data; and alerting a user to the determination of (i) or (ii).
However, Cohen teaches generate a plurality of similarity scores using a plurality of Jaccard measures to compare the first cluster data and the second cluster data; {Cohen [Para. 0089] “Between time window Ti and time window Ti+1, the number of clusters and their types can change. Moreover, a cluster in Ti+1 can be a current cluster (also found in Ti), an old cluster (found in Tj where j<i), or a new cluster.” [Para. 0090] “To annotate the clusters in Ti+1, first there is a need to find the current clusters by comparing Ti and Ti+1. A cluster in Ti+1 is mapped to a cluster in Ti if there is a significant overlap of observations between them. The overlap is been measured using the Jaccard similarity metric (a percentage of how many objects two sets have in common out of how many objects they have total).” [Para. 0091] “Clusters which have a high Jaccard Similarity Score have a large number of overlapping observations and thus are considered to be the same pattern. By using the distributed system, the Jaccard similarity of all of the clusters in Ti+1 with the clusters in Ti is simultaneously calculated. If the Jaccard similarity is above a certain threshold for two clusters, then the cluster from Ti+1 is considered to be the same as the cluster from Ti (i.e., current cluster)”} See para. 0089-0093 for additional details. Cohen’s system generates a Jaccard similarity score to compare two clusters.
and alerting a user to the determination of (i) or (ii). {Cohen [Para. 0032] “f) upon identifying clusters that have been appeared and classified as malicious in the past or clusters that have never seen before, issuing an alert.” [Para. 0072] “In the first type, the attacker is trying to conceal himself by adding dummy port access as noise. A simple way to deal with this attack group is to include an alert rule that issues alert when a cluster that has never seen before is seen, as those attacks will create a new cluster. In the second type, the attacker will try to disguise himself as a pattern that belongs to a known cluster, such as a cluster that consists of a popular port sequence pattern. To deal with this type, it is possible to create an alert rule to issue an alert when a cluster dramatically increases in size.”} Also see paras. 0026-32, 0065, and 0071-0072.
Cohen is analogous art because each of Doron and Cohen pertains to analyzing and clustering darknet traffic data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron to include Cohen’s teaching of the limitations of claim 20, listed above. Doing so “can detect the reoccurrence of previously observed complex attacks as well as novel attack patterns which were not encountered before” (Cohen, para. 0019).
However, Cohen also does not teach generate a plurality of similarity scores using a plurality of Jaccard measures to compare the first cluster data and the second cluster data.
However, Howard teaches generate a plurality of similarity scores using a plurality of Jaccard measures to compare the first cluster data and the second cluster data. {Howard [Para. 0126] “On the cloud, a received binary is first passed through the Featurizer 610. The features go through the Autoencoder to produce autoencoded features. The autoencoded features go through the Classifier 630, which this time uses the Complete Sample Database that contains all samples that have been seen previously. Again, the search for similar samples is made efficient using LSH (locality-sensitive hashing). Similarly to on the endpoint, the binary is classified to its nearest neighbor, and a confidence value is calculated. The near-neighbors that are returned by the LSH search are each compared to the target binary by their actual features, and that is the distance metric that is used. The distance measurement used is the Jaccard index between the two sets of features from the two binaries. The Jaccard indexes from the various types of features are weighted and combined to get an overall distance measurement.”} Howard’s system generates a plurality of Jaccard similarity scores to compare two pieces of data.
Howard is analogous art because each of Doron, Cohen and Howard pertains to generating embeddings representing network traffic data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Doron and Cohen to include Howard’s teaching of the limitations of claim 20, listed above. One could use the combination to implement features of claim 20. Doing so would “provide for preemptive defense, enabling a defender to predict potential future attacks” (Howard, para. 0067).
Conclusion
17. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
18. Any inquiry concerning this communication or earlier communications from the examiner should be directed to BIN QING ZHENG whose telephone number is (703)756-1535. The examiner can normally be reached on M-F 9:30 am -5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Philip J. Chea can be reached on 571-272-3951. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BIN QING ZHENG/
Examiner, Art Unit 2499
/PHILIP J CHEA/Supervisory Patent Examiner, Art Unit 2499