DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
NO restrictions warranted at applicant’s initial time of filing for patent.
Priority
The instant application is a CONtinuation and claims domestic priority under 35 USC 120 to non – provisional application # 17/448267 filed on 09/21/2021, now US PAT # 12135802; which further claims domestic priority under 35 USC 119e to provisional application # 63/203683, filed on 07/28/2021.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/27/2024, the submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
Applicant’s drawings filed on 09/27/2024 have been inspected and is in compliance with MPEP 608.02.
Specification
Applicant’s specification filed on 09/27/2024 has been inspected and is in compliance with MPEP 608.01.
Claim Objections
NO claim objections warranted at applicant’s time of filing for CONtinuation.
Claim Interpretation – 35 USC 112th f
It is in the examiner’s opinion that claims 1-22 do not invoke means for or step plus functional claim language under the meaning of the statute.
Claim Rejections – 35 USC § 112
NO rejections warranted at applicant’s time of filing for CONtinuation.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Applying the Alice/Mayo framework:
Step 1 – Statutory categoryClaim 1 is drawn to a “method” and therefore recites a process, which is one of the four statutory categories of invention under 35 U.S.C. 101.
Step 2A, Prong I – Whether the claim recites a judicial exception (abstract portions only)Under Step 2A, Prong I, the claim is analyzed to determine whether it recites a law of nature, a natural phenomenon, or an abstract idea, focusing here on the abstract portions of the claim.
Stripped to its abstract aspects, claim 1 recites the following limitations:
“searching a plurality of maps for [a] set of tokens,” i.e., looking up information (tokens) in stored structures (maps).
“the plurality of maps associate hash values of tokens … to locations of the tokens within [a] dataset,” i.e., storing and organizing information about where particular content appears in a dataset.
“determining whether token matches for a record of the dataset complete a data field pattern defined by a … rule,” i.e., evaluating whether the retrieved information satisfies a predefined pattern or condition.
“prioritizing token matches … [from one map] over token matches [from another map]” when making that determination, i.e., applying a heuristic about which information to consider first in the decision.
“indicating violation of the … rule based on a determination that token matches for a record … complete the data field pattern,” i.e., outputting the result of the rule-based evaluation.
These limitations describe receiving or identifying information, comparing that information to stored information, applying a rule-defined pattern to decide whether conditions are met, ordering the evaluation of certain information, and then reporting the decision. Such operations are mental steps and information-analysis activities that can, in principle, be performed conceptually by a person with access to the data and rules, and they also amount to managing and enforcing policies, which is a form of organizing human activity. Therefore, under Step 2A, Prong I, claim 1 recites an abstract idea.
Step 2A, Prong II – Whether the claim is “directed to” the judicial exception (additional elements / practical application)Under Step 2A, Prong II, the claim is evaluated as a whole to determine whether the judicial exception identified in Prong I is integrated into a practical application. In this prong, the analysis considers the additional elements in the claim beyond the abstract portions listed above.
In claim 1, those additional elements include:
The context that the “set of tokens” is “generated from a data-in-motion object,” and that the “data field pattern” is “defined by a data leakage prevention rule for the dataset,” tying the abstract evaluation to data-loss-prevention (DLP) for data-in-motion.
The requirement that “a first and a second of the plurality of maps each corresponds to a different class of token frequency within the dataset,” and that different token-frequency classes (e.g., unique, infrequent, frequent) are defined based on a configurable threshold, as explained in the specification.
As detailed in the specification and related claims, the maps can be constructed using perfect or minimal perfect hashing functions and ordered sets, and the method is implemented using generic hardware components (e.g., processor, memory, non-transitory computer-readable media) and software modules (e.g., lexer, normalizer, encoder, MPHF query functions).
These additional elements limit the abstract data-matching and rule-evaluation concept to the particular technological field of DLP in cybersecurity and specify particular data-structure choices (frequency-class maps, configurable thresholds, MPHF-based maps) and generic computer implementation for performing the abstract analysis. However, they do not reflect an improvement to the functioning of the computer itself or another technology, do not effect a transformation of an article to a different state or thing, and do not impose a meaningful limit beyond using a general-purpose computer in a specific environment to implement the abstract steps. Under MPEP 2106.05 and cases such as Electric Power Group and Recentive, field-of-use limitations, data-structure optimizations, and generic computer implementation do not, without more, integrate an abstract idea into a practical application.
Accordingly, under Step 2A, Prong II, claim 1 is “directed to” the abstract idea identified in Prong I and is not integrated into a practical application.
Step 2B – Whether the claim recites “significantly more” than the judicial exception (inventive concept)Under Step 2B, the claim is evaluated to determine whether any additional element, or combination of elements, amounts to significantly more than the judicial exception itself.
As discussed, the additional elements include: (1) restricting the method to DLP for data-in-motion via a DLP rule; (2) organizing dataset tokens into frequency classes (unique, infrequent, frequent) using a configurable threshold; (3) constructing multiple maps using perfect or minimal perfect hashing functions and ordered sets for the different frequency classes; and (4) implementing the method on conventional computer hardware with standard software components. The specification acknowledges that minimal perfect hashing and related data-structure techniques are available through known libraries (e.g., emphf, CMPH, CHD, BBHash) and are used to build compact and efficient lookup tables. Using such known hashing and indexing techniques, combined with straightforward frequency-based splitting of keys, represents well-understood, routine, and conventional computer activity to optimize performance when implementing an otherwise abstract matching and rule-evaluation process.
The ordering of evaluation (e.g., prioritizing lower-frequency classes) and related algorithmic heuristics merely implement the abstract idea more efficiently and do not constitute a technological improvement or unconventional computer implementation. Under Alice, Electric Power Group, and Recentive, such efficiency-focused optimizations, when performed on generic computer components using known techniques, do not supply an “inventive concept” that transforms the abstract idea into patent-eligible subject matter.
Considering the claim elements individually and in combination, claim 1 amounts to no more than implementing the abstract concept of token-based matching and DLP rule enforcement using standard hashing, mapping, and indexing techniques on conventional computing hardware. The claim therefore does not recite significantly more than the judicial exception under Step 2B.
Regarding claim 2, the additional limitation of “invoking a security action based on indication of violation of the data leakage prevention rule” merely specifies what to do with the abstract determination (e.g., block, track, notify) and thus remains within the abstract idea of managing and enforcing policies; it does not integrate the exception into a practical application or add significantly more.
Regarding claim 3, the additional limitation of first attempting to complete the data-field pattern using the first and second maps (unique and infrequent tokens) and subsequently using a third map (frequent tokens), with classification based on a configurable frequency threshold, is an algorithmic refinement to the order in which information is evaluated and how tokens are grouped; this remains part of the abstract information-analysis idea and does not constitute a technological improvement or unconventional computer implementation.
Regarding claim 4, the additional limitation that at least a first of the maps is constructed based on perfect hashing or minimal perfect hashing merely recites using known hashing techniques and data structures to implement the abstract matching more efficiently and does not integrate the abstract idea into a practical application or amount to significantly more than the judicial exception.
Regarding claim 5, the additional limitation that token frequency classes comprise unique, infrequent, and frequent tokens separated by a configurable threshold simply further characterizes how the abstract matching algorithm partitions tokens; this remains within the abstract data-organization and analysis idea and does not add a non-abstract technological feature.
Regarding claim 6, the additional limitation that each map comprises a minimal perfect hashing function and an ordered set of tokens ordered according to the hash positions recites a particular known way of implementing a map structure; it is a conventional computer-science technique for efficient key-value lookup and, as such, does not integrate the abstract idea into a practical application or add an inventive concept.
Regarding claim 7, the claim recites program code stored on a non-transitory computer-readable medium to perform the same abstract operations as claim 1 (generating tokens from a data-in-motion object, querying minimal perfect hashing functions built from unique and infrequent token sets, determining data-field indexes and record indicators, and determining whether the indexes complete a DLP pattern); expressing the abstract method as instructions on generic storage does not alter the 101 analysis or add significantly more than the abstract idea.
Regarding claim 8, the additional instructions to instantiate and update a tracking data structure and to base the pattern-completion determination on that structure mirror the abstract bookkeeping already described for the method; this remains information organization within the abstract idea and is implemented on generic hardware, so it does not integrate the exception into a practical application.
Regarding claim 9, the additional limitation that a third minimal perfect hashing function was created from a key set of frequent tokens that are neither unique nor infrequent recites another MPHF instance over a different token subset; this is a straightforward application of known MPHF techniques and frequency-based partitioning for the same abstract matching, and therefore does not add significantly more.
Regarding claim 10, the additional instructions to determine data-field indexes for tokens that hit in the third MPHF, track those matches, and then decide whether those matches complete the pattern after results from the first and second MPHFs do not suffice is an algorithmic refinement for using frequent tokens as “fill-ins” in the abstract matching logic; it remains within the abstract idea and does not provide a technological improvement.
Regarding claim 11, the additional instructions to verify that selected frequent-token matches actually occur for records by querying a fourth MPHF with token+record combinations describe a validation step implemented with another known hash-based structure; this is still part of the abstract data-analysis scheme and does not integrate the exception into a practical application.
Regarding claim 12, the additional limitation that the verification step comprises querying a fourth MPHF and determining whether the result is a hit is a specific data-structure choice for performing that validation; as with other MPHF uses, it is a known CS technique for efficient membership checking and does not add significantly more than the abstract idea.
Regarding claim 13, the additional instructions to handle multiple-term tokens by querying a third MPHF keyed by record indicator and data-field for unmatched fields, obtaining multi-term tokens, and then searching the first set of tokens for those multi-term tokens simply extend the abstract matching logic to multi-term strings using another hash-based map; it remains information analysis and does not provide a technical improvement beyond the abstract idea.
Regarding claim 14, the additional limitation that generating the tokens includes parsing the data-in-motion object and encoding the initial tokens according to encoding techniques applied to the dataset describes conventional pre-processing (lexing, normalizing, hashing) for implementing the abstract matching; this is generic computer processing and does not integrate the exception into a practical application.
Regarding claim 15, the additional instructions to determine token frequency in the dataset and construct the set of MPHFs based on those frequencies mirror the method-claim additions and represent routine CS techniques for building efficient hash-based indices; they remain part of the abstract matching scheme and do not add significantly more.
Regarding claim 16, the additional instructions to indicate violation of the DLP rule when the data-field indexes complete the pattern simply restate the abstract notion of reporting the policy-evaluation result and do not provide a technological improvement beyond the abstract idea.
Regarding claim 17, the claim recites a processor and non-transitory computer-readable medium storing instructions to perform the same operations as claim 7/claim 1; this is a standard “system” formulation of the same abstract method, implemented on generic hardware, and therefore does not change the 101 analysis or add significantly more.
Regarding claim 18, the additional instructions to instantiate and update a tracking data structure are the system counterpart to the method/medium tracking limitations; they describe information organization on generic hardware within the abstract matching logic and do not integrate the abstract idea into a practical application.
Regarding claim 19, the additional instructions to use a third MPHF for frequent tokens, determine data-field indexes for hits in that MPHF, and track those matches mirror the medium/method frequent-token handling and remain algorithmic refinements; they do not add a non-abstract technological feature.
Regarding claim 20, the additional instructions to, for each record with a partially complete pattern, select frequent-token matches that would complete the pattern and verify them via a fourth MPHF are apparatus-form versions of the validation scheme; this still implements the same abstract analysis using known data-structure techniques on generic hardware, without supplying significantly more.
Regarding claim 21, the additional instructions to, when first/second MPHFs cannot complete the pattern and the dataset includes multi-term tokens, query a third MPHF keyed by record+field and match multi-term tokens, simply implement multi-term matching within the same abstract DLP evaluation on generic hardware; it does not integrate the exception into a practical application.
Regarding claim 22, the additional instructions to determine token frequency in the dataset and construct the initial set of MPHFs based on frequency mirror earlier claims; again, they describe routine computer-science techniques for building indexes and thus do not add an inventive concept beyond the abstract idea.
Appropriate action required.
Double Patenting
The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A non-statutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on non-statutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a non-statutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based e-Terminal Disclaimer may be filled out completely online using web-screens. An e-Terminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about e-Terminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claim[s] 1 – 19, 21, 22 are rejected on the ground of non-statutory double patenting as being unpatentable over claim[s] 1 - 16 of U.S. Patent No. 12135802.
Although the claims at issue are not identical, they are not patentably distinct from each other because the subject matter of the pending application and the reference patent is the same or similar in subject matter and are not distinct in any manner:
Generating one or more tokens for a data in motion object; then based on the tokens determine whether the data in motion object has violated a data leakage prevention rule for a data set. The violation caused by a data in motion object of data set is known by determination of a first and second minimal perfect hashes of the tokens of the data set. Then a determination of the frequencies of tokens are classified [i.e. infrequent, frequent..etc.] of the data set by use of the first and second minimal perfecting hashes, violation is indicated for a record of dataset when a hash of token matches at pattern in a data field of the record.
Also, see the table below for a claim-by-claim comparison.
Pending US Application # 18/899226
US PAT # 12135802 (reference)
1. A method comprising:
for a set of tokens generated from a data-in-motion object, searching a plurality of maps for the set of tokens, wherein the plurality of maps associate hash values of tokens occurring in a dataset to locations of the tokens within the dataset, wherein a first and a second of the plurality of maps each corresponds to a different class of token frequency within the dataset; and
based on the search of the plurality of map for the set of tokens, determining whether token matches for a record of the dataset complete a data field pattern defined by a data leakage prevention rule for the dataset, wherein determining whether token matches complete a data field pattern comprises prioritizing token matches in the map corresponding to the least frequency class over token matches in the map corresponding to a greater frequency class; and
indicating violation of the data leakage prevention rule based on a determination that token matches for a record of the dataset complete the data field pattern.
11. (Currently Amended) A non-transitory, computer-readable medium having program code stored thereon that are executable by a computing device, the program code comprising instructions to:
for a set of tokens generated from a data-in-motion object, search a plurality of maps for the set of tokens, wherein the plurality of maps map hash values of tokens occurring in a dataset to locations of the tokens within the dataset, wherein a first and a second of the plurality of maps each corresponds to a different classes of token frequency of occurrence for the tokens within the dataset; and
based on the search of the plurality of maps for the set of tokens, determine whether token matches for a record of the dataset complete a data field pattern defined by a data leakage prevention rule for the dataset, wherein the instructions to determine whether token matches complete a data field pattern comprise instructions to prioritize token matches in with priority for completion being from the map corresponding to the least frequency class of occurrence over token matches into the map corresponding to a greatest greater frequency class occurrent; and
indicate violation of the data leakage prevention rule based on a determination that token matches for a record of the dataset complete the data field pattern.
2. The method of claim 1 further comprising invoking a security action based on indication of violation of the data leakage prevention rule.
12. (Original) The computer-readable medium of claim 11, wherein the program code further comprises instructions to invoke a security action based on indication of violation of the data leakage prevention rule.
3. The method of claim 1, wherein determining whether token matches for a record of the dataset complete the data field pattern defined by the data leakage prevention rule comprises attempting to complete the data field pattern for records of the dataset with token matches based on searches of the first and the second map of the plurality of maps and subsequently attempting to complete the data field pattern with token matches based on searches of a third map of the plurality of maps for records indicated from the searches of the first and second maps, wherein the first map corresponds to unique tokens within the dataset, the second map corresponds to tokens within the dataset in an infrequent frequency class, and
the third map corresponds to tokens within the dataset in a frequent frequency class, wherein the classification of token frequency as infrequent or frequent is based on a configurable frequency threshold.
13. (Currently Amended) The computer-readable medium of claim 11, wherein the program code to determine whether token matches for a record of the dataset complete the data field pattern defined by the data leakage prevention rule comprises program code to attempt to
complete the data field pattern for records of the dataset with token matches based on searches of the first and the second map of the plurality of maps and subsequently attempt to complete the data field pattern with token matches based on searches of a third
map of the plurality of maps for records indicated from the searches of the first and second maps, wherein the first map corresponds to unique tokens within the dataset, the second map corresponds to infrequent-tokens within the dataset in an infrequent frequency class, and
the third map corresponds to frequent tokens within the dataset in a frequent frequency class,
wherein the classification of token frequency as infrequent or frequent is based on a configurable frequency threshold.
4. The method of claim 1, wherein at least a first of the plurality of maps is constructed based on perfect hashing or minimal perfect hashing.
14. (Original) The computer-readable medium of claim 11, wherein at least a first of the plurality of maps is constructed based on perfect hashing or minimal perfect hashing.
5. The method of claim 1, wherein the different token frequency classes comprise unique tokens, infrequent tokens, and frequent tokens, wherein a configurable threshold separates the class of infrequent tokens from the class of frequent tokens.
15. (Currently Amended) The computer-readable medium of claim 11, wherein the different token frequency classes of frequency of occurrence-comprise unique tokens, infrequent tokens, and frequent tokens, wherein a configurable threshold separates the class of infrequent tokens from the class of frequent tokens.
6. The method of claim 1, wherein each of the maps comprises a minimal perfect hashing function constructed from a key set of tokens corresponding to the one of the different token frequency classes and an ordered set of the tokens forming the key set that is ordered according to the positions determined from the minimal perfect hashing function.
16. (Currently Amended) The computer-readable medium of claim 11, wherein each of the maps comprises a minimal perfect hashing function constructed from a key set of tokens corresponding to the one of the different token frequency classes of token frequency and an ordered set of the tokens forming the key set that is ordered according to the positions determined from the minimal perfect hashing function.
7. A non-transitory, computer-readable medium having program code stored thereon that are executable by a computing device, the program code comprising instructions to:
generate a first set of one or more tokens from a data-in-motion object; and
based at least partly on the first set of tokens, determine whether the data-in- motion object violates a data leakage prevention rule for a dataset, wherein the instructions to determine whether the data-in-motion object violates the data leakage prevention rule for the dataset comprise instructions to,
query each of a first set of minimal perfect hashing functions with each of the first set of tokens, wherein the first set of minimal perfect hashing functions at least includes a first minimal perfect hashing function created from a key set of unique tokens within the dataset and a second minimal perfect hashing function created from a key set of infrequent tokens that occur within the dataset at a frequency that satisfies a defined frequency criterion for a token to be classified as infrequent;
determine one or more data field indexes and one or more record indicators of the dataset for those of the first set of tokens that hit in at least one of the first and second minimal perfect hashing functions; and
determine whether the data field indexes complete a data field pattern specified by the data leakage prevention rule for at least one record of the dataset indicated by one of the record indicators.
1. (Currently Amended) A method comprising:
generating a first set of one or more tokens from a data-in-motion object; and
based at least partly on the first set of tokens, determining whether the data-in-motion object violates a data leakage prevention rule for a dataset, wherein determining whether the data-in-motion object violates the data leakage prevention rule for the dataset comprises,
querying each of a first set of plurality of minimal perfect hashing functions with each of the first set of tokens, wherein each of the plurality of minimal perfect hashing functions corresponds to a different class of token frequency within the dataset, wherein the first set of minimal perfect hashing functions at least includes a first of the plurality of minimal perfect hashing functions was created from a key set of unique tokens within the dataset and a second of the plurality of minimal perfect hashing functions was created from a key set of infrequent tokens that occur infrequently within the dataset at a frequency that satisfies according to a defined frequency criterion for a token to be classified as infrequent;
determining one or more data field indexes and one or more record indicators of the dataset for those of the first set of tokens that hit in at least one of the first and second minimal perfect hashing functions; and
determining whether the data field indexes complete a data field pattern specified by the data leakage prevention rule for at least one record of the dataset indicated by one of the record indicators.
8. The computer-readable medium of claim 7, wherein the program code further comprises instructions to:
instantiate a tracking data structure according to the data field pattern specified by the data leakage prevention rule to track matches of tokens of the first set of tokens to tokens of the dataset; and
update the tracking data structure based, at least in part, on results of querying and determination of the data field indexes, wherein the instructions to determine whether the data field indexes complete the data field pattern is based, at least in part, on the tracking data structure.
2. (Original) The method of claim 1 further comprising:
instantiating a tracking data structure according to the data field pattern specified by the data leakage prevention rule to track matches of tokens of the first set of tokens to tokens of the dataset; and
updating the tracking data structure based, at least in part, on results of the querying and determination of the data field indexes, wherein determining whether the data field indexes complete the data field pattern is based, at least in part, on the tracking data structure.
9. The computer-readable medium of claim 7, wherein a third of the plurality of minimal perfect hashing functions was created from a key set of frequent tokens that are neither unique nor infrequent.
3. (Currently Amended) The method of claim 1, wherein the first set of minimal perfect hashing functions also includes a third of the plurality of minimal perfect hashing functions was created from a key set of frequent tokens that are neither unique nor infrequent occurrence at a frequency that satisfies the defined frequency criterion.
10. The computer-readable medium of claim 9, wherein the program code further comprises instructions to:
determine one or more data field indexes of the dataset for those of the first set of tokens that hit in the third minimal perfect hashing function;
track those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function; and
based on a determination that results from querying the first and second minimal perfect hashing functions do not complete the data field pattern for at least one record of the dataset, determine whether those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function can complete the data field pattern for at least one of the records.
4. (Currently Amended) The method of claim 3 further comprising:
determining one or more data field indexes of the dataset for those of the first set of tokens that hit in the third minimal perfect hashing function;
tracking those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function; and
based on a determination that results from querying the first and second minimal perfect hashing functions do not complete the data field pattern for at least one record of the dataset, determining whether those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function can complete the data field pattern for at least one of the records.
11. The computer-readable medium of claim 10, wherein the instructions to determine whether those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function can complete the data field pattern for at least one of the records comprise instructions:
for each record indicator with a partially complete data field pattern based on results of querying the first and second minimal perfect hashing functions, select each of those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function and that correspond to a data field index that would complete the partially complete data field pattern; and
verify that the selected token occurs for a record indicated by the record indicator.
5. (Currently Amended) The method of claim 4, wherein determining whether those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function can complete the data field pattern for at least one of the records comprises:
for each record indicator with a partially complete data field pattern based on results of querying the first and second minimal perfect hashing functions, selecting each of those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function and that correspond to a data field index that would complete the partially complete data field pattern; and
verifying that the selected token occurs for a record indicated by the record indicator.
12. The computer-readable medium of claim 11, wherein the instructions to verify that the selected token occurs for a record indicated by the record indicator comprise instructions to query a fourth of the plurality of minimal perfect hashing functions with a combination of the selected token and the record indicator and to determine whether a result of the querying is a hit.
6. (Currently Amended) The method of claim 5, wherein verifying that the selected token occurs for a record indicated by the record indicator comprises querying a fourth of the plurality of minimal perfect hashing functions with a combination of the selected token and the record indicator and determining whether a result of the querying is a hit.
13. The computer-readable medium of claim 7, wherein the program code further comprises instructions to:
based on a determination that results from querying the first and second minimal perfect hashing functions do not complete the data field pattern for at least one record of the dataset and a determination that the dataset includes multiple term tokens, query a third of the plurality of minimal perfect hashing functions with a combination of a record indicator and a data field of the data field pattern not yet matched, wherein the third minimal perfect hashing function was created from a key set of record indicators combined with data field indexes for each multiple term token of the dataset; and
for each multiple term token obtained based on querying the third minimal perfect hashing function, search the first set of tokens for a match with the multiple term token.
7. (Currently Amended) The method of claim 1 further comprising:
based on a determination that results from querying the first and second minimal perfect hashing functions do not complete the data field pattern for at least one record of the dataset and a determination that the dataset includes multiple term tokens, querying a third of the first set of plurality of minimal perfect hashing functions with a combination of a record indicator and a data field of the data field pattern not yet matched, wherein the third minimal perfect hashing function was created from a key set of record indicators combined with data field indexes for each multiple term token of the dataset; and
for each multiple term token obtained based on querying the third minimal perfect hashing function, searching the first set of tokens for a match with the multiple term token.
14. The computer-readable medium of claim 7, wherein the instructions to generate the first set of one or more tokens from the data-in-motion object comprise instructions to parse the data-in-motion object to generate initial tokens and then to encode the initial tokens according to one or more encoding techniques applied to the dataset.
8. (Original) The method of claim 1, wherein generating the first set of one or more tokens from the data-in-motion object comprises parsing the data-in-motion object to generate initial tokens and then encoding the initial tokens according to one or more encoding techniques applied to the dataset.
15. The computer-readable medium of claim 7, wherein the program code further comprises instructions to determine frequency of occurrence of tokens within the dataset and construct the first set of minimal perfect hashing functions based, at least in part, on the frequency of occurrence of the tokens.
9. (Currently Amended) The method of claim 1 further comprising determining frequency of occurrence of tokens within the dataset and constructing the first set of minimal perfect hashing functions based, at least in part, on the frequency of occurrence of the tokens.
16. The computer-readable medium of claim 7, wherein the program code further comprises instructions to indicate violation of the data leakage prevention rule based on a determination that the data field indexes complete the data field pattern for at least one of the records of the dataset.
10. (Original) The method of claim 1 further comprising indicating violation of the data leakage prevention rule based on determining that the data field indexes complete the data field pattern for at least one of the records of the dataset.
17. An apparatus comprising:
a processor; and
a non-transitory computer-readable medium having instructions stored thereon that are executable by the processor to cause the apparatus to,
generate a first set of one or more tokens from a data-in-motion object; and
based at least partly on the first set of tokens, determine whether the data-in- motion object violates a data leakage prevention rule for a dataset, wherein the instructions to determine whether the data-in-motion object violates the data leakage prevention rule for the dataset comprise instructions to, query each of a first set of minimal perfect hashing functions with each of the first set of tokens, wherein the first set of minimal perfect hashing functions at least includes a first minimal perfect hashing function created from a key set of unique tokens within the dataset and a second minimal perfect hashing function created from a key set of infrequent tokens that occur within the dataset at a frequency that satisfies a defined frequency criterion for a token to be classified as infrequent;
determine one or more data field indexes and one or more record indicators of the dataset for those of the first set of tokens that hit in at least one of the first and second minimal perfect hashing functions; and
determine whether the data field indexes complete a data field pattern specified by the data leakage prevention rule for at least one record of the dataset indicated by one of the record indicators.
1. (Currently Amended) A method comprising:
generating a first set of one or more tokens from a data-in-motion object; and
based at least partly on the first set of tokens, determining whether the data-in-motion object violates a data leakage prevention rule for a dataset, wherein determining whether the data-in-motion object violates the data leakage prevention rule for the dataset comprises, querying each of a first set of plurality of minimal perfect hashing functions with each of the first set of tokens, wherein each of the plurality of minimal perfect hashing functions corresponds to a different class of token frequency within the dataset, wherein the first set of minimal perfect hashing functions at least includes a first of the plurality of minimal perfect hashing functions was created from a key set of unique tokens within the dataset and a second of the plurality of minimal perfect hashing functions was created from a key set of infrequent tokens that occur infrequently within the dataset at a frequency that satisfies according to a defined frequency criterion for a token to be classified as infrequent;
determining one or more data field indexes and one or more record indicators of the dataset for those of the first set of tokens that hit in at least one of the first and second minimal perfect hashing functions; and
determining whether the data field indexes complete a data field pattern specified by the data leakage prevention rule for at least one record of the dataset indicated by one of the record indicators.
18. The apparatus of claim 17, wherein the non-transitory computer-readable medium further has stored thereon instructions executable by the processor to cause the apparatus to:
instantiate a tracking data structure according to the data field pattern specified by the data leakage prevention rule to track matches of tokens of the first set of tokens to tokens of the dataset; and
update the tracking data structure based, at least in part, on results of querying and determination of the data field indexes, wherein the instructions to determine whether the data field indexes complete the data field pattern is based, at least in part, on the tracking data structure.
2. (Original) The method of claim 1 further comprising:
instantiating a tracking data structure according to the data field pattern specified by the data leakage prevention rule to track matches of tokens of the first set of tokens to tokens of the dataset; and
updating the tracking data structure based, at least in part, on results of the querying and determination of the data field indexes, wherein determining whether the data field indexes complete the data field pattern is based, at least in part, on the tracking data structure.
19. The apparatus of claim 17, wherein the non-transitory computer-readable medium further has stored thereon instructions executable by the processor to cause the apparatus to:
determine one or more data field indexes of the dataset for those of the first set of tokens that hit in a third minimal perfect hashing function, wherein the third of the plurality of minimal perfect hashing functions was created from a key set of frequent tokens that are neither unique nor infrequent;
track those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function; and
based on a determination that results from querying the first and second minimal perfect hashing functions do not complete the data field pattern for at least one record of the dataset, determine whether those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function can complete the data field pattern for at least one of the records.
4. (Currently Amended) The method of claim 3 further comprising:
determining one or more data field indexes of the dataset for those of the first set of tokens that hit in the third minimal perfect hashing function;
tracking those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function; and
based on a determination that results from querying the first and second minimal perfect hashing functions do not complete the data field pattern for at least one record of the dataset, determining whether those of the first set of tokens that match tokens of the dataset based, at least in part, on hits in the third minimal perfect hashing function can complete the data field pattern for at least one of the records.
21. The apparatus of claim 17, wherein the non-transitory computer-readable medium further has stored thereon instructions executable by the processor to cause the apparatus to:
based on a determination that results from querying the first and second minimal perfect hashing functions do not complete the data field pattern for at least one record of the dataset and a determination that the dataset includes multiple term tokens, query a third of the plurality of minimal perfect hashing functions with a combination of a record indicator and a data field of the data field pattern not yet matched, wherein the third minimal perfect hashing function was created from a key set of record indicators combined with data field indexes for each multiple term token of the dataset; and
for each multiple term token obtained based on querying the third minimal perfect hashing function, search the first set of tokens for a match with the multiple term token.
7. (Currently Amended) The method of claim 1 further comprising:
based on a determination that results from querying the first and second minimal perfect hashing functions do not complete the data field pattern for at least one record of the dataset and a determination that the dataset includes multiple term tokens, querying a third of the first set of plurality of minimal perfect hashing functions with a combination of a record indicator and a data field of the data field pattern not yet matched, wherein the third minimal perfect hashing function was created from a key set of record indicators combined with data field indexes for each multiple term token of the dataset; and
for each multiple term token obtained based on querying the third minimal perfect hashing function, searching the first set of tokens for a match with the multiple term token.
22. The apparatus of claim 17, wherein the non-transitory computer-readable medium further has stored thereon instructions executable by the processor to cause the apparatus to
determine frequency of occurrence of tokens within the dataset and construct the first set of minimal perfect hashing functions based, at least in part, on the frequency of occurrence of the tokens.
9. (Currently Amended) The method of claim 1 further comprising
determining frequency of occurrence of tokens within the dataset and constructing the first set of minimal perfect hashing functions based, at least in part, on the frequency of occurrence of the tokens.
Claim Rejections - 35 USC § 102
NO rejections warranted at applicant’s time of filing for CONtinuation.
Claim Rejections - 35 USC § 103
NO rejections warranted at applicant’s time of filing for CONtinuation.
Allowable Subject Matter
Claim[s] 1 – 22 contain allowable subject matter, but as allowable subject matter has been indicated, applicant's reply must either comply with all formal requirements or specifically traverse each requirement not complied with. See 37 CFR 1.111(b) and MPEP § 707.07(a).
***The examiner notes that a reason’s for allowance can be written in the next subsequent office action once all formal requirements as identified above have been overcome.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANT SHAIFER - HARRIMAN whose telephone number is (571)272-7910. The examiner can normally be reached M - F: 9am to 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kambiz Zand can be reached at 571- 272- 3811. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DANT B SHAIFER HARRIMAN/ Primary Examiner, Art Unit 2434