DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claim(s) 11-15, 17 is/are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 4, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al. (US 2020/0327445 A1) and further in view of JALALI et al. (US 2021/0406474 A1).
Re Claim 1 & 19, Yu teaches an apparatus to update a set of clusters representative of a classification of a text-based dataset into a plurality of different text types for use in a cyber security system, the apparatus comprising:
a receiving module configured to receive text data associated with an entity; (Yu; FIG. 1-7; Summary, ¶ [0013]-[0019], [0028]; The transmission of text data associated with an entity.)
a generating module coupled to the receiving module, wherein the generating module is configured to generate one or more vector embeddings representative of the text data; (Yu; FIG. 1-7; Summary, ¶ [0013]-[0019], [0028]-[0038]; Generating embedding vectors of the text data.)
Yu does not explicitly suggest a learning module coupled to the generating module, wherein the learning module is configured to use incremental learning to update the set of clusters based on the one or more vector embeddings; and wherein instructions implemented in software for the receiving module, the generating module, and the learning module are configured to be stored in one or more non-transitory storage mediums to be executed by one or more processing units.
However, in analogous art, JALALI teaches a learning module coupled to the generating module, wherein the learning module is configured to use incremental learning to update the set of clusters based on the one or more vector embeddings; and (JALALI; FIG. 1-6; Summary, ¶ [0072]-[0093]; The embodiment(s) detail k-mean incremental learning to update clusters, based on vector embedding.)
wherein instructions implemented in software for the receiving module, the generating module, and the learning module are configured to be stored in one or more non-transitory storage mediums to be executed by one or more processing units. (JALALI; FIG. 1-6; Summary, ¶ [0044], [0072]-[0093], [0106]-[0110]; Non-transitory storage medium for storing instructions, which are implemented in software/ware to execute the embodiment(s) various operations.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Yu in view of JALALI to use incremental k-means to updates clusters for the reasons of using machine learning to generate a K-anonymity models based on vectors and clusters. (JALALI Abstract)
Re Claim 2, Yu-JALALI discloses the apparatus of claim 1, wherein the learning module is configured to use incremental k-means clustering to update the set of clusters. (JALALI; FIG. 1-6; Summary, ¶ [0072]-[0093]; The embodiment(s) detail k-mean incremental learning to update clusters, based on vector embedding.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Yu in view of JALALI to use incremental k-means to updates clusters for the reasons of using machine learning to generate a K-anonymity models based on vectors and clusters. (JALALI Abstract)
Re Claim 4, Yu-JALALI discloses the apparatus of claim 1, wherein a number of clusters forming the set of clusters is a hyperparameter specified for the learning module based on the entity. (JALALI; FIG. 6-9; ¶ [0075]-[0081], [0092]-[0102]; A number of desired clusters (hyperparameter) based on a number set by an entity.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Yu in view of JALALI to use incremental k-means to updates clusters for the reasons of using machine learning to generate a K-anonymity models based on vectors and clusters. (JALALI Abstract)
Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al. (US 2020/0327445 A1), in view of JALALI et al. (US 2021/0406474 A1) and further in view of Newman et al. (US 2023/0083512 A1).
Re Claim 3, Yu-JALALI discloses the apparatus of claim 1, yet does not explicitly suggest wherein the generating module is configured to use a large language model (LLM) to generate the one or more vector embeddings.
However, in analogous art, Newman teaches wherein the generating module is configured to use a large language model (LLM) to generate the one or more vector embeddings. (Newman; FIG. 1-9; ¶ [0015]-[0030]; A large language model to generate vector related embeddings.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Yu-JALALI in view of Newman to create a LLM for the reasons of extracting factual information via training LLM and embedding. (Newman Abstract)
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al. (US 2020/0327445 A1), in view of JALALI et al. (US 2021/0406474 A1) and further in view of Dang et al. (US 2016/0140208 A1).
Re Claim 5, Yu-JALALI discloses the apparatus of claim 4, yet does not explicitly suggest wherein the learning module is configured to determine whether to modify the number of clusters based on a fit metric obtained for the set of clusters derived from text data collected over a specified time period.
However, in analogous art, Dang teaches wherein the learning module is configured to determine whether to modify the number of clusters based on a fit metric obtained for the set of clusters derived from text data collected over a specified time period. (Dang; FIG. 1-6; Background, Summary, ¶ [0012]-[0030]; The embodiment(s) detail modifying the numbers of cluster based on various metrics derived from clustered collected data over time periods.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Yu-JALALI in view of Dang to limit the amount of clusters for the reasons of analyzing and grouping data sets more efficiently (Dang Abstract)
Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yu et al. (US 2020/0327445 A1), in view of JALALI et al. (US 2021/0406474 A1) and further in view of Betser (US 2021/0406366 A1).
Re Claim 6, Yu-JALALI discloses the apparatus of claim 1, yet does not explicitly suggest wherein the learning module is configured to obtain the set of clusters using text within the text-based dataset that has been received within a specified time period, and wherein the text within the text-based dataset that has been received prior to the specified time period is disregarded by the learning module.
However, in analogous art, Betser teaches wherein the learning module is configured to obtain the set of clusters using text within the text-based dataset that has been received within a specified time period, and wherein the text within the text-based dataset that has been received prior to the specified time period is disregarded by the learning module. (Betser; FIG. 1-4; ¶ [0007]-[0012], [0014]-[0034]; The cited embodiment(s) describe comparable methodology in which text based clusters are received in various time periods and text related data sets are disregarded by a learned computing system.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Yu-JALALI in view of Betser to limit the amount of clusters for the reasons of clustering large sets of categorical data by ordering data points. (Betser Abstract)
Claim(s) 7-8, 18, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Frazier et al. (US 2023/0169103 A1) and further in view of McManis, JR. et al. (US 2016/0232226 A1).
Re Claim 7 & 20, Frazier teaches an apparatus to classify text data for a cyber security system, the apparatus comprising:
a receiving module configured to receive text data associated with an entity; (Frazier; FIG. 1-15; Background, Summary, ¶ [0061], [0083]-[0113]; Receiving text data from various entities/sources.)
a generating module coupled to the receiving module, wherein the generating module is configured to generate one or more vector embeddings representative of the text data; (Frazier; FIG. 1-15; Background, Summary, ¶ [0061], [0083]-[0113]; Vector associated embedding of text related data.)
an identifying module coupled to the generating module, wherein the identifying module is configured to identify one or more clusters of a set of clusters that the one or more vector embeddings are associated with based on a similarity search, (Frazier; FIG. 1-15; Background, Summary, ¶ [0061], [0083]-[0113], [0143]-[0157]; Cluster, embedding, query searches (similarity search).)
wherein the set of clusters is representative of a classification of a text-based dataset into a plurality of different text types associated with the entity; and (Frazier; FIG. 1-15; Background, Summary, ¶ [0061], [0083]-[0113], [0143]-[0157]; Text based classification.)
an updating module coupled to the identifying module, wherein the updating module is configured to update a database based on the one or more vector embeddings being identified as being associated with the one or more clusters, (Frazier; FIG. 1-15; Background, Summary, ¶ [0060]-[0067], [0080]-[0113], [0143]-[0157]; Identifying and updating clusters associated with vector related embedding.)
Frazier does not explicitly suggest wherein the database is indicative of a frequency of occurrence of each text type of the plurality of different text types within the text-based dataset; and wherein instructions implemented in software for the receiving module, the generating module, the identifying module, and the updating module are configured to be stored in one or more non-transitory storage mediums to be executed by one or more processing units.
However, in analogous art, McManis, JR. teaches wherein the database is indicative of a frequency of occurrence of each text type of the plurality of different text types within the text-based dataset; and (McManis, JR.; FIG. 1-6; Summary, ¶ [0071]-[0021], [0030]-[0061], [0077]-[0091]; Frequency of various categories/types of textual data.)
wherein instructions implemented in software for the receiving module, the generating module, the identifying module, and the updating module are configured to be stored in one or more non-transitory storage mediums to be executed by one or more processing units. (McManis, JR.; FIG. 1-6; Summary, ¶ [0071]-[0021], [0030]-[0061], [0077]-[0091]; Identifying, updating, configuring operations are stored in non-transitory storage medium executed by a processor.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Frazier in view of McManis, JR. to count the frequency of data types for the reasons analyzing and identifying textual data. (McManis, JR. Abstract)
Re Claim 8, Frazier-McManis, JR. discloses the apparatus of claim 7, wherein the set of clusters is obtained based on incremental k-means clustering of the text-based dataset. (McManis, JR.; FIG. 1-6; Summary, ¶ [0058], [0088], [0139]; K-means clustering.)
Re Claim 18, Frazier-McManis, JR. discloses the apparatus of claim 7, wherein the text-based dataset comprises text derived from:
a message header;
or
a message body; (McManis, JR.; FIG. 1; ¶ [0038]; Textual data may include paragraphs.)
or
a message attachment;
or
metadata associated with a message;
or
any combination thereof.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Frazier in view of McManis, JR. to count the frequency of data types for the reasons analyzing and identifying textual data. (McManis, JR. Abstract)
Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Frazier et al. (US 2023/0169103 A1), in view of McManis, JR. et al. (US 2016/0232226 A1) and further in view of Newman et al. (US 2023/0083512 A1).
Re Claim 9, Frazier-McManis, JR. discloses the apparatus of claim 7, yet does not explicitly suggest wherein the generating module is configured to use a large language model (LLM) to generate the one or more vector embeddings.
However, in analogous art, Newman teaches wherein the generating module is configured to use a large language model (LLM) to generate the one or more vector embeddings. (Newman; FIG. 1-9; ¶ [0015]-[0030]; A large language model to generate vector related embeddings.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Frazier-McManis, JR. in view of Newman to create a LLM for the reasons of extracting factual information via training LLM and embedding. (Newman Abstract)
Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Frazier et al. (US 2023/0169103 A1), in view of McManis, JR. et al. (US 2016/0232226 A1) and further in view of TAN et al. (US 2023/0035337 A1).
Re Claim 10, Frazier-McManis, JR. discloses the apparatus of claim 7, yet does not explicitly suggest wherein the similarity search is based on hierarchical navigable smallest world (HNSW) searching.
However, in analogous art, TAN teaches wherein the similarity search is based on hierarchical navigable smallest world (HNSW) searching. (TAN; FIG. 1-2; Background, ¶ [0029]-[0034], [0055]-[0068]; Similarity search based on HNSW.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Frazier-McManis, JR. in view of TAN to include HNWS searching for the reasons of creating an efficient data retrieval search method. (TAN Abstract)
Claim(s) 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Frazier et al. (US 2023/0169103 A1), in view of McManis, JR. et al. (US 2016/0232226 A1) and further in view of Angel et al. (US 2021/0081831 A1).
Re Claim 16, Frazier-McManis, JR. discloses the apparatus of claim 7, yet does not explicitly suggest wherein the set of clusters are labelled with a textual representation of each cluster in the set of clusters, wherein the textual representation is based on a natural language based classification of one or more portions of text associated with each cluster.
However, in analogous art, Angel teaches wherein the set of clusters are labelled with a textual representation of each cluster in the set of clusters, wherein the textual representation is based on a natural language based classification of one or more portions of text associated with each cluster. (Angel; FIG. 3-11; Summary, ¶ [0025]-[0030], [0036]-[0054]; Clusters, text representation of clusters, natural languages based classification of text associated with the clusters.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention (AIA ) to modify Frazier-McManis, JR. in view of Angel to name clusters for the reasons of assessing, labeling and classifying clusters of a neural model. (Angel Abstract)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHRISTOPHER B ROBINSON whose telephone number is (571)270-0702. The examiner can normally be reached M-F 7:00-3:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nicholas R Taylor can be reached at 571-272-3889. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHRISTOPHER B ROBINSON/Primary Examiner, Art Unit 2443