Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This non-final office action is responsive to the U.S. patent application no. 18/902,630 filed on September 30, 2024.
Claims 1-20 are pending.
Claims 1-20 are rejected.
Priority
The application claims priority under 35 U.S.C. 120 to U.S. non-provisional application No. 17/936,624 filed on September 29, 2022, which claims priority under 35 U.S.C. 120 to U.S. non-provisional application No. 17/357,904 filed on June 24, 2021, claims priority under 35 U.S.C. 120 to U.S. non-provisional application No. 16/006,511 filed on June 12, 2018, which claims priority under 35 U.S.C. 119(e) to U.S. Provisional application No. 62/570,616.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on September 30, 2024 is compliant with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement has been considered by the examiner.
Allowable Subject Matter
Claims 9, 11 and 12 are rejected to under double patenting but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, provided that the double patent rejection has also been overcome.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1, 3-4, 8-16 and 19 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 5 and 8-12 of U.S. Patent No. 11,574,287. Although the claims at issue are not identical, they are not patentably distinct from each other as shown below.
Application no. 18/902,630
Patent No. 11,574,287
1. A method to automatically classify documents, the method comprising:
generating, by a system that includes a processor and memory, a plurality of entity data objects representing entities identified in a plurality of documents such that each entity data object of the plurality of entity data objects represents a different one of the entities identified in the plurality of documents;
(claim 7. The method of claim 1, wherein the plurality of documents are emails and the entities identified in the plurality of documents are entities identified in receiver and sender fields of the emails.)
extracting, by the system, tokens from the plurality of documents, each token being a word or phrase from the plurality of documents; determining, by the system, feature vectors for each of the plurality of entity data objects based on the extracted tokens from the documents associated with each of the plurality of entity data objects;
obtaining a first classification for a first subset of the plurality of entity data objects and a second classification for a second subset of the plurality of entity data objects, wherein a third subset of the plurality of entity data objects does not include a classification;
training a machine-learning model to apply one of the first classification or the second classification to an entity data object using feature vectors of the first subset of the plurality of entity data objects and feature vectors of the second subset of the plurality of entity data objects; and
(clam 3. The method of claim 2, wherein the determining comprises:
calculating, for each of the candidate entity data objects using a document network graph, a degree of separation between the candidate entity data objects and one or more entities identified in the first document, the document network graph constructed to represent patterns between the entities identified in the plurality of documents; and selecting, as the particular entity data object, a candidate entity data object that includes the lowest degree of separation from the entities identified in the first document.)
after training the machine-learning model, applying one of the first classification and the second classification to each of the third subset of the plurality of entity data objects using the machine-learning model.
4. The method of claim 3, wherein the plurality of documents are emails and the entities identified in the plurality of documents are entities identified in receiver and sender fields of the emails and the determining the particular entity data object further comprises:
in response to multiple candidate entity data objects including the lowest degree of separation, calculating an email volume of each of the multiple candidate entity data objects, wherein the email volume is a number of emails sent from entities of the multiple candidate entity data object to each entity identified in the receiver and sender fields of the first document; and
selecting the particular entity data object from the multiple candidate entity data objects based on the particular entity data object including the highest volume.
8. The method of claim 7, wherein generating the plurality of entity data objects comprises:
generating, by the system, a plurality of initial entity data objects using the entities identified in receiver and sender fields of the emails; and merging two or more of the plurality of initial entity data objects to form an entity data object, wherein the merging comprises:
determining whether an initial entity data object is similar to a first entity data object of the plurality of initial entity data objects;
identifying second entity data objects of the plurality of initial entity data objects that relate to the first entity data object based on the second entity data objects including a name that is included in the first entity data object or a variant of a name included in the first entity data object; and
merging the initial entity data object into the first entity data object in response to all of the second entity data objects being domain compatible with the first entity data object.
9. The method of claim 8, wherein generating the plurality of entity data objects comprises:
identifying a level set for each initial entity data object based on a number of tokens in the initial entity data object associated with names; and
performing the merging of the initial entity data objects by level set in descending order of number of tokens.
10. The method of claim 7, further comprising:
classifying one or more the emails as spam emails; and removing entity data objects from the plurality of entity data objects that are senders of the spam emails.
11. The method of claim 7, further comprising:
identifying disclaimers in the emails, wherein searching the extracted tokens does not comprise searching tokens from the disclaimers in the emails.
12. The method of claim 11, wherein identifying disclaimers further comprises marking a set of paragraphs in the emails as disclaimers and using the set of disclaimer paragraphs to calculate a coverage score to identify additional disclaimers in the emails.
1. A method to automatically classify emails, the method comprising:
(limitation 1 that is originally here has been moved below)
the email data set configured for training a machine learning model and the first shared characteristic being mutually exclusive of the second shared characteristic;
obtaining, by the system, emails from an email database;
generating, by the system, a plurality of entity data objects representing entities identified in receiver and sender fields of the emails such that each entity data object of the plurality of entity data objects representing a different one of the entities identified in the receiver and sender fields of the emails;
(limitation 2 that is originally here has been moved below)
extracting, by the system, tokens from the emails from the email database, each token being a word or phrase from an email and the words or phrases of the tokens corresponding to the entities identified in the receiver and sender fields of the emails from the email database;
(limitation 2: categorizing, by the system, the plurality of entity data objects into a first set of entity data objects and a second set of entity data objects using the machine learning model, the first set of entity data objects associated with the first category for classification of emails; )
searching, by the system, the extracted tokens for tokens potentially corresponding with the entities represented by the first set of entity data objects;
identifying, by the system, the emails that include the extracted tokens that potentially correspond with the entities represented by the first set of entity data objects;
(limitation 1: obtaining, …, a machine learning model configured to classify entity data objects representing entities into two categories by distinguishing between entity data objects representing first entities having a first shared characteristic associated with a first category for classification of emails and entity data objects representing second entities having a second shared characteristic using an email data set of the first entities and the second entities,
determining, by the system, a particular entity data object of the first set of entity data objects to which an identified email corresponds, wherein the determining comprises:
determining candidate entity data objects of the first set of entity data objects based on the candidate entity data objects including data that corresponds to an extracted token of the identified email;
calculating a joint distance for each of the candidate entity data objects, the joint distance for one of the candidate entity data objects comprising a sum of minimum graph distances in an email network graph from the one of the candidate entity data objects to each entity identified in the receiver and sender fields of the identified email, the email network graph representing email communication patterns between the entities in the receiver and sender fields of the emails from the email database and the email network graph constructed using the emails from the email database; and
identifying the particular entity data object in response to the particular entity data object including a smallest joint distance, the smallest joint distance comprising the fewest degrees of separation in the email network graph between an entity corresponding to the particular entity data object and each entity identified in the receiver and sender fields of the identified email; and
automatically classifying, by the system, the identified email in the first category in response to determining that the identified email corresponds to the particular entity data object.
5. The method of claim 1, wherein the determining the particular entity data object further comprises:
in response to multiple candidate entity data objects including the same joint distance, calculating a volume of each of the multiple candidate entity data objects, wherein the volume is a number of emails sent from entities of the multiple candidate entity data objects to each entity identified in the receiver and sender fields of the identified email; and
selecting the particular entity data object from the multiple candidate entity data objects based on the particular entity data object including the highest volume.
8. The method of claim 1, wherein generating the plurality of entity data objects comprises:
generating, by the system, a plurality of initial entity data objects using the entities identified in the receiver and sender fields of the emails; and merging two or more of the plurality of initial entity data objects to form an entity data object, wherein the merging comprises:
determining whether an initial entity data object is similar to a first entity data object of the plurality of initial entity data objects;
identifying second entity data objects of the plurality of initial entity data objects that relate to the first entity data object based on the second entity data objects including a name that is included in the first entity data object or a variant of the name included in the first entity data object; and
merging the initial entity data object into the first entity data object in response to all of the second entity data objects being domain compatible with the first entity data object.
9. The method of claim 8, wherein generating the plurality of entity data objects comprises:
identifying a level set for each initial entity data object based on a number of tokens in the initial entity data object associated with names; and
performing the merging of the initial entity data objects by level set in descending order of number of tokens.
10. The method of claim 1, further comprising:
identifying emails from the email database as spam emails; and removing entity data objects that send spam emails from the plurality of entity data objects.
11. The method of claim 1, further comprising:
identifying disclaimers in the emails, wherein searching the extracted tokens does not comprise searching tokens from the disclaimers in the emails.
12. The method of claim 11, wherein identifying disclaimers further comprises marking a set of paragraphs in the emails as disclaimers and using the set of disclaimer paragraphs to calculate a coverage score to identify additional disclaimers in the emails.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-8, 10 and 13-20 are rejected under 35 U.S.C. 103 as obvious over Olmstead (U.S. 2019/0057310).
Regarding claim 1, Olmstead disclosed a method to automatically classify documents, the method comprising:
generating, by a system that includes a processor and memory, a plurality of entity data objects representing entities identified in a plurality of documents such that each entity data object of the plurality of entity data objects represents a different one of the entities identified in the plurality of documents (Olmstead, [0020, 0048], “The system 100 processes the communication data using the entity recognition model 104 to extract one or more named entities and one or more topics.” Said “one or more named entities” and “one or more topics” anticipate the ”plurality of entity data objects” in the claim; said “communication data” includes emails therefore anticipates the “plurality of documents” in the claim);
extracting, by the system, tokens from the plurality of documents, each token being a word or phrase from the plurality of documents (Olmstead, [0057], “The text can be broken down into different tokens and one or more tokens can be assigned a label that indicates a named entity.”);
determining, by the system, feature vectors for each of the plurality of entity data objects based on the extracted tokens from the documents associated with each of the plurality of entity data objects (Olmstead, [0071], “An example expert classifier is based on the concepts of Word2vec or another model that converts words to vector representations” said vector representations anticipate the “feature vectors” in the claim);
obtaining a first classification for a first subset of the plurality of entity data objects and a second classification for a second subset of the plurality of entity data objects, wherein a third subset of the plurality of entity data objects does not include a classification (Olmstead, [0057], “Named-entity recognition unit 112 processes electronic communication data to identify named entities and assign labels or tags (such as person, place, organization) to words” and “ The text can be broken down into different tokens and one or more tokens can be assigned a label that indicates a named entity.” Said labels or tags (such as person, place, organization) anticipate the first classification and second classification and their associated named entity objects anticipate the first subset of entity data objects and second subset of entity data objects in the claim. The extracted named entities not yet assigned a label or tag anticipates the third subset of entity data objects.),
training a machine-learning model (Olmstead, [0057], “The named entity recognition unit 112 can be trained using a domain specific model or multiple domain models.”), and
after training the machine-learning model, applying one of the first classification and the second classification to each of the third subset of the plurality of entity data objects using the machine-learning model (Olmstead, [0057], “ Named-entity recognition unit 112 processes electronic communication data to identify named entities and assign labels or tags (such as person, place, organization) to words” and “The named entity recognition unit 112 can detect the names and then classify the names by the type of entity they refer to”).
Olmstead might not have explicitly disclosed
The machine-learning model is trained by applying one of the first classification or the second classification to an entity data object using feature vectors of the first subset of the plurality of entity data objects and feature vectors of the second subset of the plurality of entity data objects.
However, Olmstead disclosed in [0057] that “The named entity recognition unit 112 can be trained using a domain specific model or multiple domain models.” Olmstead further disclosed in [0061] that “The entity recognition unit 104 can identify people, organizations, and places within electronic communication data. An example embodiment can involve the Spacy Dependency parser to identify topics which includes words like {law, oil, stocks, energy, market, fuel} and other entities of interest in datasets;” and in [0079] that “The system 100 can pass an email through the neural network in order to classify emails in this way.”
Olmstead’s disclosures in various places would have made it obvious to one of ordinary skill in the art that the machine learning models in Olmstead could be trained using training data comprising features vectors and classification labels, as is recited in this claim.
Claim 13 lists the same elements as claim 1, in computer readable medium form rather than method form. Therefore, the rejection rationale for claim 1 applies equally as well to claim 13.
Claim 14 lists substantially the same elements as claim 1, in system form rather than method form. Therefore, the rejection rationale for claim 1 applies equally as well to claim 14.
Regarding claim 2, Olmstead disclosed the method of claim 1.
Olmstead further disclosed that
after applying one of the first classification and the second classification, selecting, by the system, a first document of the plurality of documents in response to the first document including an extracted token that corresponds with data from two or more of the entity data objects of the plurality of entity data objects; identifying the two or more of the entity data objects as candidate entity data objects; determining, by the system, a particular entity data object of the candidate entity data objects to which the first document corresponds; and automatically assigning, by the system, the first document to a category corresponding to a classification of the particular entity data object (Olmstead disclosed [0066-0067] that “Once the topics of the electronic communication data are identified, LDA 116 can be used to cluster the document into topic categories. Some examples of topics include privacy, compliance, transmission, information, credit, energy, gas, fuel, oil, and so on,” which would involve extracting entities and topics from emails and other communication data such as chats as disclosed in paragraphs [0057-0060]).
Regarding claim 3, Olmstead disclosed the method of claim 2.
Olmstead further disclosed that
wherein the determining comprises: calculating, for each of the candidate entity data objects using a document network graph, a degree of separation between the candidate entity data objects and one or more entities identified in the first document, the document network graph constructed to represent patterns between the entities identified in the plurality of documents; and selecting, as the particular entity data object, a candidate entity data object that includes the lowest degree of separation from the entities identified in the first document (Olmstead, [0055], “Knowledge engine 108 creates and updates a dictionary of named entities which includes People, Organization, Places, Topics and Subjects of discussion (also referred to as named entities). Knowledge engine 108 creates and updates a graph of entities and topics, including relationship scores between the entities and topics.” and “For example the edges can represent a similarity between two topics or entities, for example. As another example the edges can represent a distance metrics between two topics or entities”).
Claim 15 lists substantially the same elements as claims 2 and 3 combined, in system form rather than method form. Therefore, the rejection rationale for claims 2 and 3 applies equally as well to claim 15.
Regarding claims 4 and 16, Olmstead disclosed the subject matter of claims 3 and 15, respectively.
Olmstead further disclosed that
wherein the plurality of documents are emails and the entities identified in the plurality of documents are entities identified in receiver and sender fields of the emails (Olmstead, [0051], “The system 100 implements AI techniques to process input data 102 including electronic communication data such as emails, chats and other forms of textual communication and surface topics of interest and subjects of discussion, as named entities.” And [0052], “The contacts can be a sender and a recipient of an electronic communication.”) and the determining the particular entity data object further comprises:
in response to multiple candidate entity data objects including the lowest degree of separation (Olmstead, [0048], “The system 100 can generate, using a relationship model 106, a relationship score indicating strength of a relationship between the requestor and the expert entity.”), calculating an email volume of each of the multiple candidate entity data objects, wherein the email volume is a number of emails sent from entities of the multiple candidate entity data object to each entity identified in the receiver and sender fields of the first document (Olmstead, [0052], “The system 100 can implement a relationship model 106 to take into account sentiment of exchanges, formality of the conversation, the number of exchanges and duration two contacts have known each other”); and
selecting the particular entity data object from the multiple candidate entity data objects based on the particular entity data object including the highest volume (Subject matter in this limitation can be implied from Olmstead’s disclosure in [0052] about using the number of exchanges between two contacts to determine the closeness of their relationship).
Regarding claims 5 and 17, Olmstead disclosed the subject matter of claims 1 and 14, respectively.
Olmstead further disclosed that
after applying one of the first classification and the second classification to each of the third subset of the plurality of entity data objects, retraining the machine-learning model using feature vectors of the third subset of the plurality of entity data objects with the first classification; and after retraining the machine-learning model, reapplying one of the first classification and the second classification to each of the third subset of the plurality of entity data objects with the second classification using the retrained machine-learning model (Olmstead disclosed in [0055] that “” where the updating a graph of entities and topics is essentially the same as retraining the graph, which is a machine learning model).
Regarding claims 6 and 18, Olmstead disclosed the subject matter of claims 5 and 17, respectively.
Olmstead further disclosed that
iterating the steps of retraining the machine-learning model and reapplying one of the first classification and second classification until the retrained machine-learning model does not apply a first classification to one of the third subset of the plurality of entity data objects (Olmstead disclosed in [0055] that “The knowledge engine 108 can be continually updated as the system 100 processes new electronic data to extract more entities and topics or to further update scores computed between entities and topics. ”).
Regarding claims 7 and 19, Olmstead disclosed the subject matter of claims 1 and 14, respectively.
Olmstead further disclosed that
wherein the plurality of documents are emails and the entities identified in the plurality of documents are entities identified in receiver and sender fields of the emails (Olmstead, [0051], “The system 100 implements AI techniques to process input data 102 including electronic communication data such as emails, chats and other forms of textual communication and surface topics of interest and subjects of discussion, as named entities.” And [0052], “The contacts can be a sender and a recipient of an electronic communication.”).
Regarding claim 8, Olmstead disclosed the method of claim 7.
Olmstead further disclosed that
wherein generating the plurality of entity data objects comprises: generating, by the system, a plurality of initial entity data objects using the entities identified in receiver and sender fields of the emails; and merging two or more of the plurality of initial entity data objects to form an entity data object, wherein the merging comprises: determining whether an initial entity data object is similar to a first entity data object of the plurality of initial entity data objects; identifying second entity data objects of the plurality of initial entity data objects that relate to the first entity data object based on the second entity data objects including a name that is included in the first entity data object or a variant of a name included in the first entity data object; and merging the initial entity data object into the first entity data object in response to all of the second entity data objects being domain compatible with the first entity data object (Olmstead, [0066], “The “topics” produced by topic modeling techniques can be clusters of similar words.”; [0067], “Once the topics of the electronic communication data are identified, LDA 116 can be used to cluster the document into topic categories”. Said clustering anticipates the “merging” in the claim)
Regarding claim 10, Olmstead disclosed the method of claim 7.
Olmstead did not explicitly disclose
classifying one or more the emails as spam emails; and removing entity data objects from the plurality of entity data objects that are senders of the spam emails.
However, Olmstead disclosed in [0104] that “an organization can use the tool to detect breaches of security or privacy in an organization as well as internal threats such as insider trading and loss of confidential information in capital markets.” Said disclosure about detecting breaches of security using the tool, when combined with Olmstead’s disclosure elsewhere in the document that the tool can be used to process emails, would have made it obvious to one of ordinary skill in the art that using the tool to classify email as spam and removing the entity data from the spam emails is simply an example use of the tool.
Claim 20 lists substantially the same elements as claims 1-6 combined, in the same method form. Therefore, the rejection rationale for claims 1-6 applies equally to claim 20.
Related Prior Art
Venkatraman et al. (US 2016/0253679) is directed to a brand abuse monitoring system with infringement detection engine using graphs to represent entities and their relationships.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIRLEY X ZHANG whose telephone number is (571)270-5012. The examiner can normally be reached 8:30am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Joon H Hwang can be reached at 571-272-4036. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHIRLEY X ZHANG/Primary Examiner, Art Unit 2447