Office Action Analysis: 17990043 — PREDICTING COMPLIANCE OF TEXT DOCUMENTS WITH A RULESET USING SELF-SUPERVISED MACHINE LEARNING

Examiner Intelligence

PHAM, JESSICA THUY View full profile →
Grants only 17% of cases
Career Allowance Rate
1 granted / 6 resolved
-38.3% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
20 currently pending
Career history
43
Total Applications
across all art units
Statute-Specific Performance

§103
87.3%
+47.3% vs TC avg
§102
10.1%
-29.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 6 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment/Status of Claims
The specification was amended.
Claims 1 and 10 were amended.
Claims 19 and 20 are new.
Claims 1-20 are pending and examined herein.
Claims 1-20 are rejected under 35 U.S.C. 103.

Response to Arguments
Applicant’s arguments, see pages 7-8, filed 12/09/2025, with respect to the objection of the specification have been fully considered and are persuasive.  The objection of the specification has been withdrawn. 

Applicant’s arguments, see pages 8-10, filed 12/09/2025, with respect to the rejection(s) of claim(s) 1-18 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Lippi (“CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service”, 2019), Xie (“Self-training with Noisy Student improves ImageNet classification”, 2020), Nguyen (“Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution”, 2021), and Brock (US 2008/0059425 A1).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lippi (“CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service”, 2019), Xie (“Self-training with Noisy Student improves ImageNet classification”, 2020), Nguyen (“Data Augmentation by Concatenation for Low-Resource Translation: A Mystery and a Solution”, 2021), and Brock (US 2008/0059425 A1).
Xie was made available by the applicant through the IDS.

Regarding claim 1, Lippi teaches
A computer system for predicting compliance of text documents with a ruleset using self-supervised machine learning, the system comprising a server computing device having a memory for storing computer-executable instructions and a processor that executes the computer-executable instructions to: (Page 2 states "To address this problem, we propose a machine learning-based method and tool for partially automating the detection of potentially unfair clauses." Page 2 states "According to art. 3 of the Directive 93/13 on Unfair Terms in Consumer Contracts, a contractual term is unfair if: 1) it has not been individually negotiated; and 2) contrary to the requirement of good faith, it causes a significant imbalance in the parties rights and obligations, to the detriment of the consumer. This general definition is GS: further specified in the Annex to the Directive, containing an indicative and non-exhaustive list of the terms which may be regarded as unfair, as well in a few dozen judgments of the Court of Justice of the EU (Micklitz and Reich 2014). Examples of unfair clauses encompass taking jurisdiction away from the consumer, limiting liability for damages on health and/or gross negligence, imposing obligatory arbitration in a country different from consumers residence etc." Therefore, detecting potentially unfair clauses is interpreted as predicting compliance of documents. Page 5 states "The corpus consists of 50 relevant on-line consumer contracts, i.e. the Terms of Service of on-line platforms." Therefore, the documents are text documents. Page 10 states "We address two different tasks: a detection task, aimed at predicting whether a given sentence contains a (potentially) unfair clause, and a classification task, aimed at predicting the category an unfair clause belongs to, which indeed could be a valuable piece of information to a potential user." One of ordinary skill in the art would realize that machine learning, as is present in Lippi, would be executed on a computer. Modern computers, as would be used for executing machine learning, are capable of running a server, and therefore, the computer used to execute the machine learning model is interpreted as the server computing device. A computer, as one of ordinary skill in the art would understand, in order to execute the method taught by Lippi, must have a memory storing computer-readable instructions and a processor to execute those instructions.)
[training a] natural language processing (NLP) [model that is] using as input a first plurality of … sentences from each of a first plurality of text documents (Page 10 states “We address the problem of detecting potentially unfair contract clauses as a sentence classification task. Such a task could be tackled by treating sentences independently of one another (sentence-wide classification).” Page 10 further states "In sentence-wide classification the problem can be formalized as follows. Given a sentence, the goal is to classify it as positive if it contains a potentially unfair clause, or negative otherwise. Within this setting, a machine learning classifier is trained with a data set                 
                    D
                    =
                    
                                                    x
                                                
                                                    i
                                                
                                            ,
                                             
                                                    y
                                                
                                                    i
                                                
                            i
                            =
                            1
                        
                            N
                        
            , which consists of a collection of                 
                    N
                
             pairs, where                 
                    
                            x
                        
                            i
                        
             encodes some representation of a sentence, and                 
                    
                            y
                        
                            i
                        
             is its corresponding (positive or negative) class." The machine learning classifier is a natural language processing model as page 5 states ""The corpus consists of 50 relevant on-line consumer contracts, i.e. the Terms of Service of on-line platforms." The contracts are natural language, and as the classifier processes the contracts, the model is a natural language processing model. Page 12 states "We obtained a total of 9,414 sentences, 1,032 of which (11.0%) were labeled as positive, thus containing a potentially unfair clause. We run experiments following the leave-one-document-out (LOO) procedure, in which each document in the corpus, in turn, is used as test set, leaving the remaining documents for training set (4/5) and validation set (1/5) for model selection." The training set is interpreted as the first plurality of text documents, and the sentences of the training set of documents is interpreted as the first plurality of sentences. One of ordinary skill in the art would realize that the training set is used to train the NLP model.)
execute the trained NLP … model, using as input a second plurality of … sentences from each of a second plurality of text documents, to generate a second compliance pseudo-label for each unlabeled sentence in the second plurality of unlabeled sentences; and (Page 12 states "We obtained a total of 9,414 sentences, 1,032 of which (11.0%) were labeled as positive, thus containing a potentially unfair clause. We run experiments following the leave-one-document-out (LOO) procedure, in which each document in the corpus, in turn, is used as test set, leaving the remaining documents for training set (4/5) and validation set (1/5) for model selection." The test set is interpreted as the second plurality of text documents and the sentences in the second plurality of text documents is interpreted as the second plurality of unlabeled sentences. One of ordinary skill in the art would realize that a test set is entered into the model without a label in order to compare the output label to the ground-truth label, and thus, is unlabeled. Page 12 states "For the first task (potentially unfair clause detection) we compared several systems. The problem is formulated as a binary classification task, where the positive class is either the union of all potentially unfair sentences, or the set of potentially unfair clauses of a single category, as described below." The output of classification, as one of ordinary skill in the art would understand, is a label, which are interpreted as pseudo-labels as they are not the ground-truth label. Therefore, the classification outputs, representing unfair sentences, are compliance pseudo-labels. As the output is all of the unfair sentences, each sentence is essentially labeled as compliant or non-compliant (not included in the unfair sentences). This is supported by page 10, which states "We address two different tasks: a detection task, aimed at predicting whether a given sentence contains a (potentially) unfair clause, and a classification task, aimed at predicting the category an unfair clause belongs to, which indeed could be a valuable piece of information to a potential user.")
determine whether each text document in the second plurality of text documents is in compliance with one or more rulesets using the second compliance pseudo-labels generated for the text document; and (Page 12 states "We run experiments following the leave-one-document-out (LOO) procedure, in which each document in the corpus, in turn, is used as test set, leaving the remaining documents for training set (4/5) and validation set (1/5) for model selection." Page 12 further states "For the first task (potentially unfair clause detection) we compared several systems. The problem is formulated as a binary classification task, where the positive class is either the union of all potentially unfair sentences, or the set of potentially unfair clauses of a single category, as described below." Therefore, as each test document is input into the model, the binary classification of the negative class (no sentences labeled as unfair) is determining that the text document is in compliance with one or more rulesets. The categories of clause unfairness in Table 1, described in pages 5-9 are interpreted as the rulesets.)
Lippi does not appear to explicitly teach
execute a … teacher model to generate a first … pseudo-label for each unlabeled [input data];
train a … student model using the first plurality of unlabeled [data] and associated first … pseudo-labels, including injecting input noise during the training process by aggregating each unlabeled sentence with one or more sentences adjacent to each unlabeled sentence into a sentence block and providing the aggregated sentence blocks as input to train the NLP student model;
[executing the trained] student model 
for each text document in the second plurality of text documents that is determined to be non-compliant:
generate a document fingerprint based upon digital attributes of the text document,
match the document fingerprint to one or more text documents in document storage, and 
deactivate access to the one or more text documents in document storage.
However, Xie—directed to analogous art—teaches
execute a … teacher model to generate a first … pseudo-label for each unlabeled [input data]; (Page 2 states that step 2 is "Use an unnoised teacher model to generate soft or hard pseudo labels for unlabeled images.")
train a … student model using the first plurality of unlabeled [data] and associated first … pseudo-labels, including injecting input noise during the training process (Page 2 states that step 3 is "Learn an equal-or-larger student model                 
                    
                            θ
                        
                            *
                        
                            s
                        
             which minimizes the cross entropy loss on labeled images and unlabeled images with noise added to the student model." Learning a model is training a model.)
[executing the trained] student model (Page 3 states "We first report the validation set accuracy on the ImageNet 2012 ILSVRC challenge prediction task as commonly done in literature [45, 79, 30, 82] (see also [66])." Therefore, the trained student model was executed.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi and Xie, because as Xie states in the introduction, "Deep learning has shown remarkable successes in image recognition in recent years [45, 79, 74, 30, 82]. However state-of-the-art (SOTA) vision models are still trained with supervised learning which requires a large corpus of labeled images to work well. By showing the models only labeled images, we limit ourselves from making use of unlabeled images available in much larger quantities to improve accuracy and robustness of SOTA models. Here we use unlabeled images to improve the SOTA ImageNet accuracy and show that the accuracy gain has an outsized impact on robustness (out-of-distribution generalization)." Additionally, Xie states "When applied to unlabeled data, noise has a compound benefit of enforcing local smoothness in the decision function on both labeled and unlabeled data." As Lippi teaches the use of an NLP model architecture for compliance prediction, and Xie teaches the use of unlabelled data and the self-training teacher-student training method, it would have been obvious for one of ordinary skill in the art to substitute the training method and unlabelled data of Xie for the training method and labelled data of Lippi for the predictable result of better performance.
The combination of Lippi and Xie does not appear to explicitly teach
[noising data] by aggregating each unlabeled sentence with one or more sentences adjacent to each unlabeled sentence into a sentence block and providing the aggregated sentence blocks as input to train the … model; 
for each text document in the second plurality of text documents that is determined to be non-compliant:
generate a document fingerprint based upon digital attributes of the text document,
match the document fingerprint to one or more text documents in document storage, and 
deactivate access to the one or more text documents in document storage.
However, Nguyen—directed to analogous art teaches
[noising data] by aggregating each … sentence with one or more sentences adjacent to each unlabeled sentence into a sentence block and providing the aggregated sentence blocks as input to train the … model; (The introduction states "Instead, we view concatenation as a kind of data augmentation or noising method (one which pleasantly requires no alteration to the text, unlike data augmentation methods that disturb word order (Belinkov and Bisk, 2018; Anastasopoulos et al., 2019) or replace words with automatically-selected words (Gao et al., 2019; Fadaee et al., 2017; Wang et al., 2018))." Page 287 states “Let                 
                    
                            D
                        
                            orig
                        
                    =
                    
                                            x
                                        
                                            i
                                        
                                    ,
                                     
                                            y
                                        
                                            i
                                        
                    i
                    =
                    1
                    ,
                     
                    .
                     
                    .
                     
                    .
                     
                    ,
                     
                    N
                    }
                     
            be the original training data. We consider two concatenation strategies: CONSEC Concatenate consecutive sentence-pairs:                 
                    
                            D
                        
                            new
                        
                    =
                    
                                            x
                                        
                                            i
                                        
                                            x
                                        
                                            i
                                            =
                                            1
                                        
                                    ,
                                     
                                            y
                                        
                                            i
                                        
                                            y
                                        
                                            i
                                            +
                                            1
                                        
                            i
                            =
                            1
                            ,
                             
                            .
                             
                            .
                             
                            .
                             
                            ,
                             
                            N
                            -
                            1
                        
                    .
                
             Page 288 states "For baseline, the training data is                 
                    
                            D
                        
                            orig
                        
            . For concatenation, we first create                 
                    
                            D
                        
                            new
                        
            , then combine it with                 
                    
                            D
                        
                            orig
                        
             to create the training data." Therefore, the aggregated sentences are used as input to train the model.)
Lippi teaches the use of an NLP machine learning model for predicting compliance of documents. Xie teaches the use of noise and self-supervised learning in order to improve performance of the student model. Nguyen teaches the specific way of noising data. It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to substitute the noising method of Xie with the specific way of noising data taught by Nguyen for the predictable result of improving accuracy of the student model. 
The combination of Lippi, Xie, and Nguyen does not appear to explicitly teach 
for each text document in the second plurality of text documents that is determined to be non-compliant:
generate a document fingerprint based upon digital attributes of the text document,
match the document fingerprint to one or more text documents in document storage, and 
deactivate access to the one or more text documents in document storage.
However, Brock—directed to analogous art—teaches
for each text document in the second plurality of text documents that is determined to be non-compliant: ([0104] states "At 264, the fingerprint (or another analysis result) is checked against a database of known non-compliant (or known compliant) content objects. In some embodiments, the database includes a common pool of content that has previously been identified either manually or automatically as non-compliant or compliant. The content can be looked up by fingerprint or any other appropriate index." The database of known non-compliant documents are interpreted as the second plurality of text documents determined to be non-compliant. [0025] states "As used herein, a unit of content may be referred to as a content object. Content objects can include any object type. Examples of content objects include a text document, an image, Video, audio, flash, animation, game, lyrics, code, or portions thereof (e.g., a phrase/sentence/paragraph, a subimage, or a video clip)." Therefore, content objects are interpreted as text documents.)
generate a document fingerprint based upon digital attributes of the text document, ([0104] states "At 264, the fingerprint (or another analysis result) is checked against a database of known non-compliant (or known compliant) content objects. In some embodiments, the database includes a common pool of content that has previously been identified either manually or automatically as non-compliant or compliant. The content can be looked up by fingerprint or any other appropriate index." Therefore, as the content can be looked up by fingerprint, the non-compliant text documents must have had a document fingerprint generated. [0034] states "A fingerprint includes a signature of an object that can be used to detect a copy of an object as a whole or in part. A content object may have more than one fingerprint. A fingerprint may be associated with more than one content object. A fingerprint may be associated with a whole or part of a content object. A fingerprint may be multidimensional. For example, there may be multiple features associated with a fingerprint. A fingerprint may contain multiple fingerprints or subfingerprints." Therefore, the fingerprint is based upon digital attributes of the document.)
match the document fingerprint to one or more text documents in document storage, and ([0104] states "At 264, the fingerprint (or another analysis result) is checked against a database of known non-compliant (or known compliant) content objects. In some embodiments, the database includes a common pool of content that has previously been identified either manually or automatically as non-compliant or compliant. The content can be looked up by fingerprint or any other appropriate index." Therefore, the new text documents are matched to the new documents by fingerprint. [0034] states "Controlled content store 116 includes controlled content. In some embodiments, controlled content store 116 includes the following information: a copy of the content, an index of fingerprints associated with the content, and meta data about the content (e.g., filename, URL, fetch date, etc.). In some embodiments, the copy of the content is stored in a separate cache." Therefore, the document is stored in a document storage.)
deactivate access to the one or more text documents in document storage. ([0104] states "At 266, it is determined whether the content object is non-compliant according to the database. If it is non-compliant according to the database, the content object is removed at 272." Therefore, as the content object is removed, the access to the text document is deactivated.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi, Xie, and Nguyen with the teachings of Brock because, as Brock states in [0004] "From an OSPs perspective, monitoring for content that does not comply with the OSPs host policy is also typically a manual process. When OSPs monitor content as it is uploaded, typically a human views and approves content before (or after) it is displayed and non-compliant content is rejected (or removed). OSPs also must manually review and compare content when they receive DMCA notices, and often have little information to determine if content is out of compliance and no automated way to determine the identity or reputation of the complaining party. As the amount of content on the Internet grows, manual content monitoring and enforcement processes are becoming increasingly impractical. Therefore, improved methods for monitoring content and managing enforcement of non-compliant content are needed."

Regarding claim 2, the rejection of claim 1 is incorporated herein. Lippi teaches
[an] NLP model [comprising a deep learning] NLP [architecture] (Page 13 states, in description of the models used for classification "C4: a CNN trained from plain word sequences; C5: an LSTM trained from plain word sequences;". A CNN and an LSTM are deep learning NLP architectures, as they process the natural language of the corpus.)
Lippi does not appear to explicitly teach
wherein the … teacher model and the … student model each comprises a deep learning … model architecture.
However, Xie—directed to analogous art—teaches
wherein the … teacher model and the … student model each comprises a deep learning … model architecture. (Page 4 states "We first trained an EfficientNet-B7 on ImageNet as the teacher model. Then by using the B7 model as the teacher, we trained an EfficientNet-L2 model with the unlabeled batch size set to 14 times the labeled batch size. Then, we trained a new EfficientNet-L2 model with the EfficientNet-L2 model as the teacher." One of ordinary skill would recognize EfficientNet architectures as deep learning model architectures.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi and Xie for the reasons given above in regards to claim 1.

Regarding claim 3, the rejection of claim 1 is incorporated herein. Lippi teaches
wherein the NLP … model is trained using a corpus of text documents where each sentence is associated with a compliance label. ("In analyzing the Terms of Service of the selected on-line platforms, we identified eight different categories of unfair clauses. For each type of clause we defined a corresponding XML tag, as shown in Table 1. Notice that not necessarily all the documents contain all clause categories. For example, Twitter provides two different ToS, the first one for US and non-US residents and the second one for EU residents. The tagged version is the version applicable in the EU and it does not contain any choice of law, arbitration or jurisdiction clauses. We assumed that each type of clause could be classified as clearly fair, potentially unfair, or clearly unfair. In order to mark the different degrees of (un)fairness we appended a numeric value to each XML tag, with 1 meaning clearly fair, 2 potentially unfair, and 3 clearly unfair. Nested tags were used to annotate text segments relevant to more than one type of clause. If one clause covers more then one paragraphs. we chose to tag each paragraph separately, possibly with different degrees of (un)fairness." As the clause tag labels which paragraphs are potentially unfair, each sentence is associated with a compliance label. As the classifier learns on the tags, the lack of a tag is a negative label.)
Lippi does not appear to explicitly teach
[training the] teacher [model with labeled data]
However, Xie—directed to analogous art—teaches
[training the] teacher [model with labeled data] (Page 2 states that step 1 is "Learn teacher model                 
                    
                            θ
                        
                            *
                        
                            t
                        
             which minimizes the cross entropy loss on labeled images". Learning the model is training the model.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi and Xie for the reasons given above in regards to claim 1.

Regarding claim 4, the rejection of claim 3 is incorporated herein. Lippi teaches
wherein the compliance label is an indicator of whether the corresponding sentence is in compliance with one or more rulesets. ("In analyzing the Terms of Service of the selected on-line platforms, we identified eight different categories of unfair clauses. For each type of clause we defined a corresponding XML tag, as shown in Table 1. Notice that not necessarily all the documents contain all clause categories. For example, Twitter provides two different ToS, the first one for US and non-US residents and the second one for EU residents. The tagged version is the version applicable in the EU and it does not contain any choice of law, arbitration or jurisdiction clauses. We assumed that each type of clause could be classified as clearly fair, potentially unfair, or clearly unfair. In order to mark the different degrees of (un)fairness we appended a numeric value to each XML tag, with 1 meaning clearly fair, 2 potentially unfair, and 3 clearly unfair. Nested tags were used to annotate text segments relevant to more than one type of clause. If one clause covers more then one paragraphs. we chose to tag each paragraph separately, possibly with different degrees of (un)fairness." The clauses, as explained in the explanation of claim 1, are interpreted as the ruleset. As the tags relate to the unfairness of the clauses, the tags are an indicator of whether the corresponding sentences are in compliance with the ruleset.)

Regarding claim 5, the rejection of claim 1 is incorporated herein. Lippi teaches
wherein the ... compliance [output label] is a prediction of whether the corresponding sentence is in compliance with one or more rulesets. (Page 12 states "For the first task (potentially unfair clause detection) we compared several systems. The problem is formulated as a binary classification task, where the positive class is either the union of all potentially unfair sentences, or the set of potentially unfair clauses of a single category, as described below." As explained in regards to claim 1, the clauses are interpreted as the ruleset. As the output is the unfair sentences, the output label is a prediction of whether the corresponding sentences are in compliance with one or more rulesets.)
Lippi does not appear to explicitly teach
first … pseudo-label
However, Xie—directed to analogous art—teaches
first … pseudo-label (Page 2 states that step 2 is "Use a normal (i.e., not noised) teacher model to generate soft or hard pseudo labels for clean (i.e., not distorted) unlabeled images".) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi and Xie for the reasons given above in regards to claim 1.

Regarding claim 6, the rejection of claim 1 is incorporated herein. Lippi teaches
wherein the second compliance pseudo-label is a prediction of whether the corresponding sentence is in compliance with one or more rulesets. (Page 12 states "For the first task (potentially unfair clause detection) we compared several systems. The problem is formulated as a binary classification task, where the positive class is either the union of all potentially unfair sentences, or the set of potentially unfair clauses of a single category, as described below." As explained in regards to claim 1, the clauses are interpreted as the ruleset. As the output is the unfair sentences, the output label is a prediction of whether the corresponding sentences are in compliance with one or more rulesets. As explained in regards to claim 1, the output label is interpreted as the second compliance pseudo-label.)

Regarding claim 7, the rejection of claim 1 is incorporated herein. Lippi teaches
determining that the text document in the second plurality of text documents is not in compliance with the one or more rulesets when at least one sentence in the text document is labeled as being non-compliant. (Page 12 states "We run experiments following the leave-one-document-out (LOO) procedure, in which each document in the corpus, in turn, is used as test set, leaving the remaining documents for training set (4/5) and validation set (1/5) for model selection." Page 12 further states "For the first task (potentially unfair clause detection) we compared several systems. The problem is formulated as a binary classification task, where the positive class is either the union of all potentially unfair sentences, or the set of potentially unfair clauses of a single category, as described below." Therefore, as each test document is input into the model, the binary classification of the negative class (no sentences labeled as unfair) is determining that the text document is in compliance with one or more rulesets. The categories of clause unfairness in Table 1, described in pages 5-9 are interpreted as the rulesets. When the binary class is negative, the text document is found compliant. If one sentence is labeled non-compliant, it will be included in the union of all potentially unfair sentences, making the binary classification positive, meaning that the document is non-compliant.)

Regarding claim 8, the rejection of claim 1 is incorporated herein. Lippi teaches
sentences and associated second compliance pseudo-labels (As explained in regards to claim 1, the output of the model is interpreted as the second compliance pseudo-labels. The corpus is composed of sentences, as explained in regards to claim 1.)
[a] NLP [model] (As the model processes text documents that are written in natural language, the model is a NLP model.
compliance [pseudo-labels] (As explained in regards to claim 1, the output of the model is interpreted as the  compliance pseudo-labels)
and determines whether each text document in the … plurality of text documents is in compliance with one or more rulesets using the … compliance pseudo-labels generated for the text document. (Page 12 states "We run experiments following the leave-one-document-out (LOO) procedure, in which each document in the corpus, in turn, is used as test set, leaving the remaining documents for training set (4/5) and validation set (1/5) for model selection." Page 12 further states "For the first task (potentially unfair clause detection) we compared several systems. The problem is formulated as a binary classification task, where the positive class is either the union of all potentially unfair sentences, or the set of potentially unfair clauses of a single category, as described below." Therefore, as each test document is input into the model, the binary classification of the negative class (no sentences labeled as unfair) is determining that the text document is in compliance with one or more rulesets. The categories of clause unfairness in Table 1, described in pages 5-9 are interpreted as the rulesets.)
Lippi does not appear to explicitly teach
trains a second … student model using the second plurality of unlabeled [data], including injecting input noise during the training process by aggregating each unlabeled sentence with one or more sentences adjacent to each unlabeled sentence into a sentence block and providing the aggregated sentence blocks as input to train the second … student model;  
executes the trained second … student model, using as input a third plurality of [data], to generate a third … pseudo-label for each unlabeled sentence in the third plurality of [data]; 
[use the] third [data using the] third pseudo-labels
However, Xie—directed to analogous art—teaches
trains a second … student model using the second plurality of [data], including injecting input noise during the training process [and using the noise] as input to train the second … student model (Page 4 states "The best model in our experiments is a result of three iterations of putting back the student as the new teacher. We first trained an EfficientNet-B7 on ImageNet as the teacher model. Then by using the B7 model as the teacher, we trained an EfficientNet-L2 model with the unlabeled batch size set to 14 times the labeled batch size. Then, we trained a new EfficientNet-L2 model with the EfficientNet-L2 model as the teacher. Lastly, we iterated again and used an unlabeled batch size of 28 times the labeled batch size."  Therefore, there is a second student model. Page 2 states that step 4 is "Iterative training: Use the student as a teacher and go back to step 2." Step 3 is "Learn an equal-or-larger student model                 
                    
                            θ
                        
                            *
                        
                            s
                        
             which minimizes the cross entropy loss on labeled images and unlabeled images with noise added to the student model". Therefore, the second student also has noise added to it.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi and Xie for the reasons given above in regards to claim 1. Additionally, page 4 states "The best model in our experiments is a result of three iterations of putting back the student as the new teacher."
The combination of Lippi and Xie does not appear to explicitly teach
injecting input noise during the training process by aggregating each unlabeled sentence with one or more sentences adjacent to each unlabeled sentence into a sentence block and providing the aggregated sentence blocks [as input to train a model]
However, Nguyen—directed to analogous art—teaches
injecting input noise during the training process by aggregating each unlabeled sentence with one or more sentences adjacent to each unlabeled sentence into a sentence block and providing the aggregated sentence blocks [as input to train a model] (The introduction states "Instead, we view concatenation as a kind of data augmentation or noising method (one which pleasantly requires no alteration to the text, unlike data augmentation methods that disturb word order (Belinkov and Bisk, 2018; Anastasopoulos et al., 2019) or replace words with automatically-selected words (Gao et al., 2019; Fadaee et al., 2017; Wang et al., 2018))." Page 287 states “Let                 
                    
                            D
                        
                            orig
                        
                    =
                    
                                            x
                                        
                                            i
                                        
                                    ,
                                     
                                            y
                                        
                                            i
                                        
                    i
                    =
                    1
                    ,
                     
                    .
                     
                    .
                     
                    .
                     
                    ,
                     
                    N
                    }
                     
            be the original training data. We consider two concatenation strategies: CONSEC Concatenate consecutive sentence-pairs:                 
                    
                            D
                        
                            new
                        
                    =
                    
                                            x
                                        
                                            i
                                        
                                            x
                                        
                                            i
                                            =
                                            1
                                        
                                    ,
                                     
                                            y
                                        
                                            i
                                        
                                            y
                                        
                                            i
                                            +
                                            1
                                        
                            i
                            =
                            1
                            ,
                             
                            .
                             
                            .
                             
                            .
                             
                            ,
                             
                            N
                            -
                            1
                        
                    .
                
             Page 288 states "For baseline, the training data is                 
                    
                            D
                        
                            orig
                        
            . For concatenation, we first create                 
                    
                            D
                        
                            new
                        
            , then combine it with                 
                    
                            D
                        
                            orig
                        
             to create the training data." Therefore, the aggregated sentences are used as input to train the model.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi and Xie with the teachings of Nguyen for the reasons given above in regards to claim 1.

	Regarding claim 9, the rejection of claim 1 is incorporated herein. Lippi teaches
	text documents [and] sentences (The corpus, as explained in regards to claim 1, includes text documents comprising sentences.)
	Lippi does not appear to explicitly teach
	[increasing the number of sentences by data augmentation]  
	wherein the second plurality of [data] comprises a larger number of [data] than the first plurality of [data].
	However, Xie—directed to analogous art—teaches
	wherein the second plurality of [data] comprises a larger number of [data] than the first plurality of [data]. (As the data is augmented when the student model is trained, the student training data is interpreted as the second plurality of data and the teacher training data is interpreted as the first plurality of data. Page 2 states "For input noise, we use data augmentation with RandAugment [18]." Page 2 further states "First, data augmentation is an important noising method in Noisy Student Training because it forces the student to ensure prediction consistency across augmented versions of an image (similar to UDA [91])." Data augmentation is therefore used to increase the second plurality of data (training data for the student model) compared to the first plurality of data (training data used to train the teacher model).)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi and Xie for the reasons given above in regards to claim 1. Additionally, as Xie states on page 2, "When applied to unlabeled data, noise has an important benefit of enforcing invariances in the decision function on both labeled and unlabeled data. First, data augmentation is an important noising method in Noisy Student Training because it forces the student to ensure prediction consistency across augmented versions of an image (similar to UDA [91])."
	The combination of Lippi and Xie does not appear to explicitly teach
	[increasing the number of sentences by data augmentation]
	However, Nguyen—directed to analogous art—teaches
	[increasing the number of sentences by data augmentation] (The introduction states "Instead, we view concatenation as a kind of data augmentation or noising method (one which pleasantly requires no alteration to the text, unlike data augmentation methods that disturb word order (Belinkov and Bisk, 2018; Anastasopoulos et al., 2019) or replace words with automatically-selected words (Gao et al., 2019; Fadaee et al., 2017; Wang et al., 2018))." Page 287 states “Let                 
                    
                            D
                        
                            orig
                        
                    =
                    
                                            x
                                        
                                            i
                                        
                                    ,
                                     
                                            y
                                        
                                            i
                                        
                    i
                    =
                    1
                    ,
                     
                    .
                     
                    .
                     
                    .
                     
                    ,
                     
                    N
                    }
                     
            be the original training data. We consider two concatenation strategies: CONSEC Concatenate consecutive sentence-pairs:                 
                    
                            D
                        
                            new
                        
                    =
                    
                                            x
                                        
                                            i
                                        
                                            x
                                        
                                            i
                                            =
                                            1
                                        
                                    ,
                                     
                                            y
                                        
                                            i
                                        
                                            y
                                        
                                            i
                                            +
                                            1
                                        
                            i
                            =
                            1
                            ,
                             
                            .
                             
                            .
                             
                            .
                             
                            ,
                             
                            N
                            -
                            1
                        
                    .
                
             Page 288 states "For baseline, the training data is                 
                    
                            D
                        
                            orig
                        
            . For concatenation, we first create                 
                    
                            D
                        
                            new
                        
            , then combine it with                 
                    
                            D
                        
                            orig
                        
             to create the training data." Therefore, the aggregated sentences are used as input to train the model.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi, Xie, and Nguyen for the reasons given above in regards to claim 1.

	Regarding claim 10, Lippi teaches
A computerized method of predicting compliance of text documents with a ruleset using self-supervised machine learning, the method comprising: (Page 2 states "To address this problem, we propose a machine learning-based method and tool for partially automating the detection of potentially unfair clauses." Page 2 states "According to art. 3 of the Directive 93/13 on Unfair Terms in Consumer Contracts, a contractual term is unfair if: 1) it has not been individually negotiated; and 2) contrary to the requirement of good faith, it causes a significant imbalance in the parties rights and obligations, to the detriment of the consumer. This general definition is GS: further specified in the Annex to the Directive, containing an indicative and non-exhaustive list of the terms which may be regarded as unfair, as well in a few dozen judgments of the Court of Justice of the EU (Micklitz and Reich 2014). Examples of unfair clauses encompass taking jurisdiction away from the consumer, limiting liability for damages on health and/or gross negligence, imposing obligatory arbitration in a country different from consumers residence etc." Therefore, detecting potentially unfair clauses is interpreted as predicting compliance of documents. Page 5 states "The corpus consists of 50 relevant on-line consumer contracts, i.e. the Terms of Service of on-line platforms." Therefore, the documents are text documents. Page 10 states "We address two different tasks: a detection task, aimed at predicting whether a given sentence contains a (potentially) unfair clause, and a classification task, aimed at predicting the category an unfair clause belongs to, which indeed could be a valuable piece of information to a potential user." One of ordinary skill in the art would realize that machine learning, as is present in Lippi, would be executed on a computer.)
The remainder of claim 10 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 11-18 recite substantially similar subject matter to claims 2-9 respectively and are rejected with the same rationale, mutatis mutandis.

Regarding claim 19, the rejection of claim 8 is incorporated herein. The combination of Lippi, Xie, and Nguyen does not appear to explicitly teach 
for each text document in the third plurality of text documents that is determined to be non-compliant:
generate a document fingerprint based upon digital attributes of the text document,
match the document fingerprint to one or more text documents in document storage, and 
deactivate access to the one or more text documents in document storage.
However, Brock—directed to analogous art—teaches
for each text document in the second plurality of text documents that is determined to be non-compliant: ([0104] states "At 264, the fingerprint (or another analysis result) is checked against a database of known non-compliant (or known compliant) content objects. In some embodiments, the database includes a common pool of content that has previously been identified either manually or automatically as non-compliant or compliant. The content can be looked up by fingerprint or any other appropriate index." The database of known non-compliant documents are interpreted as the third plurality of text documents determined to be non-compliant. [0025] states "As used herein, a unit of content may be referred to as a content object. Content objects can include any object type. Examples of content objects include a text document, an image, Video, audio, flash, animation, game, lyrics, code, or portions thereof (e.g., a phrase/sentence/paragraph, a subimage, or a video clip)." Therefore, content objects are interpreted as text documents.)
generate a document fingerprint based upon digital attributes of the text document, ([0104] states "At 264, the fingerprint (or another analysis result) is checked against a database of known non-compliant (or known compliant) content objects. In some embodiments, the database includes a common pool of content that has previously been identified either manually or automatically as non-compliant or compliant. The content can be looked up by fingerprint or any other appropriate index." Therefore, as the content can be looked up by fingerprint, the non-compliant text documents must have had a document fingerprint generated. [0034] states "A fingerprint includes a signature of an object that can be used to detect a copy of an object as a whole or in part. A content object may have more than one fingerprint. A fingerprint may be associated with more than one content object. A fingerprint may be associated with a whole or part of a content object. A fingerprint may be multidimensional. For example, there may be multiple features associated with a fingerprint. A fingerprint may contain multiple fingerprints or subfingerprints." Therefore, the fingerprint is based upon digital attributes of the document.)
match the document fingerprint to one or more text documents in document storage, and ([0104] states "At 264, the fingerprint (or another analysis result) is checked against a database of known non-compliant (or known compliant) content objects. In some embodiments, the database includes a common pool of content that has previously been identified either manually or automatically as non-compliant or compliant. The content can be looked up by fingerprint or any other appropriate index." Therefore, the new text documents are matched to the new documents by fingerprint. [0034] states "Controlled content store 116 includes controlled content. In some embodiments, controlled content store 116 includes the following information: a copy of the content, an index of fingerprints associated with the content, and meta data about the content (e.g., filename, URL, fetch date, etc.). In some embodiments, the copy of the content is stored in a separate cache." Therefore, the document is stored in a document storage.)
deactivate access to the one or more text documents in document storage. ([0104] states "At 266, it is determined whether the content object is non-compliant according to the database. If it is non-compliant according to the database, the content object is removed at 272." Therefore, as the content object is removed, the access to the text document is deactivated.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Lippi, Xie, and Nguyen with the teachings of Brock because, as Brock states in [0004] "From an OSPs perspective, monitoring for content that does not comply with the OSPs host policy is also typically a manual process. When OSPs monitor content as it is uploaded, typically a human views and approves content before (or after) it is displayed and non-compliant content is rejected (or removed). OSPs also must manually review and compare content when they receive DMCA notices, and often have little information to determine if content is out of compliance and no automated way to determine the identity or reputation of the complaining party. As the amount of content on the Internet grows, manual content monitoring and enforcement processes are becoming increasingly impractical. Therefore, improved methods for monitoring content and managing enforcement of non-compliant content are needed."

Claim 20 recites substantially similar subject matter to claim 19 and is rejected with the same rationale, mutatis mutandis.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.T.P./               Examiner, Art Unit 2121                                                                                                                                                                                         

	/Li B. Zhen/               Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Nov 18, 2022
Application Filed
Sep 11, 2025
Non-Final Rejection mailed — §103
Dec 09, 2025
Response Filed
Mar 02, 2026
Final Rejection mailed — §103 (current)
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
17%
Grant Probability
17%
With Interview (+0.0%)
4y 1m (~6m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 6 resolved cases by this examiner. Grant probability derived from career allowance rate.
PREDICTING COMPLIANCE OF TEXT DOCUMENTS WITH A RULESET USING SELF-SUPERVISED MACHINE LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

PREDICTING COMPLIANCE OF TEXT DOCUMENTS WITH A RULESET USING SELF-SUPERVISED MACHINE LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email