Office Action Analysis: 18069073 — CONTRASTIVE LEARNING METHOD BASED ON IMPLICATIONS FOR DETECTING IMPLICIT HATE EXPRESSION, APPARATUS AND COMPUTER PROGRAM FOR PERFORMING THE METHOD

Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No.  KR 10-2022-0136485 , filed on 10/21/2022.

Status of Claims
Claims 1-13 are pending and examined herein. 
Claims 1-13 are rejected under 35 U.S.C. 112(b).
Claims 1-13 are rejected under 35 U.S.C. 101.
Claims 1-13 are rejected under 35 U.S.C. 103. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 11-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
		Claims 1 – 13 repeatedly recite the limitation “an implicit hate expression” in line 1 of each claims and multiple times through claim 9. The indefinite “an” is used multiple times in each claim as new instances of implicit hate expression are being introduced, rather than consistently referring back to the original introduction. There is insufficient antecedent basis for the limitation in the claims. 
Claim 1 recites the limitation “the hate expression detection model” in line 7 and “the training dataset” in line 5. Claim 9 also recites the limitation “the training dataset” in line 10. These limitations are never introduced prior to it . There is insufficient antecedent basis for these limitations in the claims. For examination purposes, “the training dataset” recited in claims 1, 9 will refer to “a training dataset” as they were the first time term was introduced in the claim. 

Claims 2, 10 recite the limitation “superficially different” in line 3. It is a subjective term of degree that does not provide clear guidance as to the scope of the claimed invention. The specification does not appear to provide an objective standard by which one of ordinary skill in the art could determine whether an expression is “superficially different.”

Claims 2 – 3 recite the limitation “a training dataset” in line 2. Claims 3, 11 recite the limitation “a text” in line 3 and “a predetermined hate expression” in line 4. These limitations were already introduced in claims that they are dependent to. There is insufficient antecedent basis for these limitations in the claim. For examination purposes, these limitations will refer back to previously introduced limitations as “the training dataset”, “the text”, and “the predetermined hate expression”. 	Claims 4 – 7 recite the limitation “a hate expression detection model” in claim 4 line 8, 11, claim 5 – 7 line 2. The limitation “hate expression detection model” was already introduced in claim 1 that they are dependent to. There is insufficient antecedent basis for the limitation in these claims. For examination purposes, the limitation “a hate expression detection model” will refer back to previously introduced limitation. 

Claims 5, 13 recite the limitation “an encoded expression value of the input text”  in line 6. These limitations were already introduced in claims that they are dependent to. There is insufficient antecedent basis for these limitations in the claim. For examination purposes, these limitations will refer back to previously introduced limitations as “the encoded expression value of the input text”.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  
The claim 8 does not fall within at least one of the four categories of patent eligible subject matter because the broadest reasonable interpretation of the "computer readable storage medium" encompasses signals per se. The specification discloses that “The computer readable storage medium indicates an arbitrary medium which participates to provide a command to a processor for execution.  The computer readable storage medium may include solely a program command, a data file, and a data structure or a combination thereof.   For example, the computer readable medium may include a magnetic medium, an optical recording medium, and a memory.   The computer program may be distributed on a networked computer system so that the computer readable code may be stored and executed in a distributed manner.” (Paragraph [0084]). A claim whose BRI covers both statutory and non-statutory embodiments embraces subject matter that is not eligible for patent protection and therefore is directed to non-statutory subject matter. See MPEP 2106.03(II). It is suggested that claim 8 be amended to recite a “non-transitory” computer readable medium. 

Claims 1 - 13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject matter. The analysis of claims 1-20, in accordance with these steps, follows. 

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine, manufacture, or composition of matter.
Claims 1 - 7 are directed to a contrastive learning method, meaning that it is directed to the statutory category of process. Claims 8 are directed to a computer readable storage medium, which is also the statutory category of manufacture. Claims 9 – 13 are directed to a contrastive learning apparatus, which can be an article of machine. 

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.

Regarding claim 1, the following claim elements are abstract ideas:
and training the hate expression detection model using a contrastive loss function together with a cross entropy loss function, (Training model by using a contrastive loss function with a cross entropy loss function is merely mathematical calculations, which is a mathematical concept. )
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
acquiring a plurality of input texts to be used as training data; (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(g). Therefore, this does not amount to significantly more than the judicial exception.)
acquiring a positive sample for each of the plurality of input texts and acquiring the training dataset based on the plurality of input texts and the positive sample for each of the plurality of input texts; (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(g). Therefore, this does not amount to significantly more than the judicial exception.)
based on the training dataset. (This falls under mere instructions to apply general training of said model. See MPEP 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following abstract idea:
which is semantically similar to the input text, but is superficially different from the input text as the positive sample for the input text when the input text is not a predetermined hate expression. (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components.)
Claim 2 recites the following additional element.
the acquiring of a training dataset is configured by acquiring a text (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(g). Therefore, this does not amount to significantly more than the judicial exception.) 

Regarding claim 3, the rejection of claim 2 is incorporated herein. Further, claim 3 recites the following abstract idea:
which represents implication of the input text as the positive sample for the input text when the input text is a predetermined hate expression. (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components.)
Claim 3 recites the following additional element.
the acquiring of a training dataset is configured by acquiring a text (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(g). Therefore, this does not amount to significantly more than the judicial exception.)
Regarding claim 4, the rejection of claim 3 is incorporated herein. Further, claim 4 recites the following additional elements:
a first encoder which outputs an encoded expression value of the input text when the input text is input; (This is mere data gathering and outputting, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(g). Therefore, this does not amount to significantly more than the judicial exception.)
a second encoder which outputs an encoded expression value of the positive sample when the positive sample for the input text is input; (This is mere data gathering and outputting, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(g). Therefore, this does not amount to significantly more than the judicial exception.)
and a classifier which outputs a value representing whether the input text is a hate expression or not when the encoded expression value of the input text which is an output of the first encoder is input, (This is mere data gathering and outputting, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. See MPEP § 2106.05(g). Therefore, this does not amount to significantly more than the judicial exception.)
and the training of a hate expression detection model is configured by training the hate expression detection model based on the training dataset and removing the second encoder from the hate expression detection model when the training of the hate expression detection model is completed. (This falls under mere instructions to apply general training of said model. See MPEP 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
Regarding claim 5, the rejection of claim 4 is incorporated herein. Further, claim 5 recites the following abstract idea:
the training of a hate expression detection model is configured by training the hate expression detection model by repeatedly training based on the cross entropy loss function which uses an output value of the classifier for the input text and a correct answer label corresponding to the input text and the contrastive loss function which uses an encoded expression value of the input text which is an output value of the first encoder for the input text and the encoded expression value of the positive sample which is an output value of the second encoder for the positive sample. (This falls under mere instructions to apply general training of said model. See MPEP 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
Claim 5 does not recite additional elements. 

Regarding claim 6, the rejection of claim 5 is incorporated herein. Further, claim 6 recites the following abstract idea:
the training of a hate expression detection model is configured by training the hate expression detection model using the cross entropy loss function Lce as represented in Equation 1, Equation 1 is represented by  

    PNG
    media_image1.png
    52
    325
    media_image1.png
    Greyscale
, yi is a correct answer label corresponding to an i-th input text and 
    PNG
    media_image2.png
    38
    30
    media_image2.png
    Greyscale
 is a prediction probability which is an output of the classifier for the input text. (Claim is merely reciting mathematical equations for cross entropy loss function, which is mathematical concept.)
Claim 6 does not recite additional elements. 

Regarding claim 7, the rejection of claim 6 is incorporated herein. Further, claim 7 recites the following abstract idea:
the training of a hate expression detection model is configured by training the hate expression detection model using the contrastive loss function Lcl as represented in Equation 2, Equation 2 is represented by

    PNG
    media_image3.png
    107
    427
    media_image3.png
    Greyscale
 N is a number of input texts, xi is an –th input text,
    PNG
    media_image4.png
    34
    46
    media_image4.png
    Greyscale
 is a positive sample for the i-th input text, h(xi) is an encoded expression value of the i-th input text, h(xi)∈RH, and H is a hidden dimension size, 1[·] is an indicator function, and τ is a temperature hyperparameter which adjusts scaling of dot product. (Claim is merely reciting mathematical equations for contrastive loss function, which is mathematical concept.)
Claim 7 does not recite additional elements. 

Regarding claim 8, the rejection of claim 1 is incorporated herein. Further, claim 8 recites the following additional element:
A computer program stored in a computer readable storage medium to cause a computer to execute the contrastive learning method based on implications for detecting an implicit hate expression according to claim 1. (This falls under mere instructions to apply general training of said model. See MPEP 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 9, the following claim elements are additional elements:
a memory which stores one or more programs to execute contrastive learning based on implications of an implicit hate expression to detect an implicit hate expression; (This falls under mere instructions to apply general training of said model. See MPEP 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
 and one or more processors which perform an operation to perform contrastive learning based on implications of an implicit hate expression to detect an implicit hate expression according to one or more programs stored in the memory (This falls under mere instructions to apply general training of said model. See MPEP 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

The rest of claims 9 - 13 recite substantially similar subject matter to claims 1 – 5 respectively and are rejected with the same rationale, mutatis mutandis.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1 – 3, 8, 9 – 11 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan et al.  (U.S. Pub. 2023/0153629 A1) in view of Shi et al. (NPL:“ Emotion Detection with Deep Neural Network and Contrastive Learning”).
Regarding Claim 1, Krishnan teaches 
A contrastive learning method based on implications for detecting an implicit hate expression, comprising: acquiring a plurality of input texts to be used as training data; ([0056] of Krishnan states “ Although the present disclosure focuses on data examples from the image domain for ease of explanation, the framework is extensible to data examples of different domains, including text and/or audio domains. Example types of images that can be used include video frames, LiDAR point clouds, computed tomography scans, X-ray images, hyper-spectral images, and/or various other forms of imagery.” [0081] of Krishnan states “Other example user input components include a microphone, a traditional keyboard, or other means by which a user can provide user input.” Therefore, the Krishnan’s disclosure deals with collecting input texts to be used.)
acquiring a positive sample for each of the plurality of input texts and acquiring the training dataset based on the plurality of input texts and the positive sample for each of the plurality of input texts; ([0029] of Krishnan states “However, in some implementations, the proposed supervised contrastive loss (shown generally at FIG. 2C) has two stages; in the first stage labels are used to choose the images for a contrastive loss, including multiple positive examples and one or more negative examples.” For each input(anchor), label is used to acquire a positive samples. The first stage itself would be acquiring the training dataset consisting input and positive samples. )
Krishnan does not explicitly teach
and training the hate expression detection model using a contrastive loss function together with a cross-entropy loss function, based on the training dataset. 
However, Shi teaches that 
and training the hate expression detection model using a contrastive loss function together with a cross-entropy loss function, based on the training dataset. (Pg. 85 IV. Experiment Setup section of Shi states “In this experiment, I use a tweets dataset that contains 40000 tweet texts with 12 emotions (neutral, worry, happiness, sadness, love, surprise, fun, relief, hate, empty, enthusiasm, boredom and anger) from Kaggle.” And Pg. 85 D. Combined Loss Used in Models section  of Shi states “To account for both cross entropy and contrastive loss, I integrated two loss functions with adding them up while generating the models.”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings from Krishnan with Shi because both utilize a contrastive learning framework in which a cross-entropy loss function and a contrastive loss function are used. Krishnan teaches the use of multiple positive and negative examples in contrastive learning and explicitly notes that their disclosure is applicable for text data. Shi  teaches the use of a classifier trained with contrastive learning framework for emotion detection tasks, including hate as one of the emotion to detect. One with the ordinary skill in the art would have been motivated to incorporate the teachings of Shi into Krishnan to effectively perform hate expression detection task in text with the known contrastive learning method. 

Regarding claim 2, the rejection of claim 1 is incorporated herein. The combination of Krishnan and Shi teaches 
the acquiring of a training dataset is configured by acquiring a text which is semantically similar to the input text, but is superficially different from the input text as the positive sample for the input text when the input text is not a predetermined hate expression. ([0029] of Krishnan states “ However, in some implementations, the proposed supervised contrastive loss (shown generally at FIG. 2C) has two stages; in the first stage labels are used to choose the images for a contrastive loss, including multiple positive examples and one or more negative examples.” Positive samples are already selected based on labels. Conditioning this with non-hate (not predetermined hate) is a predictable and routine extension of known approach.)

Regarding claim 3, the rejection of claim 2 is incorporated herein. The combination of Krishnan and Shi teaches 
the acquiring of a training dataset is configured by acquiring a text which represents implication of the input text as the positive sample for the input text when the input text is a predetermined hate expression. (Pg. 83 I. Introduction section of Shi states “Contrastive loss functions intend to learn similarity functions that measure the similarity or distance between a pair of objects. In the context of classification, the desired metric would render a pair of examples with the same label more similar than a pair of examples with different labels. Typically, it is used in supervised learning to uncover sample connections, which are determined by comparing sample pairings, both positive and negative.” Referring to specification [0070] of the current disclosure, “implication” of a hate expression may be represented by semantically similar but superficially different text. Accordingly, Shi acquire such semantically similar text as the positive sample for the input text.)

Regarding claim 8, the rejection of claim 1 is incorporated herein. The combination of Krishnan and Shi teaches 
A computer program stored in a computer readable storage medium to cause a computer to execute the contrastive learning method based on implications for detecting an implicit hate expression according to claim 1. ([0008] of Krishnan states “Other aspects of the present disclosure are directed to various systems, apparatuses, non-transitory computer-readable media, user interfaces, and electronic devices.”)

	Claims 9 – 11 recite substantially similar subject matter as claims 1 – 3 respectively, and are rejected with the same rationale, mutatis mutandis.

Claims 4 – 5 , 12 – 13 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan et al.  (U.S. Pub. 2023/0153629 A1) in view of Shi et al. (NPL:“ Emotion Detection with Deep Neural Network and Contrastive Learning”), further in view of Chen et al. (NPL: “A Simple Framework for Contrastive Learning of Visual Representations”).
Regarding claim 4, the rejection of claim 3 is incorporated herein. The combination of Krishnan and Shi teaches 
and a classifier which outputs a value representing whether the input text is a hate expression or not when the encoded expression value of the input text which is an output of the first encoder is input, (Pg. 1 I. Introduction of Shi states “The cross-entropy loss function has a number of significant advantages, one of which is that it minimizes the distance between predicted and actual probability distributions. However, when considering the spatial distribution of samples, intuitively displaying the relationship between sample data is insufficient. Reducing the distance between samples belonging to the same category while increasing the distance between samples belonging to different categories can help clarify the space relationship between the samples. Meanwhile, because samples with similar labels are closer together and those with dissimilar labels are further apart, this can benefit both the performance of emotion classification models and their accuracy. After investigation, the contrastive loss function can theoretically satisfy the requirement.”)
and the training of a hate expression detection model is configured by training the hate expression detection model based on the training dataset (Pg. 85 IV. Experiment Setup section of Shi states “In this experiment, I use a tweets dataset that contains 40000 tweet texts with 12 emotions (neutral, worry, happiness, sadness, love, surprise, fun, relief, hate, empty, enthusiasm, boredom and anger) from Kaggle.” And Pg. 85 D. Combined Loss Used in Models section  of Shi states “To account for both cross entropy and contrastive loss, I integrated two loss functions with adding them up while generating the models.”)
The combination of Krishnan and Shi does not teach 
a first encoder which outputs an encoded expression value of the input text when the input text is input;
a second encoder which outputs an encoded expression value of the positive sample when the positive sample for the input text is input;
removing the second encoder from the “model” when the training of the “model” is completed.
However, Chen teaches 
a first encoder which outputs an encoded expression value of the input text when the input text is input; (Pg. 2 2. Method section of Chen states “A neural network base encoder f(·) that extracts representation vectors from augmented data examples. Our framework allows various choices of the network architecture without any constraints.”)
a second encoder which outputs an encoded expression value of the positive sample when the positive sample for the input text is input; (Pg. 2 Fig. 2 of Chen states “Two separate data augmentation operators are sampled from the same family of augmentations (t ∼ T and t 0 ∼ T ) and applied to each data example to obtain two correlated views. A base encoder network f(·) and a projection head g(·) are trained to maximize agreement using a contrastive loss.” Pg. 2 2. Method of Chen states “A contrastive loss function defined for a contrastive prediction task. Given a set {x˜k} including a positive pair of examples x˜i and x˜j , the contrastive prediction task aims to identify x˜j in {x˜k}k!=i for a given x˜i.”)
removing the second encoder from the “model” when the training of the “model” is completed. (Pg.2 Fig.2  of Chen states “After training is completed, we throw away the projection head g(·) and use encoder f(·) and representation h for downstream tasks.”)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings from Chen with the combination of Krishnan and Shi because Chen further uses differently augmented views of the same data example through contrastive learning and uses two encoders for paired inputs. One with the ordinary skill in the art would have been motivated to incorporate the teachings of Chen into the combination of Krishnan and Shi in order to adapt contrastive text embeddings for downstream hate expression detection. 

Regarding claim 5, the rejection of claim 4 is incorporated herein. The combination of Krishnan, Shi, and Chen teaches 
the training of a hate expression detection model is configured by training the hate expression detection model by repeatedly training based on the cross entropy loss function which uses an output value of the classifier for the input text and a correct answer label corresponding to the input text and the contrastive loss function which uses an encoded expression value of the input text which is an output value of the first encoder for the input text and the encoded expression value of the positive sample which is an output value of the second encoder for the positive sample. (All details of the methods are explained in previous limitations. The training model repeatedly based on these methods is stated in Pg. 3 Algorithm 1 of Chen as the following. 
    PNG
    media_image5.png
    491
    431
    media_image5.png
    Greyscale
. Chen’s algorithm loops over sampled minibatches to repeatedly train the network to minimize contrastive loss.)
	Claims 12 – 13 recite substantially similar subject matter as claims 4 – 5 respectively, and are rejected with the same rationale, mutatis mutandis.

Claims 6 – 7 are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan et al.  (U.S. Pub. 2023/0153629 A1), Shi et al. (NPL:“ Emotion Detection with Deep Neural Network and Contrastive Learning”), Chen et al. (NPL: “A Simple Framework for Contrastive Learning of Visual Representations”), further in view of Shankar (NPL: “Introduction to Contrastive Loss - Similarity Metric as an Objective Function”).
Regarding claim 6, the rejection of claim 5 is incorporated herein. The combination of Krishnan, Shi, and Chen does not explicitly teach 
the training of a hate expression detection model is configured by training the hate expression detection model using the cross-entropy loss function Lce as represented in Equation 1, Equation 1 is represented by 

    PNG
    media_image1.png
    52
    325
    media_image1.png
    Greyscale
, yi is a correct answer label corresponding to an i-th input text and 
    PNG
    media_image2.png
    38
    30
    media_image2.png
    Greyscale
 is a prediction probability which is an output of the classifier for the input text.
However, Shankar teaches 
the training of a hate expression detection model is configured by training the hate expression detection model using the cross-entropy loss function Lce as represented in Equation 1, Equation 1 is represented by 

    PNG
    media_image1.png
    52
    325
    media_image1.png
    Greyscale
, yi is a correct answer label corresponding to an i-th input text and 
    PNG
    media_image2.png
    38
    30
    media_image2.png
    Greyscale
 is a prediction probability which is an output of the classifier for the input text. (Pg. 3 of Shankar shows equation same as shown in the disclosure
    PNG
    media_image6.png
    30
    416
    media_image6.png
    Greyscale
.)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings from Shankar with the combination of Krishnan, Shi, and Chen because Shankar provides the explicit mathematical formulation of contrastive loss used in the art. One with the ordinary skill in the art would have been motivated to incorporate the teachings of Shankar into the combination of Krishnan, Shi, and Chen in order to apply the explicit loss formulation into the contrastive learning framework, as such equations were well-known implementation of contrastive learning framework. 

Regarding claim 7, the rejection of claim 6 is incorporated herein. The combination of Krishnan, Shi, Chen, and Shankar explicitly teach 
the training of a hate expression detection model is configured by training the hate expression detection model using the contrastive loss function Lcl as represented in Equation 2, Equation 2 is represented by

    PNG
    media_image3.png
    107
    427
    media_image3.png
    Greyscale
 N is a number of input texts, xi is an –th input text,
    PNG
    media_image4.png
    34
    46
    media_image4.png
    Greyscale
 is a positive sample for the i-th input text, h(xi) is an encoded expression value of the i-th input text, h(xi)∈RH, and H is a hidden dimension size, 1[·] is an indicator function, and τ is a temperature hyperparameter which adjusts scaling of dot product. (Pg. 2 of Chen states “Then the loss function for a positive pair of examples (i; j) is defined as [Equation 1]… T denotes a temperature parameter. The final loss is computed across all positive pairs, both (i; j) and (j; i), in a mini-batch.” And the equation 1 
    PNG
    media_image7.png
    66
    404
    media_image7.png
    Greyscale
)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BYUNGKWON HAN whose telephone number is (571)272-5294. The examiner can normally be reached M-F: 8:30AM-6PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/BYUNGKWON HAN/               Examiner, Art Unit 2121                                                                                                                                                                                         
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
CONTRASTIVE LEARNING METHOD BASED ON IMPLICATIONS FOR DETECTING IMPLICIT HATE EXPRESSION, APPARATUS AND COMPUTER PROGRAM FOR PERFORMING THE METHOD

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

CONTRASTIVE LEARNING METHOD BASED ON IMPLICATIONS FOR DETECTING IMPLICIT HATE EXPRESSION, APPARATUS AND COMPUTER PROGRAM FOR PERFORMING THE METHOD

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email