Last updated: May 29, 2026
Application No. 18/315,931
SYSTEM AND METHOD FOR CONTEXT INSERTION FOR CONTRASTIVE SIAMESE NETWORK TRAINING

Non-Final OA §103
Filed
May 11, 2023
Priority
May 25, 2022 — provisional 63/345,614
Examiner
LERNER, MARTIN
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Samsung Electronics Co., Ltd.
OA Round
3 (Non-Final)
Interview Optional

— +13.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 78% grant rate with +13.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 988 resolved cases, 2023–2026
Examiner Intelligence

LERNER, MARTIN View full profile →
Grants 78% — above average
Career Allowance Rate
771 granted / 988 resolved
+16.0% vs TC avg
Moderate +13% lift
Without
With
+13.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
22 currently pending
Career history
1008
Total Applications
across all art units
Statute-Specific Performance

§101
10.0%
-30.0% vs TC avg
§103
74.1%
+34.1% vs TC avg
§102
3.5%
-36.5% vs TC avg
§112
8.7%
-31.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 988 resolved cases
Office Action

§103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5 to 6, 9, 13 to 14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Jiang et al. (U.S. Patent Publication 2019/0205743) in view of Choi et al. (U.S. Patent Publication 2021/0201143).
Concerning independent claims 1, 9, and 17, Jiang et al. discloses a method, system, and computer-program product for detangling conversations, comprising:
“receiving an input utterance that is a continuation of a previous utterance” – a new post 105 (“a input utterance”) is combined with a plurality of earlier posts 110 to determine post pairs 115; process 200 uses active learning to identify pairs of posts that should be part of the same conversation or thread (“a continuation of a previous utterance”); one or more new posts 205 are received and added to a corpus of earlier posts 210 already a part of the chat or messaging platform; earlier posts 210 within a specific window relative to the new post 205 are identified (¶[0029] - ¶[0030]: Figures 1 and 2: Steps 205 to 215); broadly, a post in a conversation is an ‘utterance’ (Figures 5 to 14); a new post that is part of a same conversation thread as earlier posts is “a continuation of a previous utterance”;
“using a trained Siamese network: determining input utterance embeddings representing tokens from the input utterance” – a neural network, e.g., a Siamese network, may be used to estimate a textual similarity between the new post and each of the posts that have been identified to be within the window at 215; word-embedding using deep learning techniques may be employed to represent the words in a post; to predict which thread, if any, a post in a chat platform is a member of, a Siamese network may be used to estimate the textual similarity of the new post and each of the posts that have been identified to be within the window; a Siamese subnetwork may encode each word in a post into an embedding (¶[0032] - ¶[0036]: Figures 2 and 4); an embedding may optionally be generated to represent each word of each post being considered; an embedding may be performed using a neural network; a Siamese neural network may be used to generate the embeddings to represent the words of each post (¶[0049]: Figure 3); Siamese neural network 400 may encode each word of post 405A, 405B into an embedding 410 (¶[0061]: Figure 4);
“pooling the input utterance embeddings with a context token embedding [representing a class] associated with the previous utterance to generate a representative input utterance embedding” – a content feature is extracted from an embedding associated with each message (Abstract); attention may be used to extract features based on the textual and contextual data representing a post; this textual similarity feature may be combined with other features based on meta-data and may also include one or more of: distance in time between the two posts, number of other posts between the two posts, authors of the two posts, or whether they are the same (¶[0036] - ¶[0040]; Figure 4); an embedding may be generated to represent each word of each post being considered for threading; detection of content features may include concatenation of embeddings to identify both local features associated with a portion of the post and global features associated with the entire post (“pooling the input utterance embeddings”) (¶[0049]: Figure 3A); a neural network may be used to extract features based on textual and contextual data representing a post (¶[0061]: Figure 4); here, concatenating embeddings to identify both local features and global features of a post is “pooling the input utterance embeddings with a context token embedding [representing a class] associated with the previous utterance to generate a representative input utterance embedding””;
“determining a representative utterance [associated with each of multiple possible classes, wherein each possible class is] associated with (i) a first threshold boundary that encompasses embeddings of utterances [that specify that possible class] and (ii) a second threshold boundary that encompasses embeddings of continuation utterances with context” – features may then be used to compute the similarity score between the two posts published closely in time or number of posts at 225; scores for a given new post are then examined at 230; if a similarity score is above a threshold at 230, the new post 210 is connected to the highest score posts (e.g., the most similar posts); for each new post and the links of the new post to recent earlier posts, the model predicts (1) the thread to which a post is most similar and (2) whether the post is the start of a new thread (¶[0041]: Figure 2: Step 230); conversely, if the similarity score is below the threshold at 230, a new thread may be hypothesized (¶[0043]: Figure 2: Step 230); if the total similarity score is in a middle range below a threshold determined to be clearly indicative of a link between posts but above a threshold determined to be clearly indicative of no link (“(i) a first threshold boundary” and “(ii) a second threshold boundary”), a user may be queried to confirm whether the posts are related (¶[0052]: Figure 2); an algorithm has two parameters r and h, where r is a high threshold of similarity ranks and h is a lower threshold of similarity scores (¶[0055]); retrieved high-confidence pairs (e.g., the messages having the highest similarity scores or the top-r pairs) are treated as the edges in a message graph G at 340 (¶[0055] - ¶[0058]: Figure 3); here, r and h are two thresholds (“(i) a first threshold boundary that encompasses embeddings of utterances . . . and (ii) a second threshold boundary that encompasses embeddings of continuation utterances with context”) that determine if a message pair that includes contextual meta-data belongs to a same thread or a new thread;
“determining a similarity score [for each possible class] based on a distance between the representative input utterance embedding and a selected threshold boundary of the representative embedding [for that possible class], the selected threshold boundary comprising the first threshold boundary or the second threshold boundary” – based on extracted content features, a text similarity score between the pair of messages is generated using a neural network and the generated text similarity score is combined with additional data associated with the messages to generate a total similarity score; a message thread is generated based on the generated total similarity score for the pair of messages selected from the plurality of unstructured messages (Abstract); if a similarity score is below the threshold, a new thread may be hypothesized (¶[0043]: Figure 2); if the similarity score is above a threshold, the new post 210 is connected to the highest score posts (e.g., the most similar posts) (¶[0044]: Figure 2); an algorithm has two parameters r and h, where r is a high threshold of similarity ranks and h is a lower threshold of similarity scores; (¶[0055] - ¶[0056]: Figure 3); pairs of posts with high similarity greater than a threshold r (“a second threshold boundary”) “encompass embeddings of continuation utterances with context” and pairs of posts with low similarity less than threshold h (“a first threshold boundary”) “encompasses embeddings of utterances” of a new thread; implicitly, a similarity score represents “a distance” between embeddings of two posts with context as compared to a threshold; that is, a higher similarity corresponds to lesser distance and a lower similarity corresponds to a greater distance;
“performing an action [corresponding to the identified class]” – in some implementations, a user may be infrequently queried when the similarity score is below the threshold to improve the performance of the detangling process; a determination may be made whether to query the user or not; with similarity scores closer to 0.5 and with no post with high similarity, a determination to perform a query may be made; alternatively, a post with high confidence could be also infrequently presented as well for retaining the trust of users to the system; a query may be provided (¶[0043] - ¶[0044]: Figure 2: Steps 240 to 255); after the total similarity scores have been calculated between posts, a user may optionally be queried to improve reliability of detection of a link between pairs of posts; if the total similarity score is in a middle range below a threshold determined to be clearly indicative of a link between posts but above a threshold determined to be clearly indicative of no link, a user may be queried to confirm whether the posts are related (¶[0052]: Figure 3: Step 325); broadly, querying a user is “performing an action”.
Concerning independent claims 1, 9, and 17, Jiang et al. discloses all of the limitations of these claims with the exception of those directed to “a class” of “multiple possible classes” and “wherein determining the representative embedding includes selecting the representative embedding associated with each of the multiple possible classes as a class target, and wherein the class target represents a centroid embedding for an associated class of the multiple possible classes” for “identifying a class for the input utterance based on the determined similarity score”.  That is, Jiang et al. does not expressly disclose that utterances and contextual meta-data are associated with “classes” and “a centroid embedding for an associated class” so that embeddings of utterances and similarity scores are determined for “identifying a class” with “a centroid”.  
Concerning independent claims 1, 9, and 17, Choi et al. teaches classifying a category of data that determines a category with respect to at least one sentence of the data based on a determined centroid.  (Abstract; ¶[0019] and ¶[0027])  A computing device may perform sentence embedding for each category and determine a sentence vector using an average weight of word vectors.  (¶[0061]: Figure 1)  Classifying unstructured data may perform sentence embedding on a new sentence based on a similarity with a centroid for each category, and may predict a category of unstructured data based on the centroid for each category.  (¶[0062]: Figure 1)  Sentence embedding may be performed for each category by calculating an average weight of the words of a modified sentence, and a centroid may be calculated for each category using the generated sentence vector.  (¶[0085] - ¶[0086]: Figure 3)  Here, determining a sentence embedding using an average weight of word embeddings is “pooling the input utterance embeddings”.  Compare Specification, ¶[0055].  A centroid of a corresponding category from all sentences belonging to the i-th category may be determined based on the sentence embedding result, and a category may be predicted by determining a centroid that is most similar to an input sentence.  (¶[0090] - ¶[0091]: Figure 3)  Choi et al., then, teaches “wherein determining the representative embedding includes selecting the representative embedding associated with each of the multiple possible classes as a target class, and wherein the class target represents a centroid embedding for an associated class of the multiple possible classes” and “determining a similarity score for each possible class”.  An objective is to comprehensively analyze and accurately understand data with a technology that is capable of classifying data while using unsupervised learning.  (¶[0004])  It would have been obvious to one having ordinary skill in the art to determine a class for input data based on a similarity to a centroid of a representative embedding as taught by Choi et al. to generate a similarity score for a pair of messages in Jiang et al. for a purpose of comprehensively analyzing and accurately understanding data with a technology that uses unsupervised learning.

Concerning claims 5 to 6 and 13 to 14, Jiang et al. discloses that an algorithm has two parameters r and h, where r is a high threshold of similarity ranks and h is a lower threshold of similarity scores (¶[0055]); retrieved high-confidence pairs (e.g., the messages having the highest similarity scores or the top-r pairs) are treated as the edges in a message graph G at 340; any remaining message pairs that have a total similarity score greater than the threshold (h) but less than the highest similarity scores (e.g., not top-r pairs) may optionally also be added to the message graph G at 345; r and h may be set as 5 and 0.5, respective (¶[0055] - ¶[0058]: Figure 3).  Here, if a similarity of a pair of posts is less than a lowest threshold of similarity h as compared to a similarity of a pair of posts that is greater than a highest threshold of similarity r, then “the distance between the representative input utterance embedding and the second threshold boundary is greater than the distance between the representative input utterance embedding and the first threshold boundary.”  That is, distance is an inverse of similarity, so that a distance between post embeddings and a threshold h is greater than a distance between post embeddings and a threshold r implies that a new post is a new thread.  Correspondingly, if a similarity is greater than a first threshold r, then this implies the new post is part of a same thread (“the second threshold boundary is used as the selected threshold boundary when the input utterance is in context with that possible class”), and if a similarity is less than a second threshold h, then, this implies the new post is a new thread (“the first threshold boundary is used for the selected threshold boundary when the input utterance is not in context with the possible class”).  

Claims 2, 10, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Jiang et al. (U.S. Patent Publication 2019/0205743) in view of Choi et al. (U.S. Patent Publication 2021/0201143) as applied to claims 1, 9, and 17 above, and further in view of Debnath et al. (U.S. Patent Publication 2020/0058295).
Jiang et al. arguably discloses the limitation of “one of the possible classes comprises an unhandled class for the utterances that do not belong to any of the other possible classes” and “the unhandled class is associated with a specified similarity score” for posts that do not clearly belong to a class of a new thread or a same thread with earlier posts because a similarity score can be below a highest threshold r of 5 and above a lowest threshold h of 0.5.  If a similarity score is between these two thresholds, then a user may be queried.  (¶[0043] - ¶[0046] and ¶[0058]: Figures 2 to 3)  Similarly, Choi et al. teaches that at least one sentence of data may not be determined according to an identification classification system of a category using a neural network, and a computing device may update the classification of category of data based on the result of determining that the category of the at least one sentence of data may not be determined (“an unhandled class”).  (¶[0067])  Applicants’ claim language can be construed to only require “an unhandled class is associated with a specified similarity score” and not a limitation of “another score given by a negative soft maximum of the similarity scores of the other possible classes” due to the ‘or limitation’.  However, Jiang et al. and Choi et al. do not clearly disclose a class that is unhandled according to “a specified similarity score”.
Still, Debnath et al. teaches analyzing and handling of an utterance of interest that may classify a type of utterance as a partial utterance and as an incomplete sentence, and generate a recommendation for handling the utterance in a conversation between a user and a virtual agent.  (Abstract)  Specifically, a partial utterance resolution module may determine whether the class confidence score is above and/or equal to an upper limit.  A step may determine whether the class confidence score is less than or equal to 0.8 and above 0.65.  If the class confidence score is determined to be below or equal to the upper limit and above the lower limit, the partial utterance resolution module may respond with the detection advice code.  The partial utterance resolution module may determine whether the class confidence score is less than and/or equal a lower limit.  If the answer is no, then the class confidence score must be less than and/or equal a lower limit of 0.65.  If the class confidence score is determined to be below the lower limit, the partial utterance resolution module may respond with an advice code corresponding to asking the user to rephrase.  (¶[0101] - ¶[0103]: Figures 5 and 6)  Debnath et al., then, teaches “an unhandled class” that is “associated with a specified similarity score” between a lower limit of 0.65 and an upper limit of 0.8.  An objective is to provide a resolution strategy to improve actions taken by a dialogue management system and to improve a virtual agent’s ability to determine and deliver what it is the user desires.  (¶[0011])  It would have been obvious to one having ordinary skill in the art to provide an unhandled class that is associated with a specified similarity score as taught by Debnath et al. to classify a post as a new thread or a thread of a same conversation as earlier posts according to similarity in Jiang et al. for a purpose of improving actions taken by a dialogue management system to deliver what it is the user desires.

Claims 3, 7, 11, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jiang et al. (U.S. Patent Publication 2019/0205743) in view of Choi et al. (U.S. Patent Publication 2021/0201143) as applied to claims 1, 9, and 17 above, and further in view of Mei et al. (U.S. Patent Publication 2020/0218780).
Concerning claims 3, 11, and 19, Choi et al. teaches that a computing device may perform sentence embedding for each category and determine a sentence vector using an average weight of word vectors.  (¶[0061]: Figure 1)  Sentence embedding may be performed for each category by calculating an average weight of the words of a modified sentence.  (¶[0085]: Figure 3)  Here, sentence embedding that is obtained by calculating an average weight of words of a sentence can be construed as “mean pooling”.  That is, taking an average is equivalent to taking a mean, and taking an average of all of the words in the sentence ‘pools’ a contribution by averaging a weight of all of the words.  Compare Specification, ¶[0055].  However, Choi et al. does not expressly refer to this average of weights of words as “mean pooling”.  
	Concerning claims 3, 11, and 19, Mei et al. teaches automated contextual dialog generation that includes situation quantification network 330 of a quantification convolutional neural network (CNN) 332 and mean pool 334 to generate the situation quantification 302 vector (“using a mean pooling technique”).  (¶[0066]: Figure 3)  A resulting vector for each cluster is passed to a mean pool 334 that pools the vectors into a pooled array according to an average value from each cluster to complete a convolutional layer; mean pool 334 can then return the pooled array to the quantification CNN 332 for a subsequent convolution to further extract semantic features; any number of layers can be used to extract the context.  Upon performing a final convolutional layer, the mean pool 334 produces a situation quantification 302 that includes the extracted context from the context array 301.  The situation quantification 302 includes a vector that represents the semantic features, including the context, of the context array 301.  (¶[0068] - ¶[0069]: Figure 3)  Dialog RNN 442 uses a layer of LSTM units to encode the values of the sentence vector 401 corresponding to each word of input sentence into hidden values, and that hidden values can be pooled with a mean pool 444 according to the average of the hidden values.  As a result, the hidden values are pooled and the mean pool 444 generates a pooled hidden state.  (¶[0072]: Figure 4)  Figures 2 and 3 illustrate how an input sentence and a conversation history are embedded by a sentence embedder into a context array, and how a context array is subject to mean pooling to generate a situation quantification.  An objective is to optimize a response dialog in a manner that does not increase costs of interactions or decrease quality of service.  (Abstract; ¶[0003])  It would have been obvious to one having ordinary skill in the art to perform mean pooling as taught by Mei et al. to average weights of words in sentence vectors of Choi et al. for a purpose of optimizing a response dialog to generate a situation quantification.  

Concerning claims 7 and 15, Mei et al. teaches that an input sentence 201 and a conversation history 202 can be provided to a context determination unit 220 to determine context of the input sentence 201 in view of the conversation history 202; to determine the context, the input sentence 201 is embedded into a feature vector to produce the sentence vector 203 (¶[0058] - ¶[0059]: Figure 2); the conversation history 224 can be transformed into an array representing the contextual sentences with a contextual sentence embedder 226; contextual sentence embedder 226 embeds context based on the clustering of the sentences (¶[0064]: Figure 2); a resulting vector for each cluster is passed to a mean pool 334 that pools the vectors into a pooled array according to an average value from each cluster (¶[0068]: Figure 3); dialog RNN 442 uses a layer of LSTM units to encode the values of the sentence vector 401 corresponding to each word of input sentence into hidden values; hidden values can be pooled with a mean pool 444 according to the average of the hidden values; as a result, the hidden values are pooled and the mean pool 444 generates a pooled hidden state.  (¶[0072]: Figure 4)  Mei et al., then, teaches that context array 204 produced by contextual sentence embedder 226 represents conversation history 202 (“pooling the input utterance embeddings comprises using a representation of the context from the class associated with the previous utterance to further contextualize the input utterance and generate the representative input utterance embedding”).

Claims 4, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Jiang et al. (U.S. Patent Publication 2019/0205743) in view of Choi et al. (U.S. Patent Publication 2021/0201143) as applied to claims 1, 9, and 17 above, and further in view of Wang et al. (U.S. Patent Publication 2021/0375280).
Jiang et al. discloses neural networks with attention to capture important parts of posts.  (¶[0033] - ¶[0036])    However, Jiang et al. does not disclose “using a learnable attention layer and using the context token embedding representing the class associated with the previous utterance as one of a query, a key, or a value.”  However, Wang et al. teaches response selection with topic tracking that tracks how conversation topics change from one utterance to another.  (Abstract)  The topic vectors are then passed through a self-attention layer to learn topic relevance at the utterance level. (¶[0026])  An attention layer 530 is applied to the encoded representation 512 to enhance topic information.  The start-of-sequence token [CLS] representation 512b may then be used as query 522 to attend over the token representations as keys and values 525.  The attention layer 530 may be then applied using the query 522, keys and values 525 served from tokens in the encoded representation 512.  (¶[0053]: Figure 5)  Wang et al., then, teaches selecting a response by determining a topic with an attention layer that uses a token sequence as a query over token representations as keys and values.  An objective is to provide response selection in multi-party multi-turn conversations that involve more than two parties.  (¶[0003] - ¶[0004])  It would have been obvious to one having ordinary skill in the art to provide an attention layer to determine a topic associated with a query, a key, or a value as taught by Wang et al. to detangle interleaved conversations in Jiang et al. for a purpose of selecting responses in a multi-party multi-turn conversation. 

Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Jiang et al. (U.S. Patent Publication 2019/0205743) in view of Choi et al. (U.S. Patent Publication 2021/0201143) as applied to claims 1 and 9 above, and further in view of Han et al. (U.S. Patent Publication 2020/0401844).
Jiang et al. discloses a Siamese network, but does not disclose that a neural network that is trained using a multi-target loss function.  However, Han et al. teaches an adversarial multi-binary neural network for multi-class classification.  (Abstract)  Specifically, multi-class classifier 140 and multiple binary classifiers 142 may be jointly trained using a training dataset.  Joint training may be performed to minimize the total loss L.  (¶[0033]: Figure 1)  Model 200 may also be implemented without the adversarial training.  In embodiments where no adversarial training is adopted, the multi-class classifier and the multiple binary classifiers may be optimized based only on minimizing the multi-binary classification loss.  (¶[0035]: Figure 1: Equation 10)  Han et al., then, teaches training a neural network with a multi-class (“multi-target”) loss function.  An objective is to improve artificial intelligence for multi-class classification that accounts for the adverse effects of shared features.  (¶[0005] - ¶[0006])  It would have been obvious to one having ordinary skill in the art to train a Siamese neural network of Jiang et al. using a multi-class loss function as taught by Han et al. for a purpose of improving multi-class classification by accounting for the adverse effects of shared features.

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Jiang et al. (U.S. Patent Publication 2019/0205743) in view of Choi et al. (U.S. Patent Publication 2021/0201143) as applied to claim 1 above, and further in view of Balduino et al. (U.S. Patent Publication 2019/0362021).
Jiang et al. discloses “the first threshold boundary” and “the second threshold boundary” according to a similarity ranking corresponding to a high threshold of similarity r that is determined to be clearly indicative of a link between posts and a low threshold of similarity h that is determined to be clearly indicative of no link.  (¶[0052] - ¶[0055]: Figures 3A to 3B)  Choi et al. teaches class targets represented by a centroid for each class.  However, Jiang et al. and Choi et al. do not disclose or teach that a first threshold boundary corresponds to “a first distance from the class target” and a second threshold boundary corresponds to “a second distance from the class target”.  Still, it is known in the art that a similarity is inversely related to a distance, so that a similarity to a threshold boundary is inversely related to a distance to a threshold boundary, i.e., a high similarity corresponds to a low distance and a low similarity corresponds to a high distance.  Specifically, Balduino et al. teaches discovery of salient topics during customer interactions by receiving textual content and a plurality of available topics, determining a distance between coadjacent snippets of the plurality of snippets in the textual content, and determining an update to the plurality of snippets by merging each of the pairs of coadjacent snippets having a respective distance less than a second threshold.  (Abstract)  Content is divided into snippets 401, snippets are converted into vectors 402, distances between the vectors are computed for coadjacent snippets 403 to identify transitions, i.e., boundaries between topics, in the content using a threshold for a change in distances between coadjacent snippets, and coadjacent snippets are merged if they have a distance less than a threshold 405.  A cosine distance between the vectors represents a similarity.  A cosine similarity of contiguous snippets is analyzed to find any sudden changes in similarity that confirm topic boundaries 404 for a cosine similarity greater than a threshold 405.  (¶[0060] - ¶[0063]: Figure 4)  Balduino et al., then, teaches that a threshold similarity corresponds to a threshold distance so that a similarity to a first threshold r and a similarity to a second threshold h are “a first distance” and “a second distance” in of Jiang et al.  An objective is to automatically identify and learn topics of interactions using optimal utterance segments.  (¶[0055]: Figure 3)  It would have been obvious to one having ordinary skill in the art to determine a similarity with first and second thresholds in Jiang et al. corresponding to “a first threshold boundary at a first distance” and “a second threshold boundary at a second distance” as taught by Balduino et al. for a purpose of automatically identifying and learning topics of interactions using optimal utterance segments.  




Response to Arguments
Applicants’ arguments filed 18 November 2025 have been considered but are moot in view of new grounds of rejection, necessitated by amendment.
Applicants amend independent claims 1, 9, and 17 to set forth new limitations of “wherein the class target represents a centroid embedding for an associated class of the multiple possible classes.”  Then Applicants present arguments directed against the prior rejection of the independent claims as being obvious under 35 U.S.C. §103 over Jiang et al. (U.S. Patent Publication 2019/0205743) in view of Wang et al. (U.S. Patent Publication 2023/0089308).  Generally, Applicants argue that Jiang et al. does not disclose the limitations of determining a representative embedding associated with each of multiple possible classes, where (i) determining the representative embedding includes selecting the representative embedding associated with each of multiple possible classes as a class target, and (ii) the class target represents a centroid embedding for an associated class of the multiple possible classes.  Applicants argue that Wang et al. does not disclose these limitations, but merely provides speaker labels that predict which speaker spoke each segment.  
Mainly, Applicants’ arguments are moot in light of new grounds of rejection for independent claims 1, 9, and 17 as being obvious under 35 U.S.C. §103 over Jiang et al. (U.S. Patent Publication 2019/0205743) in view of Choi et al. (U.S. Patent Publication 2021/0201143).  Applicants arguments are not completely persuasive that Wang et al. only labels speaker segments and does not determine classes of an utterance for multiple possible classes.  Here, Wang et al. states repeatedly that classes correspond to speakers so that labeling a segment with a speaker is assigning a class to the segment.  Nevertheless, it is agreed that a class centroid is not taught by Wang et al., and this necessitates new grounds of rejection with Choi et al.  The rejection of certain dependent claims continues to rely upon Debnath et al. (U.S. Patent Publication 2020/0058295), Mei et al. (U.S. Patent Publication 2020/0218780), Wang et al. (U.S. Patent Publication 2021/0375280), Han et al. (U.S. Patent Publication 2020/0401844), and Balduino et al. (U.S. Patent Publication 2019/0362021).  The rejection no longer relies upon Wang et al. (U.S. Patent Publication 2023/0089308).
Choi et al. teaches classifying a category of text data using a class centroid.  A class centroid is generated from sentence embeddings that average weights of embeddings of word vectors.  After these class centroids are constructed, a sentence embedding is performed on a new sentence to predict a category based on a similarity with a centroid for each category.  Choi et al., then, teaches the limitations of “determining a representative embedding associated with each of multiple possible classes”, “wherein determining the representative embedding includes selecting the representative embedding associated with each of the multiple possible classes as a class target, and wherein the class target represents a centroid embedding for an associated class of the multiple possible classes”.  That is, a class target is a class to which a new sentence is assigned based on a similarity.
Jiang et al. is directed to determining if a new message is similar to prior messages so as to be ‘classified’ into a same message thread or a new message thread.  Jiang et al. performs embeddings of messages to determine a similarity score between pairs of messages using two thresholds, a first threshold that determines if a message is clearly linked to a prior post and a second threshold that determines if a message is clearly not linked to a prior art.  See ¶[0052] of Jiang et al.  Here, Applicants’ claim limitations set forth “(i) a first threshold boundary that encompasses embeddings of utterances that specify that possible class” and “(ii) a second threshold boundary that encompasses embeddings of continuation utterances with context”.  Applicants’ Specification does not clearly describe which utterances specify a possible class and which utterances encompass continuation utterances.  However, it appears reasonable to construe (i) as a message that is not linked to a prior message but that represents a new thread and (ii) as a message that is linked to a prior message and so is a continuation of a prior thread.  
Additionally, Jiang et al. is maintained to disclose the limitations of “pooling the input utterance embeddings with a context token embedding . . . to generate a representative input utterance embedding” and “embeddings of continuation utterances with context”.  Features are extracted based on both textual and contextual data representing a post.  (¶[0036] and ¶[0061])  Jiang et al., at ¶[0049], states that an embedding may be generated to represent each word of a post, and that content features may include a concatenation of embeddings to identify both local features and global features.  Jiang et al.’s embeddings of messages, then, includes concatenation (‘pooling’) of textual and contextual information about a post.  This embedded context includes information on, e.g., a distance between two posts or authors of the two posts, to help decide if the two posts are sufficiently similar so as to be a continuation of a prior message thread or not sufficiently similar so as to represent a new thread with a new topic.  
One skilled in the art could understand that the teachings of Jiang et al. and Choi et al. could be combined in accordance with a standard set forth by KSR International Co. v. Teleflex Inc. (KSR), 550 U.S. 398, 82 USPQ2d 1385 (2007).  Jiang et al. provides a way to determine if two messages (“utterances”) are sufficiently similar so that they belong to the same “continuation utterances” or are sufficiently different so that they belong to a new class using threshold similarities of embeddings.  Choi et al. provides a way to classify sentences from a similarity of sentence embeddings and averages their word embeddings of centroids.  A combination would be consistent with Rationale (D) Applying a known technique to a known device (method, or product) ready for improvement to yield predictable results of KSR International.  Here, Jiang et al. represents a known device ready for improvement that determines if a new message is more similar to a prior message thread or is a new message thread, and this can be improved by determining a subject classification of a new message as taught by Choi et al.  It would be a predictable result to improve a determination of a relationship between two messages in Jiang et al. with a classification of a message topic as taught by Choi et al.  That is, a classification of a message topic provides contextual information that assists in determining if the two messages belong in the same thread.  
Applicants’ amendments necessitate new grounds of rejection.  This Office Action is NON-FINAL. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Erdenee et al. and Wanas et al. disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2658                                                                                                                                                                                                        February 5, 2026
Read full office action
Prosecution Timeline

Show 9 earlier events
Nov 18, 2025
Response after Non-Final Action
Dec 22, 2025
Request for Continued Examination
Jan 16, 2026
Response after Non-Final Action
Feb 09, 2026
Non-Final Rejection mailed — §103
Apr 07, 2026
Interview Requested
Apr 23, 2026
Applicant Interview (Telephonic)
Apr 23, 2026
Examiner Interview Summary
May 08, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/272,516
Patent 12632656
TEXT GENERATION INCLUDING DE-DUPLICATION OF DECODED WORD INFORMATION TO SPLICE TARGET WORD INFORMATION INTO AN INFORMATION SEQUENCE
2y 10m to grant Granted May 19, 2026
17/770,177
Patent 12620404
DEEP SOURCE SEPARATION ARCHITECTURE
4y 0m to grant Granted May 05, 2026
18/365,535
Patent 12596880
DETERMINING CAUSALITY BETWEEN FACTORS FOR TARGET OBJECT BY ANALYZING TEXT
2y 8m to grant Granted Apr 07, 2026
17/882,447
Patent 12586592
METHODS AND APPARATUS FOR GENERATING AUDIO FINGERPRINTS FOR CALLS USING POWER SPECTRAL DENSITY VALUES
3y 7m to grant Granted Mar 24, 2026
18/336,831
Patent 12585680
CONTEXTUAL TITLES BASED ON TEMPORAL PROXIMITY AND SHARED TOPICS OF RELATED COMMUNICATION ITEMS WITH SENSITIVITY POLICY
2y 9m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
91%
With Interview (+13.3%)
2y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 988 resolved cases by this examiner. Grant probability derived from career allowance rate.