Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/14/2025 2/2/2026 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Status of the Claims
Claims 1-4, 7-22 are currently pending. Claims 21-22 are new and claims 5-6 are canceled.
Response to Arguments
Applicant’s arguments with respect to rejections made under 101 have been fully considered but are not persuasive. Applicant argues that the claims are not abstract but “ rather a large language model (LLM) having a plurality of attention heads that simultaneously identify both a mood and an item of interest from live interaction content”. The attention heads are configured for simultaneous dual feature extraction during the execution of the LLM, and cannot be performed by a human mind.” (Remarks 8-9)
Examiner respectfully disagrees and notes that the claim recites steps including (receiving interaction content, identifying a mood and an item of interest from the content, annotating a transcript with identifiers corresponding to the detected information , and generation a response based on the annotated transcript)- are operations correspond to analyzing and organizing information and generating a responds based on the analysis which falls within the category of certain methods of organizing human activities and mental processes. The large language model merely amounts to “ apply it” as explained below. Therefore, merely performing an abstract idea using a ML does not make the claim non-abstract. The claim recites identifying contextual attributes of conversation data and generating a response based on those attributes. Such analysis and decision-making processes correspond to mental processes that can be performed conceptually by a human. Implementing these steps using a machine learning model or attention-based architecture does not remove the claim from the abstract idea category.
Applicant further argues the claim improves the operation of natural language processing system by transforming raw communication content into an enriched, machine-readable structure. (Remarks 9-10)
Examiner respectfully disagrees and notes that the claim did not recite any specific technological improvement to the functioning of the computer or the machine learning model itself. The claim merely uses conventional computer component and the language model is used as a tool to perform data analysis tasks ( i.e., extracting sentiment information, annotating transcripts and generating responses). The alleged improvement resides in the abstract idea itself (improved information analysis) rather than in a technological improvement to computer functionality. At step one, the inquiry "asks whether the focus of the claims is on the specific asserted improvement in computer capabilities (i.e., the self-referential table for a computer database) or, instead, on a process that qualifies as an 'abstract idea' for which computers are invoked merely as a tool." Enfish, LLC v. Microsoft Corp., 822 F.3d 1327, 1335-36 (Fed. Cir. 2016).
Applicant further argues the claim is directed to practical, technological solution because it “ includes multiple LLMs, temporal annotations, and an LLM with a feature-specific attention architecture which is used to enhance the performance of an interactive communication system” (Remarks 10-11)
Examiner notes that the additional elements as identified below in the OA merely invoke generic components as a tool to perform the abstract idea or “apply it”. The LLM is used to analyze conversational data and produce a response. "As many cases make clear, even if a process of collecting and analyzing information is limited to particular content or a particular source, that limitation does not make the collection and analysis other than abstract." SAP America, Inc. v. InvestPic, LLC, 890 F.3d 1016, 1022 (Fed. Cir. 2018) (citation and quotation marks omitted). "[T]he only thing the claims disclose about the use of machine learning is that machine learning is used in a new environment." Recentive Analytics, Inc. v. Fox. Corp., Fed Cir. No. 2023-2437 (Apr. 18, 2025), slip op. at 13. Accordingly, the rejection is maintained.
Applicant arguments with respect to rejections made under 103 have been fully considered but are moot in view of new grounds of rejection.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 7 and 14 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claims 7 and 14 recite “an AI agent performs an action based on the response”. The specification does not mention the step of performing an action based on the response, nor does it explain how such action would be performed ( see [0041], [0052], [0048], [0085]-[0090] as cited in the Remarks). Instead, the specification only mentions the response as an output displayed to user. See ([0091] the LLM framework 440 may generate an instruction, a question, a query, a product offer, or the like, which can be displayed on the user interface 422 of the service provider device 420. As another example, the LLM framework 440 may generate a product offer, verification question, or the like, which can be displayed on the user interface 412 of the source device 410). As such, the written description fails to reasonably convey to one skilled in the art, that the Applicant had possession of the invention at the time the application was filed. For the purpose of examination, the action would be interpreted as an output response.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claim 15, and dependent claims 16-20, are directed towards a “computer-readable storage medium.” The computer readable storage medium is described in exemplary terms in paragraphs 0210 and 0211. The computer-readable storage medium description does not exclude transitory media. Therefore, the broadest reasonable interpretation of computer-readable storage media includes signals and carrier waves. Because the computer readable medium covers transitory propagating signals per se in view of the specification, the claims are rejected under 35 USC 101 as covering nonstatutory subject matter. See MPEP 2106.03. This appears to be a typographical error. For the purpose of compact prosecution, examiner will interpret the limitation as a non-transitory computer readable storage medium.
Claims 1-4, 7-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1
Claims 1-4, 7, 21-22 are directed to an apparatus (i.e., device), Claims 8-14 are directed to a method (i.e., a process), Claims 15-20 are directed to a computer-readable storage medium (i.e., device), therefore, claims 1-4, 7-22 all fall within the one of the four statutory categories of invention.
Step 2A, Prong One
Independent claims 1, 8 and 15 substantially recite: receive interaction content from a communication session between a user and a service provider, identify a mood and an item of interest from the interaction content; annotate a transcript of the interaction content with an identifier of the mood and an identifier of the item of interest at a point in time at which they occurred within the transcript; generate a response to the interaction content based on the mood and the item of interest; response to at least one of the user and the service provider during the communication session.
The limitations stated above are processes/ functions that under broadest reasonable interpretation covers “certain methods of organizing human activity” (managing personal behavior or relationships or interactions between people and commercial or legal interactions and following rules or instructions) because the claims recite collecting, analyzing and outputting the result to a user. The claim are falls with the category of mental processes. Therefore, the claims recite an abstract idea.
Step 2A, Prong Two
The judicial exception is not integrated into a practical application. Claims 1, 8 and 15 as a whole amounts to: (i) merely invoking generic components as a tool to perform the abstract idea or “apply it” (or an equivalent).
Independent claims 1, 8 and 15 recite the additional elements: apparatus, a memory, a processor coupled to the memory, a source device, service provider device execute a large language model (LLM), LLM comprises a plurality of attention heads, simultaneously, outputting, A computer-readable storage medium comprising instructions stored therein which when executed by a processor cause the processor to perform steps, these are recited at a high-level of generality in the specification. (See specification: [0044] an LLM may be a machine learning model. As another example, an LLM may be an artificial intelligence (AI) model such as a “generative” Al model. As another example, the LLM may be a multimodal large language model. As another example, the LLM may be a transformer neural network (“transformer”) [0047] the source device 110 may refer to a mobile device, smartphone, desktop computer, laptop, tablet, smartwearable device, and the like. The service provider device 130 may correspond to a contact center or other third-party device and may include audio, video, and text capabilities. As an example, the service provider device 130 may be a mobile device, a computer, a tablet, a Voice over Internet Protocol (VoIP) phone, and/or the like. As another example, the service provider device 130 may correspond to a server, software application, or the like, which provides a chatbot functionality that is able to generate chat communications and send the chat communications to the source device 110 via the software application 121. The source device 110 and the service provider device 130 may connect to the host platform 120 over a computer network such as the Internet, a private network, a combination thereof, and the like.[0048-49], [0210-222] computer program, memory), such that, when viewed as whole/ordered combination ( as shown in Fig.1), it amounts to no more than mere instruction to apply the judicial exception using generic computer components or “apply it” (See MPEP 2106.05(f)).
Step 2B
As discussed above with respect to Step 2A Prong Two, the additional elements amount to no more than: (i) “apply it” (or an equivalent), does not integrate the abstract idea into a practical application at Step 2A or provide an inventive concept at Step 2B.
Therefore, the additional elements of: (i) apparatus, a memory, a processor coupled to the memory, a source device, service provider device execute a large language model (LLM), LLM comprises a plurality of attention heads, simultaneously, outputting, A computer-readable storage medium comprising instructions stored therein which when executed by a processor cause the processor to perform steps, do not integrate the abstract idea into a practical application at Step 2A or provide an inventive concept at Step 2B. Thus, even when viewed as a whole/ordered combination ( as shown in Fig.1), nothing in the claims adds significantly more (i.e., an inventive concept) to the abstract idea. Thus, the claim is ineligible.
Dependent Claims Step 2A:
The limitations of the dependent claims but for those addressed below merely set forth further refinements of the abstract idea without changing the analysis already presented. Additionally, for the same reasons as above, the limitations fail to integrate the abstract idea into a practical application because they use the same general technological environment and instructions to implement the abstract idea (e.g., using computers to communicate data). Thus, the claims are ineligible.
Dependent Claims Step 2B:
The dependent claims merely use the same general technological environment and instructions to implement the abstract idea. Accordingly, the claims are not directed to significantly more than the exception itself. Therefore, the dependent claims are not eligible subject matter under § 101.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1, 7-8, 12, 14-15,18, 20 and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Somech (US 20190005024 A1) in view of Balashov (US2025/0039334A1)
As per claim 1, Somech teaches:
An apparatus comprising: a memory; and a processor coupled to the memory, the processor configured to: ( see Fig.1)
receive interaction content from a communication session between a source device and a service provider device; ( see at least: FIG. 4 depicts a flow diagram of a method for determining a relevance of content of the communication session [0144] At step 402, the content of the CS is received. [0033] the terms “communication session” and “CS” may be used interchangeably to broadly refer to any session where two or more computing devices are employed to exchange information and/or data between two or more users)
execute a large language model (LLM) on the interaction content, ( see at least: [0020] actively monitor and analyze the content and context of the one or more CSs, via various natural language processing (NLP) and other machine-learning (ML) methodologies.)
Model which is configured to identify a mood and an item of interest from the interaction content; ( see at least: Fig.2 #270 model learners, Fig.4 [0020] [0026] the user's activity is monitored and user-activity patterns are inferred and/or learned. User-activity information, such as the inferred user-activity patterns, are employed to generate various user-interest data models. [0044] Content-style features may additionally encode one or more emotions of the speaker, e.g., anger, surprise, satisfaction, happiness, and other emotions. Content-style features may indicate the intonation, pitch, speed, and volume of the speaker, as well as changes in these features, such as speeding up, slowing down, or changing volume [0101])
Somech does not explicitly teach LLM comprises a plurality of attention heads, simultaneously, however, this is taught by Balashov ( see at least: [0017] The conversational AI may be, for example, a transformer-based large language model (LLM). [0080] Some LLMs may include a transformer neural network. The transformer neural network may include one or more layers that include a self-attention mechanism in addition to one or more feed-forward neural networks. The self-attention mechanism, sometimes referred to as multi-head attention, gives the model the ability to “focus” on different parts of the input sequence, providing a specific context for the interpretation of each word.)
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the LLM and attention heads feature for the same reasons its useful in Balashov -namely, gives the model the ability to “focus” on different parts of the input sequence, providing a specific context for the interpretation of each word ( par.80). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
Somech further teaches identifier of the mood and an identifier of the item of interest at a point in time at which they occurred ( see at least: [0043-44] Determined content features for a content line including topics, keywords, sentiment, emotion, speaker, and addressee [0094] Contextual features may additionally indicate initiating and terminating time stamps of the CS, the temporal duration of the CS, and an indication of other past or current CSs that the associated users have or are currently participating in [ timestamp communication sessions] )
Somech does not explicitly teach, but Balashov teaches annotating a transcript of the interaction content at a point in time at which they occurred within the transcript using at least one additional LLM; generate a response based on execution of the at least one additional LLM on the annotated transcript; ( see at least: [0101] Data structure generation 460 component may use a memory device to generate transcripts of collections of prompts and responses, sometimes referred to as conversations. For example, the data structure generation 460 component may receive instructions to generate a transcript of a conversation or chat dialogue reflecting a team's collaboration from a previous date. The generated transcript may be annotated with information including information about the client device from which each prompt was received, information about the participant who submitted the prompt, and other information relevant to receipt of the transcript by the conversational AI 330. The transcript can then be sent to the conversational AI 330 to provide context to response to prompts during a new conversation.[0100] The data structure may be populated by querying the memory device and adding the requested information to the structure ordered according to timestamp, including prompt/response text, formatting, associated profile information, and so on).
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the transcript feature for the same reasons its useful in Balashov -namely, to provide a basis for a new conversation for a second collaborative team ( par.102). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
Somech further teaches generate a response ( see at least: Fig. 2 # 260 CS summary, [0111] CS summary engine 260 is generally responsible for generating the summary of the CS and providing the summary to the user. [0143] the CS summary may be generated based on at least the identified portions of the content that are likely relevant to the user. [0022] real time notification [ during the CS session] [0041]).
and output the response to at least one of the source device and the service provider device during the communication session. ( see at least: Fig.4 [0111] CS summary engine 260 is generally responsible for generating the summary of the CS and providing the summary to the user [0041])
As per claim 7, Somech in view of Balashov teaches claim 1 as above. Somech further teaches:
receive feedback about the response from at least one of the source device and the service provider device, and retrain the model based on a combination of the response and the feedback about the response ( see at least: [0010] the user may provide feedback to the system regarding the accuracy and/or utility of the provided summary [response]. This feedback may be employed by the various embodiments to further train and/or update [retrain] the various models, such as the content-relevance model and/or the user-interest model. For instance, in an embodiment, the user may be provided a user interface (UI) that enables or facilitates the user to provide feedback on the summary [0139] [0041]).
wherein an AI agent performs an action based on the response ( see at least:[0041] The content-style and response-generation models may be trained to emulate and/or simulate the user's stylistic choices of employing emojis, emoticons, and other symbolic content.[0030] conventional VAs and chat-bots may automatically generate responses to questions posed by users of a CS. That is, conventional VAs and chat-bots are attempts at enabling an agent to pass a rudimentary form of a “Turing Test”)
Somech does not explicitly teach LLM, however, this is taught by Balashov ( see at least: [0017] The conversational AI may be, for example, a transformer-based large language model (LLM). [0080] Some LLMs may include a transformer neural network. The transformer neural network may include one or more layers that include a self-attention mechanism in addition to one or more feed-forward neural networks. The self-attention mechanism, sometimes referred to as multi-head attention, gives the model the ability to “focus” on different parts of the input sequence, providing a specific context for the interpretation of each word.)
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the LLM and attention heads feature for the same reasons its useful in Balashov -namely, gives the model the ability to “focus” on different parts of the input sequence, providing a specific context for the interpretation of each word ( par.80). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
As per claim 12, Somech in view of Balashov teaches claim 8 as above. Somech further teaches:
receive previous interaction content from at least one previous communication sessions between the source device and the service provider device, ( see at least: [0057] the one or more user devices are monitored for user activity, including the generation of content within one or more CSs that the user is currently or has previously participated in. [previous communication sessions] [0087] CS monitor and analyzer 290 is generally responsible for monitoring and analyzing the content and context of the one or more CSs that the user is currently participating in, or has previously participated in. As such, CS monitor and analyzer 290 may receive various data associated with each CS that the user is or has participated in. The various received data may include at least the content and metadata associated with the CS. )
aggregate the previous interaction content with the interaction content to generate aggregated interaction content, ( see at least: [0087] CS monitor and analyzer 290 is generally responsible for monitoring and analyzing the content and context of the one or more CSs that the user is currently participating in, or has previously participated in. As such, CS monitor and analyzer 290 may receive various data associated with each CS that the user is or has participated in. The various received data may include at least the content and metadata associated with the CS.)
As per claim 22, Somech in view of Balashov teaches claim 1 as above. Somech further teaches:
insert labels that identify the mood and the item of interest within a conversation at a temporal location of the conversation corresponding to the point in time ( see at least: [0043-44] [mood detection and item of interest] [0094] Contextual features may additionally indicate initiating and terminating time stamps of the CS, the temporal duration of the CS, and an indication of other past or current CSs that the associated users have or are currently participating in [ timestamp communication sessions] )
Somech does not explicitly teach, but Balashov teaches transcript at a temporal location of the conversation corresponding to the point in time; ( see at least: [0101] Data structure generation 460 component may use a memory device to generate transcripts of collections of prompts and responses, sometimes referred to as conversations. For example, the data structure generation 460 component may receive instructions to generate a transcript of a conversation or chat dialogue reflecting a team's collaboration from a previous date. The generated transcript may be annotated with information including information about the client device from which each prompt was received, information about the participant who submitted the prompt, and other information relevant to receipt of the transcript by the conversational AI 330. The transcript can then be sent to the conversational AI 330 to provide context to response to prompts during a new conversation.[0100] The data structure may be populated by querying the memory device and adding the requested information to the structure ordered according to timestamp, including prompt/response text, formatting, associated profile information, and so on).
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the transcript feature for the same reasons its useful in Balashov -namely, to provide a basis for a new conversation for a second collaborative team ( par.102). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
Claims 8, 14-15, 18 and 20 recite similar limitations as claims 1, 12, 7 , therefore they are rejected over the same rationales.
Claim(s) 2-4, 9-11, and 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Somech (US 20190005024 A1) in view of Balashov (US2025/0039334A1) in further view of Cao ( see PTO-892 U)
As per claim 2, Somech in view of Balashov teaches claim 1 as above. Somech further teaches:
the mood ( see at least: [0093] Content-style features may additionally encode one or more emotions of the speaker, e.g., anger, surprise, satisfaction, happiness, and other emotions)
Somech does not explicitly teach attention head associated with the input and to mask content unrelated to the input based on the attention head. However, this is taught by Cao ( see Fig. 1, page 5009, we bring insights into the relation between attentions and content selection via masking operations to further improve summarization performance. Masking Operation. We propose attention head masking in encoder-decoder attentions, which blocks attentions to unimportant tokens, to better concentrate multi-head attentions on salient input tokens. Importantly, it is activated during inference. Concretely, we add an ˜m inside the softmax operator of Eq. 1, with implementation displayed in Fig. 1. The size of ˜m is the same as the input length. If the i-th token is tagged as salient, the corresponding element in ˜m is set to 0 (attendable to the attention heads), and −∞ otherwise (hidden from these heads). The saliency labels can be predicted by an externally trained content selector)
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the masking feature for the same reasons its useful in Cao -namely, to better concentrate multi-head attentions on salient input tokens and further improve summarization performance ( page 5009). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
As per claim 3, Somech in view of Balashov teaches claim 1 as above. Somech further teaches:
the item of interest ( see at least: [0101] a user-interest model quantifies the user's interest in one or more topics, which may be encoded in a content-substance feature of the content)
Somech does not explicitly teach attention head associated with the input and to mask content unrelated to the input based on the attention head. However, this is taught by Cao ( see Fig. 1, page 5009, we bring insights into the relation between attentions and content selection via masking operations to further improve summarization performance. Masking Operation. We propose attention head masking in encoder-decoder attentions, which blocks attentions to unimportant tokens, to better concentrate multi-head attentions on salient input tokens. Importantly, it is activated during inference. Concretely, we add an ˜m inside the softmax operator of Eq. 1, with implementation displayed in Fig. 1. The size of ˜m is the same as the input length. If the i-th token is tagged as salient, the corresponding element in ˜m is set to 0 (attendable to the attention heads), and −∞ otherwise (hidden from these heads). The saliency labels can be predicted by an externally trained content selector)
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the masking feature for the same reasons its useful in Cao -namely, to better concentrate multi-head attentions on salient input tokens and further improve summarization performance ( page 5009). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
As per claim 4, Somech in view of Balashov teaches claim 1 as above. Somech further teaches:
a tone of the interaction content, ( see at least: [0044] Content-style features may indicate the intonation, pitch, speed, and volume of the speaker, as well as changes in these features, such as speeding up, slowing down, or changing volume.
Somech does not explicitly teach attention head associated with the input and to mask content included in the interaction content which is unrelated to the input based on the attention head. However, this is taught by Cao ( see Fig. 1, page 5009, we bring insights into the relation between attentions and content selection via masking operations to further improve summarization performance. Masking Operation. We propose attention head masking in encoder-decoder attentions, which blocks attentions to unimportant tokens, to better concentrate multi-head attentions on salient input tokens. Importantly, it is activated during inference. Concretely, we add an ˜m inside the softmax operator of Eq. 1, with implementation displayed in Fig. 1. The size of ˜m is the same as the input length. If the i-th token is tagged as salient, the corresponding element in ˜m is set to 0 (attendable to the attention heads), and −∞ otherwise (hidden from these heads). The saliency labels can be predicted by an externally trained content selector)
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the masking feature for the same reasons its useful in Cao -namely, to better concentrate multi-head attentions on salient input tokens and further improve summarization performance ( page 5009). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
Claims 9-11 and 16-17 recite similar limitations as claims 2-4, therefore they are rejected over the same rationales.
Claim(s) 13, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over by Somech (US 20190005024 A1) in view of Balashov (US2025/0039334A1) in further view of Kataria (US20190164170A1).
As per claim 13, Somech tin view of Balashov teaches claim 12 as above. Somech further teaches:
identify an aggregated mood based on execution of the model on the aggregated interaction content, generate the response based on the aggregated mood ( see at least: [0041] Various responses may be generated to represent likely stylistic choices of the user, such as emotions, sentiments [0043] A content-substance feature may also indicate a sentiment of the speaker. [0092] the content-substance features may indicate topics being conversed about, as well as the intentioned meaning and the sentiments of the conversation. [0029] notification [0041])
Somech does not explicitly teach LLM with the plurality of attention heads, however, this is taught by Balashov ( see at least: [0017] The conversational AI may be, for example, a transformer-based large language model (LLM). [0080] Some LLMs may include a transformer neural network. The transformer neural network may include one or more layers that include a self-attention mechanism in addition to one or more feed-forward neural networks. The self-attention mechanism, sometimes referred to as multi-head attention, gives the model the ability to “focus” on different parts of the input sequence, providing a specific context for the interpretation of each word.)
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the LLM and attention heads feature for the same reasons its useful in Balashov -namely, gives the model the ability to “focus” on different parts of the input sequence, providing a specific context for the interpretation of each word ( par.80). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
Somech does not explicitly identify an aggregated mood over time with respect to the item of interest based on execution of the model on the aggregated interaction content and generate the response based on the aggregated mood over time with respect to the item of interest. However, this is taught by Kataria ( see at least: [0020] Rather than ignoring or discarding previous interaction information or attributes of an individual, the system as described herein uses that information to generate a sentiment profile for the individual [ over time] [0001] Sentiment analysis may determine the attitude of an individual with respect to another object or user. if the sentiment analysis tool identifies an interaction as negative, the individual may contact the other individual to determine what part of the interaction caused the negative interaction and attempt to correct the identified problem. [0022])
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the mood over time feature for the same reasons its useful in Kataria -namely, generates a more accurate sentiment intent, thereby resulting in fewer false positives or false negatives ( par.20). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
Claim 19 recites similar limitations as claim 13, therefore its rejected over the same rationales.
Claim(s) 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Somech (US 20190005024 A1) in view of Balashov (US2025/0039334A1) in further view of Lee ( US 20240414017 A1)
As per claim 21, Somech tin view of Balashov teaches claim 1 as above. Somech further teaches:
the identifier of the mood and the identifier of the item of interest and generating a response ( see at least: [0041] response, [0043-44] [mood detection and item of interest])
Somech does not explicitly teach convert the transcript into a vector representation after annotating the transcript and generate the response based on the vector representation. However, this is taught by Lee ( see at least: [0018] the AI system may convert the query into a query embedding, which is an embedding of the query. The AI system may also convert audio of the videoconferencing meeting into a text transcript, and then convert the text transcript into a transcript embedding, which is an embedding of the transcript. Having generated the query embedding and the transcript embedding, the AI system can compare the query embedding to various parts of the transcript embedding (e.g., using approximate nearest neighbor (ANN) techniques) to identify the most relevant portion or portions of the conversation to the query [0020] AI system can provide the query and the relevant portions of the conversation as input to the selected machine-learning model. In response, the machine-learning model can generate an output based on the query and the relevant portions of the conversation [0072-74] embedding service 314 can generate an embedding of the transcript 312, known hereinafter as a transcript embedding 316. The transcript embedding 316 is a numerical representation (e.g., a vector representation) of the transcript 312. Since the transcript 312 is in text form, it may not be easily searchable, storable, or usable with certain models. Generating an embedding representation of the transcript 312 can help resolve those issues.)
It would have been obvious for one ordinary skilled in the art before the effective filing date of present invention to combine the converting of the transcript feature for the same reasons its useful in Lee -namely, to make the transcript searchable, storable, or usable with certain models ( par.72). Moreover, this is merely a combination of old elements in the art. In the combination, no element would serve a purpose other than it already did independently, and one skilled in the art would have recognized that the combination could have been implemented through routine engineering producing predictable results.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MANAL A. ALSAMIRI whose telephone number is (571)272-5598. The examiner can normally be reached M-F: 9:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Shannon Campbell can be reached at 571)272-5587. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MANAL A. ALSAMIRI/Examiner, Art Unit 3628