Last updated: April 19, 2026
Application No. 17/357,585
NEURAL NETWORK MEMORY FOR AUDIO

Non-Final OA §103
Filed
Jun 24, 2021
Examiner
SHALU, ZELALEM W
Art Unit
2145
Tech Center
2100 — Computer Architecture & Software
Assignee
Amazon Technologies, Inc.
OA Round
3 (Non-Final)
Interview Optional

— +19.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 108 resolved cases, 2023–2026
Examiner Intelligence

SHALU, ZELALEM W View full profile →
Grants only 29% of cases
Career Allow Rate
31 granted / 108 resolved
-26.3% vs TC avg
Strong +19% interview lift
Without
With
+19.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
34 currently pending
Career history
142
Total Applications
across all art units
Statute-Specific Performance

§101
14.3%
-25.7% vs TC avg
§103
63.4%
+23.4% vs TC avg
§102
8.1%
-31.9% vs TC avg
§112
10.8%
-29.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 108 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is in response to the amendment filed on 11/06/2025. Claims 1-20 are pending in the case. 

Applicant Response
In Applicant’s response dated 11/06/2025, Applicant amended Claims 1, 4, 7-8, 10-11, 15 and 18-19 and argued against all objections and rejections previously set forth in the Office Action dated 08/11/2025.


Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/06/2025 has been entered.


Examiner Comments
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Shinn (US 20190065625 A1: Pub. Date: 2015-12-17) in view of Zhao (US 20150364128 A1: Pub. Date: 2015-12-17) in further view of Rao ( NPL: Title: COMPRESSIVE TRANSFORMERS FOR LONG-RANGE SEQUENCE MODELLING, 13 – Nov- 2019)

Regarding independent Claim 1, 
Shinn teaches a computer-implemented method comprising: 
receiving a sentence input request (see Shinn: Fig.4, [0079], “At 401, a sentence input is received. In some embodiments, the sentence input is a sentenced processed using a natural language understanding module.”) 
receiving a query having text and previous contextual information (see Shinn: Fig.11, [0080], “a determination is made whether the sentence input is a query. For example, the input sentence received at 401 is evaluated to determine whether the input is a query such as a question to the autonomous robotic system.” See also Fig.11, [0200] “process of the supporting knowledge retriever sub-function of the reasoner module. The sub-function first checks the input's performative in order to determine whether it is a query. If the input is a query, the sub-function determines which previous data (previously spoken data sentences) are related to the given input. This is accomplished via a series of filters that filter the whole set of previous data based on various criteria.”, i.e. Shinn teaches receiving text and retrieving previous contextual information)
querying a short-term memory storing fine-grained information for recent text of the document (see Shinn: Fig.19, [0242], “short-term memory 1905 is used for information that must be retrieved quickly. In various embodiments, short-term memory 1905 is used for information that should be retrieved more quickly such as information based on recent experiences. For example, short-term memory 1905 may be used to store information related to recent events”) and receiving a first value in response (see Shinn: Fig.4, [0083], “relevant data nodes are identified. For example, data nodes of a knowledge store, such as a memory graph data structure, are identified and made candidates for retrieval.”, i.e. recent action)
querying an episodic long-term memory storing, in a memory graph data structure, information discarded from the short-term memory (see Shinn: Fig.1, [0036], “a long-term AI memory graph data structure includes data that was previously stored in the short-term artificial intelligence memory graph data structure. For example, the identified components of the natural language input are stored in both the short-term and long-term AI memory graph structures and may be removed from the short-term memory when it becomes less relevant.”) and receiving a second value in response (see Shinn: Fig.4, [0083], “relevant data nodes are identified. For example, data nodes of a knowledge store, such as a memory graph data structure, are identified and made candidates for retrieval.” i.e. Shinn teaches transfer of data form short term to long term memory)). See also Fig.5A-5E  illustrating an example of a memory graph data structure that stores information )
querying an episodic long-term memory storing information discarded from the short-term memory (see Shinn: Fig.1, [0036], “a long-term AI memory graph data structure includes data that was previously stored in the short-term artificial intelligence memory graph data structure. For example, the identified components of the natural language input are stored in both the short-term and long-term AI memory graph structures and may be removed from the short-term memory when it becomes less relevant”) and receiving a second value in response (see Shinn: Fig.4, [0083], “relevant data nodes are identified. For example, data nodes of a knowledge store, such as a memory graph data structure, are identified and made candidates for retrieval.” i.e. Shinn teaches transfer of data form short term to long term memory))
querying a semantic long-term memory storing relevant facts per entity in the document in one or more relationship graphs (see Shinn: Fig.19, [0033], “The memory graph data structure can also include semantic memory such as external ontology memory. An external ontology memory may utilize a remote database for semantic information. In response to a query, the appropriate episodic memory (e.g., short term and/or long term memory) and/or semantic memory is used to retrieve supporting knowledge relevant to the query”) and receiving a third value in response (see Shinn: Fig.4, [0083], “relevant data nodes are identified. For example, data nodes of a knowledge store, such as a memory graph data structure, are identified and made candidates for retrieval.”)

	 Shinn does not teach the system wherein:
generating audio output,
providing the first, second, and third values to a neural network to generate the audio;
generating the audio using the neural network; and 
providing the audio according to the request.
compressing the information discarded from the short-term memory using a compression function before being stored in the episodic long-term memory

However, Zhao teaches the system wherein:
text to speech system receiving text input and generating audio output (see Zhao: Fig.10, [0078], “Based on the received input, optimal phonetic properties are determined at operation 1012. The phonetic properties determined may be of the same type of phonetic properties that are received. Additional phonetic properties for the text may also be determined. Based on the determined optimal phonetic properties, a generation sequence is generated that is capable of being synthesized into audible speech. The determination of the optimal phonetic properties and the creation of the generation sequence may be performed by the hyper-structure recurrent neural networks combining module 112. At operation 1018, the generation sequence may be optimized. The optimization of the generation sequence may be based on a special set of rules and/or a golden set of data. The optimized generation sequence may then be synthesized into audible speech at operation 1020.”)
providing the first, second, and third values to a neural network to generate the audio (see Zhao: Fig.1, [0065], “The hyper-structure recurrent neural networks combing module 112 receives the inputs and outputs of the context awareness and semantic mining RNN modules 110, the linguistic prosody tagger RNN module 108, the LTS RNN modules 106, and the POS RNN module 104.”)
generating the audio using the neural network (see Zhao: Fig.1, [0046], “output of the global optimization module is a generation sequence that may be utilized an audio synthesizer to generate the synthesized speech corresponding the input text 102. Because the generation sequence is the combination of multiple phonetic properties and details regarding the input text 102, the synthesized audio will sound more natural to the user.”) 
providing the audio according to the request (see Zhao: Fig.10, [0078], “The optimized generation sequence may then be synthesized into audible speech at operation 1020.”)
	Because both Shinn and Zhao address the same/similar technical problem of maintaining and applying contextual and semantic information based on natural language input, accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Shinn to include neural based text to speech system of Zhao in order to improve contextual information and semantic information of generated speech or audio output. One would have been motivated to make such a combination in order to provide users efficient, realistic and precise voice output that voice that sounds more natural and accurate.
	Shinn and Zhao does not reach the system wherein compressing the information discarded from the short-term memory using a compression function before being stored in the episodic long-term memory.
	However, Rae teaches the system wherein compressing the information discarded from the short-term memory using a compression function before being stored in the episodic long-term memory (see Rae: Fig.1, Pg.3, Section 3, “We build on the ideas of the TransformerXL (Dai et al., 2019) which maintains a memory of past activations at each layer to preserve a longer history of context. The TransformerXL discards past activations when they become sufficiently old (controlled by the size of the memory). The key principle of the Compressive Transformer is to compress these old memories, instead of discarding them, and store them in an additional compressed memory.). 
See Duplicate Figure below 

    PNG
    media_image1.png
    292
    696
    media_image1.png
    Greyscale

See also Section 3.1 describing “The oldest ns activations in memory are evicted, but
unlike the TransformerXL we do not discard them. Instead, we apply a compression operation,
fc : Rns× d → Rns c ×d, mapping the ns oldest memories to ns c compressed memories which we then store in a secondary FIFO compressed memory. d denotes the hidden size of activations and c refers to the compression rate, a higher value indicates more coarse-grained compressed memories.”)
Because Shinn, Zhao and Rae address the same/similar technical problem of maintaining and applying contextual and semantic information based on natural language input, accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shinn’s long term memory architecture to include a compression function when transferring evicted short term memory into long-term memory in order to reduce storage consumption and preserve relevant contextual information while discarding redundancy. neural based text to speech system of Zhao in order to improve contextual information and semantic information of generated speech or audio output. One would have been motivated to make such a combination in order to provide users efficient, realistic and precise voice output that voice that sounds more natural and accurate.
	 
Regarding Claim 2,     
	Shinn, Zhao and Rae teaches all the limitations of Claim 1. Shinn further teaches the computer-implemented method wherein: 
the request includes at least one of: a location of the document, the document, an indication of a voice to use for the audio, a speed for the audio, a location of a user making the request, and a type of audio file to generate (see Shinn: Fig.4, [0081], “for an autonomous robotic system that has interacted with two different users, Alice and Bob, at 405, the source node for Alice is selected in response to a query initiated by Alice and the source node for Bob is selected in response to a query initiated by Bob. In various embodiments, a source node is selected since an autonomous robotic system partitions stored knowledge data by user and/or agent. For example, the knowledge data associated with each user includes user information related to that user's conversation history with the autonomous robotic system. In some embodiments, agents include the autonomous robotic system itself and may include other robotic agents acting autonomously.”)

Regarding Claim 3,     
	Shinn, Zhao and Rae teaches all the limitations of Claim 1. Shinn further teaches the computer-implemented method wherein: 
wherein the short-term memory utilizes a caching algorithm of one of: least recently used, least important, least frequently used, last in first out, and first in first out (see Shinn: Fig.19, [0242], “short-term memory 1905 is used for information that must be retrieved quickly. In various embodiments, short-term memory 1905 is used for information that should be retrieved more quickly such as information based on recent experiences. For example, short-term memory 1905 may be used to store information related to recent events”)

Regarding independent Claim 4,     
Shinn teaches a computer-implemented method comprising: 
receiving a query having text and previous contextual information (see Shinn: Fig.11, [0080], “a determination is made whether the sentence input is a query. For example, the input sentence received at 401 is evaluated to determine whether the input is a query such as a question to the autonomous robotic system.” See also Fig.11, [0200] “process of the supporting knowledge retriever sub-function of the reasoner module. The sub-function first checks the input's performative in order to determine whether it is a query. If the input is a query, the sub-function determines which previous data (previously spoken data sentences) are related to the given input. This is accomplished via a series of filters that filter the whole set of previous data based on various criteria.”, i.e. Shinn teaches receiving text and retrieving previous contextual information)
a neural network memory comprising a short-term memory (see Shinn: Fig.19, [0242], “short-term memory 1905), an episodic long- term memory  (see Shinn: Fig.1, [0036], “a long-term AI memory graph data structure and a semantic long-term memory,) wherein querying the neural network memory (see Shinn: Fig.19, [0033], “The memory graph data structure can also include semantic memory such as external ontology memory”), comprises querying one or more of:
querying a short-term memory storing fine-grained information for recent text of the document (see Shinn: Fig.19, [0242], “short-term memory 1905 is used for information that must be retrieved quickly. In various embodiments, short-term memory 1905 is used for information that should be retrieved more quickly such as information based on recent experiences. For example, short-term memory 1905 may be used to store information related to recent events”) and receiving a first value in response (see Shinn: Fig.4, [0083], “relevant data nodes are identified. For example, data nodes of a knowledge store, such as a memory graph data structure, are identified and made candidates for retrieval.”, i.e. recent action)
querying an episodic long-term memory storing, in a memory graph data structure, information discarded from the short-term memory (see Shinn: Fig.1, [0036], “a long-term AI memory graph data structure includes data that was previously stored in the short-term artificial intelligence memory graph data structure. For example, the identified components of the natural language input are stored in both the short-term and long-term AI memory graph structures and may be removed from the short-term memory when it becomes less relevant.”) and receiving a second value in response (see Shinn: Fig.4, [0083], “relevant data nodes are identified. For example, data nodes of a knowledge store, such as a memory graph data structure, are identified and made candidates for retrieval.” i.e. Shinn teaches transfer of data form short term to long term memory)). See also Fig.5A-5E  illustrating an example of a memory graph data structure that stores information )
querying an episodic long-term memory storing information discarded from the short-term memory (see Shinn: Fig.1, [0036], “a long-term AI memory graph data structure includes data that was previously stored in the short-term artificial intelligence memory graph data structure. For example, the identified components of the natural language input are stored in both the short-term and long-term AI memory graph structures and may be removed from the short-term memory when it becomes less relevant”) and receiving a second value in response (see Shinn: Fig.4, [0083], “relevant data nodes are identified. For example, data nodes of a knowledge store, such as a memory graph data structure, are identified and made candidates for retrieval.” i.e. Shinn teaches transfer of data form short term to long term memory))
querying a semantic long-term memory storing relevant facts per entity in the document in one or more relationship graphs (see Shinn: Fig.19, [0033], “The memory graph data structure can also include semantic memory such as external ontology memory. An external ontology memory may utilize a remote database for semantic information. In response to a query, the appropriate episodic memory (e.g., short term and/or long term memory) and/or semantic memory is used to retrieve supporting knowledge relevant to the query”) and receiving a third value in response (see Shinn: Fig.4, [0083], “relevant data nodes are identified. For example, data nodes of a knowledge store, such as a memory graph data structure, are identified and made candidates for retrieval.”)

	 Shinn does not teach the system wherein:
generating audio output,
providing the first, second, and third values to a neural network to generate the audio;
generating the audio using the neural network; and 
providing the audio according to the request.
compressing the information discarded from the short-term memory using a compression function before being stored in the episodic long-term memory

However, Zhao teaches the system wherein:
text to speech system receiving text input and generating audio output (see Zhao: Fig.10, [0078], “Based on the received input, optimal phonetic properties are determined at operation 1012. The phonetic properties determined may be of the same type of phonetic properties that are received. Additional phonetic properties for the text may also be determined. Based on the determined optimal phonetic properties, a generation sequence is generated that is capable of being synthesized into audible speech. The determination of the optimal phonetic properties and the creation of the generation sequence may be performed by the hyper-structure recurrent neural networks combining module 112. At operation 1018, the generation sequence may be optimized. The optimization of the generation sequence may be based on a special set of rules and/or a golden set of data. The optimized generation sequence may then be synthesized into audible speech at operation 1020.”)
providing the first, second, and third values to a neural network to generate the audio (see Zhao: Fig.1, [0065], “The hyper-structure recurrent neural networks combing module 112 receives the inputs and outputs of the context awareness and semantic mining RNN modules 110, the linguistic prosody tagger RNN module 108, the LTS RNN modules 106, and the POS RNN module 104.”)
generating the audio using the neural network (see Zhao: Fig.1, [0046], “output of the global optimization module is a generation sequence that may be utilized an audio synthesizer to generate the synthesized speech corresponding the input text 102. Because the generation sequence is the combination of multiple phonetic properties and details regarding the input text 102, the synthesized audio will sound more natural to the user.”) 
providing the audio according to the request (see Zhao: Fig.10, [0078], “The optimized generation sequence may then be synthesized into audible speech at operation 1020.”)
	Because both Shinn and Zhao address the same/similar technical problem of maintaining and applying contextual and semantic information based on natural language input, accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify the teaching of Shinn to include neural based text to speech system of Zhao in order to improve contextual information and semantic information of generated speech or audio output. One would have been motivated to make such a combination in order to provide users efficient, realistic and precise voice output that voice that sounds more natural and accurate.
	Shinn and Zhao does not reach the system wherein compressing the information discarded from the short-term memory using a compression function before being stored in the episodic long-term memory.
	However, Rae teaches the system wherein compressing the information discarded from the short-term memory using a compression function before being stored in the episodic long-term memory (see Rae: Fig.1, Pg.3, Section 3, “We build on the ideas of the TransformerXL (Dai et al., 2019) which maintains a memory of past activations at each layer to preserve a longer history of context. The TransformerXL discards past activations when they become sufficiently old (controlled by the size of the memory). The key principle of the Compressive Transformer is to compress these old memories, instead of discarding them, and store them in an additional compressed memory.). See Duplicate Figure below 

    PNG
    media_image1.png
    292
    696
    media_image1.png
    Greyscale

See also Section 3.1 describing “The oldest ns activations in memory are evicted, but
unlike the TransformerXL we do not discard them. Instead, we apply a compression operation,
fc : Rns× d → Rns c ×d, mapping the ns oldest memories to ns c compressed memories which we then store in a secondary FIFO compressed memory. d denotes the hidden size of activations and c refers to the compression rate, a higher value indicates more coarse-grained compressed memories.”)
Because Shinn, Zhao and Rae address the same/similar technical problem of maintaining and applying contextual and semantic information based on natural language input, accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the invention, to modify Shinn’s long term memory architecture to include a compression function when transferring evicted short term memory into long-term memory in order to reduce storage consumption and preserve relevant contextual information while discarding redundancy. neural based text to speech system of Zhao in order to improve contextual information and semantic information of generated speech or audio output. One would have been motivated to make such a combination in order to provide users efficient, realistic and precise voice output that voice that sounds more natural and accurate.


Regarding Claim 5,     
	Shinn, Zhao and Rae teaches all the limitations of Claim 4. Zhao further teaches the computer-implemented method wherein: 
maintaining the short-term memory by: 
applying a language model to the text to generate a vector (see Zhao: Fig.2, [0047], “the input layer (vector) w(t) represents an input word at time t encoded using 1-of-N coding (also called “one-hot coding”), and the output layer y(t) produces a probability distribution over phonetic properties that are assignable to the input text. The hidden layer 204 s(t) maintains a representation of the text sequence history.”)
computing a similarity between the vector and stored vectors, calculating a probability distribution for the similarities (see Zhao: Fig.2, [0047], “the input layer (vector) w(t) represents an input word at time t encoded using 1-of-N coding (also called “one-hot coding”), and the output layer y(t) produces a probability distribution over phonetic properties that are assignable to the input text. The hidden layer 204 s(t) maintains a representation of the text sequence history.”)
weighting the stored vectors by their respective probability distribution summing the weighted vectors and combining with the vector to generate a value, evicting a stored vector according to a caching algorithm, and storing the vector according to the caching algorithm (see Zhao: Fig.2, [0047], “Each layer represents a respective set of nodes, and the layers are connected with weights denoted by the matrices U, W, and V. For instance, in one embodiment, the hidden layer may contain 800 nodes. The input layer (vector) w(t) represents an input word at time t encoded using 1-of-N coding (also called “one-hot coding”), and the output layer y(t) produces a probability distribution over phonetic properties that are assignable to the input text.”)
See Claim 1 above for motivation to combine.

Regarding Claim 6,     
	Shinn, Zhao and Rae teaches all the limitations of Claim 5.  Shinn further teaches the computer-implemented method wherein: 
the caching algorithm is one of least recently used, least important, least frequently used, last in first out, and first in first out (see Shinn: Fig.19, [0242], “short-term memory 1905 is used for information that should be retrieved more quickly such as information based on recent experiences. For example, short-term memory 1905 may be used to store information related to recent events and long-term memory 1903 may be used to store information related to events that occurred further in the past. In some embodiments, short-term memory 1905 includes facts or concepts pinned to recent memory.”)

Regarding Claim 7,    
	Shinn, Zhao and Rae teaches all the limitations of Claim 4.  Shinn further teaches the computer-implemented method wherein:  
maintaining the episodic long-term memory by receiving text discarded from the short-term memory, applying a compression function to the received text to generate a compressed representation of the received text, and storing the compressed representation of received text (see Rae: Fig.1, Pg.3, Section 3, “We build on the ideas of the TransformerXL (Dai et al., 2019) which maintains a memory of past activations at each layer to preserve a longer history of context. The TransformerXL discards past activations when they become sufficiently old (controlled by the size of the memory). The key principle of the Compressive Transformer is to compress these old memories, instead of discarding them, and store them in an additional compressed memory.). 
See motivation to combine Shinn and Rae in Claim 1.

Regarding Claim 8,    
	Shinn, Zhao and Rae teaches all the limitations of Claim 4.  Shinn further teaches the computer-implemented method wherein:
training the episodic long-term memory by reading all text of the document, applying the compression function to sequences of sentences of the document to generate a compressed representation for each sequence, and storing the compressed representations (see Rae: Fig.1, Pg.3, Section 3, “We build on the ideas of the TransformerXL (Dai et al., 2019) which maintains a memory of past activations at each layer to preserve a longer history of context. The TransformerXL discards past activations when they become sufficiently old (controlled by the size of the memory). The key principle of the Compressive Transformer is to compress these old memories, instead of discarding them, and store them in an additional compressed memory.).
 See  motivation to combine Shinn and are in claim 1


Regarding Claim 9,    
	Shinn, Zhao and Rae teaches all the limitations of Claim 4.  Shinn further teaches the computer-implemented method wherein:
the episodic long-term memory uses similarity-based compression (see Rae: Fig.1, Pg.3, Section 3, “We build on the ideas of the TransformerXL (Dai et al., 2019) which maintains a memory of past activations at each layer to preserve a longer history of context. The TransformerXL discards past activations when they become sufficiently old (controlled by the size of the memory). The key principle of the Compressive Transformer is to compress these old memories, instead of discarding them, and store them in an additional compressed memory.).
See  motivation to combine Shinn and are in claim

Regarding Claim 10,    
	Shinn, Zhao and Rae teaches all the limitations of Claim 4.  Shinn further teaches the computer-implemented method wherein:
all text in the same paragraph discarded from the short-term memory is stored together in the episodic long-term memory (see Shinn: Fig.19, [0257], “long-term memory graph data structure is updated. In some embodiments, components identified at 2003 are stored in long-term memory and/or components removed from short-term memory are transferred to long-term memory. In various embodiments, the components may be stored using a node structure similar to the memory graph examples of FIGS. 5A-E and/or FIGS. 15-16. In some embodiments, data stored in long-term memory corresponds to older data that may no longer exist in short-term memory. In some embodiments, long-term memory utilizes a remote database for storage and connectivity of different nodes of the saved components.”)

Regarding Claim 11,    
	Shinn, Zhao and Rae teaches all the limitations of Claim 4.  Shinn further teaches the computer-implemented method wherein:
maintaining the semantic long-term memory by resolving pronouns for text discarded from the short-term memory, extracting relationship information for the resolved pronouns, and updating an entity relationship graph based on the extracted relationship information (see Shinn: Fig.19, [0262], “semantic memory is searched based on the identified components. Semantic memory may be used to further inform the response to the query. For example, a semantic search may return a result that states a “car” is a synonym for an “automobile.” In some embodiments, the semantic search results are used to modify the episodic memory searches. In some embodiments, an additional episodic memory search is performed after the results of a semantic memory search (not shown). In some embodiments, the semantic memory search at 2107 is performed in advance of the episodic memory search at 2105 (not shown). In some embodiments, no semantic memory search is necessary and the semantic search at 2107 is optional.”)

Regarding Claim 12,     
	Shinn, Zhao and Rae teaches all the limitations of Claim 11.  Shinn further teaches the computer-implemented method wherein the entity relationship graph stores information in symbolic form (see Shinn: Fig.20, [0249], “diagram illustrating an embodiment of a process for responding to a natural language input using an adaptive, interactive, and cognitive reasoner that utilizes an advanced memory graph structure.”)

Regarding Claim 13,     
	Shinn, Zhao and Rae teaches all the limitations of Claim 11.  Zhao further teaches the computer-implemented method wherein: 
using the received values to determine prosody (see Zhao: Fig.1, [0038], “The linguistic prosody tagger (LPT) RNN module 108 determines linguistic prosody properties for letters, words, or groups of words from the input 102.”)
See Claim 1 above for motivation to combine.

Regarding Claim 14,     
	Shinn, Zhao and Rae teaches all the limitations of Claim 4.  Shinn further teaches the computer-implemented method wherein: 
wherein the audio is generated in response to a request including at least one of a location of the document, the document, an indication of a voice to use for the audio, a speed for the audio, a location of a user making the request, and a type of audio file to generate  (see Shinn: Fig.4, [0081], “for an autonomous robotic system that has interacted with two different users, Alice and Bob, at 405, the source node for Alice is selected in response to a query initiated by Alice and the source node for Bob is selected in response to a query initiated by Bob. In various embodiments, a source node is selected since an autonomous robotic system partitions stored knowledge data by user and/or agent. For example, the knowledge data associated with each user includes user information related to that user's conversation history with the autonomous robotic system. In some embodiments, agents include the autonomous robotic system itself and may include other robotic agents acting autonomously.”)

Regarding independent Claim 15,     
	Claim 15 is directed to a system claim and have the same claim limitations as Claim 1 and Claim 4 and is rejected under the same rationale.

Regarding Claim 16,     
	Claim 16 is directed to a system claim and have the same claim limitations as Claim 1 and Claim 4 and is rejected under the same rationale.

Regarding Claim 17,     
	Claim 16 is directed to a system claim and have the same claim limitations as Claim 6 and is rejected under the same rationale.

Regarding Claim 18,     
	Claim 15 is directed to a system claim and have the same claim limitations as Claim 8and is rejected under the same rationale.

Regarding Claim 19,     
	Claim 19 is directed to a system claim and have the same/similar claim limitations as Claim 14 and is rejected under the same rationale.

Regarding Claim 20,     
	Claim 20 is directed to a system claim and have the same/similar claim limitations as Claim 14 and is rejected under the same rationale.

Response to Arguments
	Applicant’s arguments with respect to claim amendments have been considered but are moot considering the new combination of references being used in the current rejection. The new combination of references was necessitated by Applicant’s claim amendments. Therefore, the claims are rejected under the new combination of references as indicated above.



Conclusion
	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
PGPUB
 NUMBER:
INVENTOR-INFORMATION:
TITLE / DESCRIPTION
US 20070106513 A1
Boillot; Marc A.
Title: Method For Facilitating Text To Speech Synthesis Using A Differential Vocoder 
Description: he invention relates in general to the field of text to speech synthesis, and more particularly, to improving the segmentation quality of speech tokens when used in conjunction with a vocoder for data compression.
US 20160343366 A1
Fructuoso; Javier Gonzalvo
Title: SPEECH SYNTHESIS MODEL SELECTION
Description: This invention relates generally to a Text-to-speech systems can be used to artificially generate an audible representation of a text. Text-to speech systems typically attempt to approximate various characteristics of human speech, such as the sounds produced, rhythm of speech, and intonation.

 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZELALEM W SHALU whose telephone number is (571)272-3003. The examiner can normally be reached M- F 0800am- 0500pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached at (571) 272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Zelalem Shalu/Examiner, Art Unit 2145      



/CESAR B PAULA/Supervisory Patent Examiner, Art Unit 2145
Read full office action
Prosecution Timeline

Jun 24, 2021
Application Filed
Mar 08, 2025
Non-Final Rejection — §103
May 15, 2025
Interview Requested
Jun 05, 2025
Applicant Interview (Telephonic)
Jun 05, 2025
Examiner Interview Summary
Jun 09, 2025
Response Filed
Aug 06, 2025
Final Rejection — §103
Aug 29, 2025
Interview Requested
Oct 03, 2025
Examiner Interview Summary
Oct 03, 2025
Applicant Interview (Telephonic)
Nov 06, 2025
Request for Continued Examination
Nov 14, 2025
Response after Non-Final Action
Feb 18, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/827,588
Patent 12477016
AUTOMATION OF VISUAL INDICATORS FOR DISTINGUISHING ACTIVE SPEAKERS OF USERS DISPLAYED AS THREE-DIMENSIONAL REPRESENTATIONS
2y 5m to grant Granted Nov 18, 2025
17/808,093
Patent 12468969
METHODS FOR CORRELATED HISTOGRAM CLUSTERING FOR MACHINE LEARNING
2y 5m to grant Granted Nov 11, 2025
15/770,665
Patent 12419611
PATIENT MONITOR, PHYSIOLOGICAL INFORMATION MEASUREMENT SYSTEM, PROGRAM TO BE USED IN PATIENT MONITOR, AND NON-TRANSITORY COMPUTER READABLE MEDIUM IN WHICH PROGRAM TO BE USED IN PATIENT MONITOR IS STORED
2y 5m to grant Granted Sep 23, 2025
17/344,053
Patent 12153783
User Interfaces and Methods for Generating a New Artifact Based on Existing Artifacts
2y 5m to grant Granted Nov 26, 2024
17/573,118
Patent 12120422
SYSTEMS AND METHODS FOR CAPTURING AND DISPLAYING MEDIA DURING AN EVENT
2y 5m to grant Granted Oct 15, 2024
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
29%
Grant Probability
48%
With Interview (+19.0%)
3y 2m
Median Time to Grant
High
PTA Risk
Based on 108 resolved cases by this examiner. Grant probability derived from career allow rate.