Last updated: May 29, 2026

Application No. 18/357,611

SYSTEMS AND METHODS FOR GENERATING A CONTINUOUS MUSIC SOUNDSCAPE USING A TEXT-BASED SOUND ENGINE

Non-Final OA §102§103

Filed

Jul 24, 2023

Priority

Nov 05, 2018 — provisional 62/755,725 +3 more

Examiner

ALBERTALLI, BRIAN LOUIS

Art Unit

2119

Tech Center

2100 — Computer Architecture & Software

Assignee

Endel Sound GmbH

OA Round

1 (Non-Final)

Interview Optional

— +16.6% interview lift. Examiner has a relatively high allowance rate (82%); +16.6% interview lift. A written response may suffice.

Based on 857 resolved cases, 2023–2026

Examiner Intelligence

ALBERTALLI, BRIAN LOUIS View full profile →

Grants 82% — above average

Career Allowance Rate

701 granted / 857 resolved

+26.8% vs TC avg

Strong +17% interview lift

Without

With

+16.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

18 currently pending

Career history

874

Total Applications

across all art units

Statute-Specific Performance

§101

9.6%

-30.4% vs TC avg

§103

65.0%

+25.0% vs TC avg

§102

13.9%

-26.1% vs TC avg

§112

7.0%

-33.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 857 resolved cases

Office Action

§102 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
This application, filed 24 July 2023, is a continuation-in-part (CIP) of application 17/815,126, filed on 26 July 2022, which is a CIP of application 17/665,353, filed on 4 February 2022, which is a continuation of application 16/674,844, filed 5 November 2019, which claims priority to provisional application 62/755,725, filed 5 November 2018.
Claims 1-20 recite subject matter directed to analyzing text frames using a machine learning network to determine sound sections for presentation to a user which was not described in any of the parent applications noted above. Accordingly, the effective filing date of claims 1-20 is the filing date of the application, i.e. 24 July 2023. See MPEP 211.05.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-5, 7-8, and 11-15 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Cameron et al. (U.S. Patent Application Pub. No. 2018/0032305, hereinafter “Cameron”).
In regard to claim 1, Cameron discloses a method for creating a personalized sound environment for a user (see Fig. 4), the method comprising:
obtaining text data comprising a plurality of words (text data 26 is received, paragraph [0196]);
generating a plurality of text frames based on the text data, wherein each respective text frame of the plurality of text frames includes a subset of the plurality of words (the text is divided into a plurality of text segments 30, paragraphs [0197-0198]);
analyzing, using a machine learning network, each respective text frame to generate one or more features corresponding to the respective text frame and the subset of the plurality of words (the text segments are analyzed 32 to determine emotional data profiles corresponding to the text segments, paragraph [0198]; the analysis includes sentiment analysis performed using machine learning such as the Python NLP toolkit, paragraph [0242]);
determining two or more sound sections for presentation to a user, each sound section corresponding to a particular text frame of the plurality of text frames and generated based at least in part on the one or more features of the particular text frame (audio regions corresponding to the text segments are determined 34 and music selections are made 36 based on the determined emotional data profiles, paragraphs [0199-0200]);
generating a personalized sound environment for presentation to the user, wherein the personalized sound environment includes at least the two or more sound sections (a soundtrack is generated 38 including the determined audio selections, paragraph [0201]); and
presenting the personalized sound environment to the user on a user computing device (the soundtrack is presented to the user on a device while the user is reading the text, paragraph [0184]).

In regard to claim 2, Cameron discloses the personalized sound environment is presented to the user based on:
presenting at least a portion of the text data on a display of the user computing device (a device displays the text data, paragraphs [0367-0368]);
determining an estimated current reading position of the user, indicative of a location within the text data (the system tracks the user’s reading position in the text data, paragraphs [0369-0370]); and
synchronizing playback of the personalized sound environment with presentation of the text data based on the estimated current reading position of the user (the soundtrack audio is played according to the user’s reading position, paragraph [0370]).

In regard to claim 3, Cameron discloses synchronizing playback comprises:
determining a corresponding text frame of the plurality of text frames that includes the estimated current reading position of the user (the user’s reading position is determined, and the audio region associated with the corresponding text segment is determined, paragraphs [0369-0370]); and
presenting a respective sound section of the personalized sound environment, wherein the respective sound section is a sound section generated for the corresponding text frame (the audio region associated with the text segment is output, paragraph [0370]).

In regard to claim 4, Cameron discloses analyzing the plurality of words of the text data to generate one or more full text baselines, each full text baseline indicative of one or more of a complexity of the text data, semantic analysis information of the text data, or a theme of the text data (semantic engine 32 determines baseline statistical values relating to each emotional category, paragraph [0223]; and/or thematic profiling, paragraph [0346]).

In regard to claim 5, Cameron discloses analyzing each respective text frame comprises:
determining a frame-specific deviation information indicative of a deviation between the full text baseline and the one or more features corresponding to the respective text frame, wherein the full text baseline and the one or more features are calculated using a same text analysis metric (baseline absolute emotional data profiles are calculated, then relative emotional data profiles are calculated based on the absolute value of the difference between the absolute value and the mean value divided by the standard deviation, paragraph [0226]).

In regard to claim 7, Cameron discloses the full text baseline comprises the theme of the text data, based on identifying the text data as a work of fiction (genre or style based on text determined to be fiction, paragraph [0308]). 

In regard to claim 8, Cameron discloses the machine learning network comprises a semantic analysis model configured to determine the one or more features of the respective text frame as a mood or a theme associated with the respective text frame; or the machine learning network comprises a text classification neural network configured to determine the one or more features of the respective text frame as a text type classification associated with the respective text frame (using machine learning sentiment classifiers such as the Python NLP toolkit, the text segments are classified as having positive or negative sentiment, paragraph [0242]). 

In regard to claim 11, Cameron discloses the plurality of text frames are non-overlapping, and wherein each text frame includes a unique subset of the plurality of words (individual non-overlapping sentences or other segments, paragraph [0198]).

In regard to claim 12, Cameron discloses generating the plurality of text frames based on the text data comprises:
parsing the text data and segmenting the parsed text data into the plurality of text frames based on identifying a text frame start trigger or a text frame end trigger in the parsed text data (a natural language processing engine parses the raw text to identify segments according to a start text position and a stop text position, paragraphs [0197-0199]).

In regard to claim 13, Cameron discloses the text frame start trigger or the text frame end trigger comprises one or more of:
a paragraph break, a section header, or a chapter header included in the parsed text data (paragraphs, chapter boundaries, etc., paragraph [0198]).

In regard to claim 14, Cameron discloses segmenting the parsed text data into the plurality of frames is based on a pre-determined text frame length (predetermined numbers of words or sentences, paragraph [0198]).

In regard to claim 15, Cameron discloses the text data corresponds to one of: 
an e-book, an article, or a scientific publication (e-books, articles, professional documents, etc., paragraph [0361]).


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cameron, in view of Shenkan (U.S. Patent Application Pub. No. 2024/0282209, hereinafter “Shenkan”).
In regard to claim 6, while Cameron discloses generating a full text baseline based on identifying the text data as a work of non-fiction (paragraph [0361]), Cameron does not specifically disclose the full text baseline comprises the complexity of the text data.
Shenkan discloses a method for generating content associated with text data using attributes of the textual data, wherein the attributes comprise the complexity of the text data based on identifying the text data as a work of non-fiction (text data is categorized as non-fiction and a complexity of the text is determined, paragraph [0103]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to generate the full text baseline based on the complexity of the text data, because it would allow associated content to be tailored to the user’s understanding and abilities, as suggested by Shenkan (paragraphs [0103] and [0107]).


Claim(s) 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cameron, in view of Hammersley et al. (U.S. Patent Application Pub. No. 2019/0189019, hereinafter “Hammersley”).
In regard to claim 9, Cameron does not expressly disclose receiving output from a plurality of sensors, the sensor output detecting a state of the user and an environment in which the user is active.
Hammersley discloses a method for creating a personalized sound environment for a user, comprising receiving output from a plurality of sensors, the sensor output detecting a state of the user and an environment in which the user is active (sensors capturing environmental information, paragraphs [0102-0103]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to receive output from a plurality of sensors, because it would allow environmental effects to be synchronized to the user’s reading, as suggested by Hammersley (paragraphs [0084-0085]).

In regard to claim 10, Cameron does not disclose the two or more sound sections are selected from a plurality of sound sections based on the corresponding features of the particular text frame and further based on the sensor output.
Hammersley discloses two or more sound sections are selected from a plurality of sound sections based on the corresponding features of the particular text frame and further based on the sensor output (auditory special effects are selected based on the user’s reading position and environmental information, paragraphs [0071-0072] and [0102-0103]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to select two or more sound sections based on the corresponding features of the particular text frame and further based on the sensor output, because it would allow environmental effects to be synchronized to the user’s reading, as suggested by Hammersley (paragraphs [0084-0085]).


Claim(s) 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cameron (hereinafter “Cameron 1”), in view of Cameron et al. (U.S. Patent Application Pub. No. 2019/0005959, hereinafter “Cameron 2”).
In regard to claim 16, Cameron 1 does not disclose the text data comprises a transcript generated based on spoken word audio data.
Cameron 2 discloses the text data comprises a transcript generated based on spoken word audio data (audiobook, see Abstract).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize text data comprising a transcript generated based on spoken word audio data, because it would provide users with a choice to additionally synchronize audio with an audiobook version of a text, as taught by Cameron 2 (paragraph [0005]).

In regard to claim 17, Cameron 1 does not disclose the spoken word audio data is an audiobook.
Cameron 2 discloses the spoken word audio data is an audiobook (see Abstract).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize an audiobook, because it would provide users with a choice to additionally synchronize audio with an audiobook version of a text, as taught by Cameron 2 (paragraph [0005]).

In regard to claim 18, Cameron 1 does not disclose the spoken word audio data is captured by a microphone of the user computing device, and wherein the text data comprises a real-time transcript generated using a speech recognition engine.
Cameron 2 discloses the spoken word audio data is captured by a microphone of the user computing device, and wherein the text data comprises a real-time transcript generated using a speech recognition engine (real-time speech-to-text mapping engine, paragraph [0140]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize a real-time transcript generated using a speech recognition engine as text data, because it would synchronize the audiobook text to the sound sections, as taught by Cameron 2 (paragraph [0140]).

	In regard to claim 19, Cameron 1 discloses the personalized sound environment is generated without using one or more full text baselines calculated for the input text data (absolute sentence emotional data profiles do not use baseline values, paragraph [0220]).

	In regard to claim 20, Cameron 1 does not disclose the personalized sound environment is output in real-time using a speaker of the user computing device, and is synchronized with the spoken word audio data captured by the microphone of the user computing device.
	Cameron 2 discloses the personalized sound environment is output in real-time using a speaker of the user computing device, and is synchronized with the spoken word audio data captured by the microphone of the user computing device (real-time playback based on text-position determined using speech-to-text, paragraphs [0197] and [0216]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to output the personalized sound environment in real-time, because it would provide a soundtrack-enhanced audiobook experience, as taught by Cameron 2 (paragraph [0197]).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Alluri et al., Millman et al., Freeman, Henshall et al., Hirai, Roblek, and Weinstein disclose additional methods of synchronizing audio with text.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRIAN LOUIS ALBERTALLI whose telephone number is (571)272-7616. The examiner can normally be reached M-F 8AM-3PM, 4PM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached at 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





BLA 5/6/26
/BRIAN L ALBERTALLI/               Primary Examiner, Art Unit 2656

Read full office action

Prosecution Timeline

Jul 24, 2023

Application Filed

May 11, 2026

Non-Final Rejection mailed — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/201,059

Patent 12629093

Systems and Methods for Detecting Impairment Based Upon Voice Data

2y 12m to grant Granted May 19, 2026

18/587,094

Patent 12633304

DYSARTHRIA DETECTION METHOD, DYSARTHRIA DETECTION DEVICE, AND RECORDING MEDIUM

2y 2m to grant Granted May 19, 2026

18/630,727

Patent 12632652

MODIFYING DATA USING LARGE LANGUAGE MODELS

2y 1m to grant Granted May 19, 2026

18/075,155

Patent 12620395

GENERATING A GROUP AUTOMATED ASSISTANT SESSION TO PROVIDE CONTENT TO A PLURALITY OF USERS VIA HEADPHONES

3y 5m to grant Granted May 05, 2026

18/466,711

Patent 12620406

System and Method for Speech Enhancement in Multichannel Audio Processing Systems

2y 7m to grant Granted May 05, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

82%

Grant Probability

98%

With Interview (+16.6%)

2y 9m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 857 resolved cases by this examiner. Grant probability derived from career allowance rate.