Last updated: April 19, 2026

Application No. 18/156,272

UPDATING AND SEARCHING A REPOSITORY HAVING AUDIO FILES INCLUDING PRONUNCIATIONS OF NAMES OF USERS IN COMPUTER RENDERED CONTENT

Non-Final OA §102§103

Filed

Jan 18, 2023

Examiner

PENG, HUAWEN A

Art Unit

2169

Tech Center

2100 — Computer Architecture & Software

Assignee

International Business Machines Corporation

OA Round

1 (Non-Final)

Interview Optional

— +20.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 712 resolved cases, 2023–2026

Examiner Intelligence

PENG, HUAWEN A View full profile →

Grants 82% — above average

Career Allow Rate

586 granted / 712 resolved

+27.3% vs TC avg

Strong +20% interview lift

Without

With

+20.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

14 currently pending

Career history

726

Total Applications

across all art units

Statute-Specific Performance

§101

15.6%

-24.4% vs TC avg

§103

42.9%

+2.9% vs TC avg

§102

24.6%

-15.4% vs TC avg

§112

6.4%

-33.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 712 resolved cases

Office Action

§102 §103

DETAILED ACTION
Claims 1-20 are presented for examination.

Notice of Pre-AIA  or AIA  Status
2.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
3.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

4.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


5.	Claims 1-5, 7-12, 14-18 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bromand (US 2020/0357390).

	
In claim 1, Bromand teaches
A computer program product for providing audio pronunciations of name text presented in computer rendered content, the computer program product comprising a computer readable storage medium having computer readable program code embodied therein that is executable to perform operations, the operations comprising: 
providing user name pronunciation information in a repository for users, wherein user name pronunciation information for a user indicates a language, a pronunciation attribute to pronounce name text of the user, and an audio file providing an audio pronunciation of the name text in the language according to the pronunciation attribute ([0029] An “alias” is data representing a pronunciation of a term or phrase. An alias can take a variety of forms including representation using ASCII characters (even if the alias itself is represented in another format such as UNICODE), such as representing the pronunciation of “AC/DC” as “A C D C”. Generally, the alias pronunciation system 106 functions to receive and decipher utterances from a user. Server 110 can also include an alias pronunciation system 108. A machine-learning model 107 can be stored as part of or in conjunction with the alias pronunciation system 106, 108. The alias pronunciation system 106 can take the form of alias pronunciation application instructions stored on a computer readable medium and executable by one or more processors to provide alias pronunciation functionality [0030] the machine-learning model 107 is configured to receive, as input, a vector representing the characters of text for which pronunciation is desired (e.g., a vector of an artist name) and the output is a vector representing a pronunciation of the name in a desired format. The pronunciation can be used as an alias of the name [0031] there may be different machine-learning models 107 for use in different parts of the world to address regional differences in pronunciation. For instance, there may be a machine-learning model 107 trained based on training data specific to British English, American English, or other region dialects. Where a region- or dialect-specific pronunciation is requested, a corresponding machine-learning model is selected for use); 
receiving a name pronunciation request indicating an audience language and an audience pronunciation attribute in which name text of a user is to be pronounced ([0040] the utterance 130 includes a request for music by an artist named “bl!nk”, which includes a character (“!”) that is a non-letter character. The user device 102 receives the utterance 130, processes the utterance, and determines the result 132 based on the utterance 130 (e.g., a pronunciation of the utterance). The processing of the utterance 130 can benefit from output from the machine-learning model 107. Then, the media streaming application 104 plays a media content item associated with the artist. Although the English language is used herein, the techniques described herein can be applied to variety of different alphabets and languages. Techniques can be particularly applicable to situations where a name (e.g., of an artist or song) includes one or more characters that are not contained within the alphabet of the language associated with a voice system); 
determining, from the repository, an audio file associated with a language and pronunciation attribute for the user matching the audience language and the audience pronunciation attribute, respectively ([0067] the STT engine 312 provides a probabilistic transcription based on the available information, and the probability improves as the STT engine 312 learns from experience what words co-occur and at what frequencies. The STT engine 312 can also learn stylizations of specific users. That is, the STT engine 312 learns how to correctly map phonemes depending on the person that has uttered them, thereby taking into account users' individual accents, dialects, rhythm, pace, and other speech characteristics [0101] where the utterance 130 is, phonetically (e.g., as output produced by a speech-to-text engine), “play close your eyes by blink”, the first portion can be “close your eyes” and the second portion is “blink”. The portions can be identified by, for example, the retrieval engine 314. Next, a search is conducted to find a match to the first portion in a database. For instance, the retrieval engine 314 can search for the first portion in the name entity storage 306 (e.g., the database 330 thereof) for name entities matching the first portion. A resulting match can have associated data. For instance, the resulting match can be an entry in the name entity storage 306 having the title “close your eyes” and be associated with an artist having the name entity “bl!nk”. The name entity “bl!nk” can be the associated data because it is data associated with the resulting match found in the database); and 
returning the determined audio file to output audio in the audio file pronouncing the name text of the user ([0069] if the STT engine 312 provides the transcribed text “Play ‘Close Your Eyes’ by Bl!nk” the retrieval engine 314 parses the text, identifies the name entity “Bl!nk,” and then looks up the name entity in the name entity storage 306. In some examples, the name entity storage 306 includes a database 330 having entries that map each media item stored (e.g., using the media item identifier) in the media repository 302 to one or more name entities and/or one or more aliases associated with the media item identifier (ID). The retrieval engine 314 then passes the returned media item ID associated with the name entity to the media repository 302, where the media item associated with the media item ID is identified and then played back, such as via the playback interface 342 (e.g., a speaker, a display screen, etc.) of the user device 102).  

In claim 2, Bromand teaches
The computer program product of claim 1, wherein the name pronunciation request includes the name text in computer rendered content to pronounce, and wherein the determined audio file is further associated with the name text in the name pronunciation request ([0040] the utterance 130 includes a request for music by an artist named “bl!nk”, which includes a character (“!”) that is a non-letter character. The user device 102 receives the utterance 130, processes the utterance, and determines the result 132 based on the utterance 130 (e.g., a pronunciation of the utterance). The processing of the utterance 130 can benefit from output from the machine-learning model 107. Then, the media streaming application 104 plays a media content item associated with the artist. Although the English language is used herein, the techniques described herein can be applied to variety of different alphabets and languages. Techniques can be particularly applicable to situations where a name (e.g., of an artist or song) includes one or more characters that are not contained within the alphabet of the language associated with a voice system).  

In claim 3, Bromand teaches
The computer program product of claim 1, wherein the repository provides a plurality of audio files providing pronunciations of name text in different languages and/or pronunciation attributes for users in the repository ([0105] text-based words, such as name entities, and their associated standard pronunciations are obtained. As used herein, “text-based word” refers to a textual representation of words, such as name entities. It should be understood that the term “text” is used for convenience and may refer to, for example, alpha characters, numeric characters, alphanumeric characters, AMERICAN STANDARD CODE FOR INFORMATION INTERCHANGE (ASCII) characters, symbols, or foreign language UNICODE (e.g. UTF-8)).  

In claim 4, Bromand teaches
The computer program product of claim 1, wherein the pronunciation attribute indicates at least one of a dialect of the language and an accent in which the name text is pronounced in the language ([0067] the STT engine 312 provides a probabilistic transcription based on the available information, and the probability improves as the STT engine 312 learns from experience what words co-occur and at what frequencies. The STT engine 312 can also learn stylizations of specific users. That is, the STT engine 312 learns how to correctly map phonemes depending on the person that has uttered them, thereby taking into account users' individual accents, dialects, rhythm, pace, and other speech characteristics).  

In claim 5, Bromand teaches
The computer program product of claim 1, wherein the operations further comprise: receiving an update request for a specified user in the repository indicating a language and pronunciation attribute detected for name text for the specified user in computer rendered content and an audio file having a pronunciation of the name text for the specified user; and adding, to the repository, information for the specified user indicating the language, the pronunciation attribute, and the audio file in the update request ([0085] When the system receives a confirmation of a pronunciation or a new pronunciation (e.g., from the machine-learning model 107), the database 330 in the name entity storage 306 is then updated to include the collected and/or generated aliases associated with the content item ID. Subsequently, when the alias pronunciation system 106, 108 is in playback mode a transcribed uttered playback request is compared by the retrieval engine 314 to the name entity and any associated aliases of that name entity when identifying in the database 330 a content item ID corresponding to the playback request. For example, a subsequent transcribed uttered request to “play too late” is correctly mapped to the name entity 2L8 and its corresponding content ID in the database 330 using the collected or generated alias “too late” [0086] once a name entity has been classified the media item associated with that name entity is tagged with the one or more classifications. For example, the database 330 is updated with the classification tag or tags, which is then used to update the classification column 808).  

In claim 7, Bromand teaches

The computer program product of claim 1, wherein the operations further comprise: deploying, at a client computer, a name context detector, a name audio generator, and a name pronunciation updater, wherein the name context detector executes at the client computer to process computer rendered content to determine a language and pronunciation attribute for a name of a user in the repository, wherein the name audio generator processes the computer rendered content to determine an audio file providing a pronunciation of the name text of the user in the repository, wherein the name pronunciation updater generates an update request including the language and the pronunciation attribute determined by the name context detector and the audio file determined by the name audio generator to add to the repository for the user in the repository ([0067] the STT engine 312 translates the speech signal into sound units called phonemes, and then maps the phonemes to words using a stored lexicon of words. In some examples, the context of the words is also used to infer the correct transcription. For example, if the phonemes translated from “Close Your Eyes” are imprecise or unclear due to poor transmission or an accent of the user, the transcription of “Close Your Eyes” by the STT engine 312 may be informed by “by Bl!nk” since “Close Your Eyes” and “Bl!nk” often co-occur in a playback utterance. In this manner, the STT engine 312 provides a probabilistic transcription based on the available information, and the probability improves as the STT engine 312 learns from experience what words co-occur and at what frequencies. The STT engine 312 can also learn stylizations of specific users. That is, the STT engine 312 learns how to correctly map phonemes depending on the person that has uttered them, thereby taking into account users' individual accents, dialects, rhythm, pace, and other speech characteristics [0085] When the system receives a confirmation of a pronunciation or a new pronunciation (e.g., from the machine-learning model 107), the database 330 in the name entity storage 306 is then updated to include the collected and/or generated aliases associated with the content item ID. Subsequently, when the alias pronunciation system 106, 108 is in playback mode a transcribed uttered playback request is compared by the retrieval engine 314 to the name entity and any associated aliases of that name entity when identifying in the database 330 a content item ID corresponding to the playback request. For example, a subsequent transcribed uttered request to “play too late” is correctly mapped to the name entity 2L8 and its corresponding content ID in the database 330 using the collected or generated alias “too late” [0086] once a name entity has been classified the media item associated with that name entity is tagged with the one or more classifications. For example, the database 330 is updated with the classification tag or tags, which is then used to update the classification column 808).  

In claim 8, Bromand teaches
The computer program product of claim 7, wherein the name audio generator performs: determining whether user account information for the user in the repository indicates the audio file providing a pronunciation of the name text of the user identified in the repository according to the language and the pronunciation attribute determined by the name context detector, wherein the audio file included in the update request includes the audio file determined from the user account information ([0078] in response to receiving, for each of two aliases, at least a predefined minimum number of responses to crowd-sourcing pronunciations for “2L8”, the alias collection engine 310 populates the alias column 806 of the database 330 with “too late” and “too-el-ate” corresponding to two crowd-sourced pronunciation aliases. It should be appreciated that each alias can be represented in the alias column 806 in more than one way (e.g., as alternative but equivalent or at least substantially equivalent spellings to ensure that minor variations in speech transcription by the STT engine 312 are nevertheless mapped to the appropriate alias) [0079] It should be appreciated that aliases can be collected without crowd-sourcing. For example, specific user accounts can be targeted to provide aliases, or the artists themselves can be targeted to provide aliases).  

In claim 9, Bromand teaches
The computer program product of claim 7, wherein the name audio generator performs: generating a prompt at the client computer for a pronunciation of the name text of the user in the repository; receiving audio of the pronunciation of the name text of the user in the repository; and generating the audio file including the received audio to include in the update request ([0085] When the system receives a confirmation of a pronunciation or a new pronunciation (e.g., from the machine-learning model 107), the database 330 in the name entity storage 306 is then updated to include the collected and/or generated aliases associated with the content item ID. Subsequently, when the alias pronunciation system 106, 108 is in playback mode a transcribed uttered playback request is compared by the retrieval engine 314 to the name entity and any associated aliases of that name entity when identifying in the database 330 a content item ID corresponding to the playback request. For example, a subsequent transcribed uttered request to “play too late” is correctly mapped to the name entity 2L8 and its corresponding content ID in the database 330 using the collected or generated alias “too late”).

Claim Rejections - 35 USC § 103
6.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

7.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


8.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

9.	Claims 6, 13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bromand (US 2020/0357390) in view of Henderson et al. (US 2022/0012420) hereinafter Henderson.

In claim 6, per rejections in claim 1
Bromand does not appear to explicitly disclose however, Henderson discloses “wherein the operations further comprise: receiving an update request for a specified user in the repository indicating a language and pronunciation attribute detected for name text of the specified user in computer rendered content and an audio file having a pronunciation of the name text for the specified user; 
determining information in the repository for the specified user indicating the language and the pronunciation attribute in the update request; and 
incrementing a count in the determined information, wherein counts for audio files for different languages and/or pronunciation attributes are used to select an audio file to pronounce the name text for the specified user ([0051] if computer server 155 determines that a name originates in an Arab country, and that person is located in Dearborn, Mich., computer server 155 may apply census data that indicates a particularly high density of people from Somalia or of Somalian descent, to determine that the name should be pronounced in a way that is consistent with a Somali pronunciation rather than another Arabic country pronunciation. Other supplementary data may include a zip code, phone area code, name data, or other information that provides a suggestion of a locality for a particular person. The origin data extracted may or may not be a single geographical or linguistic region or ethnicity or may or may not be a list of probabilities for different regions. The linguistic origin may or may not indicate influences from multiple geographic regions across time, for example the pronunciation of a third-generation immigrant Indian name in New Orleans may differ from the pronunciation of a third-generation immigrant Indian name in New York [0052] a ranking algorithm may incorporate user preference, target origin estimation, prior target pronunciation matches, and other information derived from input supplementary information in order to rank the entries from the pronunciation database. These pronunciation entries may be ranked based on which is the most likely, most relevant or best pronunciation recording, for example [0053] the final output of this system is a list of pronunciations, particularly their audio recordings and/or phonetic representations, ranked based on the ranking algorithm [0054] a “voting model” interprets customer preferences and behavior based on prior results in order to provide additional information to adjust the behavior of the ranking algorithm. This may include excluding certain results that customers did not prefer or downvoted. This may include increasing the ranking for certain results that were highly preferred for a given target name, or highly preferred for a given target name within e.g., a certain demographic. Using this “voting” model, the ranking algorithm may be further iteratively refined to improve e.g., the accuracy and quality of name recommendations)”.
Hence, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine Bromand and Henderson, the suggestion/motivation for doing so would have been to provide a method for improving the accuracy and quality of name/word pronunciations ([0052][0053][0054]).		

Claims 10-15 are essentially same as claims 1-2, 5-6, 7 and 9 except that they recite claimed invention as a system and are rejected for the same reasons as applied hereinabove.

Claims 16-20 are essentially same as claims 1-2, 5-6 and 7 except that they recite claimed invention as a method and are rejected for the same reasons as applied hereinabove.

Conclusion
10.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure is listed on 892 form.

Examiner’s Note: Examiner has cited particular figures, and paragraphs in the references as applied to the claims above for the convenience of the applicant.  Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well.  It is respectfully requested for the applicant, in preparing the responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner.
Contact Information
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUAWEN A PENG whose telephone number is (571)270-5215. The examiner can normally be reached Mon thru Fri 9 am to 5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sherief Badawi can be reached at 571-272-9782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HUAWEN A PENG/Primary Examiner, Art Unit 2169

Read full office action

Prosecution Timeline

Jan 18, 2023

Application Filed

Nov 08, 2023

Response after Non-Final Action

Feb 20, 2026

Non-Final Rejection — §102, §103

Apr 15, 2026

Examiner Interview Summary

Apr 15, 2026

Applicant Interview (Telephonic)

Precedent Cases

Applications granted by this same examiner with similar technology

18/581,094

Patent 12602367

DATA INTEGRITY CHECKS

2y 5m to grant Granted Apr 14, 2026

18/923,053

Patent 12602625

SYSTEMS AND METHODS FOR CREATING A RICH SOCIAL MEDIA PROFILE

2y 5m to grant Granted Apr 14, 2026

18/806,290

Patent 12598135

TECHNIQUES TO BALANCE LOG STRUCTURED MERGE TREES

2y 5m to grant Granted Apr 07, 2026

18/923,803

Patent 12579160

SYSTEMS, METHODS, AND APPARATUSES FOR GENERATING, EXTRACTING, CLASSIFYING, AND FORMATTING OBJECT METADATA USING NATURAL LANGUAGE PROCESSING IN AN ELECTRONIC NETWORK

2y 5m to grant Granted Mar 17, 2026

18/071,284

Patent 12567274

GEOGRAPHIC MANAGEMENT OF DOCUMENT CONTENT

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

82%

Grant Probability

99%

With Interview (+20.1%)

3y 3m

Median Time to Grant

Low

PTA Risk

Based on 712 resolved cases by this examiner. Grant probability derived from career allow rate.