Last updated: April 18, 2026
Application No. 18/201,614
AUDIO-ENABLED MESSAGING OF AN IMAGE

Final Rejection §103
Filed
May 24, 2023
Examiner
HOPE, DARRIN
Art Unit
2178
Tech Center
2100 — Computer Architecture & Software
Assignee
Tencent Technology (Shenzhen) Company Limited
OA Round
4 (Final)
Interview Optional

— +19.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 449 resolved cases, 2023–2026
Examiner Intelligence

HOPE, DARRIN View full profile →
Grants 60% of resolved cases
Career Allow Rate
270 granted / 449 resolved
+5.1% vs TC avg
Strong +19% interview lift
Without
With
+19.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
34 currently pending
Career history
483
Total Applications
across all art units
Statute-Specific Performance

§101
7.8%
-32.2% vs TC avg
§103
54.5%
+14.5% vs TC avg
§102
24.7%
-15.3% vs TC avg
§112
4.3%
-35.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 449 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .  This Office Action is responsive to the communications filed on 11 March 2026.  Claims 1-20 are pending.
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. 2021113621128, filed on 18 August 2022.
Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-11 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Poosala et al. (Hereinafter, Poosala, US 2016/0277903 A1) in view of Levy (US 2015/0255057 A1), and further in view of  McMahon et al. (Hereinafter, McMahon, US 2014/0078331 A1).
Per claim 1,  Poosala discloses  a method for audio-enabled messaging of an image (Abstract; paragraph [0003]), comprising: 
displaying a messaging interface(e.g., block 1602 as shown in Fig. 16; paragraph [0025], “… The visual representation may be displayed in a messaging application user interface so that a user of the messaging application may send the audio file by selecting the visual representation…   “; paragraph [0123];  paragraph [0151], “In the illustrated embodiment shown in FIG. 16, the logic flow 1600 may be operative at block 1602 to display a message user interface (UI) with a visual representation of an audio file…” ); 
displaying an image selection interface in response to a first user operation via the messaging interface(e.g., section(s) 1040, 1140, 1340 as shown in Figs. 10, 11, and 13; paragraph [0104], “ The message flow 800 continues when the GUI generator 332 receives a control directive selecting an audio sticker in message 804. For example, the GUI generator 332 may receive a tap touch gesture, a touch-and-hold touch gesture, a touch-and-drag touch gesture, a voice command, or any other control directive selecting a visual representation of an audio sticker. “; paragraph [0125], “Section 1040 may display visual representations of audio stickers, such as representation 1042 .... “), the image selection interface being configured to display at least one image for selection by a user (paragraph [0125], “ …The sender may perform one or more touch gestures using his finger 1002 to select a representation and send it and its corresponding audio sticker to the recipient… “); but does not expressly disclose:
 in response to a transmission operation for a selected image of the at least one image to be transmitted to another user, obtaining feature information of the selected image;
transmitting the feature information to a server that is configured to store a plurality of pieces of sound information and to select associated audio information by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information;
information of the selected image with candidate sound information in the plurality of pieces of sound information;
receiving, from the server, associated audio information selected by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information; and
 displaying, in the messaging interface, an audio-enabled message including the selected image and the associated audio information that is transmitted to the other user.
Levy discloses:
 in response to a transmission operation for a selected image of the at least one image to be transmitted to another user (paragraph [0065], “FIG. 5 illustrates a flow chart 500 detailing the process of sending an audio/visual message using a customized dictionary.”), obtaining feature information of the selected image (e.g., step 502 as shown in Fig. 5; Abstract; paragraph [0066], “In Step 502 the message is” composed on the sender terminal. The author indicates that a customized audio dictionary is used. In the Manual Selection embodiment, the author selects the desired sequences/sets of encoded symbols from the message and relates the sets to the customized dictionary terms, each term with the corresponding audio clip/effect.”; Examiner’s Note: Levy discloses composing a message including at least one encoded symbol selected from the group comprising: an alpha-numeric character, an icon, a smiley, an emoticon, a graphic and an emoji as described in paragraphs [0024], [0040];  Examiner’s Note Levy sends the dictionary information (i.e., a dictionary indicator) used to map symbols to audio to a server.);
receiving, from the server, associated audio information selected by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information (Abstract, “… receiving the message at the receiver terminal; and replaying the message on the receiver terminal, such that the replaying includes mapping the at least one set of encoded symbols in the message to a corresponding sound effect in the audio dictionary so that when each of the at least one set of encoded symbols is displayed on the receiver terminal, a respective sound effect is sounded on the receiver terminal.“; paragraph [0017]; paragraph [0027]; paragraphs [0047-0048]; paragraph [0055];  paragraph [0061]; paragraph [0073], “In step 512 the receiver terminal receives the message and the indicator, then, in step 514 the terminal checks to see in the indicated dictionary is installed on the terminal ...” ; Examiner’s Note: Levy discloses sending a “dictionary indicator” that maps or matches the feature information (dictionary indicator) of the selected image with corresponding candidate sound information (audio dictionary) in the plurality of pieces of sound information); and 
displaying, in the messaging interface, an audio-enabled message including the selected image and the associated audio information that is transmitted to the other user (e.g., step 518 as shown in Fig. 5; Abstract, “ … replaying the message on the receiver terminal, such that the replaying includes mapping the at least one set of encoded symbols in the message to a corresponding sound effect in the audio dictionary so that when each of the at least one set of encoded symbols is displayed on the receiver terminal, a respective sound effect is sounded on the receiver terminal. “; paragraph [0010]; paragraph  [0017]; paragraph [0027]; paragraph [0030] ).
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the method, system and communication software  of Levy in the audio messaging device of Poosala for the purpose of improving message interpretation as suggested by Levy (paragraph [0002]).
McMahon discloses transmitting the feature information to a server that is configured to store a plurality of pieces of sound information  (paragraph [0022], “In embodiments, where the captured sound includes humming or singing, the method includes the steps of generating fingerprint generated fingerprints are transmitted over a network to a server, which matches the generated fingerprints with a plurality of pre-stored fingerprints/sounds and retrieves one or more matched sounds from the network ….”) and to select associated audio information by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information (paragraph [0022], “  Finally, the retrieved sounds are transmitted back to the mobile device. As a next step, a user of the mobile device selects one of the retrieved sounds and finally, the selected sound is attached to the captured video or still image by the associating module 106. “; paragraph [0025], “ … Accordingly, one or more matched sounds and various versions may be retrieved and can be displayed to the user. Finally, the user can choose one of the versions that can be attached to the captured image or, anticipating the system's ability to identify music, the user could hum a few bars of the Paul Simon song, "At the Zoo," which could be retrieved and added to the associated sound track. “; paragraphs [0028-0029]; Examiner’s Note: McMahon discloses matching sound information (i.e.,  a fingerprint of a sound) with one or more pre-stored sounds stored at a server.).
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the method and system of McMahon in the audio messaging device of Poosala and Levy for satisfying the need for identifying and interacting jointly with visual and audio data as suggested by McMahon (paragraph [0004]).
Per claim 2, Poosala, Levy, and  McMahon disclose the method according to claim 1, wherein the image includes an emoji (Poosala, paragraph [0078], “The messaging system 500 may comprise one or more sticker database servers 532. The one or more sticker database servers 532 may include one or more visual stickers and audio stickers for use by message application components 330. Visual stickers may include graphical images, such as emojis, emoticons, or other images, that are not also associated with an audio file. The audio stickers may include an audio file and a representation of the audio file, for example, an icon or image.”).  
Per claim 3, Poosala, Levy, and  McMahon  disclose the method according to claim 1,  wherein the displaying the audio-enabled message comprises: 
determining a messaging mode of the selected image (Poosala, e.g., user 1302 drags representation 1346 to section 1320 as shown in Fig. 13; paragraph [0139], “Section 1330 may also accept a representation 1346 of an audio sticker that may be dragged or otherwise selected and placed into section 1330 from section 1340… “); and 
based on the messaging mode being an audio-enabled messaging mode,
sending the audio-enabled message including the selected image to another user (Poosala, paragraph [0127]; paragraph [0139], “… The operator 1302 may then perform a control directive on the send button 1332 to send the audio sticker to Carol. Alternatively, the representation 1346 may be dragged or otherwise selected directly into section 1320 for transmission to the recipient Carol. The embodiments are not limited to these examples.  “), and 
displaying, in the messaging interface, the audio-enabled message that includes the selected image based on the messaging mode being the audio-enabled messaging mode (Poosala, paragraph [0129]).  
Per claim 4, Poosala, Levy, and  McMahon  disclose the method according to claim 3, further comprising: 
displaying a messaging mode switch control element (Poosala, e.g., “play” icon 1134, a “next” icon 1136 and a “rewind” icon 1138 as shown in Fig. 11 ) for the selected image based on a user selection of the image(Poosala, paragraph [0129], “In the illustrated example, the operator of the mobile device 1110 has received an audio sticker message. When a message is received by the messaging application, the UI 1100 may display visual indications that a message is received in section 1230. For example, the UI 1100 may display playback UI elements, such as a “play” icon 1134, a “next” icon 1136 and a “rewind” icon 1138…”); and 
setting the messaging mode to one of the audio-enabled messaging mode and audio-disabled messaging mode based on a second user operation performed on the messaging mode switch control element(Poosala, paragraph [0129], “… The play icon 1134, when selected, for example, with a touch gesture from operator 1102, may output the audio portion of the received message. While the audio message is being output, the play icon 1134 may change to a “pause” icon (not shown)... “;  Examiner’s Note: A user can play an audio message and pause an audio message during playback ).  
Per claim 5, Poosala, Levy, and  McMahon  disclose the method according to claim 1, further comprising:  initiating playback of the associated audio information of the selected image in response to a playback operation being performed on the audio-enabled message (Poosala, paragraphs [0117-0119]).  
Per claim 6, Poosala, Levy, and  McMahon  disclose the method according to claim 1, further comprising: 
in response to an audio changing operation being performed on the selected image(Poosala, paragraph [0129], “…The next icon 1136, when selected, may play a next, later-received, audio message in a sequence, when more than one audio message has been received ….   “),
 obtaining replacement audio information by re-selecting candidate audio information from the plurality of candidate audio information by  transmitting an audio changing instruction to the server and receiving replacement audio information from the server, the server being configured to generate the replacement audio information based on other candidate sound information in the plurality of pieces of sound information (McMahon, e.g., step 208 as shown in Fig. 2; paragraph [0022]; Examiner’s Note: The user is transmitted candidate sounds.),
replacing the associated audio information of the selected image with the replacement audio information (Poosala, paragraph [0129];  Examiner’s Note: Poosala discloses replacing the associated audio information of the selected image with the next, later-received, audio message in a sequence. ).  
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the method and system of McMahon in the audio messaging device of Poosala and Levy for satisfying the need for identifying and interacting jointly with visual and audio data as suggested by McMahon (paragraph [0004]).
Per claim 7, Poosala, Levy, and  McMahon   disclose the method according to claim 1,  further comprising:
in response to an audio changing operation being performed on the selected image, displaying at least one of the plurality of candidate audio information (Poosala, paragraph [0129], “…The next icon 1136, when selected, may play a next Poosala,, later-received, audio message in a sequence, when more than one audio message has been received ….   “); 
receiving, from the server, a plurality of candidate sound information in the plurality of pieces of sound information (paragraph [0022], “In embodiments, where the captured sound includes humming or singing, the method includes the steps of generating fingerprint generated fingerprints are transmitted over a network to a server, which matches the generated fingerprints with a plurality of pre-stored fingerprints/sounds and retrieves one or more matched sounds from the network ….”);
displaying at least one of the plurality of candidate [[audio]] sound information(McMahon, paragraph [0025], “ … Accordingly, one or more matched sounds and various versions may be retrieved and can be displayed to the user. Finally, the user can choose one of the versions that can be attached to the captured image or, anticipating the system's ability to identify music, the user could hum a few bars of the Paul Simon song, "At the Zoo," which could be retrieved and added to the associated sound track. “;);
generating replacement audio information for the selected image according to target audio information that is selected from the at least one of the plurality of candidate audio information by the user(Poosala, paragraph [0129], “…The next icon 1136, when selected, may play a next, later-received, audio message in a sequence, when more than one audio message has been received ….   “; Examiner’s Note: Poosala discloses selecting the next, later-received, audio message in a sequence.); and 
replacing the associated audio information of the selected image with the replacement audio information (Poosala, paragraph [0129];  Examiner’s Note: Poosala discloses replacing the associated audio information of the selected image with the next, later-received, audio message in a sequence. ). 
Per claim 8, Poosala, Levy, and  McMahon  disclose the method according to claim 1, wherein the selected image is included in a video, and the audio-enabled message includes the video (Poosala, paragraph [0078]).  
Per claim 9, Poosala discloses a method for obtaining audio information for an audio-enabled message, the method comprising: 
obtaining feature information of an image to be included in the audio-enabled message(paragraph [0091]; Examiner’s Note: Poosala discloses obtaining an identifier of the audio file and a visual representation of the audio file.); but does not expressly disclose:
 transmitting the feature information to a server that is configured to store a plurality of pieces of sound information and to select associated audio information by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information;
receive, from the server, the audio information associated with the image, the audio information being selected by matching the feature information;
generating associated audio information of the image to be included in the audio-enabled message with the image based on the received audio information.

Levy discloses:
receive, from the server, the audio information associated with the image, the audio information being selected by matching the feature information (Abstract, “… receiving the message at the receiver terminal; and replaying the message on the receiver terminal, such that the replaying includes mapping the at least one set of encoded symbols in the message to a corresponding sound effect in the audio dictionary so that when each of the at least one set of encoded symbols is displayed on the receiver terminal, a respective sound effect is sounded on the receiver terminal.“; paragraph [0017]; paragraph [0027]; paragraphs [0047-0048]; paragraph [0055];  paragraph [0061]; paragraph [0073], “In step 512 the receiver terminal receives the message and the indicator, then, in step 514 the terminal checks to see in the indicated dictionary is installed on the terminal ...” ; Examiner’s Note: Levy discloses sending a “dictionary indicator” that maps pr matches the feature information (dictionary indicator) of the selected image with corresponding candidate sound information (audio dictionary) in the plurality of pieces of sound information); and 
generating associated audio information of the image to be included in the audio-enabled message with the image based on the received audio information (Abstract; paragraph [0010]; paragraph [0012]; paragraph [0025], “According to further features composing further includes the step of generating a customized audio dictionary.”; paragraph [0068]; Examiner’s Note; Levy discloses mapping audio a text and image (emoticon, emoji, icon etc.) to create a predefined audio dictionary. ).
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the method, system and communication software  of Levy in the audio messaging device of Poosala for the purpose of improving message interpretation as suggested by Levy (paragraph [0002]).
McMahon discloses transmitting the feature information to a server that is configured to store a plurality of pieces of sound information  (paragraph [0022], “In embodiments, where the captured sound includes humming or singing, the method includes the steps of generating fingerprint generated fingerprints are transmitted over a network to a server, which matches the generated fingerprints with a plurality of pre-stored fingerprints/sounds and retrieves one or more matched sounds from the network ….”) and to select associated audio information by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information (paragraph [0022], “  Finally, the retrieved sounds are transmitted back to the mobile device. As a next step, a user of the mobile device selects one of the retrieved sounds and finally, the selected sound is attached to the captured video or still image by the associating module 106. “; paragraph [0025], “ … Accordingly, one or more matched sounds and various versions may be retrieved and can be displayed to the user. Finally, the user can choose one of the versions that can be attached to the captured image or, anticipating the system's ability to identify music, the user could hum a few bars of the Paul Simon song, "At the Zoo," which could be retrieved and added to the associated sound track. “; paragraphs [0028-0029]; Examiner’s Note: McMahon discloses matching sound information (i.e.,  a fingerprint of a sound) with one or more pre-stored sounds stored at a server.).
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the method and system of McMahon in the audio messaging device of Poosala and Levy for satisfying the need for identifying and interacting jointly with visual and audio data as suggested by McMahon (paragraph [0004]).
Per claim 10, Poosala, Levy, and  McMahon   disclose the method according to claim 9, wherein the image includes an emoji(Poosala, paragraph [0078], “The messaging system 500 may comprise one or more sticker database servers 532. The one or more sticker database servers 532 may include one or more visual stickers and audio stickers for use by message application components 330. Visual stickers may include graphical images, such as emojis, emoticons, or other images, that are not also associated with an audio file. The audio stickers may include an audio file and a representation of the audio file, for example, an icon or image.”).
Per claim 11, Poosala, Levy, and  McMahon  disclose the method according to claim 9, wherein:
the received audio information comprises at least one candidate sound information matching the feature information(Levy, Abstract, “… receiving the message at the receiver terminal; and replaying the message on the receiver terminal, such that the replaying includes mapping the at least one set of encoded symbols in the message to a corresponding sound effect in the audio dictionary so that when each of the at least one set of encoded symbols is displayed on the receiver terminal, a respective sound effect is sounded on the receiver terminal.“; paragraph [0017]; paragraph [0027]; paragraphs [0047-0048]; paragraph [0055];  paragraph [0061]; paragraph [0073], “In step 512 the receiver terminal receives the message and the indicator, then, in step 514 the terminal checks to see in the indicated dictionary is installed on the terminal ...” ; Examiner’s Note: Levy discloses sending a “dictionary indicator” that maps or matches the feature information (dictionary indicator) of the selected image with corresponding candidate sound information (audio dictionary) in the plurality of pieces of sound information); and the method further comprises:
selecting, from the at least one According to further features the composing further includes assigning at least one set of encoded symbols, in the message, to at least one audio dictionary selected from the group comprising: a default dictionary and one or more customized dictionaries.”).
Per claim 17, Poosala, Levy, and McMahon disclose the method according to claim 9,  further comprising: storing [[the]] a plurality of the generating the associated audio information includes obtaining the audio information from the plurality of pieces of previously sent audio information that is determined to be associated with the image according to the feature information(Poosala, e.g., audio sticker database 600 as shown in Fig. 6; paragraph [0091]). 
Per claim 18, Poosala discloses an information processing apparatus(e.g., audio messaging system 100 as shown in Fig. 2; paragraph [0041]), comprising: 
processing circuitry (paragraph [0048]) configured to:
display a messaging interface(e.g., block 1602 as shown in Fig. 16; paragraph [0025], “… The visual representation may be displayed in a messaging application user interface so that a user of the messaging application may send the audio file by selecting the visual representation…   “; paragraph [0123];  paragraph [0151], “In the illustrated embodiment shown in FIG. 16, the logic flow 1600 may be operative at block 1602 to display a message user interface (UI) with a visual representation of an audio file…” ); 
display an image selection interface in response to a first user operation via the messaging interface(e.g., section(s) 1040, 1140, 1340 as shown in Figs. 10, 11, and 13; paragraph [0104], “ The message flow 800 continues when the GUI generator 332 receives a control directive selecting an audio sticker in message 804. For example, the GUI generator 332 may receive a tap touch gesture, a touch-and-hold touch gesture, a touch-and-drag touch gesture, a voice command, or any other control directive selecting a visual representation of an audio sticker. “; paragraph [0125], “Section 1040 may display visual representations of audio stickers, such as representation 1042 .... “), the image selection interface being configured to display at least one image for selection by a user (paragraph [0125], “ …The sender may perform one or more touch gestures using his finger 1002 to select a representation and send it and its corresponding audio sticker to the recipient… “); but does not expressly disclose:
 in response to a transmission operation for a selected image of the at least one image to be transmitted to another user, obtaining feature information of the selected image;
transmitting the feature information to a server that is configured to store a plurality of pieces of sound information and to select associated audio information by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information;
information of the selected image with candidate sound information in the plurality of pieces of sound information;
receive, from the server, associated audio information selected by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information; and
 display, in the messaging interface, an audio-enabled message including the selected image and the associated audio information that is transmitted to the other user.
Levy discloses:
 in response to a transmission operation for a selected image of the at least one image to be transmitted to another user (paragraph [0065], “FIG. 5 illustrates a flow chart 500 detailing the process of sending an audio/visual message using a customized dictionary.”), obtaining feature information of the selected image (e.g., step 502 as shown in Fig. 5; Abstract; paragraph [0066], “In Step 502 the message is” composed on the sender terminal. The author indicates that a customized audio dictionary is used. In the Manual Selection embodiment, the author selects the desired sequences/sets of encoded symbols from the message and relates the sets to the customized dictionary terms, each term with the corresponding audio clip/effect.”; Examiner’s Note: Levy discloses composing a message including at least one encoded symbol selected from the group comprising: an alpha-numeric character, an icon, a smiley, an emoticon, a graphic and an emoji as described in paragraphs [0024], [0040];  Examiner’s Note Levy sends the dictionary information (i.e., a dictionary indicator) used to map symbols to audio to a server.);
receive, from the server, associated audio information selected by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information (Abstract, “… receiving the message at the receiver terminal; and replaying the message on the receiver terminal, such that the replaying includes mapping the at least one set of encoded symbols in the message to a corresponding sound effect in the audio dictionary so that when each of the at least one set of encoded symbols is displayed on the receiver terminal, a respective sound effect is sounded on the receiver terminal.“; paragraph [0017]; paragraph [0027]; paragraphs [0047-0048]; paragraph [0055];  paragraph [0061]; paragraph [0073], “In step 512 the receiver terminal receives the message and the indicator, then, in step 514 the terminal checks to see in the indicated dictionary is installed on the terminal ...” ; Examiner’s Note: Levy discloses sending a “dictionary indicator” that maps or matches the feature information (dictionary indicator) of the selected image with corresponding candidate sound information (audio dictionary) in the plurality of pieces of sound information); and 
display, in the messaging interface, an audio-enabled message including the selected image and the associated audio information that is transmitted to the other user (e.g., step 518 as shown in Fig. 5; Abstract, “ … replaying the message on the receiver terminal, such that the replaying includes mapping the at least one set of encoded symbols in the message to a corresponding sound effect in the audio dictionary so that when each of the at least one set of encoded symbols is displayed on the receiver terminal, a respective sound effect is sounded on the receiver terminal. “; paragraph [0010]; paragraph  [0017]; paragraph [0027]; paragraph [0030] ).
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the method, system and communication software  of Levy in the audio messaging device of Poosala for the purpose of improving message interpretation as suggested by Levy (paragraph [0002]).
McMahon discloses transmitting the feature information to a server that is configured to store a plurality of pieces of sound information  (paragraph [0022], “In embodiments, where the captured sound includes humming or singing, the method includes the steps of generating fingerprint generated fingerprints are transmitted over a network to a server, which matches the generated fingerprints with a plurality of pre-stored fingerprints/sounds and retrieves one or more matched sounds from the network ….”) and to select associated audio information by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information (paragraph [0022], “  Finally, the retrieved sounds are transmitted back to the mobile device. As a next step, a user of the mobile device selects one of the retrieved sounds and finally, the selected sound is attached to the captured video or still image by the associating module 106. “; paragraph [0025], “ … Accordingly, one or more matched sounds and various versions may be retrieved and can be displayed to the user. Finally, the user can choose one of the versions that can be attached to the captured image or, anticipating the system's ability to identify music, the user could hum a few bars of the Paul Simon song, "At the Zoo," which could be retrieved and added to the associated sound track. “; paragraphs [0028-0029]; Examiner’s Note: McMahon discloses matching sound information (i.e.,  a fingerprint of a sound) with one or more pre-stored sounds stored at a server.).
It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the method and system of McMahon in the audio messaging device of Poosala and Levy for satisfying the need for identifying and interacting jointly with visual and audio data as suggested by McMahon (paragraph [0004]).
Per claim 19, Poosala, Levy, and McMahon disclose a non-transitory computer-readable storage medium, storing instructions which when executed by a processor cause the processor to implement the method according to claim 1 (Poosala paragraph [0168]; Claim 19 is rejected under the same rationale given for claim 1). 
Per claim 20, Poosala, Levy, and McMahon disclose a non-transitory computer-readable storage medium, storing instructions which when executed by a processor cause the processor to implement the method according to claim 9 (Poosala, paragraph [0168]; Claim 20 is rejected under the same rationale given for claim 9).
Claims 12 and 13 are  rejected under 35 U.S.C. 103 as being unpatentable over Poosala et al. (Hereinafter, Poosala, US 2016/0277903 A1) in view of Levy (US 2015/0255057 A1), McMahon et al. (Hereinafter, McMahon, US 2014/0078331 A1), and further in view of Huang et al. (Hereinafter, Huang, US 2020/0257922 A1).
Per claim 12, Poosala, Levy, and  McMahon  disclose the  method according to claim 9, but do not expressly disclose wherein the obtaining the feature information comprises: performing text extraction on text information in the image to obtain text feature information of the image, the feature information including the text feature information. 
Huang discloses wherein the obtaining the feature information comprises: performing text extraction on text information in the image to obtain text feature information of the image, the feature information including the text feature information (e.g., step S220 as shown in  Fig. 2a; Abstract; paragraph [0051], “S220: extracting features of a plurality of objects in the image, and extracting a feature of the text.”). 
0It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the image-based data processing of Huang in the audio messaging device of Poosala, Levy, and  McMahon  for the purpose of accurately learning an association relationship between text and each object in an image and improving processing accuracy as suggested by Huang (Abstract).
2)).Per claim 13, Poosala, Levy, and  McMahon disclose the  method according to claim 9, but do not expressly disclose wherein the obtaining the feature information comprises: performing feature extraction on at least one of the image, an associated message of the image, or an associated messaging scenario of the image to obtain scenario feature information of the image, the feature information including the scenario feature information.  
Huang discloses wherein the obtaining the feature information comprises: performing feature extraction on at least one of the image, an associated message of the image, or an associated messaging scenario of the image to obtain scenario feature information of the image, the feature information including the scenario feature information (e.g., step S220 as shown in  Fig. 2a; Abstract; paragraph [0009];   paragraph [0022]; paragraph [0051], “S220: extracting features of a plurality of objects in the image, and extracting a feature of the text.”). 
0It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the image-based data processing of Huang in the audio messaging device of Poosala, Levy, and  McMahon for the purpose of accurately learning an association relationship between text and each object in an image and improving processing accuracy as suggested by Huang (Abstract).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Poosala et al. (Hereinafter, Poosala, US 2016/0277903 A1) in view of Levy (US 2015/0255057 A1), McMahon et al. (Hereinafter, McMahon, US 2014/0078331 A1), and further in view of Breedvelt-Schouten et al. (Hereinafter, Breedvelt-Schouten, US 2020/0285668 A1 ).
Per claim 14, Poosala, Levy, and  McMahon disclose the  method according to claim 9, but do not expressly disclose wherein the obtaining the feature information comprises: performing feature extraction on at least one of the image or the associated message of the image to obtain emotion feature information of the image, the feature information including the emotion feature information.  
Breedvelt-Schouten discloses wherein the obtaining the feature information comprises: performing feature extraction on at least one of the image or the associated message of the image to obtain emotion feature information of the image, the feature information including the emotion feature information(Abstract; paragraph [0001], “The claimed subject matter relates generally to stored images and, more specifically, to the inclusion of metadata corresponding to an emotional state in a stored image or images.”; paragraphs [0005-0007]; paragraph [0033]; paragraph [0041]).  
0It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the emotional experience metadata of  Breedvelt-Schouten in the audio messaging device of Poosala, Levy, and  McMahon for the purpose of  enhancing the future utility of a stored picture or video  as suggested by Breedvelt-Schouten (paragraph [0001]).
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Poosala et al. (Hereinafter, Poosala, US 2016/0277903 A1) in view of Levy (US 2015/0255057 A1), McMahon et al. (Hereinafter, McMahon, US 2014/0078331 A1), and further in view of Boyle et al. (Hereinafter, Boyle, US 2008/0114601 A1).
Per claim 15, Poosala, Levy, and  McMahon disclose the  method according to claim 9, but do not expressly disclose wherein the generating the associated audio information comprises: 
obtaining text information included in the image; 
extracting an audio clip corresponding to the text information from the audio information; and 
generating the associated audio information of the image based on the audio clip.  
Boyle discloses wherein the generating the associated audio information comprises: 
obtaining text information included in the image(Abstract, “…interpreting an image and producing a word description of the image including at least one image keyword “; paragraph [0008]; paragraphs [0042-0043]); 
extracting an audio clip corresponding to the text information from the audio information (paragraph [0008]; paragraph [0015]; paragraph [0022]); and 
generating the associated audio information of the image based on the audio clip (paragraph [0008], “ … selecting the audio clip transcription having a shortest similarity distance to the at least one image keyword as a location to insert the word description of the image.  “).  
0It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the systems and methods of  Boyle in the audio messaging device of Poosala, Levy, and  McMahon for the purpose of  improving the dissemination of  information as suggested by Boyle (paragraph [0005]).
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Poosala et al. (Hereinafter, Poosala, US 2016/0277903 A1) in view of Levy (US 2015/0255057 A1), McMahon et al. (Hereinafter, McMahon, US 2014/0078331 A1), Boyle et al. (Hereinafter, Boyle, US 2008/0114601 A1), and further in view of Melenboim (US 2017/0118501 A1).
Per claim 16, Poosala, Levy, ,  and Boyle disclose the  method according to claim 15, but do not expressly wherein the generating the associated audio information includes adjusting a playback duration of the audio clip based on a playback duration of a video that includes the image to obtain the associated audio information of the image, and a playback duration of the associated audio information of the first image is equal to the playback duration of the video that includes the image.  
Melenboim discloses wherein the generating the associated audio information includes adjusting a playback duration of the audio clip based on a playback duration of a video that includes the image to obtain the associated audio information of the image(e.g., step  S220; paragraph [0012]; paragraph [0017]), and a playback duration of the associated audio information of the first image is equal to the playback duration of the video that includes the image(Abstract; paragraph [0013]; paragraph [0020]).  
0It would have been obvious for a person of ordinary skill in the art before the effective filing date of the claimed invention to use the system and methods of  Melenboim in the audio messaging device of Poosala, Levy, McMahon, and Boyle for  providing a unitary video clip format that can be displayed on mobile browsers as suggested by Melenboim (paragraph [0005]).
Response to Arguments
Applicant's arguments filed11 March 2026 have been fully considered but they are not persuasive. 
Substance of Interview
Examiner acknowledges Applicant’s representative remarks regarding the interview on January 14, 2026.
Applicant argues that Poosala and Levy fail to disclose "transmitting the feature information to a server that is configured to store a plurality of pieces of sound information and to select associated audio information by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information," as discussed during the interview.
Examiner disagrees since Poosala and Levy were not relied upon to disclose "transmitting the feature information to a server that is configured to store a plurality of pieces of sound information and to select associated audio information by matching the feature information of the selected image with candidate sound information in the plurality of pieces of sound information,"  Examiner applied McMahon in this new rejection to address the alleged limitations of Poosala and Levy. Accordingly, it is respectfully submitted that Claim 1 (and all associated dependent claims) are not patentable over Poosala and Levy in view of McMahon.
Furthermore, independent Claims 9 and 18, although differing in scope and/or statutory class, are not patentable over Poosala and Levy in view of McMahon at least for reasons analogous to the reasons stated above for the patentability of Claim 1. Accordingly, it is respectfully submitted that Claims 9 and 18 (and all associated dependent claims) are not patentable over Poosala and Levy in view of McMahon. 
Regarding the rejections of Claims 12-16 under 35 U.S.C. § 103, it is respectfully submitted that Claims 12-16 are not patentably over Poosala and Levy in view of McMahon at least for the reasons stated above for the patentability of Claims 1 and 9, from which Claims 12-16 depend. Further, the Office Action does not cite Huang, Breedvelt-Schouten, Boyle, and Melenboim for such teachings. Accordingly, it is respectfully submitted that Claims 12-16 are not patentable over Poosala, Levy, Huang, Breedvelt-Schouten, Boyle, and Melenboim in view of McMahon.
In summary, Examiner maintains the rejection of claims 1-20.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARRIN HOPE whose telephone number is (571)270-5079. The examiner can normally be reached Mon-Thr - 6:45-4:15, Fri - 6:45-3:15, Alt. Fri Off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen S Hong can be reached at (571)272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

DARRIN HOPE
Examiner
Art Unit 2178



/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178
Read full office action
Prosecution Timeline

May 24, 2023
Application Filed
Dec 12, 2024
Non-Final Rejection — §103
Jan 23, 2025
Interview Requested
Jan 30, 2025
Examiner Interview Summary
Jan 30, 2025
Applicant Interview (Telephonic)
Mar 18, 2025
Response Filed
Jun 28, 2025
Final Rejection — §103
Aug 16, 2025
Interview Requested
Aug 22, 2025
Applicant Interview (Telephonic)
Aug 22, 2025
Examiner Interview Summary
Sep 02, 2025
Response after Non-Final Action
Oct 01, 2025
Request for Continued Examination
Oct 09, 2025
Response after Non-Final Action
Dec 09, 2025
Non-Final Rejection — §103
Dec 29, 2025
Interview Requested
Jan 14, 2026
Applicant Interview (Telephonic)
Mar 11, 2026
Response Filed
Apr 03, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/009,830
Patent 12582498
PROCESSING OF VIDEO STREAMS RELATED TO SURGICAL OPERATIONS
2y 5m to grant Granted Mar 24, 2026
18/535,820
Patent 12578757
CONTINUITY OF APPLICATIONS ACROSS DEVICES
2y 5m to grant Granted Mar 17, 2026
18/234,831
Patent 12547431
DATA STORAGE AND RETRIEVAL SYSTEM FOR SUBDIVIDING UNSTRUCTURED PLATFORM-AGNOSTIC USER INPUT INTO PLATFORM-SPECIFIC DATA OBJECTS AND DATA ENTITIES
2y 5m to grant Granted Feb 10, 2026
18/382,248
Patent 12547300
USER INTERFACES RELATED TO TIME
2y 5m to grant Granted Feb 10, 2026
18/364,333
Patent 12541563
INSTRUMENTATION OF SOFT NAVIGATION ELEMENTS OF WEB PAGE APPLICATIONS
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
60%
Grant Probability
79%
With Interview (+19.3%)
4y 2m
Median Time to Grant
High
PTA Risk
Based on 449 resolved cases by this examiner. Grant probability derived from career allow rate.
AUDIO-ENABLED MESSAGING OF AN IMAGE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email