Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . See 35 U.S.C. § 100 (note).
Art Rejections
Obviousness
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1–4 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of US Patent 11,798,542 (filed 26 March 2021) (“Ryabov”); US Patent Application Publication 2024/0185481 (filed 04 December 2023) (“Lindholm”) and US Patent Application Publication 2019/0188227 (published 20 June 2019) (“Fang”).
Claims 5–8 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Ryabov; Lindholm; Fang; US Patent 10,062,367 (patented 28 August 2018) (“Evans”) and US Patent Application Publication 2005/0109195 (published 26 May 2005) (“Haruyama”).
Claims 9 and 15 are rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Ryabov; Lindholm; Fang and Evans.
Claim 10 is rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Ryabov; Lindholm; Fang; Evans; Haruyama and US Patent Application Publication 2024/0404496 (filed 15 September 2023) (“O’Neil”).
Claim 13 is rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Ryabov; Lindholm; Fang; Evans; Haruyama; O’Neil and US Patent 5,555,368 (patented 10 September 1996) (“Orton”).
Claim 14 is rejected under 35 U.S.C. § 103 as being unpatentable over the combination of Ryabov; Lindholm; Fang; Evans; Haruyama and US Patent Application Publication 2024/0021174 (filed 07 April 2023) (“Lee”).
Claim 1 is drawn to “a display method.” The following table illustrates the correspondence between the Ryabov reference and the claimed method.
Claim 1
The Ryabov Reference
“1. A display method for a multimedia device, comprising:
The Ryabov reference describes a method for displaying multimedia content on a client device 102 through voice controls integrated into applications. Ryabov at col. 7 ll. 5–18, col. 8 ll. 25–42, FIGs.1, 7.
“receiving an operation instruction for the multimedia device,
Ryabov’s method includes receiving a spoken voice command, or operation instruction, at client device 102. Id. at col. 11 ll. 23–64, col. 17 l. 42 to col. 18 l. 59, FIG.3. The command relates to a desired operation of a third-party application 104 hosted on device 102. Id. For example, the command may be a spoken command to playback a particular song. Id.
“determining whether the received operation instruction matches a preset behavior library,
A voice support server 108 will receive the voice command on behalf of a third-party application 104 that does not natively support voice commands. Id. at col. 7 l. 19 to col. 8 l. 24. Voice support server 108 will derive the intent of the voice command based on pattern matching. Id. at col. 11 ll. 23–64, FIG.3. For example, a command to play a song might follow the spoken pattern, “play $song,” where $song refers to a particular song title. Id.
After mapping a command to an utterance vector, a set of domains 114, each corresponding to a third-party application, compares the utterance vector to a set of known intent vectors specified for a corresponding third-party applications 104. Id. at col. 9 ll. 19–40, col. 17 ll. 18–41, col. 18 l. 47 to col. 19 l. 36, FIG.3. The intent vectors of each domain collectively represent a preset behavior library, as claimed. See id.
“invoking preset response information associated with an audio file if the received operation instruction matches the preset behavior library, and
If the utterance vector matches the known intent vectors, voice support server 108 causes third-party application 104 to perform a preset response according to instructions, or preset response information, delivered from voice support server 108 to application 104. Id. at col. 19 ll. 28–36, FIG.3. For example, third-party application will perform a preset sequence of requesting a song from server 106, loading the song on device 102 and playing the song through client device 102. Id. at col. 7 ll. 47–56, col. 11 ll. 23–51, FIGs.1, 3.
“resuming receiving the operation instruction or displaying an alarm prompt if the received operation instruction does not match the preset behavior library; and
If the utterance vector does not match a known intent vector pattern, voice support server 108 will alert the user with an alarm prompt. Id. at col. 21 ll. 14–29.
“loading the audio file based on the preset response information,
Assuming a requested song is to be played back, the audio file will be loaded according to the instructions, or preset response information, delivered to third-party application 104 by voice support server 108. Id. at col. 7 ll. 47–56, col. 11 ll. 23–51, col. 19 ll. 28–36, FIGs.1, 3. For example, third-party application 104 may request and retrieve a song from database 106 so that client device 102 may load and play the song. Id. at col. 7 ll. 47–56, col. 11 ll. 23–51, FIG.1.
“invoking preloaded special effects in the preset response information if the audio file has been loaded, and
The Ryabov reference does not describe invoking preloaded special effects along with an audio file.
“displaying a standby screen if the audio file failed to be loaded.”
The Ryabov reference also does not describe displaying a standby screen if an audio file fails to load.
Table 1
The table above shows that the Ryabov reference describes a method that corresponds closely to the claimed method. Ryabov does not anticipate the claim because it does not invoke preloaded special effects in preset response information after loading an audio file and does not display a standby screen if the audio file fails to be loaded.
Special Effects
The differences between the claimed invention and the Ryabov reference are such that the invention as a whole would have been obvious to one of ordinary skill in the art at the time this Application was effectively filed.
Ryabov’s method is performed by a system that includes playing requested songs at a client device 102. Ryabov at col. 7 ll. 5–18, col. 11 l. 23 to col. 13 l. 13, FIGs.1, 3. The Ryabov reference focuses primarily on providing a voice interface to third-party applications that otherwise do not natively support voice commands. Ryabov at Abs. Ryabov does address the playback of audio at client device 102.
The Lindholm reference is related to Ryabov because it describes methods and systems for playing audio at a client device, and further describes the playing of audio along with synchronized lyrics and special effects applied to the lyrics. Lindholm at Abs., ¶¶ 2, 4, 187–194, FIGs.6A, 6B. According to Lindholm, the presentation of timed lyrics with synchronized visual effects increases the efficiency of using a music reproduction device since a user can listen to music and follow along with the lyrics at the same time. Id. In an example, a song is added to a playlist and the song is played while simultaneously displaying corresponding lyrics that are highlighted with visual effects synchronized to the timing of the lyrics in the song. Id. at ¶¶ 191, 192, FIGs.6A, 6B.
Read in light of Ryabov, Lindholm’s teachings would have reasonably suggested modifying Ryabov’s method and system to also display time synchronized lyrics and effects along with the audio of a song. For example, one of ordinary skill would have modified Ryabov’s third-party application 104 to not only load and playback a requested song, but to also load and playback lyrics and visual effects applied to the lyrics.
Standby Screen
Ryabov’s method is performed by a system that includes a third-party application 104 and a third-party application server 106. When commanded by a user, third-party application will attempt to obtain and provide a song to a user from server 106. The Ryabov reference treats application 104 and server 106 as known entities and does not address their detailed operation. Accordingly, Ryabov does not describe the claimed display of a standby screen in the event the requested song fails to load.
The Fang reference relates to Ryabov because both are drawn to problems involved in searching for audio content. Ryabov at col. 7 ll. 19–34, col. 11 ll. 14–34; Fang at Abs., ¶ 75. In particular, Fang teaches and suggests that there may be a delay in executing the search and that the search process should display information concerning the search. See Fang at ¶ 75, FIGs.3, 4 (describing and depicting search screens). The information acts as feedback to the user, increasing the user’s perception of how the system is processing the user’s search. Id. This suggests that, during long searches that may be deemed by a user as a failure to timely load a song, displaying a standby screen with information pertaining to the search in order to provide the user with feedback that enhances the user’s awareness and interaction with the system. For the foregoing reasons, the combination of the Ryabov, the Lindholm and the Fang references makes obvious all limitations of the claim.
Claim 2 depends on claim 1, and further requires the following:
“wherein the preset behavior library comprises a plurality of preset operation instructions individually loaded with preset operation categories and preset operation objects; and
“the operation instruction for the multimedia device is loaded with a target operation category and a target operation object.”
The Ryabov reference describes a plurality of domains 114 that form a library having a plurality of preset operation instructions (e.g., one or more supported commands for each corresponding third-party application) that each have a preset operation category and a preset operation object. Ryabov at col. 9 ll. 19–40, col. 17 ll. 18–41, col. 19 l. 36 to col. 19 l. 36, FIG.3. For example, for an audio playing third-party application, a corresponding domain includes an operation instruction, like “play” and a operation object, like “$song.” Id. Thus, when a user commands playback of a particular song, voice support server 108 loads an instruction with the target operation category of “play” and a target operation object of an identifier for a song. Id. For the foregoing reasons, the combination of the Ryabov, the Lindholm and the Fang references makes obvious all limitations of the claim.
Claim 3 depends on claim 2, and further requires the following:
“wherein the matching comprises:
“receiving a target operation instruction for the audio file, and
“acquiring a target operation category and a target operation object;
“determining whether the target operation object matches one of the preset operation objects after determined that the target operation category matches the corresponding preset operation category;
“invoking the preset response information associated with the audio file if the target operation object matches a preset operation object; and
“requesting to re-receive the operation instruction or display an alarm prompt if the target operation object matches none of the preset operation objects.”
Ryabov describes voice support server 108 as receiving an audio signal 308 and transcribing it in block 310 to receive a text version of the audio. Ryabov at col. 18 l. 38 to col. 19 l. 2, FIG. 3. The transcription is vectorized 312 into an utterance vector comprising a target operation (e.g., play) and a target operation object (e.g., $song). Id. Ryabov determines if the utterance vector matches a known intent vector. Id. at col. 19 ll. 3–27, FIG.3. This process involves matching the utterance vector’s target operation to a known operation and matching a target operation object to a known object. Id. For example, Ryabov describes an expansion step that includes contacting third-party server 106 to determine if an object is maintained by the server. Id. at col. 12 ll. 3–40, FIGs.3, 4. If the utterance vector matches an intent vector, server 108 will generates instructions for third-party application 104 to carry out the voice command. Id. at col. 19 ll. 28–36, FIG.3. Otherwise, server 108 will cause third-party application 104 to display an error. Id. at col. 21 ll. 14–29.
Ryabov describes converting an utterance vector into a more general semantic vector for comparison with a corpus of intent vectors. Ryabov at col. 13 l. 14 to col. 14 l. 6, col. 20 l. 36 to col. 21 l. 57, FIG.5. For example, an utterance vector may be mapped to a first utterance vector and then mapped to more specific, second utterance vector. Id. A first set of intent vectors are identified using the first utterance vector and then a closest intent vector is identified from the first set of vectors using the second utterance vector. Id. Thus, a more specific comparison of an utterance vector occurs only after a more general comparison with a less precise utterance vector. Id. This reasonably suggests performing the first, general comparison with a non-specific utterance vector that only expresses the command without also expressing the object (e.g., song title) in order to reduce the number of vector comparisons that need to be made. For the foregoing reasons, the combination of the Ryabov, the Lindholm and the Fang references makes obvious all limitations of the claim.
Claim 4 depends on claim 2, and further requires the following:
“wherein the matching comprises:
“requesting to re-receive the operation instruction or displaying an alarm prompt if the target operation category matches none of the preset operation categories.”
Ryabov describes that if the utterance vector matches an intent vector, server 108 will generates instructions for third-party application 104 to carry out the voice command. Ryabov at col. 19 ll. 28–36, FIG.3. Otherwise, server 108 will cause third-party application 104 to display an error. Id. at col. 21 ll. 14–29. For the foregoing reasons, the combination of the Ryabov, the Lindholm and the Fang references makes obvious all limitations of the claim.
Claim 5 depends on claim 1, and further requires the following:
“wherein the preset response information is generated by:
“extracting lyrics data and melody data of the audio file from metadata,
“wherein the lyrics data comprises characters, the number of the characters and character time characteristics, and the melody data comprises a melody spectrum and a melody type;
“binding the lyrics data and the melody data to a lyrics special effects file that matches the lyrics data and the melody data, to form first special effects;
“binding the melody data to background special effects according to the melody type, to form second special effects; and
“binding the first special effects and the second special effects to generate the preset response information.”
The obviousness rejection of claim 1, incorporated herein, shows the obviousness of modifying Ryabov’s method and system to display synchronized lyrics and special effects according to the teachings of Lindholm. Lindholm teaches and suggests extracting lyrics and melody data from audio file metadata. For example, Lindholm’s method accesses metadata about a song, including lyric characters, numbers of characters on a line, the length each character is sung and melody spectrum (e.g., volume, timbre, pitch) and melody type (e.g., valence information, lead singer or background singer). Lindholm at ¶¶ 31, 338, 340, 344, FIG.8Q. Lindholm further teaches creating first special effects by binding lyrics with melody data. For example, a first effect includes displaying lyrics from a lead singer in one color while displaying lyrics from a backup singer in another color, displaying longer words with larger fonts and producing background effects, like explosions with longer words. Lindholm further describes secondary background effects like stars that change color or shape corresponding to the audio characteristics of a song.
Lindholm does not describe the use of a file in binding first special effects. The Evans reference describes a method for applying vocal effects to a song and creating a vocal effects timeline that binds the effects to the audio of a song in a synchronized way. Evans at col. 8 l. 4 to col. 10 l. 30, FIGs.3–5. Evans further teaches saving the VET in a file for use in the future as a karaoke file. Id. at col. 5 ll. 18–29.
While Evans describes using the file to create a synchronized arrangement of audio effects, one of ordinary skill would have reasonably recognized that the file would be useful for organizing and synchronizing any type of effect, including visual effects, with an underlying audio file. For example, Evans also suggests displaying synchronized lyrics. Id. at col. 8 l. 4 to col. 10 l. 30, FIGs.3–5.
Based on these teachings, one of ordinary skill would have programmed Ryabov’s third-party application 106 to create a file in memory that stores lyrics and lyric effects while playing a song so that the lyrics and effects would be reproduced in sync with the audio of a song. One of ordinary skill would have further recognized that binding effects to a file would allow the effects to be reproduced in the future with a reduced amount of computation since the effects would not have to be recreated.
Lindholm also does not describe determining secondary background effects based on melody type. The Haruyama reference, however, teaches and suggests selected and displaying a background image based on a song’s musical genre. Haruyama at ¶ 94. This would have reasonably suggested modifying Ryabov’s method and system to similarly display a background image based on a song’s musical genre, or melody type. For the foregoing reasons, the combination of the Ryabov, the Lindholm, the Fang, the Evans and the Haruyama references makes obvious all limitations of the claim.
Claim 6 depends on claim 5, and further requires the following:
“wherein the first special effects are formed by:
“acquiring of duration of a single character of the audio file,
“performing interval slicing on the duration,
“matching up lyric special effects with the lyrics data according to the slicing result, and
“binding the lyric special effects with the lyrics data to obtain the first special effects.”
Lindholm teaches and suggests determining the duration of each lyrics character, slicing its duration into intervals and syncing (i.e., matching up) a lyric special effect with the intervals. Lindholm at ¶¶ 189–194, FIGs.6A, 6B. For example, Lindholm displays characters corresponding to syllables of sung lyrics. Id. The characters are displayed with a gradient color that dynamically moves as the corresponding syllable is sung. Id. Each tick of the gradient represents a sliced interval over the entire duration of the character. Id. Thus, Lindholm, matches a gradient effect to characters of a set of lyrics so the timing of the effect is aligned with the start of each syllable in the lyrics and extends in duration for a number of sliced intervals corresponding to the duration of the lyrics in the audio.
As expressed in the rejection of claim 5, incorporated herein, the Evans reference teaches and suggests binding lyric special effects into a file for retrieval and usage at a later time. For the foregoing reasons, the combination of the Ryabov, the Lindholm, the Fang, the Evans and the Haruyama references makes obvious all limitations of the claim.
Claim 7 depends on claim 6, and further requires the following:
“wherein the matching up lyric special effects with the lyrics data comprises:
“anchoring the audio file based on the slicing result and rhythm recognition on the melody spectrum,
“wherein the audio file is provided with a plurality of anchor tags at its corresponding melody position and/or character position; and
“matching up the lyric special effects with the lyrics data based on a positional relationship of the anchor tags.
Lindholm teaches synchronizing, or matching, lyric characters with corresponding audio in a song. Lindholm at ¶ 193. In particular, Lindholm teaches matching each character of a set of lyrics to a corresponding portion of audio so the character is displayed at the same time as a corresponding portion of audio. Lindholm at ¶¶ 189–194, FIGs.6A, 6B. Lindholm further teaches graphically altering the characters over a set of time slices corresponding to the character—for example, a color gradient is applied and then gradually swept across the character as the corresponding audio portion is reproduced. Id. Lindholm teaches that the display and gradient is based on the duration of a corresponding syllable in the underlying audio. Id. This knowledge fundamentally requires tagging start and end points in the audio that correspond to characters expressing the syllables of the sung audio. See id. For example, if a song has the lyrics “twinkle, twinkle little star,” Lindholm requires analyzing the audio to tag the start and end points of the syllables in the word “twinkle” (i.e., twi-n-kle) to determine the duration of the word’s syllables. See id. Then a gradient effect will be matched to the characters of each syllable at the same time the syllables are being sung in the audio and the gradient will progress over the characters for the same duration that the syllables are sung. See id. For the foregoing reasons, the combination of the Ryabov, the Lindholm, the Fang, the Evans and the Haruyama references makes obvious all limitations of the claim.
Claim 8 depends on claim 7, and further requires the following:
“wherein the matching up the lyric special effects with the lyrics data based on a positional relationship of the anchor tags comprises:
“matching up special effects with an audio file interval between two adjacent anchor tags according to density of the plurality of anchor tags and the slicing result,
“wherein the audio file matches a plurality of special effects on its timeline.”
Lindholm teaches and suggests matching up lyric special effects, such as size, gradient and background effects (e.g., an explosion) to correspond to start and end points, or anchor tags, in corresponding audio. Lindholm at ¶¶ 189–194, FIGs.6A, 6B. This process produces a plurality of special effects on a timeline since it applies effects for all syllables of a song’s lyrics. One effect is a gradient effect that is applied to characters forming a syllable. Id. The gradient is aligned with the start and end of audio corresponding to the syllable so the gradient is first applied when the syllable is first sung and then progresses over the syllable’s characters as the syllable continues to be sung. Id. The gradient effect effectively has a duration corresponding to the duration of syllable in the audio. Id. Further, Lindholm applies additional effects, like fireworks, based on the duration of a syllable exceeding a threshold. Id. at ¶ 206, FIG.6I. Checking a syllable’s duration against a threshold produces a density estimate (i.e., density of no more than one syllable per unit time). See id. For the foregoing reasons, the combination of the Ryabov, the Lindholm, the Fang, the Evans and the Haruyama references makes obvious all limitations of the claim.
Claim 9 depends on claim 1, and further requires the following:
“wherein the invoking preloaded special effects in the preset response information comprises:
“adding the special effects in the preset response information to a special effects sequence, and adding the audio file to an audio sequence;
“determining whether the special effects sequence has been aligned with the audio sequence;
“regenerating the preset response information if determined that the special effects sequence has not aligned with the audio sequence; and
“simultaneously invoking the preset response information and the audio file of the same sequence according to the correspondence between the special effects sequence and the audio sequence, and
“adding the preset response information and the audio file of the same sequence to a playing queue.”
The Evans reference discussed in the rejection of claim 5, incorporated herein, teaches and suggests modifying Ryabov’s system to include an effect editing function. The effect editing function includes loading an effect file and an audio file to add the effects to an effects sequence and to add the audio to an audio sequence. Evans at col. 5 ll. 18–29, col. 8 l. 4 to col. 10 l. 30, FIGs.3–5. A user interface displays the effect and audio sequences on a display. Id. A user uses the interface to edit the effects as desired. Id.
While Evans is drawn to adding vocal effects, one of ordinary skill would have reasonably recognized from Lindholm that a user would also want to add visual effects to a song. For example, Evans also suggests displaying synchronized lyrics. Id. at col. 8 l. 4 to col. 10 l. 30, FIGs.3–5. Accordingly, it would have been obvious to modify Ryabov’s system to include an effect editing function like the one taught by Evans and to modify Evans’s effect editing function to further support customization of visual effects.
In the context of Lindholm’s synchronized effects, one of ordinary skill would understand that a manual user review would naturally include determining if the effect sequence is properly aligned with the corresponding audio sequence. For example, a user would determine if gradient effects for a particular syllable are set to be displayed in sync with the corresponding audio portions. If the effects are not in sync, a user would adjust the start and end times of the effect to better align the effects with the audio. Then a user would load the audio file and the visual effects file into a playlist for simultaneously play back. See Lindholm at ¶ 191. For the foregoing reasons, the combination of the Ryabov, the Lindholm, the Fang and the Evans references makes obvious all limitations of the claim.
Claim 10 depends on claim 9, and further requires the following:
“wherein displaying comprises:
“after the audio file is loaded according to one or more pieces of preset response information, saving the one or more pieces of preset response information to the playing queue,
“determining whether to employ fast special effects,
“determining a queue length based on special effects relationship between the audio file and a next audio file in the playing queue, and
“determining connection special effects; and
“binding, the target lyrics to be changed to corresponding special effects for display and playing the audio file, based on the order of the playing queue.”
Lindholm teaches loading an audio file into a playlist. See Lindholm at ¶ 191. Combined with the teachings of Evans on effects files, this operation suggests also loading an effects file into a playlist for simultaneous reproduction with the audio file. See Evans at col. 5 ll. 18–46.
Lindholm similarly teaches and suggests applying fast special effects, like a firework exploding, if the duration of a single character exceeds a threshold. Lindholm at ¶¶ 206, 250, FIG.6I.
Lindholm teaches and suggests binding special effects to target lyrics for display. For example, Lindholm binds gradient effects to target lyrics. Id. at ¶¶ 187–194, FIGs.6A, 6B.
Lindholm teaches and suggests playing an audio file in the order of a playlist (i.e., playing queue), along with its bound special effects. Id. at ¶¶ 191, 285.
The Ryabov, Lindholm, Fang and Evans references do not describe, teach or suggest determining a queue length and connection special effects. However, the O’Neil reference teaches the idea of adding transition, or connection, effects between audio valences at any level of granularity. O’Neil at ¶¶ 31, 37, FIG.3. This would have reasonably suggested adding connection effects between songs, or parts of songs, in a playlist, which is just another type of valence (i.e., a valence of a song). For the foregoing reasons, the combination of the Ryabov, the Lindholm, the Fang, the Evans and the O’Neil references makes obvious all limitations of the claim.
Claim 13 depends on claim 9, and further requires the following:
“wherein preset response information in non-first order in the playing queue is processed with asynchronous threads; and
“the connection special effects are inserted into an audio file that has not been completely played for transition after the operation instruction is received,
“wherein the connection special effects start at a breakpoint of a previous song and end at a first anchor tag of a next song.”
The Ryabov, Lindholm, Fang and Evans references do not describe, teach or suggest determining a queue length and connection special effects. However, the O’Neil reference teaches the idea of adding transition, or connection, effects between audio valences at any level of granularity. O’Neil at ¶¶ 31, 37, FIG.3. This would have reasonably suggested adding similar connection effects between songs, or parts of songs, in a playlist, which is just another type of valence (i.e., a valence of a song).
Further, the Orton reference teaches and suggests the use of plural asynchronous threads to manage the display of multiple visual effects on a screen at one time. Orton at col. 4 l. 53 to col. 5 l. 30, This teaching would have reasonably suggested using asynchronous threads in Ryabov’s method and system to coordinate the display of multiple lyrical effects, background animations, and videos. And in connection with O’Neil, it would have suggested using asynchronous threads for displaying transition effects between songs or parts of songs while additional effects are being displayed by other asynchronous threads. For the foregoing reasons, the combination of the Ryabov, the Lindholm, the Fang, the Evans, the O’Neil and the Orton references makes obvious all limitations of the claim.
Claim 14 depends on claim 9, and further requires the following:
“wherein the binding the target lyrics to be changed to corresponding special effects for display and playing the audio file comprises:
“receiving an anchor tag of the audio file;
“loading the audio file and performing signal feedback and recording a first receiving time and a first sending time of the feedback after the anchor tag of the audio file is received,
“reading a next anchor tag of the audio file along the timeline and recording a second receiving time and a second feedback time after the audio file is loaded; and
“matching up endpoints of the second receiving time and the second feedback time with endpoints of a duration of the special effects, and determining timing of playing of next special effects data.”
The Lee reference teaches and suggests a main controller that outputs audio and video. Lee at ¶¶ 62–94, FIGs.2, 3. Each time the controller outputs audio and video it also synchronously outputs a signal to a feedback circuit 280. Id. Feedback circuit 280 receives the signals and logically compares (XOR) them to highlight any timing differences between them. Id. Main controller 211 then calculates a delay AVD1 between the audio and video based on the time the first or second signal was raised (T1, T3) and the time the first or second feedback was received (T2, T4). Id. Main controller 211 then adjusts the timing of audio, video or both audio and video outputs to reduce AVD1, AVD2 below a threshold duration. Id.
The teachings of Lee would have reasonably suggested modifying Ryabov’s method and system to also maintain audio/video sync with a similar feedback mechanism. Applied to Ryabov, a song would be output for reproduction at least at every anchor point. This would cause a series of signals to be output (T1, T3) followed by the reception of a series of feedback signals (T2, T4). Ryabov’s method and system would then calculate delays AVD1, AVD2 and adjust the timing of special effects or audio in order to reduce the delays below a threshold duration. For the foregoing reasons, the combination of the Ryabov, the Lindholm, the Fang, the Evans and the Lee references makes obvious all limitations of the claim.
Claim 15 depends on claim 1, and further requires the following:
“A display system, configured to perform the display method according to claim 1, and
“the display system comprising:
“a multimedia device configured to receive user operation and display an audio file and a video;
“an interactive module configured to be linked to the user operation or a signal;
“a storage module configured to store a preset behavior library, a music library, and a special effects library;
“a matching module configured to perform special effects matching and playing matching; and
“a control module configured to operate display elements of the multimedia device during playing to dynamically display images.”
The Ryabov reference describes a system 100 that corresponds to the claimed display system. Ryabov at col. 6 ll. 42–49, FIG.1. Ryabov’s system includes a client device 102 corresponding to the claimed multimedia device. Id. at col. 7 ll. 5–18, FIG.1.
Client device 102 includes a user interface, speakers and a screen to display audiovisual information. Id.
System 100 further includes a voice support server 108 that corresponds to the claimed interactive module since it receives a user’s voice commands. Id. at col. 8 l. 53 to col. 9 l. 8, FIG.1.
The claimed storage module corresponds to Ryabov’s voice support server 108 that stores domains 114 corresponding to the claimed preset behavior library and Ryabov’s third-party application server 106 that stores music. Id. at col. 7 ll. 47–56, col. 9 ll. 9–51, FIG.1. While Ryabov does not describe storing a special effects library, the rejection of claim 5, incorporated herein, shows that it would have been obvious to create and store special effect files. One of ordinary skill would have reasonably chosen to store those special effect files as a library on third-party application server 106 since it already stores a library of corresponding audio files.
Ryabov’s third-party application 104 corresponds to the claimed matching and control modules since it is responsible for Ryabov’s display and playing operations. Id. at col. 11 ll. 52–64, col. 14 ll. 33–53, FIG.7 (depicting one example of a GUI). And as shown in the obviousness rejection of claim 1, incorporated herein, it would have been obvious to match and play special effects along with an audio file. Accordingly, one of ordinary skill would have reasonably chosen to modify third-party application 104 to match special effects to audio and control the display of client device 102 to display the special effects as images on the display. For the foregoing reasons, the combination of the Ryabov, the Lindholm and the Fang references makes obvious all limitations of the claim. For the foregoing reasons, the combination of the Ryabov, the Lindholm, the Fang and the Evans references makes obvious all limitations of the claim.
Summary
Claims 1–10 and 13–15 are rejected under at least one of 35 U.S.C. §§ 102 and 103 as being unpatentable over the cited prior art. In the event the determination of the status of the application as subject to AIA 35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA 35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
Issues Under 35 U.S.C. § 112
Indefiniteness
The following is a quotation of 35 U.S.C. § 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
Claims 10–12 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Claims 10–12 recite: “the target lyrics to be changed.” There is insufficient antecedent basis for this term. Neither claim 10 nor its base claims define target lyrics to be changed.
Claim 10 also recites: “fast special effects.” The term “fast” is a relative term that is not described in the Specification in a way that allows one of ordinary skill to objectively categorize a special effect as fast, as opposed to a normal or slow special effect.
Claims 11 and 12 include further logical, grammatical and antecedent problems that would prevent one of ordinary skill in the art from reasonably understanding the scope of the claims. Claim 11 recites:
“retrieving duration of a single character in the audio file and comparing the duration with a preset threshold; and
“secondary anchoring endpoints of which the number of characters exceeds the preset threshold within the preset display time, and
“readding the fast special effects.”
The limitation “secondary anchoring endpoints of which the number of characters exceeds the preset threshold within the preset display time” is unclear. The previous comparing limitation indicates that single character’s duration is compared to a preset threshold. The secondary anchoring of endpoints limitation requires anchoring endpoints of “which the number of characters exceeds the present threshold within the preset display time.” This is an illogical statement because the previous limitation of comparing a character to the preset threshold does not compare a number of characters to a threshold but a duration of a character to a threshold.
Claim 11 lacks antecedent basis. In particular, the term “the preset display time” refers to a preset display time that is not recited in claim 11 or any of its base claims.
Claim 11 further recites: “readding the fast special effects.” It is unclear how fast special effects can be readded when they were never removed. This renders the limitation “readding the fast special effects” ungrammatical.
Claims 11 and 12 depend on claim 10. Claim 12 further depends on claim 11. Thus, claims 11 and 12 are indefinite for the same reasons as claim 10 and claim 12 is indefinite for the same reasons as claim 11. For these reasons, claims 10–12.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WALTER F BRINEY III whose telephone number is (571)272-7513. The examiner can normally be reached M-F 8 am-4:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carolyn Edwards can be reached at 571-270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Walter F Briney III/
/CAROLYN R EDWARDS/Supervisory Patent Examiner, Art Unit 2692
Walter F Briney IIIPrimary ExaminerArt Unit 2692
12/12/2025