Last updated: April 19, 2026

Application No. 18/843,231

AUDIO PLAYING METHOD AND APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM

Final Rejection §103

Filed

Aug 30, 2024

Examiner

LIN, JASON K

Art Unit

2425

Tech Center

2400 — Computer Networks

Assignee

BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.

OA Round

2 (Final)

This examiner grants 49% of cases after interview

— +34.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 454 resolved cases, 2023–2026

Examiner Intelligence

LIN, JASON K View full profile →

Grants 49% of resolved cases

Career Allow Rate

221 granted / 454 resolved

-9.3% vs TC avg

Strong +35% interview lift

Without

With

+34.8%

Interview Lift

resolved cases with interview

Typical timeline

3y 7m

Avg Prosecution

28 currently pending

Career history

482

Total Applications

across all art units

Statute-Specific Performance

§101

5.2%

-34.8% vs TC avg

§103

61.2%

+21.2% vs TC avg

§102

16.0%

-24.0% vs TC avg

§112

9.3%

-30.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 454 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This office action is responsive to application No. 18/843,231 filed on 12/03/2025.  Claim(s) 2, 8, 12, and 18 are canceled. Claim(s) 1, 3-7, 9-11, 13-17, and 19-21 is/are pending and have been examined.
Claim Objections
Claim(s) 3 is/are objected to because of the following informalities:
	Claim 3 recites:
“wherein the switching switching from playing the current audio to playing the target audio at the second time during the process of playing the video based on the target audio”
	
There is an additional recitation of “switching”. Please amend to:	--wherein the --

Appropriate correction is required.



Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 3-7, 9-11, 13-17, and 19-21 have been considered but are moot in view of the new ground(s) of rejection.
Although a new ground(s) of rejection has been made. Some of Applicant’s arguments need to be addressed.
Applicants assert on P.11 that “Instead, Swaminathan appears to suggest switching to playing the different audio sub-stream whenever the user makes the request to switch, without any regard to I-frames of the video. As such, Swaminathan does not disclose switching from playing the current audio to playing the target audio at the second time during the process of playing the video based on the target audio data.”
In response, the Examiner respectfully disagrees. Swaminathan teaches in Paragraph 0041 that each video segment of sub-stream 120 has a corresponding audio segment of sub-stream 124 for playback, client device 102a refrains from playing back media segment 120a until the audio content of media segment 124a is also received and ready for playback. Paragraph 0053 teaches where received media segments correspond to at least partially concurrent portions of the multimedia content, a client device may play back the received media segments at least partially concurrently, i.e. the content of those segments may be played back at least partially concurrently. For example, audio and video segments that correspond to substantially the same time period in the multimedia content may be played back together. Figs.2, Paragraph 0060 teaches pattern of requesting unsent media segments, where unsent media segments are request and sent in each repetition. Thus, switching of target audio, would result in a request for unsent media segment of target audio sub-stream, which would be subsequent to the segment of the current audio sub-stream. 
New reference Hoffmann in combination taught where each segment of content, audio or video content, starts with an I-frame to enable seamless adapation.
Thus, the combination of Swaminathan and Hoffmann teach the claimed limitation(s).

Please also see Office Action below.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1, 3-7, 9-11, 13-17, and 19-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Swaminathan (US 2016/0182600) in view of Hoffmann et al. (US 2021/0258632).
Consider claims 1, 9, and 10, Swaminathan teaches an audio playing method, electronic device, and a storage medium comprising computer-executable instructions, wherein the computer-executable instructions, when executed by a computer processor, cause the computer processor to/comprising: at least one processor; and a storage device with at least one program stored thereon, the at least one program, when executed by the at least one processor, (Fig.1A-B, processor(s) 614, memory 612 - Fig.6, Paragraph 0086-0088) causes the at least one processor to:
receive an audio data switching instruction at a first time during a process of playing a video, wherein the audio data switching instruction comprises an instruction to switch from playing current audio associated with the video to playing target audio associated with the video (Paragraph 0033 teaches at least two sub-streams such as two of sub-streams 120, 122, 124, 126, and 128 are included in streaming multimedia content 116 to a client device. For example, the stream may include at least one video sub-stream, e.g. corresponding to sub-stream 120, and at least one audio sub-stream, e.g. corresponding to sub-stream 124. Paragraph 0034 teaches default sub-streams may be changed before or during the stream. Sub-streams in the stream may change during the stream, for example, as selected by the client device and/or the server. Client device may select a lower bitrate for the stream, such that the server switches from sending video segments of sub-stream 120 to sending video segments of sub-stream 122, e.g. client device may request segments of a lower bitrate sub-stream(s). A client device may similarly select between sub-streams 124 and 126 to select the language to be played with the video content. Paragraph 0070 teaches a client device, e.g. a user of the client device, could selectively switch to an audio sub-stream corresponding to a different language during a stream);
request, upon receiving the audio data switching instruction, target audio data associated with the target audio from a server during the process of playing current the video (Paragraph 0032 teaches sub-stream 124 is an audio sub-stream of multimedia content 116 corresponding to Language A, e.g. English. Sub-stream 126 is an audio sub-stream of multimedia content 116 corresponding to Language B, e.g. Spanish. Paragraph 0034 teaches default sub-streams may be changed before or during the stream. Sub-streams in the stream may change during the stream, for example, as selected by the client device and/or the server. Client device may select a lower bitrate for the stream, such that the server switches from sending video segments of sub-stream 120 to sending video segments of sub-stream 122, e.g. client device may request segments of a lower bitrate sub-stream(s). A client device may similarly select between sub-streams 124 and 126 to select the language to be played with the video content. Paragraph 0037 teaches the client device may use a request comprising a URL provided by the manifest to request a corresponding resource. Paragraph 0041 teaches client device 102a is to play back both sub-streams 120 and 124 substantially concurrently, for example, where sub-stream 124 is an audio sub-stream that accompanies sub-stream 120, which is a video sub-stream. Client device 102a may request media segment 120a and receive a response that includes media segment 120a. Client device 102a may subsequently request media segment 124a and receive a response that includes media segment 124a. Paragraph 0042 teaches a server can receive requests to stream media segments of multimedia content to a client device. Based on each request, the server can send to the client device a plurality of media segments of the multimedia content. Paragraph 0070 teaches the sub-streams included in a stream of multimedia content can vary throughout the stream. The client device could select at least one new sub-stream, which could optionally replace another sub-stream in the stream. A client device, e.g. a user of the client device, could selectively switch to an audio sub-stream corresponding to a different language during a stream);
determine a video image information of the video corresponding to the first time at which the audio data switching instruction was received; determine a content of the video that occurs after the video image information and that has not yet been played, wherein the content is to be played at a second time during the process of playing the video (Fig.1B, Paragraph 0029 teaches a plurality of sub-streams divided into a sequence of media segments, which can be played back in order by a video player on a client device. Each media segment may correspond to a substantially fixed time period of multimedia content 116. Paragraph 0034 teaches default sub-streams may be changed before or during the stream. Sub-streams in the stream may change during the stream, for example, as selected by the client device and/or the server. Client device may select a lower bitrate for the stream, such that the server switches from sending video segments of sub-stream 120 to sending video segments of sub-stream 122, e.g. client device may request segments of a lower bitrate sub-stream(s). A client device may similarly select between sub-streams 124 and 126 to select the language to be played with the video content. Paragraph 0041 teaches as each video segment of sub-stream 120 has a corresponding audio segment of sub-stream 124 for playback, client device 102a refrains from playing back media segment 120a until the audio content of media segment 124a is also received and ready for playback. Paragraph 0052 taches media segments that are sent can correspond at least partially to concurrent portions of the multimedia content. In particular, these media segments may at least partially temporally overlap in the multimedia content, may completely overlap in the multimedia content, or may correspond to a substantially same time period in the multimedia content. Examples of concurrent portions of multimedia content are media segments in FIG. 1B that share the same letter in their reference signs. Paragraph 0053 teaches where received media segments correspond to at least partially concurrent portions of the multimedia content, a client device may play back the received media segments at least partially concurrently, i.e. the content of those segments may be played back at least partially concurrently. For example, audio and video segments that correspond to substantially the same time period in the multimedia content may be played back together. Paragraph 0061 teaches the requests could instead be for media segments in sub-stream 124 where the media segments of sub-stream 120, or of another sub-stream, are sent in response to the requests. As client(s) play back the content, when requesting to switch audio data, system would determine the playback time of the content, in order to request subsequent sub-stream(s) corresponding to the desired audio data, in order to receive subsequent audio sub-stream(s) along with the corresponding video sub-stream); and
switch from playing the current audio to playing the target audio at the second time during the process of playing the video based on the target audio data (Paragraph 0034 teaches default sub-streams may be changed before or during the stream. Sub-streams in the stream may change during the stream, for example, as selected by the client device and/or the server. Client device may select a lower bitrate for the stream, such that the server switches from sending video segments of sub-stream 120 to sending video segments of sub-stream 122, e.g. client device may request segments of a lower bitrate sub-stream(s). A client device may similarly select between sub-streams 124 and 126 to select the language to be played with the video content. Paragraph 0053 teaches where received media segments correspond to at least partially concurrent portions of the multimedia content, a client device may play back the received media segments at least partially concurrently, i.e. the content of those segments may be played back at least partially concurrently. For example, audio and video segments that correspond to substantially the same time period in the multimedia content may be played back together. Paragraph 0070 teaches the sub-streams included in a stream of multimedia content can vary throughout the stream. The client device could select at least one new sub-stream, which could optionally replace another sub-stream in the stream. A client device, e.g. a user of the client device, could selectively switch to an audio sub-stream corresponding to a different language during a stream. Once requested subsequent sub-stream(s) are received, the target audio and video sub-streams are played back together). 
Swaminathan does not explicitly teach wherein video image information is video image frame;
wherein content is an I-frame.
In an analogous art, Hoffmann teaches wherein video image information is video image frame (Paragraph 0084 teaches AV bistream may include a sequence of video I-frames and video P-frames);
wherein content is an I-frame (Paragraph 0085 teaches each segment of content, audio content or video content, of the AV bitstream start with an I-frame, to enable seamless adaptation, e.g., splicing or switching, at segment boundaries, e.g., at the start of a segment of the video content of a video elementary stream, indicated by a video I-frame, time-aligned with the start of a segment, indicated by an audio I-frame, of corresponding audio content of each of at least one audio elementary stream).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Swaminathan to include wherein video image information is video image frame; wherein content is an I-frame, as taught by Hoffmann, for the advantage of enabling seamless adaptation at segment boundaries (Hoffmann – Paragraph 0085), providing high level of temporal accuracy, allowing better and finer synchronization of audio/video content.
Combination of Swaminathan and Hoffmann teach determine a video image frame of the video corresponding to the first time at which the audio data switching instruction was received; determine an I-frame of the video that occurs after the video image frame and that has not yet been played, wherein the I-frame is to be played at a second time during the process of playing the video. Where Swaminathan teaches during playback of content, at a particular time during playback of a segment of the content, requesting of subsequent segment of audio/video for playback. Where playback of a subsequent segment of the video at a second time, that occurs after the point at where the switching instruction is received, where the subsequent segment of the video has not yet been played. Hoffman teaches that each segment of content such as video content starts with an I-frame. Thus, the combination would yield for where segments made up of frames, where a the request is received at a certain frame within the segment of content, resulting in request for subsequent segment that are after that particular frame/segment, where the segment that starts with an I-frame is determined and played back at the second time.

Consider claims 3, 13, and 19, Swaminathan and Hoffmann teach wherein the switching switching from playing the current audio to playing the target audio at the second time during the process of playing the video based on the target audio (Swaminathan - Paragraph 0034 teaches default sub-streams may be changed before or during the stream. Sub-streams in the stream may change during the stream, for example, as selected by the client device and/or the server. Client device may select a lower bitrate for the stream, such that the server switches from sending video segments of sub-stream 120 to sending video segments of sub-stream 122, e.g. client device may request segments of a lower bitrate sub-stream(s). A client device may similarly select between sub-streams 124 and 126 to select the language to be played with the video content. Paragraph 0053 teaches where received media segments correspond to at least partially concurrent portions of the multimedia content, a client device may play back the received media segments at least partially concurrently, i.e. the content of those segments may be played back at least partially concurrently. For example, audio and video segments that correspond to substantially the same time period in the multimedia content may be played back together. Paragraph 0070 teaches the sub-streams included in a stream of multimedia content can vary throughout the stream. The client device could select at least one new sub-stream, which could optionally replace another sub-stream in the stream. A client device, e.g. a user of the client device, could selectively switch to an audio sub-stream corresponding to a different language during a stream. Hoffmann – Paragraph 0085) comprises:
clearing cached data of the current audio after the second time, and reading and playing cached data of the target audio data from the second time (Swaminathan - Paragraph 0027). 

Consider claims 4, 14, and 20, Swaminathan and Hoffmann teach wherein the requesting, upon receiving the audio data switching instruction, the target audio data from the server comprises: establishing a data obtaining link for the target audio data with the server, and obtaining and caching the target audio data based on the data obtaining link (Swaminathan - Paragraph 0032 teaches sub-stream 124 is an audio sub-stream of multimedia content 116 corresponding to Language A, e.g. English. Sub-stream 126 is an audio sub-stream of multimedia content 116 corresponding to Language B, e.g. Spanish. Paragraph 0034 teaches a client device may similarly select between sub-streams 124 and 126 to select the language to be played with the video content. Paragraph 0037 teaches the client device may use a request comprising a URL provided by the manifest to request a corresponding resource. Paragraph 0041 teaches a client device 102a is to play back both sub-streams 120 and 124 substantially concurrently, for example, where sub-stream 124 is an audio sub-stream that accompanies sub-stream 120, which is a video sub-stream. Client device 102a may request media segment 120a and receive a response that includes media segment 120a. Client device 102a may subsequently request media segment 124a and receive a response that includes media segment 124a. Paragraph 0042 teaches a server can receive requests to stream media segments of multimedia content to a client device. Based on each request, the server can send to the client device a plurality of media segments of the multimedia content. Paragraph 0070 teaches the sub-streams included in a stream of multimedia content can vary throughout the stream. The client device could select at least one new sub-stream, which could optionally replace another sub-stream in the stream. A client device, e.g. a user of the client device, could selectively switch to an audio sub-stream corresponding to a different language during a stream. Paragraph 0027 teaches cache 112 can be associated with client device 102a for storing content received from server 106. Each media segment may correspond to an HTTP resource, for example, in implementations where an HTTP protocol is employed). 

Consider claims 5, 15, and 21, Swaminathan and Hoffmann teach upon receiving the audio data switching instruction (Swaminathan – Paragraph 0070), further comprising:
disconnecting a data obtaining link for obtaining initial audio data associated with the current audio from the server (Swaminathan - Paragraph 0034, 0047; Paragraph 0037, 0070). 

Consider claims 6 and 16, Swaminathan and Hoffmann teach before the playing the current audio during the process of playing the video, further comprising: requesting, according to an audio and video data obtaining instruction of a user, a data obtaining link for video data associated with the audio and video data obtaining instruction and a data obtaining link for at least one piece of audio data corresponding to the video data from the server; and determining, according to an initial audio data determination instruction of the user, initial audio data associated with the current audio from the at least one piece of audio data, and obtaining the video data and the initial audio data as the current audio and video data through data obtaining links respectively corresponding to the video data and the initial audio data (Swaminathan – Fig.1B, Paragraph 0032, 0034, 0037, 0041-0042). 

Consider claims 7 and 17, Swaminathan and Hoffmann teach wherein initial audio data associated with the current audio and the target audio data are dubbing audio data of different languages matching a same video data (Swaminathan – Fig.1B, Paragraph 0029, 0032, 0034, 0041, 0070).

Consider claims 11, Swaminathan and Hoffmann teach a computer program product, comprising a computer program carried on a non-transitory computer-readable storage medium, wherein the computer program comprises program codes for performing the audio playing method according to claim 1 (Swaminathan – Paragraph 0087).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON K LIN whose telephone number is (571)270-1446.  The examiner can normally be reached on Monday-Friday 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Pendleton can be reached on 571-272-7527.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JASON K LIN/Primary Examiner, Art Unit 2425

Read full office action

Prosecution Timeline

Aug 30, 2024

Application Filed

Sep 09, 2025

Non-Final Rejection — §103

Dec 03, 2025

Response Filed

Mar 11, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/733,331

Patent 12604047

JUST IN TIME CONTENT CONDITIONING

2y 5m to grant Granted Apr 14, 2026

18/733,118

Patent 12593082

JUST IN TIME CONTENT CONDITIONING

2y 5m to grant Granted Mar 31, 2026

18/538,255

Patent 12556760

CREDITING EXPOSURE TO MEDIA IDENTIFIED USING SOURCE FILTERING

2y 5m to grant Granted Feb 17, 2026

17/553,353

Patent 12548455

GROUND-BASED CONTENT CURATION PLATFORM DISTRIBUTING GEOGRAPHICALLY-RELEVANT CONTENT TO AIRCRAFT INFLIGHT ENTERTAINMENT SYSTEMS

2y 5m to grant Granted Feb 10, 2026

18/073,358

Patent 12537993

SMART HOME AUTOMATION USING MULTI-MODAL CONTEXTUAL INFORMATION

2y 5m to grant Granted Jan 27, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

49%

Grant Probability

84%

With Interview (+34.8%)

3y 7m

Median Time to Grant

Moderate

PTA Risk

Based on 454 resolved cases by this examiner. Grant probability derived from career allow rate.