Last updated: April 19, 2026

Application No. 18/955,664

AUDIO PROCESSING METHOD, APPARATUS AND DEVICE, AND STORAGE MEDIUM

Non-Final OA §103

Filed

Nov 21, 2024

Examiner

ZHAO, DAQUAN

Art Unit

2484

Tech Center

2400 — Computer Networks

Assignee

BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.

OA Round

2 (Non-Final)

Interview Optional

— +14.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 1029 resolved cases, 2023–2026

Examiner Intelligence

ZHAO, DAQUAN View full profile →

Grants 77% — above average

Career Allow Rate

791 granted / 1029 resolved

+18.9% vs TC avg

Moderate +15% lift

Without

With

+14.8%

Interview Lift

resolved cases with interview

Typical timeline

2y 9m

Avg Prosecution

24 currently pending

Career history

1053

Total Applications

across all art units

Statute-Specific Performance

§101

11.0%

-29.0% vs TC avg

§103

44.9%

+4.9% vs TC avg

§102

20.3%

-19.7% vs TC avg

§112

14.0%

-26.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1029 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1- 6, 11-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Yuan et al (US 2024/0169961) and further in view of Landy (US 2010/0040349).
For claim 1, Yuan et al teach a method of audio processing, comprising: 
obtaining a first media content input by a user, the first media content comprising a first video content associated with a first audio content (e.g. paragraph 40: “in this special effect video shooting scenario, not only a video image may be processed, but also audio information may be processed, that is, audio special effects are superimposed on the basis of video image processing.”); and 
providing a second media content based on a selection of an audio style by the user (e.g. paragraph 5: “…a target arrangement template is determined based on an input audio entered by a user and a trigger operation to an arrangement template by the user, whereby the effect of fusing the input audio to a target audio of the target arrangement template is obtained…”), the second media content comprising a second video associated with a second audio content generated based on the first audio content (e.g. paragraph 5: “…a target arrangement template is determined based on an input audio entered by a user and a trigger operation to an arrangement template by the user, whereby the effect of fusing the input audio to a target audio of the target arrangement template is obtained…”), the second audio content having a same timbre as the first audio content (e.g. figure 1, paragraphs 52- 53: the melody attribute may be used to characterize the basic music elements contained in the melody, and optionally, the melody attribute includes a tempo, a rhythm, a meter, a strength, a tone, and the like.), and the second audio content having at least one audio attribute corresponding to the target style (e.g. paragraph 53: “the to-be-selected arrangement style may include, but is not limited to, a popular arrangement style, a jazz arrangement style, a dance arrangement style, a hip-hop arrangement style, a reggae arrangement style”).
Yuan et al do not further disclose a first visual content of the second video content being determined by adjusting a play speed of a second visual content of the first content, wherein the adjusted play speed is determined based on the second audio content. 
Landy teaches a first visual content of the second video content being determined by adjusting a play speed of a second visual content of the first content, wherein the adjusted play speed is determined based on the second audio content (e.g. abstract or paragraph 17: “The user operates the dual-control interface to select the audio resource to be played at any point in time while adjusting the speed of the video to aesthetically match it. For example, the video speed can be adjusted to run slower if a song with a slow beat is selected for playing, and adjusted to run faster if a song with a fast beat is selected for playing. ”). 
It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Landy into the teaching of Yuan et al to allow user to independently synchronize the video to aesthetically matches any selected audio in real time (e.g. paragraph 17, Landy) to improve the quality of the content. 
Claims 11 and 20 are rejected for the same reasons as discussed in claim 1 above, wherein paragraph 8 of Yuan et al also teach one or more processors and a storage apparatus. The storage apparatus is configured to store one or more programs. The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method for determining the audio of any one of the embodiments of the present disclosure.
For claims 2 and 12, Yuan et al teach the at least one audio attribute comprises at least one of tone, cadence (e.g. figure 1, paragraphs 52- 53: the melody attribute may be used to characterize the basic music elements contained in the melody, and optionally, the melody attribute includes a tempo, a rhythm, a meter, a strength, a tone, and the like.).
For claims 3 and 13, Yuan et al teach displaying a selection panel, wherein the selection panel provides a set of candidate audio effects; and receiving a selection of a target audio effect in the set of candidate audio effects by the user, the target audio effect corresponding to the target style (e.g. figure 8, paragraph 53: “the to-be-selected arrangement style may include, but is not limited to, a popular arrangement style, a jazz arrangement style, a dance arrangement style, a hip-hop arrangement style, a reggae arrangement style”).
For claims 6 and 16, Yuan et al teach obtaining the first media content input by the user comprises at least one of: obtaining the first media content recorded by the user, or obtaining the first media content uploaded by the user (e.g. figure 2). 
For claims 4 and 14, Yuan et al do not further disclose generating the second audio content based on the first audio content; adjusting the play speed of the second visual content of the first video content based on the second audio content; and generating, based on the second audio content and the adjusted second visual content, the second video content as the second media content. Landy teaches generating the second audio content based on the first audio content; adjusting the play speed of the second visual content of the first video content based on the second audio content; and generating, based on the second audio content and the adjusted second visual content, the second video content as the second media content. (e.g. abstract or paragraph 17: “The user operates the dual-control interface to select the audio resource to be played at any point in time while adjusting the speed of the video to aesthetically match it. For example, the video speed can be adjusted to run slower if a song with a slow beat is selected for playing, and adjusted to run faster if a song with a fast beat is selected for playing. ”). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Landy into the teaching of Yuan et al to allow user to independently synchronize the video to aesthetically matches any selected audio in real time (e.g. paragraph 17, Landy) to improve the quality of the content. 
For claims 5 and 15, Yuan et al do not further disclose determining an audio portion in the second audio content corresponding to target an audio content and a video portion in the second visual content corresponding to the audio target content; and adjusting a play speed of the video portion, so that the video portion is synchronous with the audio portion. Landy teaches determining an audio portion in the second audio content corresponding to target an audio content and a video portion in the second visual content corresponding to the audio target content; and adjusting a play speed of the video portion, so that the video portion is synchronous with the audio portion (e.g. abstract or paragraph 17: “The user operates the dual-control interface to select the audio resource to be played at any point in time while adjusting the speed of the video to aesthetically match it. For example, the video speed can be adjusted to run slower if a song with a slow beat is selected for playing, and adjusted to run faster if a song with a fast beat is selected for playing. ”). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Landy into the teaching of Yuan et al to allow user to independently synchronize the video to aesthetically matches any selected audio in real time (e.g. paragraph 17, Landy) to improve the quality of the content.


Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Yuan et al and Landy, as applied to claims 1-3, 6, 11-13, 16 and 20, and further in view of Khalilia et al (US 11,404,087).
For claims 7 and 17, Yuan et al and Landy do not further disclose extracting the first audio content from the first media content; and processing the first audio content by using a target model to generate the second audio content, wherein the target model is trained based on sample data corresponding to the target style. Khalilia et al teach extracting the first audio content from the first media content; and processing the first audio content by using a target model to generate the second audio content, wherein the target model is trained based on sample data corresponding to the target style (e.g. column 3, lines 10-28: “…By contrast, the style-extraction encoder may be trained to extract representation of speaking style from audio data independently of verbal content of the audio data. Additionally, the decoder may be trained to combine and decode extracted verbal content data and extracted speaking style data into an output representation of speech.). It would have been obvious to one ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of Khalilia et al into the teaching of Yuan et al and Landy to output audio that have speaking style to allow the audio to appear more natural and realistic to viewer (e.g. column 3, lines 5-10, Khalilia et al). 
Allowable Subject Matter
Claims 8-10 and 18-19 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAQUAN ZHAO whose telephone number is (571)270-1119. The examiner can normally be reached M-Thur: 7:00 am-5:00 pm. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Tran can be reached on 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Email: daquan.zhao1@uspto.gov.  
Phone: (571)270-1119





/DAQUAN ZHAO/Primary Examiner, Art Unit 2484

Read full office action

Prosecution Timeline

Nov 21, 2024

Application Filed

Oct 31, 2025

Non-Final Rejection — §103

Feb 04, 2026

Response Filed

Mar 19, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/558,629

Patent 12597257

MONITORING SYSTEM AND METHOD FOR RECOGNIZING THE ACTIVITY OF DETERMINED PERSONS

2y 5m to grant Granted Apr 07, 2026

18/624,514

Patent 12593108

SYSTEMS AND METHODS FOR AUTOMATED SPEECH-TO-TEXT CAPTIONING

2y 5m to grant Granted Mar 31, 2026

18/241,849

Patent 12587609

ELECTRONIC DEVICE AND CONTROL METHOD FOR CONTROLLING SPEED OF WORKOUT VIDEO

2y 5m to grant Granted Mar 24, 2026

18/570,906

Patent 12587721

VIDEO PROCESSING METHOD, APPARATUS AND SYSTEM

2y 5m to grant Granted Mar 24, 2026

18/573,097

Patent 12586610

METHOD, APPARATUS, DEVICE, STORAGE MEDIUM AND PROGRAM PRODUCT FOR VIDEO GENERATION

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

2-3

Expected OA Rounds

77%

Grant Probability

92%

With Interview (+14.8%)

2y 9m

Median Time to Grant

Moderate

PTA Risk

Based on 1029 resolved cases by this examiner. Grant probability derived from career allow rate.

AUDIO PROCESSING METHOD, APPARATUS AND DEVICE, AND STORAGE MEDIUM

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email