Prosecution Insights
Last updated: April 19, 2026
Application No. 18/551,916

AUDIO PROCESSING METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM

Non-Final OA §102§103§112
Filed
Sep 22, 2023
Examiner
SHIH, HAOSHIAN
Art Unit
2179
Tech Center
2100 — Computer Architecture & Software
Assignee
BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
OA Round
1 (Non-Final)
69%
Grant Probability
Favorable
1-2
OA Rounds
3y 5m
To Grant
90%
With Interview

Examiner Intelligence

Grants 69% — above average
69%
Career Allow Rate
375 granted / 545 resolved
+13.8% vs TC avg
Strong +21% interview lift
Without
With
+21.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
20 currently pending
Career history
565
Total Applications
across all art units

Statute-Specific Performance

§101
5.5%
-34.5% vs TC avg
§103
53.1%
+13.1% vs TC avg
§102
17.7%
-22.3% vs TC avg
§112
15.5%
-24.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 545 resolved cases

Office Action

§102 §103 §112
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION Claims 1-17, 19 and 22-23 are pending in this application and have been examined in response to application filed on 09/22/2023. Claims 18, 20 and 21 are canceled. CONTINUING DATA: This application is a 371 of PCT/CN2023/092363 05/05/2022 FOREIGN APPLICATIONS: CHINA 2022104954600 05/07/2022 Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1-3, 17, 19 and 22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claims 1-3, 17, 19 and 22 recite the limitation “to-be-processed audio” appears to be indefinite because the claim and the spec essentially put no limits on what may be “to-be-processed audio”, that the scope of the claim is unclear (see par. [0037]). Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-2, 17 and 19 are rejected under 35 U.S.C. 102(a)(1) as being unpatentable by Fitzgerald (US 2021/0193164 A1). As to INDEPENDENT claim 1, Fitzgerald discloses an audio processing method, comprising: acquiring, in response to an audio acquisition instruction, to-be-processed audio (fig.4, [0050]; source audio is loaded); performing, in response to an audio extraction instruction for the to-be-processed audio, audio extraction on the to-be-processed audio, to obtain target audio, wherein the target audio is a vocal and/or an accompaniment extracted from the to-be-processed audio ([0013], [0014]; source audio is decomposed into at least a vocal track and an accompaniment track); presenting the target audio ([0050]; the separated track is presented and can be played). As to claim 2, Fitzgerald discloses wherein the acquiring, in response to the audio acquisition instruction, the to-be-processed audio comprises: acquiring, in response to a touch-control operation ([0013], [0044], [0045]; a touch interface is utilized for loading the source audio) on a first control on a first interface, the to-be-processed audio, wherein the first control is configured to trigger loading of audio (fig.5; [0056]; a touch based user interface for interacting with the source audio is disclosed). As to INDEPENDENT claim 17 is rejected under the same rationale addressed in the rejection of claim 1 above. As to INDEPENDENT claim 19 is rejected under the same rationale addressed in the rejection of claim 1 above. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 3-8, 11, 14, 16 and 22-23 are rejected under 35 U.S.C. 103 as being unpatentable by Fitzgerald in view of XTRAX (NPL: “XTRAX STEMS 2”, version 2.0 05/18/2018). As to claim 3, Fitzgerald discloses performing, in response to a touch-control operation ([0044], [0045]; a touch base UI is disclosed) on …, the audio extraction on the to-be-processed audio, to obtain the target audio ([0013]; the decomposed audio track is obtained). Fitzgerald does not expressly disclose … a second control on a second interface…; wherein the second control is configured to trigger the audio extraction. In the same field of endeavor, XTRAX discloses … a second control on a second interface…; wherein the second control is configured to trigger the audio extraction (pg.4; different audio decomposing options are selectable). It would have been obvious to one of ordinary skill in the art, having the teaching of Fitzgerald and XTRAX before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by Fitzgerald to include a content editing GUI taught by XTRAX with the motivation being to interacting with an audio decomposition app graphically. As to claim 4, Fitzgerald discloses trigger playing of the target audio ([0050]; the target audio is played back). Fitzgerald does not expressly disclose displaying, on a third interface, an audio graphic corresponding to the target audio and/or a third control associated with the target audio, wherein the third control is configured to trigger playing of the target audio. In the same field of endeavor XTRAX discloses displaying, on a third interface, an audio graphic corresponding to the target audio and/or a third control associated with the target audio, wherein the third control is configured to trigger playing of the target audio (pg.8; user can select one of the decomposed tracks to playback). It would have been obvious to one of ordinary skill in the art, having the teaching of Fitzgerald and XTRAX before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by Fitzgerald to include a content editing GUI taught by XTRAX with the motivation being to interacting with an audio decomposition app graphically. As to claim 5, Fitzgerald discloses wherein [a] fourth control is configured to trigger an export of data associated with the target audio to a target location, and the target location comprises an album or a file system ([0060]; the decomposed track(s) can be exported). Fitzgerald does not expressly disclose displaying, on a third interface, a fourth control associated with the target audio. In the same field of endeavor XTRAX discloses displaying, on a third interface, a fourth control associated with the target audio (pg.8; decomposed track(s) can be exported/saved to a folder). It would have been obvious to one of ordinary skill in the art, having the teaching of Fitzgerald and XTRAX before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by Fitzgerald to include a content editing GUI taught by XTRAX with the motivation being to interacting with an audio decomposition app graphically. As to claim 6, Fitzgerald discloses wherein the fifth control is configured to trigger audio editing of the target audio ([0060]; the decomposed tracks are subject to user editing). Fitzgerald does not expressly disclose displaying, on a third interface, a fifth control associated with the target audio. In the same field of endeavor XTRAX discloses displaying, on a third interface, a fifth control associated with the target audio. (pg.7; the decomposed tracks can be edited using sliders). It would have been obvious to one of ordinary skill in the art, having the teaching of Fitzgerald and XTRAX before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by Fitzgerald to include a content editing GUI taught by XTRAX with the motivation being to interacting with an audio decomposition app graphically. As to claim 7, the prior art as combined discloses presenting, in response to an audio processing instruction, one or more audio processing function controls, wherein the one or more audio processing function controls are configured to trigger execution of corresponding audio processing functions (Fitzgerald, [0015]; a decomposition option is selected); performing, in response to a touch-control operation (Fitzgerald, [0045]; “touch screen”) on one audio processing function control in the one or more audio processing function controls, audio processing corresponding to the one audio processing function control, on the target audio, to obtain the processed target audio (Fitzgerald, [0050]; XTRAX, pg.8; the decomposed track is presented). As to claim 8, the prior art as combined discloses presenting, in response to a touch-control operation (Fitzgerald, [0045]; “touch screen”) on a sixth control on a fourth interface, the one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls (Fitzgerald, [0014]; a source separation control is selected), wherein the seventh control is configured to trigger presentation of the one or more audio processing function controls on a fifth interface (XTRAX, pg.6; a post-processing tool is available to use). As to claim 11, the prior art as combined discloses displaying the processed target audio on a sixth interface, wherein the sixth interface comprises an eighth control, and the eighth control is configured to trigger playing of the processed target audio (XTRAX, pg.8; user can select one of the decomposed tracks to playback). As to claim 14, the prior art as combined discloses exporting, in response to an export instruction on a sixth interface, data associated with the processed target audio, to a target location, wherein the target location comprises an album or a file system (XTRAX, pg.8; an export command is provided for exporting decomposed tracks to a folder). As to claim 16, the prior art as combined discloses wherein the data associated with the processed target audio comprises at least one of: the processed target audio, the vocal, the accompaniment, a static cover of the processed target audio, and a dynamic cover of the processed target audio (XTRAX, pg.8; decomposed tracks are saved). As to claim 22 is rejected under the same rationale addressed in the rejection of claim 3 above. As to claim 23 is rejected under the same rationale addressed in the rejection of claim 4 above. Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable by Fitzgerald-XTRAX and in view of Kirkham et al. (US 8,464,180 B1). As to claim 9, the prior art as combined discloses presentation of the one or more audio processing function controls on a fifth interface (Fitzgerald, fig.5, “520”, different processing widgets are selectable). The prior art as combined does not expressly disclose in response to a sliding operation on a fourth interface, the one or more audio processing function controls or a seventh control associated with the one or more audio processing function controls, wherein the seventh control is configured to trigger presentation of the one or more audio processing function controls on a fifth interface. In the same field of endeavor, Kirkham discloses in response to a sliding operation on an interface, displaying icons on another interface (col.4, l.54-64; basic swiping gesture is used to navigate between interfaces). It would have been obvious to one of ordinary skill in the art, having the teaching of prior art as combined and Lewis before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by prior art as combined to include swipe navigation taught by Kirkham with the motivation being to provide a content navigation using touch gestures. Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable by Fitzgerald-XTRAX in view of Lewis et al. (US 2019/0051272 A1). As to claim 10, the prior art as combined discloses wherein the audio processing function controls comprise: an audio optimization control configured to trigger editing of audio to optimize the audio (XTRAX, pg.6; audio is optimized based on the user preference); an accompaniment extraction control configured to trigger extraction of a vocal and/or an accompaniment from audio (Fitzgerald, [0016]; a decomposition option is selected). The prior art as combined does not expressly disclose a style synthesis control configured to trigger extraction of vocal from audio and mixing and editing of the extracted vocal with a preset accompaniment; an audio mashup control configured to trigger extraction of vocal from first audio, extraction of an accompaniment from second audio, and mixing and editing of the extracted vocal with the extracted accompaniment. In the same field of endeavor, Lewis discloses mixing and editing of the extracted vocal with a preset accompaniment ([0029], [0030]; the voice track is remixed with a selected rhythm template that is associated with a particular preset genre). In the same field of endeavor, Lewis discloses a style synthesis control configured to trigger extraction of vocal from audio and mixing and editing of the extracted vocal with a preset accompaniment ([0029], [0030]; the voice track is remixed with a selected rhythm template that is associated with a particular preset genre); an audio mashup control configured to trigger extraction of vocal from first audio, extraction of an accompaniment from second audio, and mixing and editing of the extracted vocal with the extracted accompaniment ([0004], [0030]; selected audio tracks are remixed and edited). It would have been obvious to one of ordinary skill in the art, having the teaching of prior art as combined and Lewis before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by prior art as combined to include characteristic based audio remixing by Lewis with the motivation being to aid user in creating music contents by offering preset templates. Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable by Fitzgerald-XTRAX and in view of Lewis. As to claim 15, the prior art as combined does not expressly disclose sharing, in response to a sharing instruction on a sixth interface, data associated with the processed target audio, to a target application. In the same field of endeavor, Lewis discloses sharing, in response to a sharing instruction on a sixth interface, data associated with the processed target audio, to a target application (fig.7; [0003]; audio can be published and shared with other users). It would have been obvious to one of ordinary skill in the art, having the teaching of prior art as combined and Lewis before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by prior art as combined to include content sharing taught by Lewis with the motivation being to provide a mechanism for sharing user contents. Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable by Fitzgerald-XTRAX in view of Nichols et al. (US 2007/0233740 A1) and Koch (US 2011/0170008 A1). As to claim 12, the prior art as combined discloses wherein the sixth interface further comprises a ninth control, and the method further comprises: displaying, in response to a touch-control (Fitzgerald, [0044], [0045]; a touch base UI is disclosed). The prior art as combined does not expressly disclose operation on the ninth control on the sixth interface, a first window, wherein the first window comprises a cover import control, one or more preset static cover controls, and one or more preset animation effect controls; acquiring, in response to a control selection operation on the first window, a target cover; wherein the target cover is a static cover or a dynamic cover. In the same field of endeavor, Nichols discloses operation on the ninth control on the sixth interface, a first window, wherein the first window comprises a cover import control, one or more preset static cover controls, and one or more preset animation effect controls; acquiring, in response to a control selection operation on the first window, a target cover; wherein the target cover is a static cover or a dynamic cover ([0029], fig.7; a cover image is selectable, wherein preset image categories such as Title and subject are editable). It would have been obvious to one of ordinary skill in the art, having the teaching of prior art as combined and Nichols before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by prior art as combined to include a cover art assignment interface taught by Nichols with the motivation being to provide customizable cover arts. The prior art as combined does not expressly disclose one or more preset animation effect controls. In the same field of endeavor, Koch discloses one or more preset animation effect controls ([0043]; preset animation effect tools are presented). It would have been obvious to one of ordinary skill in the art, having the teaching of prior art as combined and Koch before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by prior art as combined to include an image animation interface taught by Koch with the motivation being to provide enhance cover arts by adding animation effects. Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable by Fitzgerald-XTRAX-Nichols-Koch and in view of Ishikawa (US 2020/0134921 A1). As to claim 13, the prior art as combined discloses in response to the control selection operation on the first window, the target cover comprises: acquiring, in response to the control selection operation on the first window, a static cover (Nichols, [0029], fig.7; a static cover is selected) and an animation effect (Koch, [0043]; animation effects are applicable); generating, according to … the static cover and the animation effect, a dynamic cover (Koch, [0043]; an animated cover image is generated). The prior art as combined does not expressly disclose generating, according to an audio characteristic of the target audio that changes with the audio characteristic of the processed target audio; wherein the audio characteristic comprises audio tempo and/or volume. In the same field of endeavor, Ishikawa discloses generating, according to an audio characteristic of the target audio that changes with the audio characteristic of the processed target audio; wherein the audio characteristic comprises audio tempo and/or volume (fig.5, fig.6; [0094], [0095]; audio tempo information is converted and displayed visually). It would have been obvious to one of ordinary skill in the art, having the teaching of prior art as combined and Ishikawa before him prior to the effective filling date, to modify the audio interactive decomposition editor taught by prior art as combined to include visualizing musical effects taught by Ishikawa with the motivation being to provide enhance audio experience by adding animation effects. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAOSHIAN SHIH whose telephone number is (571)270-1257. The examiner can normally be reached M-F 8:00-5:00. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, FRED EHICHIOYA can be reached at (571) 272-4034. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /HAOSHIAN SHIH/Primary Examiner, Art Unit 2179
Read full office action

Prosecution Timeline

Sep 22, 2023
Application Filed
Nov 20, 2025
Non-Final Rejection — §102, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597186
SYNTHESIZING SHADOWS IN DIGITAL IMAGES UTILIZING DIFFUSION MODELS
2y 5m to grant Granted Apr 07, 2026
Patent 12591329
REDUCED-SIZE INTERFACES FOR MANAGING ALERTS
2y 5m to grant Granted Mar 31, 2026
Patent 12578832
DISTANCE-BASED USER INTERFACES
2y 5m to grant Granted Mar 17, 2026
Patent 12572325
METHOD AND DEVICE FOR PLAYING SOUND EFFECTS OF MUSIC
2y 5m to grant Granted Mar 10, 2026
Patent 12561039
GENERATIVE MODEL WITH WHITEBOARD
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
69%
Grant Probability
90%
With Interview (+21.0%)
3y 5m
Median Time to Grant
Low
PTA Risk
Based on 545 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month