Prosecution Insights
Last updated: April 19, 2026
Application No. 18/843,685

METHOD AND DEVICE OF STREAM MERGING FOR SPEECH CO-HOSTING

Non-Final OA §102§112
Filed
Sep 03, 2024
Examiner
HACKENBERG, RACHEL J
Art Unit
2454
Tech Center
2400 — Computer Networks
Assignee
BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
OA Round
1 (Non-Final)
79%
Grant Probability
Favorable
1-2
OA Rounds
2y 10m
To Grant
99%
With Interview

Examiner Intelligence

Grants 79% — above average
79%
Career Allow Rate
236 granted / 300 resolved
+20.7% vs TC avg
Strong +26% interview lift
Without
With
+26.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
35 currently pending
Career history
335
Total Applications
across all art units

Statute-Specific Performance

§101
4.9%
-35.1% vs TC avg
§103
53.2%
+13.2% vs TC avg
§102
14.2%
-25.8% vs TC avg
§112
17.8%
-22.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 300 resolved cases

Office Action

§102 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Priority Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55. Information Disclosure Statement The information disclosure statement (IDS) was submitted on 09/22/2024. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Claim Rejections - 35 USC § 112 112b: The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. Claim(s) 4, 17 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 4 recites the limitation “the device” in line 1. This renders the claim unclear as there is insufficient antecedent basis for this limitation in the claim. Claim 4 depends on Claim 1 and previous limitations do not recite “a device”. This same rejection applies to Claim 17. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claim(s) 1-7, 9-10, 13-23 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by US 2018/0288467 Al (Holmberg). Regarding Claim 1: Holmberg teaches A method of stream merging, comprising: obtaining a first speech stream comprising speech information of a first user (ie. second performer vocals) associated with a live streaming interaction event (ie. livestream vocal duet); ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals. [0029] The broadcast mix is presented as a vocal duet. Audio of the live stream includes both conversational-type audio portions captured in correspondence with interactive conversation between the first and second performers. [0045] Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences.) obtaining a second speech stream (ie. second performer vocals) and a first image, the second speech stream comprising speech information of a second user associated with the live streaming interaction event, the first image comprising image information of the second user (ie. video of the first and second performers); ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals, the method further includes capturing, at the host device, video that is performance synchronized with the captured second performer vocals, and the broadcast mix is an audiovisual mix of captured audio and video of at least the first and second performers.) merging the first speech stream, the second speech stream, and the first image (ie. video of first and second performers) to obtain first merged streaming data; ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals, the method further includes capturing, at the host device, video that is performance synchronized with the captured second performer vocals, and the broadcast mix is an audiovisual mix of captured audio and video of at least the first and second performers.) obtaining a second image indicating image information of the first user (ie. video feature of the first performer and/or second performer); ([0026] The method further includes dynamically varying in the broadcast mix at least visual prominence of one or the other of the first and second performers based on evaluation of a computationally audio defined feature of either or both of the first and second performer vocals. The method further includes applying one or more video effects to the broadcast mix based, at least in part, on a computationally defined audio or video feature of either or both of the first and second performer audio or video.) and encoding the second image and the first merged streaming data to obtain second merged streaming data. ([0026] In some embodiments, the method further includes dynamically varying in the broadcast mix at least visual prominence of one or the other of the first and second performers based on evaluation of a computationally audio defined feature of either or both of the first and second performer vocals. In some embodiments, the method further includes applying one or more video effects to the broadcast mix based, at least in part, on a computationally defined audio or video feature of either or both of the first and second performer audio or video.) The first merged streaming data is the broadcast mix and it’s encoded with video feature of either or both first and second performer video to provide the second merged streaming data. Regarding Claim 9: Holmberg teaches An electronic device comprising a processor and a memory; the memory storing computer execution instructions; the processor executing the computer execution instructions stored in the memory to cause the processor to perform the acts ([0004] Computationally, these computing (mobile) devices offer speed and storage capabilities comparable to engineering workstation or workgroup computers from less than ten years ago, and typically include powerful media processors, rendering them suitable for real-time sound synthesis and other musical applications. [0041] FIG. 5 is a functional block diagram of hardware and software components executable at an illustrative mobile phone-type portable computing device to facilitate processing and communication of a captured audiovisual performance for use in a multi-vocalist livestreaming configuration of network-connected devices in accordance with some embodiments of the present invention(s).) comprising: obtaining a first speech stream comprising speech information of a first user (ie. second performer vocals) associated with a live streaming interaction event (ie. livestream vocal duet); ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals. [0029] The broadcast mix is presented as a vocal duet. Audio of the live stream includes both conversational-type audio portions captured in correspondence with interactive conversation between the first and second performers. [0045] Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences.) obtaining a second speech stream (ie. second performer vocals) and a first image, the second speech stream comprising speech information of a second user associated with the live streaming interaction event, the first image comprising image information of the second user (ie. video of first and second performers); ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals, the method further includes capturing, at the host device, video that is performance synchronized with the captured second performer vocals, and the broadcast mix is an audiovisual mix of captured audio and video of at least the first and second performers.) merging the first speech stream, the second speech stream, and the first image (ie. video of first and second performers) to obtain first merged streaming data; ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals, the method further includes capturing, at the host device, video that is performance synchronized with the captured second performer vocals, and the broadcast mix is an audiovisual mix of captured audio and video of at least the first and second performers.) obtaining a second image indicating image information of the first user (ie. video feature of the first performer and/or second performer); ([0026] The method further includes dynamically varying in the broadcast mix at least visual prominence of one or the other of the first and second performers based on evaluation of a computationally audio defined feature of either or both of the first and second performer vocals. The method further includes applying one or more video effects to the broadcast mix based, at least in part, on a computationally defined audio or video feature of either or both of the first and second performer audio or video.) and encoding the second image and the first merged streaming data to obtain second merged streaming data. ([0026] In some embodiments, the method further includes dynamically varying in the broadcast mix at least visual prominence of one or the other of the first and second performers based on evaluation of a computationally audio defined feature of either or both of the first and second performer vocals. In some embodiments, the method further includes applying one or more video effects to the broadcast mix based, at least in part, on a computationally defined audio or video feature of either or both of the first and second performer audio or video.) The first merged streaming data is the broadcast mix and it’s encoded with video feature of either or both first and second performer video to provide the second merged streaming data. Regarding Claim 10: Holmberg teaches A non-transitory computer-readable storage medium storing program code for computer execution, the program code comprising instructions for performing the acts ([0086] Embodiments in accordance with the present invention may take the form of, and/or be provided as, a computer program product encoded in a machine-readable medium as instruction sequences and other functional constructs of software.) comprising: obtaining a first speech stream comprising speech information of a first user (ie. second performer vocals) associated with a live streaming interaction event (ie. livestream vocal duet); ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals. [0029] The broadcast mix is presented as a vocal duet. Audio of the live stream includes both conversational-type audio portions captured in correspondence with interactive conversation between the first and second performers. [0045] Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences.) obtaining a second speech stream (ie. second performer vocals) and a first image, the second speech stream comprising speech information of a second user associated with the live streaming interaction event, the first image comprising image information of the second user (ie. video of first and second performers); ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals, the method further includes capturing, at the host device, video that is performance synchronized with the captured second performer vocals, and the broadcast mix is an audiovisual mix of captured audio and video of at least the first and second performers.) merging the first speech stream, the second speech stream, and the first image (ie. video of first and second performers) to obtain first merged streaming data; ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals, the method further includes capturing, at the host device, video that is performance synchronized with the captured second performer vocals, and the broadcast mix is an audiovisual mix of captured audio and video of at least the first and second performers.) obtaining a second image indicating image information of the first user (ie. video feature of the first performer and/or second performer); ([0026] The method further includes dynamically varying in the broadcast mix at least visual prominence of one or the other of the first and second performers based on evaluation of a computationally audio defined feature of either or both of the first and second performer vocals. The method further includes applying one or more video effects to the broadcast mix based, at least in part, on a computationally defined audio or video feature of either or both of the first and second performer audio or video.) and encoding the second image and the first merged streaming data to obtain second merged streaming data. ([0026] In some embodiments, the method further includes dynamically varying in the broadcast mix at least visual prominence of one or the other of the first and second performers based on evaluation of a computationally audio defined feature of either or both of the first and second performer vocals. In some embodiments, the method further includes applying one or more video effects to the broadcast mix based, at least in part, on a computationally defined audio or video feature of either or both of the first and second performer audio or video.) The first merged streaming data is the broadcast mix and it’s encoded with video feature of either or both first and second performer video to provide the second merged streaming data. Regarding Claims 2, 15, 22: Holmberg teaches on the inventions of claims 1, 9, 10 as described. Holmberg teaches wherein the obtaining a first speech stream comprises: obtaining the first speech stream from a forward server, the forward server (ie. content server) configured to forward speech information of the first user. ([0052] FIG. 1, iPhone™ handhelds available from Apple Inc. (or more generally, handhelds 101A, 101B operating as guest and host devices, respectively) execute software that operates in coordination with a content server 110 to provide vocal capture. [0053] A current guest user of current guest device 101A contributes to the group audiovisual performance mix 111 that is supplied ( eventually via content server 110) by current host device 101B as live stream 122. [0058] User vocals 103A and 103B are captured at respective handhelds 101A, 101B, and may be optionally pitch-corrected continuously and in real-time and audibly rendered mixed with the locally-appropriate backing track ( e.g., backing track 107A at current guest device 101A and guest mix 106 at current host device 101B).) The content server obtains/receives and forwards/provides audio and image data between user devices. Regarding Claims 3, 16, 23: Holmberg teaches on the inventions of claims 1, 9, 10 as described. Holmberg teaches wherein a device for stream merging is comprised in a live streamer terminal. ([0041] FIG. 5 is a functional block diagram of hardware and software components executable at an illustrative mobile phone-type portable computing device to facilitate processing and communication of a captured audiovisual performance for use in a multi-vocalist livestreaming configuration of network-connected devices.) Components shown in Figs 4& 5 capture livestreams and then send to encoder (from both the camera and the mic). The encoder is the device within the mobile device (live streamer terminal). Regarding Claims 4, 17: Holmberg teaches on the inventions of claims 1, 9 as described. Holmberg teaches wherein the device for stream merging is comprised in a merge server. ([0054] Content that is mixed to form group audiovisual performance mix 111 is captured, in the illustrated configuration, in the context of karaoke-style performance capture wherein lyrics 102, optional pitch cues 105 and, typically, a backing track 107 are supplied from content server 110 to either or both of current guest device 1 0lA and current host device 101B. Claim 47: receiving at the second device, a media encoding of a mixed audio performance … mixing the captured second performer vocal audio with the received mixed audio performance to provide a broadcast mix that includes the captured vocal audio of the first and second performers and the backing audio track without apparent temporal lag therebetween; and supplying the broadcast mix to a service platform configured to livestream the broadcast mix to plural recipient devices constituting an audience.) The content server (second device as claimed in Claim 47) is a merge server and comprises components for stream merging. Regarding Claims 5, 18: Holmberg teaches on the inventions of claims 4, 17 as described. Holmberg teaches wherein obtaining a second speech stream and a first image comprises: obtaining the second speech stream and the first image from the forward server, the forward server (ie. content server) further configured to forward image information and speech information of the second user. ([0052] FIG. 1, iPhone™ handhelds available from Apple Inc. (or more generally, handhelds 101A, 101B operating as guest and host devices, respectively) execute software that operates in coordination with a content server 110 to provide vocal capture. [0053] A current guest user of current guest device 101A contributes to the group audiovisual performance mix 111 that is supplied ( eventually via content server 110) by current host device 101B as live stream 122. [0058] User vocals 103A and 103B are captured at respective handhelds 101A, 101B, and may be optionally pitch-corrected continuously and in real-time and audibly rendered mixed with the locally-appropriate backing track ( e.g., backing track 107A at current guest device 101A and guest mix 106 at current host device 101B).) The content server obtains/receives and forwards/provides audio and image data between user devices. Regarding Claims 6, 19: Holmberg teaches on the inventions of claims 1, 9 as described. Holmberg teaches wherein the second image comprises a target image and visual effect associated with the first speech stream, the target image configured to indicate the first user. ([0026] In some embodiments, the method further includes dynamically varying in the broadcast mix at least visual prominence of one or the other of the first and second performers based on evaluation of a computationally audio defined feature of either or both of the first and second performer vocals. In some embodiments, the method further includes applying one or more video effects to the broadcast mix based, at least in part, on a computationally defined audio or video feature of either or both of the first and second performer audio or video.) The first merged streaming data is the broadcast mix and it’s encoded with video feature of either or both first and second performer video to provide the second merged streaming data. Regarding Claims 7, 20: Holmberg teaches on the inventions of claims 1, 9 as described. Holmberg teaches wherein after obtaining the second merged streaming data, the method further comprises: sending the second merged streaming data to a streaming media server (ie. service platform). (Claim 47: receiving at the second device, a media encoding of a mixed audio performance … mixing the captured second performer vocal audio with the received mixed audio performance to provide a broadcast mix that includes the captured vocal audio of the first and second performers and the backing audio track without apparent temporal lag therebetween; and supplying the broadcast mix to a service platform configured to livestream the broadcast mix to plural recipient devices constituting an audience.) Regarding Claims 13, 14, 21: Holmberg teaches on the inventions of claims 1, 9, 10 as described. Holmberg teaches wherein: both the first user and the second user are live streamer users; or the first user is an audience user and the second user is a live streamer user. ([0025] The received media encoding includes video that is performance synchronized with the captured first performer vocals. [0029] The broadcast mix is presented as a vocal duet. Audio of the live stream includes both conversational-type audio portions captured in correspondence with interactive conversation between the first and second performers. [0045] Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences.) Conclusion & Contact Information Any inquiry concerning this communication or earlier communications from the examiner should be directed to RACHEL J HACKENBERG whose telephone number is (571)272-5417. The examiner can normally be reached 9am-5pm M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Glenton B Burgess can be reached at (571)272-3949. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /RACHEL J HACKENBERG/Primary Examiner, Art Unit 2454
Read full office action

Prosecution Timeline

Sep 03, 2024
Application Filed
Jan 24, 2026
Non-Final Rejection — §102, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12587464
FAULT INJECTION CONFIGURATION EQUIVALENCY TESTING
2y 5m to grant Granted Mar 24, 2026
Patent 12580819
DETERMINING SERVICE GROUP CAPACITY BASED ON AN AGGREGATE RISK METRIC
2y 5m to grant Granted Mar 17, 2026
Patent 12500823
SYSTEM AND METHOD FOR ENTERPRISE - WIDE DATA UTILIZATION TRACKING AND RISK REPORTING
2y 5m to grant Granted Dec 16, 2025
Patent 12495001
CAPACITY AWARE LOAD PACKING FOR LAYER-4 LOAD BALANCER
2y 5m to grant Granted Dec 09, 2025
Patent 12470508
RESTRICTING MESSAGE NOTIFICATIONS AND CONVERSATIONS BASED ON DEVICE TYPE, MESSAGE CATEGORY, AND TIME PERIOD
2y 5m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
99%
With Interview (+26.4%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 300 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month