Prosecution Insights
Last updated: April 19, 2026
Application No. 18/303,021

SYNCHRONIZING AUDIO AND VIDEO USING PAUSE GAP ANALYSIS

Final Rejection §103
Filed
Apr 19, 2023
Examiner
ALAM, MUSHFIKH I
Art Unit
2426
Tech Center
2400 — Computer Networks
Assignee
International Business Machines Corporation
OA Round
2 (Final)
58%
Grant Probability
Moderate
3-4
OA Rounds
3y 9m
To Grant
96%
With Interview

Examiner Intelligence

Grants 58% of resolved cases
58%
Career Allow Rate
295 granted / 509 resolved
At TC average
Strong +38% interview lift
Without
With
+38.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
32 currently pending
Career history
541
Total Applications
across all art units

Statute-Specific Performance

§101
3.0%
-37.0% vs TC avg
§103
68.4%
+28.4% vs TC avg
§102
13.1%
-26.9% vs TC avg
§112
4.5%
-35.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 509 resolved cases

Office Action

§103
DETAILED ACTION Claims 1-18 are pending. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 1-3, 5-9, 11-15, 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kang (US 2015/0195608) in view Bauchot et al. (US 2009/0060458). Claim 1, Kang teaches a computer-implemented method for synchronizing audio and video using pause gap analysis, the method comprising: splitting a video into an audio stream and a video stream (p. 0062); identifying time points at which there is no sound in the audio stream and deriving pause gaps in the audio stream (i.e. audio silence detector) (p. 0091, 0164); applying a binary classifier to predict sound presence or absence in frames of the video stream and deriving pause gaps in the video stream (i.e. detecting scene changes) (p. 0192);. Kang is not entirely clear in teaching a computer-implemented method for synchronizing audio and video using pause gap analysis, the method comprising: identifying desynchronization between the pause gaps in the video stream and the pause gaps in the audio stream; and aligning the pause gaps in the video stream with the pause gaps in the audio stream, based on metadata of the pause gaps in the video stream. Bauchot teaches a computer-implemented method for synchronizing audio and video using pause gap analysis, the method comprising: identifying desynchronization (i.e. synchronization mark) between the pause gaps (i.e. silence detection) in the video stream and the pause gaps in the audio stream (p. 0031-0033); and aligning the pause gaps (i.e. increasing or decreasing silence gaps) in the video stream with the pause gaps in the audio stream, based on metadata of the pause gaps in the video stream (i.e. syncing data flows by extending silences and matching silences) (p. 0031-0033, 0042). Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided synchronization of streams as taught by Bauchot to the system of Kang to sync data flows with different silence gaps (p. 0033). Claim 2, The computer-implemented method of claim 1, further comprising: splitting a training video (i.e. audio or video) into a training audio stream and a training video stream (p. 0062); identifying time points at which there is no sound in the training audio stream and deriving pause gaps in the training audio stream (i.e. audio silence detector) (p. 0091, 0164). Kang is not entirely clear in teaching The computer-implemented method of claim 1, further comprising: converting the training audio stream into a binary stream with sound flags identifying the time points; and using the sound flags and frames in the training video stream to train the binary classifier. Bauchot teaches The computer-implemented method of claim 1, further comprising: converting the training audio stream into a binary stream with sound flags identifying the time points (i.e. silence detection) (p. 0031-0033); and using the sound flags (i.e. scene changes) and frames in the training video stream to train the binary classifier (i.e. syncing data flows by extending silences and matching silences) (p. 0031-0033, 0042) Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided synchronization of streams as taught by Bauchot to the system of Kang to sync data flows with different silence gaps (p. 0033). Claim 3, Kang is silent regarding the computer-implemented method of claim 2, wherein the training video is a normal video which has no desynchronization of the training audio stream and the training video stream. Bauchot teaches the computer-implemented method of claim 2, wherein the training video is a normal video which has no desynchronization of the training audio stream and the training video stream (i.e. buffered data flow) (p. 0031-0033). Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided synchronization of streams as taught by Bauchot to the system of Kang to sync data flows with different silence gaps (p. 0033). Claim 5, Kang teaches the computer-implemented method of claim 1, further comprising: feeding the video stream into the binary classifier (i.e. scene change detection) to obtain binary values which indicate whether the sound presence or absence in the frames of the video stream (i.e. using black frames) (p. 0192). Claim 6, Kang is silent regarding the computer-implemented method of claim 1, wherein, by aligning the pause gaps in the video stream with the pause gaps in the audio stream, the video stream and the audio stream are synchronized. Bauchot teaches the computer-implemented method of claim 1, wherein, by aligning the pause gaps in the video stream with the pause gaps in the audio stream, the video stream and the audio stream are synchronized (i.e. syncing data flows by extending silences and matching silences) (p. 0031-0033, 0042). Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided synchronization of streams as taught by Bauchot to the system of Kang to sync data flows with different silence gaps (p. 0033). Claim 7 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 1. Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 1. Claim 8 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 2. Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 2. Claim 9 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 3. Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 3. Claim 11 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 5. Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 5. Claim 12 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 6. Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 6. Claim 13 is analyzed and interpreted as an apparatus of claim 1. Claim 14 is analyzed and interpreted as an apparatus of claim 2. Claim 15 is analyzed and interpreted as an apparatus of claim 3. Claim 17 is analyzed and interpreted as an apparatus of claim 5. Claim 18 is analyzed and interpreted as an apparatus of claim 6. Claim(s) 4, 10, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kang (US 2015/0195608) in view Bauchot et al. (US 2009/0060458), and Kumar et al. (US 2024/0098346). Claim 4, Kang is silent regarding the computer-implemented method of claim 2, wherein training the binary classifier is through supervised machine learning. Kumar teaches the computer-implemented method of claim 2, wherein training the binary classifier is through supervised machine learning (p. 0036). Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided machine learning for scene detection as taught by Kumar to the system of Kang to allow boundary locations to be identified (p. 0036). Claim 10 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 4. Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 4. Claim 16 is analyzed and interpreted as an apparatus of claim 4. Response to Arguments Applicant's arguments filed 12/9/2025 have been fully considered but they are not persuasive. Claim 1, Applicant submits that the cited combination of references fails to teach or suggest each limitation of the invention as set forth in the claim. Specifically, whereas the Office Action provides that the limitation: "applying a binary classifier to predict sound presence or absence in frames of the video stream and deriving pause gaps in the video stream" is taught or suggested by Kang, paragraph [0192]: "As still another case, the advertising program may be detected using detection information on black frame and scene change from the video data and detection information on audio silence from the audio data for the advertisement detection period set based on the electronic program guide information of data for data broadcasting within the broadcast signal including the broadcasting program" Applicant submits hat the cited paragraph provides for detecting the presence of an advertising program using either black frame and scene change from the video data and detection information on audio silence from the audio data, and not the claimed prediction of the presence or absence of sound in the frames of the video stream. Applicant submits that the cited portions of the reference fail to teach or suggest predicting sound presence or absence in frames of the video stream, focusing instead upon the presence of black frames and scene changes in the video stream to detect advertising and not to predict the presence or absence of sound in the advertising program. In response: The Examiner respectfully disagrees. Kang clearly discloses a system that detects black frames and/or a silence within the program which according to one or ordinary skill in the art is used to “predict” the presence of an advertisement. In functional language there is no difference between “detect” and “predict” absent language that specifies how a prediction is performed. The claims are silent regarding how a prediction is performed. Therefore, Kangs system of “detecting” black frames for silence in the program, since a detection is never going to be 100% accurate all the time, this detection is interpreted as also a prediction. Applicant further argues that Bauchot, paragraph [0033] provides: "In an embodiment, the data flows buffer (200) buffers a first incoming data flow. As soon as the synchronization marks receiver (200) receives a synchronization mark involving the first data flow, the audio silence detector (200) starts analyzing and detecting audio silence periods. Meanwhile, the data flows buffer (200) listens for the pending necessary second data flow, as determined by the synchronization mark. Buffered data is modified in the data flows modification unit (200). Audio silence periods durations are increased or decreased, according to the interaction with the network controller (208). When both the second data of the second data flow to be synchronized with the first data of the first data flow and the first data of the first data flow are received, buffered, and synchronized, the data quit the buffer running positions for playing back in the media player (160)." As to the limitation: "identifying desynchronization between the pause gaps in the video stream and the pause gaps in the audio stream", the Office Action points only to the use of a genericized synchronization mark as teaching the limitation. Applicant submits that the cited portion of the reference lacks any specificity regarding the desynchronization between the pause gaps in the video stream and the pause gaps in the audio stream details of the limitation. The reference speaks of analyzing and detecting audio silence periods in the first data flow but fails to mention the claimed pause gaps in the video stream. In response: Applicant respectfully disagrees. Reading the claims in the broadest sense, the claims requires identifying a desync in between the gaps in the audio and video, and then aligning the gaps. Bauchot discloses audio and video gaps (fig. 7, 702, “v1”). When an audio gap is increased, as taught by Bauchot, a desync will occur and the system of Bauchot will synchronize or align the streams by adding frames to the video gap (710, 712). Therefore, Bauchot discloses the claimed features of “…identifying a desync in between the gaps in the audio and video, and then aligning the gaps…” Conclusion Claims 1-18 are rejected. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Inquiries Any inquiry concerning this communication or earlier communications from the examiner should be directed to MUSHFIKH I ALAM whose telephone number is (571)270-1710. The examiner can normally be reached 1:00PM-9:00PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached at 571-272-4195. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. MUSHFIKH I. ALAM Primary Examiner Art Unit 2426 /MUSHFIKH I ALAM/Primary Examiner, Art Unit 2426 3/25/2026
Read full office action

Prosecution Timeline

Apr 19, 2023
Application Filed
Sep 09, 2025
Non-Final Rejection — §103
Nov 20, 2025
Interview Requested
Dec 09, 2025
Response Filed
Dec 09, 2025
Applicant Interview (Telephonic)
Dec 09, 2025
Examiner Interview Summary
Mar 25, 2026
Final Rejection — §103
Apr 15, 2026
Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12587707
SESSION TYPE CLASSIFICATION FOR MODELING
2y 5m to grant Granted Mar 24, 2026
Patent 12581157
SYSTEMS AND METHODS FOR MEDIA CONTENT HAND-OFF BASED ON TYPE OF BUFFERED DATA
2y 5m to grant Granted Mar 17, 2026
Patent 12578752
DISPLAY DEVICE AND METHOD FOR OPERATING THE SAME
2y 5m to grant Granted Mar 17, 2026
Patent 12563241
INTERACTIVE METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Feb 24, 2026
Patent 12556751
SYSTEMS AND METHODS FOR IMPROVING LIVE STREAMING
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
58%
Grant Probability
96%
With Interview (+38.5%)
3y 9m
Median Time to Grant
Moderate
PTA Risk
Based on 509 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month