Last updated: April 19, 2026

Application No. 18/303,021

SYNCHRONIZING AUDIO AND VIDEO USING PAUSE GAP ANALYSIS

Final Rejection §103

Filed

Apr 19, 2023

Examiner

ALAM, MUSHFIKH I

Art Unit

2426

Tech Center

2400 — Computer Networks

Assignee

International Business Machines Corporation

OA Round

2 (Final)

Interview Optional

— +38.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 509 resolved cases, 2023–2026

Examiner Intelligence

ALAM, MUSHFIKH I View full profile →

Grants 58% of resolved cases

Career Allow Rate

295 granted / 509 resolved

At TC average

Strong +38% interview lift

Without

With

+38.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 9m

Avg Prosecution

32 currently pending

Career history

541

Total Applications

across all art units

Statute-Specific Performance

§101

3.0%

-37.0% vs TC avg

§103

68.4%

+28.4% vs TC avg

§102

13.1%

-26.9% vs TC avg

§112

4.5%

-35.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 509 resolved cases

Office Action

§103

DETAILED ACTION
Claims 1-18 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 5-9, 11-15, 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kang (US 2015/0195608) in view Bauchot et al. (US 2009/0060458).

Claim 1, Kang teaches a computer-implemented method for synchronizing audio and video using pause gap analysis, the method comprising: 
splitting a video into an audio stream and a video stream (p. 0062); 
identifying time points at which there is no sound in the audio stream and deriving pause gaps in the audio stream (i.e. audio silence detector) (p. 0091, 0164); 
applying a binary classifier to predict sound presence or absence in frames of the video stream and deriving pause gaps in the video stream (i.e. detecting scene changes) (p. 0192);.
Kang is not entirely clear in teaching a computer-implemented method for synchronizing audio and video using pause gap analysis, the method comprising:
identifying desynchronization between the pause gaps in the video stream and the pause gaps in the audio stream; and 
aligning the pause gaps in the video stream with the pause gaps in the audio stream, based on metadata of the pause gaps in the video stream.
Bauchot teaches a computer-implemented method for synchronizing audio and video using pause gap analysis, the method comprising:
identifying desynchronization (i.e. synchronization mark) between the pause gaps (i.e. silence detection) in the video stream and the pause gaps in the audio stream (p. 0031-0033); and 
aligning the pause gaps (i.e. increasing or decreasing silence gaps) in the video stream with the pause gaps in the audio stream, based on metadata of the pause gaps in the video stream (i.e. syncing data flows by extending silences and matching silences) (p. 0031-0033, 0042).
Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided synchronization of streams as taught by Bauchot to the system of Kang to sync data flows with different silence gaps (p. 0033).

Claim 2, The computer-implemented method of claim 1, further comprising: 
splitting a training video (i.e. audio or video) into a training audio stream and a training video stream (p. 0062); 
identifying time points at which there is no sound in the training audio stream and deriving pause gaps in the training audio stream (i.e. audio silence detector) (p. 0091, 0164).
Kang is not entirely clear in teaching The computer-implemented method of claim 1, further comprising:
converting the training audio stream into a binary stream with sound flags identifying the time points; and 
using the sound flags and frames in the training video stream to train the binary classifier.
Bauchot teaches The computer-implemented method of claim 1, further comprising:
converting the training audio stream into a binary stream with sound flags identifying the time points (i.e. silence detection) (p. 0031-0033); and 
using the sound flags (i.e. scene changes) and frames in the training video stream to train the binary classifier (i.e. syncing data flows by extending silences and matching silences) (p. 0031-0033, 0042)
Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided synchronization of streams as taught by Bauchot to the system of Kang to sync data flows with different silence gaps (p. 0033).

Claim 3, Kang is silent regarding the computer-implemented method of claim 2, wherein the training video is a normal video which has no desynchronization of the training audio stream and the training video stream.
Bauchot teaches the computer-implemented method of claim 2, wherein the training video is a normal video which has no desynchronization of the training audio stream and the training video stream (i.e. buffered data flow) (p. 0031-0033).
Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided synchronization of streams as taught by Bauchot to the system of Kang to sync data flows with different silence gaps (p. 0033).

Claim 5, Kang teaches the computer-implemented method of claim 1, further comprising: 
feeding the video stream into the binary classifier (i.e. scene change detection) to obtain binary values which indicate whether the sound presence or absence in the frames of the video stream (i.e. using black frames) (p. 0192).


Claim 6, Kang is silent regarding the computer-implemented method of claim 1, wherein, by aligning the pause gaps in the video stream with the pause gaps in the audio stream, the video stream and the audio stream are synchronized.
Bauchot teaches the computer-implemented method of claim 1, wherein, by aligning the pause gaps in the video stream with the pause gaps in the audio stream, the video stream and the audio stream are synchronized (i.e. syncing data flows by extending silences and matching silences) (p. 0031-0033, 0042).
Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided synchronization of streams as taught by Bauchot to the system of Kang to sync data flows with different silence gaps (p. 0033).

Claim 7 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 1.
Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 1.
Claim 8 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 2.
Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 2.
Claim 9 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 3.
Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 3.
Claim 11 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 5.
Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 5.
Claim 12 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 6.
Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 6.

Claim 13 is analyzed and interpreted as an apparatus of claim 1.
Claim 14 is analyzed and interpreted as an apparatus of claim 2.
Claim 15 is analyzed and interpreted as an apparatus of claim 3.
Claim 17 is analyzed and interpreted as an apparatus of claim 5.
Claim 18 is analyzed and interpreted as an apparatus of claim 6.

Claim(s) 4, 10, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kang (US 2015/0195608) in view Bauchot et al. (US 2009/0060458), and Kumar et al. (US 2024/0098346).

Claim 4, Kang is silent regarding the computer-implemented method of claim 2, wherein training the binary classifier is through supervised machine learning.
Kumar teaches the computer-implemented method of claim 2, wherein training the binary classifier is through supervised machine learning (p. 0036).
Therefore, it would be obvious to one of ordinary skill in the art before the effective invention was filed to have provided machine learning for scene detection as taught by Kumar to the system of Kang to allow boundary locations to be identified (p. 0036).

Claim 10 recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 4.
Kang inherently discloses recites “A computer program product for synchronizing audio and video using pause gap analysis, the computer program product comprising a computer readable storage medium having program instructions stored therewith” for performing the steps of claim 4.

Claim 16 is analyzed and interpreted as an apparatus of claim 4.

Response to Arguments
Applicant's arguments filed 12/9/2025 have been fully considered but they are not persuasive.

Claim 1, Applicant submits that the cited combination of references fails to teach or suggest each limitation of the invention as set forth in the claim. Specifically, whereas the Office Action provides that the limitation: "applying a binary classifier to predict sound presence or absence in frames of the video stream and deriving pause gaps in the video stream" is taught or suggested by Kang, paragraph [0192]: "As still another case, the advertising program may be detected using detection information on black frame and scene change from the video data and detection information on audio silence from the audio data for the advertisement detection period set based on the electronic program guide information of data for data broadcasting within the broadcast signal including the broadcasting program" Applicant submits hat the cited paragraph provides for detecting the presence of an advertising program using either black frame and scene change from the video data and detection information on audio silence from the audio data, and not the claimed prediction of the presence or absence of sound in the frames of the video stream. Applicant submits that the cited portions of the reference fail to teach or suggest predicting sound presence or absence in frames of the video stream, focusing instead upon the presence of black frames and scene changes in the video stream to detect advertising and not to predict the presence or absence of sound in the advertising program.
In response:
The Examiner respectfully disagrees.  Kang clearly discloses a system that detects black frames and/or a silence within the program which according to one or ordinary skill in the art is used to “predict” the presence of an advertisement.  In functional language there is no difference between “detect” and “predict” absent language that specifies how a prediction is performed.  The claims are silent regarding how a prediction is performed.  Therefore, Kangs system of “detecting” black frames for silence in the program, since a detection is never going to be 100% accurate all the time, this detection is interpreted as also a prediction.

Applicant further argues that Bauchot, paragraph [0033] provides: "In an embodiment, the data flows buffer (200) buffers a first incoming data flow. As soon as the synchronization marks receiver (200) receives a synchronization mark involving the first data flow, the audio silence detector (200) starts analyzing and detecting audio silence periods. Meanwhile, the data flows buffer (200) listens for the pending necessary second data flow, as determined by the synchronization mark. Buffered data is modified in the data flows modification unit (200). Audio silence periods durations are increased or decreased, according to the interaction with the network controller (208). When both the second data of the second data flow to be synchronized with the first data of the first data flow and the first data of the first data flow are received, buffered, and synchronized, the data quit the buffer running positions for playing back in the media player (160)."
As to the limitation: "identifying desynchronization between the pause gaps in the video stream and the pause gaps in the audio stream", the Office Action points only to the use of a genericized synchronization mark as teaching the limitation. Applicant submits that the cited portion of the reference lacks any specificity regarding the desynchronization between the pause gaps in the video stream and the pause gaps in the audio stream details of the limitation. The reference speaks of analyzing and detecting audio silence periods in the first data flow but fails to mention the claimed pause gaps in the video stream.
In response:
Applicant respectfully disagrees.  Reading the claims in the broadest sense, the claims requires identifying a desync in between the gaps in the audio and video, and then aligning the gaps.  Bauchot discloses audio and video gaps (fig. 7, 702, “v1”).  When an audio gap is increased, as taught by Bauchot, a desync will occur and the system of Bauchot will synchronize or align the streams by adding frames to the video gap (710, 712). Therefore, Bauchot discloses the claimed features of “…identifying a desync in between the gaps in the audio and video, and then aligning the gaps…”

Conclusion
Claims 1-18 are rejected.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Inquiries
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MUSHFIKH I ALAM whose telephone number is (571)270-1710. The examiner can normally be reached 1:00PM-9:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nasser Goodarzi can be reached at 571-272-4195. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

MUSHFIKH I. ALAM
Primary Examiner
Art Unit 2426



/MUSHFIKH I ALAM/Primary Examiner, Art Unit 2426                                                                                                                                                                                                        3/25/2026

Read full office action

Prosecution Timeline

Apr 19, 2023

Application Filed

Sep 09, 2025

Non-Final Rejection — §103

Nov 20, 2025

Interview Requested

Dec 09, 2025

Response Filed

Dec 09, 2025

Applicant Interview (Telephonic)

Dec 09, 2025

Examiner Interview Summary

Mar 25, 2026

Final Rejection — §103

Apr 15, 2026

Interview Requested

Precedent Cases

Applications granted by this same examiner with similar technology

18/175,465

Patent 12587707

SESSION TYPE CLASSIFICATION FOR MODELING

2y 5m to grant Granted Mar 24, 2026

17/973,287

Patent 12581157

SYSTEMS AND METHODS FOR MEDIA CONTENT HAND-OFF BASED ON TYPE OF BUFFERED DATA

2y 5m to grant Granted Mar 17, 2026

18/120,333

Patent 12578752

DISPLAY DEVICE AND METHOD FOR OPERATING THE SAME

2y 5m to grant Granted Mar 17, 2026

18/607,023

Patent 12563241

INTERACTIVE METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM

2y 5m to grant Granted Feb 24, 2026

17/898,925

Patent 12556751

SYSTEMS AND METHODS FOR IMPROVING LIVE STREAMING

2y 5m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

58%

Grant Probability

96%

With Interview (+38.5%)

3y 9m

Median Time to Grant

Moderate

PTA Risk

Based on 509 resolved cases by this examiner. Grant probability derived from career allow rate.

SYNCHRONIZING AUDIO AND VIDEO USING PAUSE GAP ANALYSIS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email