Last updated: April 19, 2026
Application No. 18/776,765
AUTOMATIC DETECTION OF ALIGNMENT BETWEEN TWO AUDIO SIGNALS

Non-Final OA §103
Filed
Jul 18, 2024
Examiner
YU, NORMAN
Art Unit
2693
Tech Center
2600 — Communications
Assignee
ETH ZÜRICH
OA Round
1 (Non-Final)
Interview Optional

— +13.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 598 resolved cases, 2023–2026
Examiner Intelligence

YU, NORMAN View full profile →
Grants 88% — above average
Career Allow Rate
525 granted / 598 resolved
+25.8% vs TC avg
Moderate +14% lift
Without
With
+13.5%
Interview Lift
resolved cases with interview
Fast prosecutor
2y 1m
Avg Prosecution
35 currently pending
Career history
633
Total Applications
across all art units
Statute-Specific Performance

§101
2.2%
-37.8% vs TC avg
§103
51.8%
+11.8% vs TC avg
§102
17.2%
-22.8% vs TC avg
§112
16.8%
-23.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 598 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-10, 12-16, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chebiyyam hereinafter as Chebi (US 20170180906) in view of Cengarle hereinafter as Ceng (US 2021/0132895).

Regarding claim 1, Chebi teaches A method comprising: analyzing a first sample of a first audio signal to determine a first representation in a space (Chebi ¶0135, “the samples 626-632 may correspond to a first time (t),” corresponding to first audio signal); analyzing a plurality of second samples for a second audio signal to determine a plurality of second representations in the space (Chebi ¶0136, “The samples 654-660 may correspond to the second time (t−1),” corresponding to second audio signal); comparing distances in the space between the first representation and the plurality of second representations in the space (Chebi ¶0136, “The signal comparator 506 may determine a first comparison value 714 (e.g., a difference value or a cross-correlation value) corresponding to the first mismatch value 764 based on the samples 626-632 and the samples 654-660”) to select a second representation (Chebi ¶0138, “The signal comparator 506 may identify a selected comparison value 736 of the comparison values 534 that has a higher (or lower) value than other values of the comparison values 534. For example, the signal comparator 506 may select the second comparison value 716 as the selected comparison value 736 in response to determining that the second comparison value 716 is greater than or equal to the first comparison value 714”); determining an offset between the first sample in the first audio signal and a second sample in the second audio signal that is associated with the second representation (Chebi ¶0139, “The signal comparator 506 may identify the tentative mismatch value 536 of the mismatch values 760 that corresponds to the selected comparison value 736”); and outputting the offset (Chebi figure 5, signal comparator 506 outputs the determined tentative mismatch 536), however does not explicitly teach determining an offset between the first sample in the first audio signal and a second sample in the second audio signal that is associated with the second representation .

Ceng teaches determining an offset between the first sample in the first audio signal and a second sample in the second audio signal that is associated with the second representation (Ceng ¶0033-0035, “The audio processing device 102 performs pre-alignment on the audio signals x.sub.i(t) and x.sub.j(t). The pre-alignment includes synchronizing the audio signals x.sub.i(t) and x.sub.j(t) based on waveform cross-correlation”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Ceng to improve the known method of Chebi to achieve the predictable result of more accurate synchronization (Ceng ¶0005).

Regarding claims 2 and 19, Chebi in view of Ceng teaches wherein analyzing the first sample and analyzing the plurality of second samples comprises: analyzing the first sample using a first branch of a model; and analyzing the plurality of second samples using a second branch of the model (Chebi ¶0091, “The samples 300 may include first samples 320 corresponding to the first audio signal 130, second samples 350 corresponding to the second audio signal 132,” wherein the first and audio signals originate from first microphone 146 and second microphone 148 which can be considered different branches of a model).

Regarding claim 3, Chebi in view of Ceng teaches wherein the first branch and the second branch include the same logic to generate first representations and second representations, respectively, in the space (Chebi ¶0055 and ¶0060, the audio signals are compared, in order to be comparable the analysis of the signals must necessarily follow the “same logic”, with BRI).

Regarding claim 4, Chebi in view of Ceng teaches wherein: the first branch comprises first parameters that are trained to generate first representations, and the second branch comprises second parameters that are trained to generate second representations (Chebi figure 6, and ¶0047 “frame-by-frame basis, e.g., based on each 20 milliseconds (ms) speech/audio frame”).

Regarding claim 5, Chebi in view of Ceng teaches selecting a time period based on the first sample; and selecting the plurality of second samples based on the time period (Chebi figure 6, and ¶0047 “frame-by-frame basis, e.g., based on each 20 milliseconds (ms) speech/audio frame” and fig 7, ¶0135-0137, the first and second samples from the first and second (resampled) signals are related with respect to the point in time to which the frames correspond).

Regarding claim 6, Chebi in view of Ceng teaches wherein comparing the first representation and the plurality of second representations comprises: comparing a distance in the space between the first representation and second representation in the plurality of second representations (Chebi figure 6, and ¶0047 “frame-by-frame basis, e.g., based on each 20 milliseconds (ms) speech/audio frame”); and selecting second representation from the plurality of second representations based on the respective distance (Chebi ¶0138, the selection of the second samples of the second signal that are most similar to the first samples of the first signal, differences between values of the samples are compared in order to find the samples with lower difference as the samples to be selected).

Regarding claim 7, Chebi in view of Ceng teaches wherein selecting second sample comprises: selecting the second representation from the plurality of second representations based on the second representation having a minimum distance to the first representation in the space (Chebi ¶0138, the selection of the second samples of the second signal that are most similar to the first samples of the first signal, differences between values of the samples are compared in order to find the samples with lower difference as the samples to be selected. ¶0134 “minimum mismatch value” ¶0136 “The signal comparator 506 may determine a first comparison value 714 (e.g., a difference value or a cross-correlation value) corresponding to the first mismatch value 764 based on the samples 626-632 and the samples 654-660”).

Regarding claim 8, Chebi in view of Ceng teaches wherein: the first sample is from a first sequence of first samples in the first audio signal, and the second sample is from a second sequence of second samples in the second audio signal (Chebi figure 7).

Regarding claim 9, Chebi in view of Ceng teaches determining an offset for the first samples to a respective second sample (Chebi ¶0136 “The signal comparator 506 may determine a first comparison value 714 (e.g., a difference value or a cross-correlation value) corresponding to the first mismatch value 764 based on the samples 626-632 and the samples 654-660”); and determining an offset for the first sequence based on the offset for the first samples (Chebi ¶0103, “According to one implementation, the signal comparator 506 may retrieve comparison values for previous frames of the resampled signals 530, 532 and may modify the comparison values 534 based on a long-term smoothing operation using the comparison values for previous frames”).

Regarding claim 10, Chebi in view of Ceng teaches determining a training dataset including pairs of first training audio samples and second training audio samples (Chebi ¶0136 “The signal comparator 506 may determine a first comparison value 714 (e.g., a difference value or a cross-correlation value) corresponding to the first mismatch value 764 based on the samples 626-632 and the samples 654-660”); analyzing the pairs using a model to output first training representations and second training representations; and adjusting parameters of the model based on labels associated with the pairs and a distance between respective first training representations and second training representations (Chebi ¶0103, “According to one implementation, the signal comparator 506 may retrieve comparison values for previous frames of the resampled signals 530, 532 and may modify the comparison values 534 based on a long-term smoothing operation using the comparison values for previous frames,” with BRI the parameters are constantly being trained when the previous frames are being updated).

Regarding claim 12, Chebi in view of Ceng teaches wherein determining the offset comprises: determining a first position identifier for the first sample in the first audio signal; determining a second position identifier for the second sample in the second audio signal; and determining the offset based on the first position identifier and the second position identifier (Chebi figure 6, and ¶0047 “frame-by-frame basis, e.g., based on each 20 milliseconds (ms) speech/audio frame” and fig 7, ¶0135-0137, the first and second samples from the first and second (resampled) signals are related with respect to the point in time to which the frames correspond).

Regarding claim 13, Chebi in view of Ceng teaches wherein determining the offset comprises: determining the offset for a first time for the first sample in the first audio signal and a second time for the second sample in the second audio signal as the offset (Chebi figure 6, and ¶0047 “frame-by-frame basis, e.g., based on each 20 milliseconds (ms) speech/audio frame” and fig 7, ¶0135-0137, the first and second samples from the first and second (resampled) signals are related with respect to the point in time to which the frames correspond ¶0064, “the second audio signal 132 may be adjusted to temporally align with the first audio signal 130. However, as described below, in other implementations, the first audio signal 130 may be the target channel and the second audio signal 132 may be the reference channel”).

Regarding claim 14, Chebi in view of Ceng teaches wherein: the comparing of the first representation and the plurality of second representations in the space is based on a distance in the space between the first representation and the plurality of second representations, and the offset is determined based on a difference in time between the first sample and the second sample in the first audio signal and the second audio signal (Chebi ¶0138, the selection of the second samples of the second signal that are most similar to the first samples of the first signal, differences between values of the samples are compared in order to find the samples with lower difference as the samples to be selected).

Regarding claim 15, Chebi in view of Ceng teaches analyzing the offset to adjust a synchronization of the first audio signal and the second audio signal (Chebi ¶0064, “the second audio signal 132 may be adjusted to temporally align with the first audio signal 130. However, as described below, in other implementations, the first audio signal 130 may be the target channel and the second audio signal 132 may be the reference channel”).

Regarding claim 16, Chebi in view of Ceng teaches analyzing the offset to identify a synchronization issue of the first audio signal and the second audio signal (Chebi ¶0064, “the second audio signal 132 may be adjusted to temporally align with the first audio signal 130. However, as described below, in other implementations, the first audio signal 130 may be the target channel and the second audio signal 132 may be the reference channel”).

Regarding claim 18, Chebi teaches A non-transitory computer-readable storage medium having stored thereon computer executable instructions, which when executed by a computing device, cause the computing device to be operable for: analyzing a first sample of a first audio signal to determine a first representation in a space (Chebi ¶0135, “the samples 626-632 may correspond to a first time (t),” corresponding to first audio signal); analyzing a plurality of second samples for a second audio signal to determine a plurality of second representations in the space (Chebi ¶0136, “The samples 654-660 may correspond to the second time (t−1),” corresponding to second audio signal); comparing the first representation and the plurality of second representations in the space (Chebi ¶0136, “The signal comparator 506 may determine a first comparison value 714 (e.g., a difference value or a cross-correlation value) corresponding to the first mismatch value 764 based on the samples 626-632 and the samples 654-660”) to select a second representation (Chebi ¶0138, “The signal comparator 506 may identify a selected comparison value 736 of the comparison values 534 that has a higher (or lower) value than other values of the comparison values 534. For example, the signal comparator 506 may select the second comparison value 716 as the selected comparison value 736 in response to determining that the second comparison value 716 is greater than or equal to the first comparison value 714”); determining an offset between the first sample and a second sample that is associated with the second representation (Chebi ¶0139, “The signal comparator 506 may identify the tentative mismatch value 536 of the mismatch values 760 that corresponds to the selected comparison value 736”); and outputting the offset (Chebi figure 5, signal comparator 506 outputs the determined tentative mismatch 536), however does not explicitly teach determining an offset between the first sample in the first audio signal and a second sample in the second audio signal that is associated with the second representation .

Ceng teaches determining an offset between the first sample in the first audio signal and a second sample in the second audio signal that is associated with the second representation (Ceng ¶0033-0035, “The audio processing device 102 performs pre-alignment on the audio signals x.sub.i(t) and x.sub.j(t). The pre-alignment includes synchronizing the audio signals x.sub.i(t) and x.sub.j(t) based on waveform cross-correlation”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Ceng to improve the known method of Chebi to achieve the predictable result of more accurate synchronization (Ceng ¶0005).

Regarding claim 20, Chebi teaches An apparatus comprising: one or more computer processors; and a computer-readable storage medium comprising instructions for controlling the one or more computer processors to be operable for: analyzing a first sample of a first audio signal to determine a first representation in a space (Chebi ¶0135, “the samples 626-632 may correspond to a first time (t),” corresponding to first audio signal); analyzing a plurality of second samples for a second audio signal to determine a plurality of second representations in the space (Chebi ¶0136, “The samples 654-660 may correspond to the second time (t−1),” corresponding to second audio signal); comparing the first representation and the plurality of second representations in the space (Chebi ¶0136, “The signal comparator 506 may determine a first comparison value 714 (e.g., a difference value or a cross-correlation value) corresponding to the first mismatch value 764 based on the samples 626-632 and the samples 654-660”) to select a second representation (Chebi ¶0138, “The signal comparator 506 may identify a selected comparison value 736 of the comparison values 534 that has a higher (or lower) value than other values of the comparison values 534. For example, the signal comparator 506 may select the second comparison value 716 as the selected comparison value 736 in response to determining that the second comparison value 716 is greater than or equal to the first comparison value 714”); determining an offset between the first sample and a second sample that is associated with the second representation (Chebi ¶0139, “The signal comparator 506 may identify the tentative mismatch value 536 of the mismatch values 760 that corresponds to the selected comparison value 736”); and outputting the offset (Chebi figure 5, signal comparator 506 outputs the determined tentative mismatch 536), however does not explicitly teach determining an offset between the first sample in the first audio signal and a second sample in the second audio signal that is associated with the second representation .

Ceng teaches determining an offset between the first sample in the first audio signal and a second sample in the second audio signal that is associated with the second representation (Ceng ¶0033-0035, “The audio processing device 102 performs pre-alignment on the audio signals x.sub.i(t) and x.sub.j(t). The pre-alignment includes synchronizing the audio signals x.sub.i(t) and x.sub.j(t) based on waveform cross-correlation”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Ceng to improve the known method of Chebi to achieve the predictable result of more accurate synchronization (Ceng ¶0005).

Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chebiyyam hereinafter as Chebi (US 20170180906) in view of Cengarle hereinafter as Ceng (US 2021/0132895) in further view of Gupta (US 12505863).

Regarding claim 11, Chebi in view of Ceng teaches wherein adjusting the parameters comprises: adjusting a parameter to cause the model to output a first training representation or a second training representation that is closer in distance in the space when the pair indicating the first training sample and the second training sample are in-sync (Chebi ¶119, “mismatch value 162 has a value (e.g., 0) indicating no time shift”); and adjusting the parameter to cause the model to output the first training representation or the second training representation that is farther in distance in the space when the pair has a indicating the first training sample and the second training sample are out-of-sync (Chebi ¶119, “The gain parameter generator 514 may select samples of the target signal (e.g., the second audio signal 132) based on the non-causal mismatch value 162. To illustrate, the gain parameter generator 514 may select the samples 358-364 in response to determining that the non-causal mismatch value 162 has a first value (e.g., +X ms or +Y samples, where X and Y include positive real numbers)”), however does not explicitly teach the use of label for indicating.

Gupta teaches the use of label for indicating synchronization (Gupta Col6 lines 25-48, “The machine learning model may be trained on labelled data of dubbed clips having a score indicating how perceptually correlated the dubbed audio is with the lip-movements of the speaker in the corresponding video, as well as distances between lip poses of the speaker and expect visemes based on the dubbed audio. In some implementations, the data is labelled using a range of scores, e.g., a score of 1-5, where a higher score indicates a higher perception of correlation. In some embodiments, the data is labelled as synchronized or not synchronized”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Gupta to improve the known method of Chebi in view if Ceng to achieve the predictable result of efficient signal processing by using labeled signals.

Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chebiyyam hereinafter as Chebi (US 20170180906) in view of Cengarle hereinafter as Ceng (US 2021/0132895) in further view of Zhu (CN 110675886).

Regarding claim 17, Chebi in view of Ceng does not explicitly teach wherein: the first audio signal is a reference audio signal in an original language of a video, and the second audio signal is a dubbed audio signal in a translation of the original language for the video.

Zhu teaches wherein: the first audio signal is a reference audio signal in an original language of a video, and the second audio signal is a dubbed audio signal in a translation of the original language for the video (Zhu ¶0013, “obtain the standard audio features of the original audio signal corresponding to the audio signal to be processed,” ¶0154 “the specific type of audio signal to be processed…the audio signal of a video or advertisement dubbing recorded.” ¶173 “When performing dubbing enhancement, the corresponding original audio file is called according to the identifier of the video,” ¶0005 “dubbed voice can be compared with the standard voice of the dubbing actor”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Zhu to improve the known method of Chebi in view of Ceng to achieve the predictable result of enhancing audio processing (Zhu ¶0131).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NORMAN YU whose telephone number is (571)270-7436.  The examiner can normally be reached on Mon - Fri 11am-7pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Any response to this action should be mailed to:
                        Commissioner of Patents and Trademarks
                        P.O. Box 1450
                        Alexandria, Va.  22313-1450
        Or faxed to:
                    (571) 273-8300, for formal communications intended for entry and for 
                     informal or draft communications, please label “PROPOSED” or “DRAFT”.
                                Hand-delivered responses should be brought to: 

                         Customer Service Window 
                         Randolph Building 
                         401 Dulany Street 
                         Arlington, VA 22314

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/NORMAN YU/Primary Examiner, Art Unit 2693
Read full office action
Prosecution Timeline

Jul 18, 2024
Application Filed
Feb 25, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/205,362
Patent 12604123
APPARATUS AND VEHICULAR APPARATUS INCLUDING THE SAME
2y 5m to grant Granted Apr 14, 2026
18/188,055
Patent 12598409
IN-EAR WEARABLE DEVICE
2y 5m to grant Granted Apr 07, 2026
18/312,253
Patent 12594882
AUTOMOTIVE SOUND AMPLIFICATION
2y 5m to grant Granted Apr 07, 2026
18/327,873
Patent 12593165
ACOUSTIC INPUT-OUTPUT DEVICES
2y 5m to grant Granted Mar 31, 2026
18/343,228
Patent 12581238
BINDING BAND ASSEMBLY FOR HEADSET AND HEADSET
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
88%
Grant Probability
99%
With Interview (+13.5%)
2y 1m
Median Time to Grant
Low
PTA Risk
Based on 598 resolved cases by this examiner. Grant probability derived from career allow rate.