Last updated: April 19, 2026
Application No. 18/507,849
ONE-SHOT ACOUSTIC ECHO GENERATION NETWORK

Non-Final OA §103§112§DP
Filed
Nov 13, 2023
Examiner
YU, NORMAN
Art Unit
2693
Tech Center
2600 — Communications
Assignee
Zoom Video Communications, Inc.
OA Round
1 (Non-Final)
Interview Optional

— +13.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 598 resolved cases, 2023–2026
Examiner Intelligence

YU, NORMAN View full profile →
Grants 88% — above average
Career Allow Rate
525 granted / 598 resolved
+25.8% vs TC avg
Moderate +14% lift
Without
With
+13.5%
Interview Lift
resolved cases with interview
Fast prosecutor
2y 1m
Avg Prosecution
35 currently pending
Career history
633
Total Applications
across all art units
Statute-Specific Performance

§101
2.2%
-37.8% vs TC avg
§103
51.8%
+11.8% vs TC avg
§102
17.2%
-22.8% vs TC avg
§112
16.8%
-23.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 598 resolved cases
Office Action

§103 §112 §DP
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.
The following is a quotation of pre-AIA  35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA  35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

Claims 5-6 are rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  Claim 5 depends on itself, which is improper. Claim 6 is also rejected for being dependent on claim 5.  Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO internet Web site contains terminal disclaimer forms which may be used.  Please visit http://www.uspto.gov/forms/.  The filing date of the application will determine what form should be used.  A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission.  For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.  

Claims 1-20 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1, 8, 15 of Patent 11847999.  Although the conflicting claims are not identical, they are not patentably distinct from each other because: claims 1 and 15 are broader variations of claims 1, 8, and 15 of Patent 11847999. Dependent claims 2-14 and 16-20 are also rejected because they are obvious variants of the patented claims.

Claims of the patented application reads on the corresponding dependent claims of the instant application:
2, 9, 16 read on 2,9
4, 11, 18 read on 3, 10
5, 12, 19 read on 4, 11
6, 13, 20 read on 5, 12
7, 14 read on 6, 13  

Claims 1, 8 and 15 of Patent 11847999 in view of Fazeli (US 10803881) reads on claims 7, 14, 20 of the instant application. Claims 1,8 and 15 of Patent 11847999 does not explicitly teach “training an automatic echo cancellation (“AEC”) system based on the echo recording representation.” Fazeli teaches training an automatic echo cancellation (“AEC”) system based on the echo recording representation (Fazeli figures 1B, AEC 110, figure 2 : echo estimator 230, microphone 14). Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Fazeli to improve the patented claims to achieve the predictable result of achieving an optimal signal by reducing the % of unwanted noise.

Patent 11847999
Instant Application 18507849
1. A computer-implemented method for echo recording generation, comprising: receiving, by an autoencoder, an input comprising an audio signal representation and a target echo embedding, wherein: the audio signal representation represents an audio signal; the target echo embedding comprises information about a target room; and the autoencoder comprises an encoder and a decoder; generating, by the encoder, a content embedding based on the audio signal representation and an estimated echo embedding based on the audio signal representation; generating, by the decoder, an echo recording representation, wherein the echo recording representation: is based on the content embedding and the target echo embedding; and comprises a representation of an estimated audio signal that estimates the audio signal being played in the target room, including an estimated echo from playing in the target room; and outputting, by the autoencoder, the echo recording representation and the estimated echo embedding.

8. A non-transitory computer readable medium that stores executable program instructions that when executed by one or more computing devices configure the one or more computing devices to perform operations comprising: receiving, by an autoencoder, an input comprising an audio signal representation and a target echo embedding, wherein: the audio signal representation represents an audio signal; the target echo embedding comprises information about a target room; and the autoencoder comprises an encoder and a decoder; generating, by the encoder, a content embedding based on the audio signal representation and an estimated echo embedding based on the audio signal representation; generating, by the decoder, an echo recording representation, wherein the echo recording representation: is based on the content embedding and the target echo embedding; and comprises a representation of an estimated audio signal that estimates the audio signal being played in the target room, including an estimated echo from playing in the target room; and outputting, by the autoencoder, the echo recording representation and the estimated echo embedding.

15. An echo recording generation system comprising one or more processors configured to perform the operations of: receiving, by an autoencoder, 
an input comprising an audio signal representation and a target echo embedding, wherein: the audio signal representation represents an audio signal; the target echo embedding comprises information about a target room; and the autoencoder comprises an encoder and a decoder; 

generating, by the encoder, a content embedding based on the audio signal representation and an estimated echo embedding based on the audio signal representation; 
generating, by the decoder, an echo recording representation, wherein the echo recording representation: is based on the content embedding and the target echo embedding; 
and comprises a representation of an estimated audio signal that estimates the audio signal being played in the target room, including an estimated echo from playing in the target room; and outputting, by the autoencoder, the echo recording representation and the estimated echo embedding.
1. A method comprising: receiving, by an autoencoder, 
an audio signal representation of an audio signal and a target echo embedding comprising information about a target room; 
generating, based on the audio signal representation and by a trained encoder of the autoencoder, a content embedding and an estimated echo embedding; 
generating, by a trained decoder of the autoencoder, an echo recording representation based on the content embedding and the target echo embedding; and outputting the echo recording representation.


8. A system comprising: a non-transitory computer-readable medium; and one or more processors configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: receive, by an autoencoder, an audio signal representation of an audio signal and a target echo embedding comprising information about a target room; generate, based on the audio signal representation and by a trained encoder of the autoencoder, a content embedding and an estimated echo embedding; generate, by a trained decoder of the autoencoder, an echo recording representation based on the content embedding and the target echo embedding; and output the echo recording representation.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 8-10, and 15-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Soulodre (US 2008/0069366) in view of Ohta (US 2009/0220100).

Regarding claim 1, Soulodre teaches A method comprising: receiving, by an autoencoder (Soulodre circuit of figure 3), an audio signal representation of an audio signal and a target echo embedding (Soulodre ¶0055, Reverberant signal modifier 37 is operable to independently adjust frequency components of one or more of the estimates of the reverberant signal to produce modified estimates of the reverberant signal components) comprising information about a target room (Soulodre figure 1, reverberant room. Reverberations are considered echoes); generating, based on the audio signal representation and by a trained encoder of the autoencoder (Soulodre figure 3 and ¶0049, analysis window 21 and time to frequency domain processor 22.), a content embedding and an estimated echo embedding (Soulodre figure 3 and ¶0054,  decompose processor 33…produce an estimate of the original dry signal and estimates of one or more components of the reverberant signal.); generating, by a trained decoder (Soulodre figure 3, processor 30 and root hanning window 31) of the autoencoder, an echo recording representation based on the content embedding (Soulodre ¶0055, Dry signal modifier 36…produce a modified estimate of the original dry signal.) and the target echo embedding (Soulodre ¶0055, Reverberant signal modifier 37 is operable to independently adjust frequency components of one or more of the estimates of the reverberant signal to produce modified estimates of the reverberant signal components); and outputting the echo recording representation (Soulodre figure 3, output 32), however does not clearly teach the target echo embedding.

Ohta teaches the target echo embedding (Ohta ¶0108, “reverberation time desired by the user, which is preliminarily set via the operating unit 128”), and generating an echo recording representation based on the content embedding and the target echo embedding (Ohta ¶0108).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Ohta to improve the known method of Soulodre to achieve the predictable result of achieving an enhanced audio experienced tailored to the user’s selected preference.

Regarding claims 2, 9, and 16, Soulodre in view of Ohta teaches wherein the target echo embedding encodes information about a geometry of the target room and one or more echo paths (Ohta ¶0107, “when the reverberation characteristic in the listening room 10 is analyzed…shape of the listening room”).

Regarding claims 3, 10 and 17, Soulodre in view of Ohta teaches wherein the target echo embedding is generated by inputting into the autoencoder a second audio signal representation that represents a second audio signal that was recorded in the target room (Soulodre figure 3, and ¶0109, “second input signal s2(t) 40”).

Regarding claim 8, Soulodre teaches A system comprising: a non-transitory computer-readable medium; and one or more processors configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: receive, by an autoencoder (Soulodre circuit of figure 3), an audio signal representation of an audio signal and a target echo embedding (Soulodre ¶0055, Reverberant signal modifier 37 is operable to independently adjust frequency components of one or more of the estimates of the reverberant signal to produce modified estimates of the reverberant signal components)  comprising information about a target room (Soulodre figure 1, reverberant room. Reverberations are considered echoes); generate, based on the audio signal representation and by a trained encoder of the autoencoder (Soulodre figure 3 and ¶0049, analysis window 21 and time to frequency domain processor 22.), a content embedding and an estimated echo embedding (Soulodre figure 3 and ¶0054,  decompose processor 33…produce an estimate of the original dry signal and estimates of one or more components of the reverberant signal.); generate, by a trained decoder of the autoencoder (Soulodre figure 3, processor 30 and root hanning window 31), an echo recording representation based on the content embedding (Soulodre ¶0055, Dry signal modifier 36…produce a modified estimate of the original dry signal.) and the target echo embedding (Soulodre ¶0055, Reverberant signal modifier 37 is operable to independently adjust frequency components of one or more of the estimates of the reverberant signal to produce modified estimates of the reverberant signal components); and output the echo recording representation (Soulodre figure 3, output 32), however does not clearly teach the target echo embedding.

Ohta teaches the target echo embedding (Ohta ¶0108, “reverberation time desired by the user, which is preliminarily set via the operating unit 128”), and generating an echo recording representation based on the content embedding and the target echo embedding (Ohta ¶0108).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Ohta to improve the known system of Soulodre to achieve the predictable result of achieving an enhanced audio experienced tailored to the user’s selected preference.

Regarding claim 15, Soulodre teaches A non-transitory computer-readable medium comprising processor-executable instructions configured to cause one or more processors to: receive, by an autoencoder (Soulodre circuit of figure 3), an audio signal representation of an audio signal and a target echo embedding (Soulodre ¶0055, Reverberant signal modifier 37 is operable to independently adjust frequency components of one or more of the estimates of the reverberant signal to produce modified estimates of the reverberant signal components) comprising information about a target room (Soulodre figure 1, reverberant room. Reverberations are considered echoes); generate, based on the audio signal representation and by a trained encoder of the autoencoder (Soulodre figure 3 and ¶0049, analysis window 21 and time to frequency domain processor 22.), a content embedding and an estimated echo embedding (Soulodre figure 3 and ¶0054,  decompose processor 33…produce an estimate of the original dry signal and estimates of one or more components of the reverberant signal.); generate, by a trained decoder of the autoencoder (Soulodre figure 3, processor 30 and root hanning window 31), an echo recording representation based on the content embedding (Soulodre ¶0055, Dry signal modifier 36…produce a modified estimate of the original dry signal.) and the target echo embedding (Soulodre ¶0055, Reverberant signal modifier 37 is operable to independently adjust frequency components of one or more of the estimates of the reverberant signal to produce modified estimates of the reverberant signal components); and output the echo recording representation (Soulodre figure 3, output 32), however does not clearly teach the target echo embedding.

Ohta teaches the target echo embedding (Ohta ¶0108, “reverberation time desired by the user, which is preliminarily set via the operating unit 128”), and generating an echo recording representation based on the content embedding and the target echo embedding (Ohta ¶0108).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Ohta to improve the known system of Soulodre to achieve the predictable result of achieving an enhanced audio experienced tailored to the user’s selected preference.

Claim(s) 4, 11 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Soulodre (US 2008/0069366) in view of Ohta (US 2009/0220100) in further view of Esparza (US 2016/0093278).

Regarding claims 4, 11 and 18, Soulodre in view of Ohta does not explicitly teach wherein the autoencoder comprises one or more weights that are based on training the autoencoder in a Siamese reconstruction network.

Esparza teaches wherein the autoencoder comprises one or more weights that are based on training the autoencoder in a Siamese reconstruction network (Esparza ¶0094, “Siamese neural network that has identical structure and mirrored weights is given two arbitrary inputs”).

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Esparza to improve the known method of Soulodre in view of Ohta to achieve the predictable result of a more consistent and stable system.

Claim(s) 7, 14 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Soulodre (US 2008/0069366) in view of Ohta (US 2009/0220100) in further view of Fazeli (US 10803881).

Regarding claims 7, 14 and 20, Soulodre in view of Ohta does not explicitly teach training an automatic echo cancellation (“AEC”) system based on the echo recording representation.

Fazeli teaches teach training an automatic echo cancellation (“AEC”) system based on the echo recording representation.(Fazeli figures 1B, AEC 110, figure 2 : echo estimator 230, microphone 14). 

Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to use the known technique of Fazeli to improve the method of Soulodre in view of Ohta to achieve the predictable result of achieving an optimal signal by reducing the % of unwanted noise.

Allowable Subject Matter
Claims 5, 12, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if 1) a terminal disclaimer is filed to overcome the double patenting rejection(s) set forth in this office action and 2) rewritten to overcome the 112(d) rejection, 3) rewritten in independent form including all of the limitations of the base claim and any intervening claims because the closest prior art either alone or in combination, fail to anticipate or render obvious, the claimed limitation of “wherein the Siamese reconstruction network comprises two copies of the autoencoder in series, wherein an output of a first copy of the autoencoder comprises an input to a second copy of the autoencoder” in combination with all other limitations in the claim(s) as defined by the applicant.

Claims 6, and 13 are objected to as being dependent upon a rejected base claim, but would be allowable if 1) a terminal disclaimer is filed to overcome the double patenting rejection(s) set forth in this office action, 2) rewritten to overcome the 112(d) rejection, 3) rewritten in independent form including all of the limitations of the base claim and any intervening claims because the closest prior art either alone or in combination, fail to anticipate or render obvious, the claimed limitation of “wherein the Siamese reconstruction network is trained to minimize reconstruction loss between an input audio signal representation and input echo embedding of the Siamese reconstruction network and an output audio signal representation and output echo embedding of the Siamese reconstruction network” in combination with all other limitations in the claim(s) as defined by the applicant.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NORMAN YU whose telephone number is (571)270-7436.  The examiner can normally be reached on Mon - Fri 11am-7pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Any response to this action should be mailed to:
                        Commissioner of Patents and Trademarks
                        P.O. Box 1450
                        Alexandria, Va.  22313-1450
        Or faxed to:
                    (571) 273-8300, for formal communications intended for entry and for 
                     informal or draft communications, please label “PROPOSED” or “DRAFT”.
                                Hand-delivered responses should be brought to: 
                         Customer Service Window 
                         Randolph Building 
                         401 Dulany Street 
                         Arlington, VA 22314
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/NORMAN YU/Primary Examiner, Art Unit 2693
Read full office action
Prosecution Timeline

Nov 13, 2023
Application Filed
Jan 01, 2026
Non-Final Rejection — §103, §112, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/205,362
Patent 12604123
APPARATUS AND VEHICULAR APPARATUS INCLUDING THE SAME
2y 5m to grant Granted Apr 14, 2026
18/188,055
Patent 12598409
IN-EAR WEARABLE DEVICE
2y 5m to grant Granted Apr 07, 2026
18/312,253
Patent 12594882
AUTOMOTIVE SOUND AMPLIFICATION
2y 5m to grant Granted Apr 07, 2026
18/327,873
Patent 12593165
ACOUSTIC INPUT-OUTPUT DEVICES
2y 5m to grant Granted Mar 31, 2026
18/343,228
Patent 12581238
BINDING BAND ASSEMBLY FOR HEADSET AND HEADSET
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
88%
Grant Probability
99%
With Interview (+13.5%)
2y 1m
Median Time to Grant
Low
PTA Risk
Based on 598 resolved cases by this examiner. Grant probability derived from career allow rate.