Last updated: April 19, 2026
Application No. 18/105,077
ACOUSTIC MACHINE LEARNING WITH TRANSPARENT AND INTERPRETABLE ADAPTATION OF ACOUSTIC DATA BETWEEN ENVIRONMENTS

Non-Final OA §103§112
Filed
Feb 02, 2023
Examiner
DASGUPTA, SHOURJO
Art Unit
2144
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
1 (Non-Final)
Interview Optional

— +38.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 449 resolved cases, 2023–2026
Examiner Intelligence

DASGUPTA, SHOURJO View full profile →
Grants 65% — above average
Career Allow Rate
293 granted / 449 resolved
+10.3% vs TC avg
Strong +38% interview lift
Without
With
+38.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
32 currently pending
Career history
481
Total Applications
across all art units
Statute-Specific Performance

§101
11.8%
-28.2% vs TC avg
§103
56.8%
+16.8% vs TC avg
§102
12.2%
-27.8% vs TC avg
§112
15.6%
-24.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 449 resolved cases
Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
2.	The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


3.	Claims 1-20 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.
Regarding the independent claims 1, 14, and 18, the aforementioned independent claims recite, in part, limitations akin to “training, by the processor set, a neural network model on the acoustic data for the target environment to extract features of the target environment” and then “using, by the processor set, the neural network model to transfer the features of the target environment to the acoustic content data.”  Respectfully, the Examiner believes the second limitation, e.g. using... , renders the claim vague and indefinite:
In the training... limitation, the neural network model is clearly trained to perform an extraction task.
Then, with the using... limitation, the same neural network model is said to do something else, e.g. something it is not trained to do.  There is no mention, either in the claims or in the specification, that the neural network model is trained to “transfer features to” a target/destination construct such as the recited “acoustic content data.”
As a first matter, it does not make logical sense that a neural network trained to do one task is then recited to do another separate task, even if the two tasks are related as being part of a workflow or pipeline.  Putting it another way, when one says “the neural network or machine learning model is used for this” it most often means the “used for this” is what the network or model is trained to do.  However, the claim on its face, with the recited training details, presents something of a contradiction as to what the model is trained for and what the model is used for.  
The Examiner acknowledges that Applicants’ specification provides this:
[0054] of the published specification: “Acoustic model transfer code 200 transfers the environmental features to the source object of the content audio so that the resulting audio can represent the change of the content audio in a different environment, and with minimized style loss (516). This may be an example of step 340 of using the neural network model to transfer the features of the target environment to the acoustic content data, as in FIG. 3.”
In the Examiner’s opinion, per the first underlined portion noted just above, it is very clear and logically consistent with the state of the art that the recited/taught feature transfer is performed by something else that is not the recited trained neural network model.  The specifically itself says that it is a different component in the taught framework, which the Examiner acknowledges may in fact encompass and include the trained neural network.  But the neural network model itself is not doing a transfer, even if Applicants’ specification appears to restate its teaching (badly/inaccurately, in the Examiner’s opinion) as such with its second underlined portion noted just above.  Rather, as the Examiner understands it, at best, the transfer can happen because the thing to be transferred (e.g., the features) have been extracted by the trained neural network.  That is to the say, the trained neural network’s functionality/performance is a necessary precedent to the actual transferring.  However, as the Examiner has tried to note, Applicants’ claim limitation on their face appear to state differently and hence in a manner that does not logically comport.

The dependent claims include the limitations of the independent claims, including the ones addressed just above, and do not otherwise cure their deficiencies.  Hence, they too are rejected under the same rationale.


Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office Action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


6.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

7.	Claims 1-2, 4-6, 8-11, 14, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Non-Patent Literature “Acoustic Environment Transfer for Distributed Systems” (“Inventors”, made of record via Applicants’ IDS) in view of U.S. Patent No. 11853401 (“Nookula”).
Regarding claim 1, Inventors teach A method, comprising:
receiving, by a processor set, acoustic content data (page 3, Figure 1: “Content audio”, where the Figure 1 framework as discussed in the first full paragraph on page 4 is understood by the Examiner to be a computer-implemented framework that is conducive to machine-learning (e.g., as established by Inventors in their Introduction section on page 1, where the problem domain is discussed) and therefore would involve the use of basic computer elements to do the processing steps taught by the reference, e.g. using processor and memory implementations known in the state of the art and widely applied to the same problem domain);
receiving, by the processor set, acoustic data for a target environment (page 3, Figure 1: “Environment audio”);
training, by the processor set, a neural network model on the acoustic data for the target environment to extract features of the target environment (a style transfer model is trained (last paragraph of section 3’s introduction, near the bottom of page 4), and the feature transfer involved therein specifically encompasses the extraction of environment features and content features as discussed in the same section 3 introduction section on page 4 (e.g., the page’s 1st full paragraph), and that such extraction is performed via a convolutional neural network as discussed which would intuitively be subject to training before any meaningful or effective use);
using, by the processor set, the neural network model to transfer the features of the target environment to the acoustic content data (section 3’s introduction beginning at the bottom of page 3, discussing the transfer of style from an acoustic environment to essentially one of many possible different target environments);
constructing, by the processor set, the acoustic content data with the transferred features of the target environment (section 3’s introduction section, in its 6th paragraph as found on page 4, discusses the generation of a new waveform with the Griffin-Lim algorithm, which the Examiner believes reads on the same sort of construction step recited here and as taught in [0047] of Applicants’ published specification).

The Inventors do not teach the further limitation for outputting, by the processor set, via a user interface (UI), information on and configurable options for the training of the neural network model on the acoustic data for the target environment.  However, the notion of a GUI/UI framework to assist model design, development, tuning, and/or deployment is known, and it would be an obvious tool given in the existence in the greater state of the art relating to machine learning assistance to apply such a framework to facilitate the assembly of steps that Inventors’ reference teaches together in coordination.  With that reasoning, the Examiner relies upon NOOKULA to teach what Inventors’ reference lacks, see, e.g., Nookula’s column 1 lines 7-35 (discussing the challenges in model development, generally), column 2 lines 10-61 (discussing the advantages to be realized by a GUI implementation that makes model building, training, development, etc. more accessible and convenient to users), and more specifically column 6 lines 8-32 and column 8 lines 33-38 (discussing more concretely how typical GUI elements can be used to facilitate typical model building aspects, such as those that define how the model is built, trained, tuned, and so forth – in other words, the exposure of a model’s configurable options to a user/designer/developer via a GUI such as Nookula contemplates).
Both references generally relate to the building of machine learning models, and training aspects thereto.  Hence, they are generally related to a common field, and hence are analogous.  It would have been obvious to one of ordinary skill in the art to package the steps and techniques taught by Inventors in their reference into a common framework such as Nookula’s, with a reasonable expectation of success, such as to realize the advantages Nookula discusses in model design and training and deployment and so forth that are made possible way of its GUI, and by extension apply them to the management of the modelling process taught by the Inventors.

Regarding claim 2, Inventors in view of Nookula teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation for outputting, via the UI, information on and configurable options for the using of the neural network model to transfer the features of the target environment to the acoustic content data (Nookula’s column 9 line 59 – column 10 line 7, and column 14 line 51 – column 15 line 11 for example, discussing a user’s capability to specify and define the “input data” for a model, e.g., by way of the reference’s GUI-driven approach, which the Examiner equates with a designer/developer practicing the Inventors’ taught framework having the ability to specify configuration option specification and feedback relating to its model design, use, training, calibration, etc.).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 4, Inventors in view of Nookula teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation for receiving, via the UI, user inputs to configure the configurable options for the training of the neural network model on the acoustic data for the target environment (Nookula’s column 9 line 59 – column 10 line 7, and column 14 line 51 – column 15 line 11 for example, discussing a user’s capability to specify and define the “input data” for a model, e.g., by way of the reference’s GUI-driven approach, which the Examiner equates with a designer/developer practicing the Inventors’ taught framework having the ability to specify configuration option specification and feedback relating to its model design, use, training, calibration, etc.).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 5, Inventors in view of Nookula teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein using the neural network model to transfer the features of the target environment to the acoustic content data comprises using a plurality of neural network layers (Inventors’ CNN appears to permit multiple layers, see e.g., section 3’s introduction in its 5th paragraph, where a particular layer operative in the CNN is designated with an index number), the method further comprising: generating one or more per-layer surrogates corresponding to one or more of the neural network layers (staying with section 3’s introduction in its 5th paragraph of the Inventors’ reference, where it is discussed that gram matrices are generated for captured features which intuitively correspond to layers of the CNN, and where the Examiner understands a gram matrix as discussed here to constitute a “surrogate” as recited based on Applicants’ own specification, see e.g., [0053]-[0058] of the published specification).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 6, Inventors in view of Nookula teach the method of claim 5, as discussed above.  The aforementioned references further teach the additional limitation for generating the one or more per-layer surrogates comprises generating an average Gram matrix per layer for the one or more of the neural network layers (section 3’s introduction in its 5th paragraph of the Inventors’ reference, where it is discussed that gram matrices are generated for captured features which intuitively correspond to layers of the CNN, and where the Examiner understands a gram matrix as discussed here to constitute a “surrogate” as recited based on Applicants’ own specification, see e.g., [0053]-[0058] of the published specification).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 8, Inventors in view of Nookula teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein training the neural network model on the acoustic data for the target environment comprises training a convolutional neural network (CNN) using training data and a filter specific to the target environment (trained style transfer network as discussed in the last paragraph of Inventors’ section 3 introduction, on page 4, where the style transfer network is understood to be a CNN as discussed therein, and further per section 5.1 of Inventors’ reference, see the declaration of the system as configured “with variable convolutional filter sizes in order to better capture the sound signatures from various environments”, which suggests to the Examiner that the filter fits the target environment).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 9, Inventors in view of Nookula teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation for enabling user inputs to select options from the information on and configurable options for the training of the neural network model on the acoustic data for the target environment (Nookula’s column 9 line 59 – column 10 line 7, and column 14 line 51 – column 15 line 11 for example, discussing a user’s capability to specify and define the “input data” for a model, e.g., by way of the reference’s GUI-driven approach, which the Examiner equates with a designer/developer practicing the Inventors’ taught framework having the ability to specify configuration option specification and feedback relating to its model design, use, training, calibration, etc.).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 10, Inventors in view of Nookula teach the method of claim 1, as discussed above.  The aforementioned references further teach the additional limitation wherein constructing the acoustic content data with the transferred features of the target environment comprises constructing the acoustic content data in accordance with: 
xs = arg min x Ls (x, G˜(1), … , G˜(L)) = ∑ l=1 L {G˜(l) - G(l)(x) } F 2 
G˜(l) = ∑i=1 N G(i)(xi) / N 		or 	G˜(l) = ∑i=1 N Gk(i) (xi) / N 
where x and xs are embedding features of generated data and the target environment, respectively, Gk(i)(xi) is a low-rank approximation of G(i)(xi), and k controls the number of largest eigenvalues used for approximation (Inventors’ section 3’s introduction, in its 5th paragraph, found on page 4).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 11, Inventors in view of Nookula teach the method of claim 10, as discussed above.  The aforementioned references further teach the additional limitation further comprising setting N=1 (Inventors’ section 3’s introduction, in its 5th paragraph, found on page 4) and performing a denoising task to separate semantic content and implicit style (Inventors’ section 5.4, in its 5th paragraph as found on page 7, discussing “to ensure the semantic content well-preserved, whilst the target style is transferred in the newly generated data.”).  The motivation for combining the references is as discussed above in relation to claim 1.

Regarding claim 14, the claim includes the same or similar limitations as discussed above in relation to claim 1, and is therefore rejected under the same rationale.  The Examiner notes that the present claim recites A computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to practice the limitations of claim 1 as discussed above.  The additional features bolded here are likewise taught by the references, see e.g., Inventors’ reference, where the Figure 1 framework as discussed in the first full paragraph on page 4 is understood by the Examiner to be a computer-implemented framework that is conducive to machine-learning (e.g., as established by Inventors in their Introduction section on page 1, where the problem domain is discussed) and therefore would involve the use of basic computer elements to do the processing steps taught by the reference, e.g. using processor and memory implementations known in the state of the art and widely applied to the same problem domain. 

Regarding claim 18, the claim includes the same or similar limitations as discussed above in relation to claim 1, and is therefore rejected under the same rationale.  The Examiner notes that the present claim recites A system comprising: a processor set, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to practice the limitations of claim 1 as discussed above.  The additional features bolded here are likewise taught by the references, see e.g., Inventors’ reference, where the Figure 1 framework as discussed in the first full paragraph on page 4 is understood by the Examiner to be a computer-implemented framework that is conducive to machine-learning (e.g., as established by Inventors in their Introduction section on page 1, where the problem domain is discussed) and therefore would involve the use of basic computer elements to do the processing steps taught by the reference, e.g. using processor and memory implementations known in the state of the art and widely applied to the same problem domain.



8.	Claims 3, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Inventors in view of Nookula and further in view of U.S. Patent No. 10068557 (“Engel”).
Regarding claim 3, Inventors in view of Nookula teach the method of claim 1, as discussed above.  The aforementioned references teach a UI, e.g. as discussed above in relation to claim 1, but not specifically a further limitation for outputting, via the UI, information on and configurable options for the constructing of the acoustic content data with the transferred features of the target environment.  Rather, the Examiner relies upon ENGEL to teach what Inventors etc. otherwise lack, see e.g., Engel’s FIG. 14 as discussed per column 21 line 55 – column 22 line 8, teaching a UI that allows a user to mix embeddings to generate new audio waveforms, specifically by enabling the user to adjust aspects of the sounds (“interpolate”), in accordance with a framework that generally performs some of the same sound modelling and mixing aspects as Inventors as shown in Engel’s FIG. 12 for example.
The references generally relate to the building of machine learning models, and training aspects thereto.  See, e.g., Inventors’ discussion of training aspects as discussed per claim 1, but also Engel’s FIG. 12.  In particular, both of those references relate these machine learning aspects in the problem domain for sound/acoustic data.  Hence, they are generally related to a common field, and therefore are analogous.  It would have been obvious to one of ordinary skill in the art to extend the user’s manipulation of model aspects/parameters, as Inventors modified in view of Nookula permits, to include waveform presentation and control/adjustment, as Engel teaches, with a reasonable expectation of success, to provide users with a concrete way to control and manipulate the results of its modelling.

Regarding claim 15, the claim includes the same or similar limitations as claim 3 discussed above, and is therefore rejected under the same rationale.

Regarding claim 19, the claim includes the same or similar limitations as claim 3 discussed above, and is therefore rejected under the same rationale.


9.	Claims 7, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Inventors in view of Nookula and further in view of Non-Patent Literature “Sound Texture Synthesis Using Convolutional Neural Networks” (“Caracalla”).
Regarding claim 7, Inventors in view of Nookula teach the method of claim 6, as discussed above.  The aforementioned references teach a UI, e.g. as discussed above in relation to claim 1, but not specifically a further limitation for outputting the information on and configurable options for the training of the neural network model on the acoustic data for the target environment comprises outputting the one or more per-layer surrogates.  Rather, the Examiner relies upon CARACALLA to teach what Inventors etc. otherwise lack, see e.g., Caracalla’s 2.1.3 discussing the determination of per-layer Gram matrices, and their selection as parameters explicitly, as would be understood to define modelling and the training therefor.
The references generally relate to the building of machine learning models, and training aspects thereto.  See, e.g., Inventors’ discussion of training aspects as discussed per claim 1, but also Caracalla’s section 2.1.1-2.1.2.  In particular, both of those references relate these machine learning aspects in the problem domain for sound/acoustic data.  Hence, they are generally related to a common field, and therefore are analogous.  It would have been obvious to one of ordinary skill in the art to extend the user’s presentation and/or manipulation of model aspects/parameters, as Inventors modified in view of Nookula permits, to include additional parameters that are effective in defining the model’s design and training, as Caracalla teaches with gram matrices in its section 2.1.3, with a reasonable expectation of success, to provide users with more perspective and insight into the modelling.

Regarding claim 16, the claim includes the same or similar limitations as claim 7 discussed above, and is therefore rejected under the same rationale.

Regarding claim 20, the claim includes the same or similar limitations as claim 7 discussed above, and is therefore rejected under the same rationale.



10.	Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Inventors in view of Nookula and further in view of U.S. Patent No. 10163429 (“Silverstein”).
Regarding claim 12, Inventors in view of Nookula teach the method of claim 1, as discussed above.  The aforementioned references teach a UI, e.g. as discussed above in relation to claim 1, but not specifically a further limitation for outputting the information on and configurable options for the training of the neural network model on the acoustic data for the target environment comprises enabling user-configurable options for a plurality of predefined generators.  Rather, the Examiner relies upon SILVERSTEIN to teach what Inventors etc. otherwise lack, see e.g., Silverstein’s column 2 lines 22-44 and column 3 lines 42-59 discussing prior art instances that permit a user to visually modify controls in a UI that have the effect of calibrating a generator element in a sound mixing/design workflow.
The references generally relate to the building of machine learning models, and training aspects thereto.  See, e.g., Inventors’ discussion of training aspects as discussed per claim 1, but also Engel’s FIG. 12.  In particular, both of those references relate these machine learning aspects in the problem domain for sound/acoustic data.  Hence, they are generally related to a common field, and therefore are analogous.  It would have been obvious to one of ordinary skill in the art to extend the user’s manipulation of model aspects/parameters, as Inventors modified in view of Nookula permits, to include generator presentation and control/adjustment, as Silverstein teaches, with a reasonable expectation of success, to provide users with a concrete way to control and manipulate the results of its modelling.


Allowable Subject Matter
11.	Claims 13 and 17 are objected to as being dependent upon a base claim rejected with an art rejection, but would be allowable: (a) if rewritten in independent form including all of the limitations of the base claim and any intervening claims, and (b) if Applicants are able to overcome the rejection under 35 U.S.C. 112(b).

Conclusion
12. 	The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure:
US 10262641 B2 (Silverstein)
CN 110047512 B
CN 110099332 A
WO 2021114808 A1

13.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOURJO DASGUPTA whose telephone number is (571)272-7207. The examiner can normally be reached M-F 8am-5pm CST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 571 272 4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHOURJO DASGUPTA/Primary Examiner, Art Unit 2144
Read full office action
Prosecution Timeline

Feb 02, 2023
Application Filed
Feb 21, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/491,240
Patent 12591802
GENERATING ESTIMATES BY COMBINING UNSUPERVISED AND SUPERVISED MACHINE LEARNING
2y 5m to grant Granted Mar 31, 2026
17/342,719
Patent 12586371
SENSOR DATA PROCESSING
2y 5m to grant Granted Mar 24, 2026
18/076,240
Patent 12578979
VISUALIZATION OF APPLICATION CAPABILITIES
2y 5m to grant Granted Mar 17, 2026
18/662,972
Patent 12572782
SCALABLE AND COMPRESSIVE NEURAL NETWORK DATA STORAGE SYSTEM
2y 5m to grant Granted Mar 10, 2026
17/341,511
Patent 12549397
MULTI-USER CAMERA SWITCH ICON DURING VIDEO CALL
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
65%
Grant Probability
99%
With Interview (+38.1%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 449 resolved cases by this examiner. Grant probability derived from career allow rate.