DETAILED ACTION
Response to Arguments
Applicant's arguments filed 11/24/2025 have been fully considered but they are not persuasive. Applicant states on pg. 9, “while is true that Gabryjelski does describe the use of metadata, nowhere does Gabryjelski teaches or suggest the training of a machine learning model to enable spatial metadata”. However, Atkins is already teaching the on Fig. 5 and [0015] the DNN already outputs DOA and diffuseness information, which in the audio art can be part of Metadata. Since Metadata is not something specific, it is just a format, a structured information about something. Therefore, while Atkins is already teaching applicant’s own disclose Metadata as sound direction/DOA. Atkins does not talks about it in the format of Metadata. So Gabryjelski is used to shows that Metadata is just a format that is well known in the audio art.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 1-11, 13-17 and 21-23 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Applicant amendment to independent claim 1, 15 and 17, reciting “wherein the machine learning model receives input about the target device in parametric spatial audio capturing” is unclear and the examiner can’t determine the metes and bounds of the claim language. Applicant states the amendments is supported by Fig. 2, lines 13-21 of page 9 and lines 1-21 of page 12. Fig. 2 has no mention of the target device. Line 13-21 of page 9 talks about the target device capturing mic signal then pass it to the machine learning model or the machine learning model gets trained then pass to the target device, by stating that “Fig.2 enables a machine learning model to be trained for used in processing mic signals obtained by a target device” or “the trained machine learning model can be then be provided to the target device”. Then Line 1-21 of page 12, the closest part is line 12-15, “The second capture data can therefore represent the spatial metadata that could be captured by an ideal, … microphone array for a given spatial sound distribution, or by any other suitable reference capture method. This can be referred to as reference spatial metadata.” For the examiner this is the closest reading to the amendment as the “parametric spatial audio capturing” is what applicant is referring to as “any other suitable reference capture method”. However, applicant does not mention at all what is parametric spatial audio capturing really is. Examiner best guess from knowledge of the art of “parametric spatial audio capturing” is analyzing microphone arrays to extract spatial parameters like DOA, diffuseness, etc… The main issue is the part where it states that “the machine learning model receives input about the target device.”, in the specification there is nothing about the target device that is being send to the machine learning model. Examiner believe applicant meant to use the word “from”, which would make the sentence as “wherein the machine learning model receives input from the target device from parametric spatial audio capturing”. Meaning, that the machine learning model receives the microphone signal from the target device in which the target device is using parametric spatial audio capturing. However, even this is not supported by specification as in pg. 12 line 11-15, states the second capture data represent the spatial metadata that is capture by the mic array (which in on the target device) for a given spatial sound distribution or any other suitable reference capture method.
In conclusion, examiner it is unclear what applicant is trying to claim with the “wherein the machine learning model receives input about the target device in parametric spatial audio capturing”, and examiner best guess is that the target device is capturing audio signal from mic array which represent spatial metadata, the audio signal get send to the machine learning model and the model will output spatial metadata, which will be like outputting sound directions/angles.
To further prosecution, examiner will reject the claims based on the interpretation above as best as possible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 1-4, 8-11 and 14-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Atkins US PG-Pub 2019/0104357 in view of Gabryjelski US PG-Pub 2020/0058289.
Regarding claim 1, 15 and 17, Atkins teaches at least one processor: and at least one non-transitory memory that, when executed with the at least one processor ([0060]: processor), cause the apparatus to: obtain first capture data for a machine learning model where the first capture data is related to a plurality of spatial sound distributions and where the first capture data relates to a target device configured to obtain at least two microphone signals (Fig. 1 & Fig. 3 & [0015], [0022], [0046]: having multiple microphones in a array on a device like a phone to capture audio signal and gather sub-band directional features like DOA that will be used to trained a DNN of multiple sources); obtain second capture data for the machine learning model where the second capture data is obtained using the same plurality of spatial sound distributions and where the second capture data comprises information indicative of spatial properties of the plurality of spatial sound distributions and the second capture data is obtained using a reference capture method; and train the machine learning model to estimate the second capture data based on the first capture data wherein the machine learning model enables spatial metadata to be provided, wherein the machine learning model receives input about the target device in parametric spatial audio capturing (Fig. 1 & Fig. 5 & [0015]-[0016], [0049]: using the multiple microphones of a phone/tablet/computer in a array to capture audio signal for training a DNN for each sub-band based on the estimated sub-band directional features and target direction feature like DOA and diffuseness, then capturing audio signal from the microphones of the phone/tablet/computer (Fig. 5-501) and pass it to the DNN (Fig. 5-503) that was trained already and it will output more accurate sub-band directional feature like DOA and diffuseness of multiple sources (Fig. 5-506) and then rendering unit (Fig. 5-505) will mix the audio signal with the directional features DOA and diffuseness to provide it to the loudspeaker (Fig. 5-509)).
Atkins failed to explicitly teach metadata.
However, Gabryjelski teaches metadata ([0043]: metadata).
Atkins and Gabryjelski are analogous art because they are both in the same field of endeavor, namely audio processing. Therefore, the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains, because having metadata is an alternate equivalent way to combine Atkins DOA and diffuseness data to be pass on in a specific format.
Regarding claim 2, Atkins discloses wherein the instructions, when executed with the at least one processor, cause the apparatus to train the machine learning model for use in processing microphone signals obtained with the target device (Fig. 1 & Fig. 3: microphones 1a and 1b that will be used to pickup the sound like the impulse response).
Regarding claim 3, Atkins discloses wherein the machine learning model comprises a neural network ([0022]: deep neural network).
Regarding claim 4 and 16, Atkins discloses wherein the spatial sound distributions comprise a sound scene comprising a plurality of sound positions and corresponding audio signals for the plurality of sound positions (Fig. 1 & Fig. 5 & [0015, [0031]: using the multiple microphones in a array capture audio signal and gather estimate sub-band directional features like DOA of multiple sources, which will be an acoustic scene).
Regarding claim 8, Atkins discloses obtain the information indicative of spatial properties of the plurality of spatial sound distributions in a plurality of frequency bands (Fig. 1 & Fig. 3 & [0015], [0022], [0046]: gather one or more sub-band directional features like DOA that will be used to trained a DNN of multiple sources).
Regarding claim 9, Atkins discloses obtain information relating to a microphone array of the target device; and use the information relating to the microphone array to process a plurality of spatial sound distributions to obtain first capture data (Fig. 1 & [0015]-[0017]: gather one or more sub-band directional features like DOA that will be used to trained a DNN of multiple sources; where the microphones can be position in different surfaces, which the system requires to know the geometrical relationship/arrangement of microphone to do computations).
Regarding claim 10, Atkins discloses process the first capture data into a format that is suitable for use as an input to the machine learning model (Fig. 5: the capture sound is put on the format of having sub-band feature so the DNN model can use it to create new sub-band features).
Regarding claim 14, Atkins discloses wherein the target device comprises a mobile telephone (Fig. 2).
Regarding claim 11, Atkins teaches wherein the instructions, when executed with the at least one processor, cause the obtained second capture data to cause the apparatus to use the one or more spatial sound distributions and using simulated data to determine reference spatial data for the one or more sound scene (Fig. 1 & Fig. 5 & [0015, [0022]: using real or simulated audio data to create estimate sub-band directional features like DOA of multiple sources).
Atkins failed to explicitly teach reference microphone array and metadata.
However, Gabryjelski teaches reference microphone array and metadata ([0043]-[0044]: using as reference a virtual microphone array and metadata).
Atkins and Gabryjelski are analogous art because they are both in the same field of endeavor, namely audio processing. Therefore, the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains, because it is an inventor choice to select how to create spatial data and metadata is an alternate equivalent way to combine Atkins DOA and diffuseness data to be pass on in a specific format.
Claim 5 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Atkins US PG-Pub 2019/0104357 in view of of Gabryjelski US PG-Pub 2020/0058289 in combination with Zhang US PG-Pub 2019/0200156.
Regarding claim 5 and 21, Atkins teaches being able to measure real or simulated data ([0022]).
Atkins failed to teach wherein the spatial sound distributions used to obtain the first capture data and the second capture data comprise virtual sound distributions.
However, Zhang teaches wherein the spatial sound distributions used to obtain data comprise virtual sound distributions (Fig. 10: gathering simulated audio signal from a virtual space, which will represent the sound sources in the virtual space).
Atkins and Zhang are analogous art because they are both in the same field of endeavor, namely audio processing. Therefore, the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains, because using simulated/virtual data can be used to render virtual sound sources.
Claim 6 and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Atkins US PG-Pub 2019/0104357 in view of Gabryjelski US PG-Pub 2020/0058289 in combination with Abdullah “Real-Time Convolutional Neural Network-Based Speech Source Localization on Smartphone”.
Regarding claim 6 and 22, Atkins teaches wherein the spatial sound distributions ([0015]: gathering sub-band directional features like DOA that will be used to trained a DNN of multiple sources).
Atkins failed to explicitly teach wherein the spatial sound distributions are produced with two or more loudspeakers.
However, Abdullah teaches wherein the spatial sound distributions are produced with two or more loudspeakers (Fig. 3 & pg. 169973: Section 2, gathering real data for training the DNN by using five loudspeakers).
Atkins and Abdullah are analogous art because they are both in the same field of endeavor, namely audio processing. Therefore, the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains, because training a model with real data makes it more robust.
Claim 7 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Atkins US PG-Pub 2019/0104357 in view of Gabryjelski US PG-Pub 2020/0058289 in combination with Laitinen US PG-Pub 20122/0351735.
Regarding claim 7 and 23, Atkins teaches wherein the spatial sound distributions ([0015]: gathering sub-band directional features like DOA that will be used to trained a DNN of multiple sources).
Atkins failed to explicitly teach a parametric representation of a sound scene.
However, Laitinen teaches a parametric representation of a sound scene ([0021]-[0023]: using parametric spatial analysis to produce parametric and spatial data for the spatial energy of a sound field).
Atkins and Laitinen are analogous art because they are both in the same field of endeavor, namely audio processing. Therefore, the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains, because using parametric representation is an alternate equivalent way to represent a sound field/scene.
Claim 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Atkins US PG-Pub 2019/0104357 in combination with Gabryjelski US PG-Pub 2020/0058289 in view of Tammi US PG-Pub 2019/0394606.
Regarding claim 13, the combination teaches spatial data comprises, for one or more frequency sub-bands sound direction (Fig. 1 & Fig. 5 & [0015, [0022]: using real or simulated audio data to create estimate sub-band directional features like DOA of multiple sources).
The combination failed to teach metadata comprises information indicative of a sound direction; and sound directionality.
However, Tammi teaches metadata comprises information indicative of a sound direction; and sound directionality ([0076]: spatial metadata like directions and directionality of sound).
The combination and Tammi are analogous art because they are both in the same field of endeavor, namely audio processing. Therefore, the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains, because having metadata is an alternate equivalent way to combine Atkins DOA and diffuseness data to be pass on in a specific format.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM A JEREZ LORA whose telephone number is (571)270-5519. The examiner can normally be reached M-F 7am-9am and 11am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached at 571-272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/WILLIAM A JEREZ LORA/Primary Examiner, Art Unit 2695