Last updated: April 19, 2026
Application No. 18/672,139
Spatial Audio Parameter Merging

Non-Final OA §103§112§DP
Filed
May 23, 2024
Examiner
ZHANG, LESHUI
Art Unit
2695
Tech Center
2600 — Communications
Assignee
Nokia Technologies Oy
OA Round
1 (Non-Final)
Interview Optional

— +36.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 928 resolved cases, 2023–2026
Examiner Intelligence

ZHANG, LESHUI View full profile →
Grants 78% — above average
Career Allow Rate
719 granted / 928 resolved
+15.5% vs TC avg
Strong +36% interview lift
Without
With
+36.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
47 currently pending
Career history
975
Total Applications
across all art units
Statute-Specific Performance

§101
5.5%
-34.5% vs TC avg
§103
42.5%
+2.5% vs TC avg
§102
13.6%
-26.4% vs TC avg
§112
28.7%
-11.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 928 resolved cases
Office Action

§103 §112 §DP
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
This Office Action is in response to a preliminary amendment filed on May 23, 2024 and wherein claims 1-15 canceled and claims 16-35 newly added.
In virtue of this communication, claims 16-35 are currently pending in this Office Action.
In the response to this office action, the Examiner respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Examiner in prosecuting this application.

Specification
The application specification failed to disclose “in response to the first weight being at least larger than the second weight, merging the first and second spatial metadata by determining to output at least one of the parameters” as recited in claims 18, 30 and “in response to the second weight being at least larger than the first weight, merging the first and second spatial metadata by determining to output at least one of the multiple second metadata parameters” as recited in claims 21, 31. The application specification disclosed “solely from signal 1” (631, 731) or “solely from signal 2” (633, 733) while first weight W1 > second weight cW2, or second weight W2 > first weight cW1 (fig. 6-7), respectively, (USPGPub 20240321282 A1, para 142, 144, 167, 169) and there is no disclosure of “merge” with “at least one of …” to be merged in the application specification. Instead, only conditions that none of first weight and second weight are greater or less, merge is performed (635, 735 in figs. 6-7, para 145, 170).
Appropriate correction is required.

Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the features of “in response to the first weight being at least larger than the second weight, merging the first and second spatial metadata by determining to output at least one of the parameters” as recited in claims 18, 30 and the features “in response to the second weight being at least larger than the first weight, merging the first and second spatial metadata by determining to output at least one of the multiple second metadata parameters” as recited in claims 21, 31 must be shown or the feature(s) canceled from the claim(s).  No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Appropriate correction is required.

Claim Objections
Claims 16-35 are objected to because of the following informalities: 
Claim 16 recited “based on comparison” which should be -- based on a comparison --. Claims 17-27 are objected due to the dependencies to claim 16.
Claim 28 is objected for the at least similar reason as described in claim 16 above since claim 28 recited similar deficient feature as recited in claim 16. Claims 29-35 are objected due to the dependencies to claim 28.
Appropriate correction is required.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees.  A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and  In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. 
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).

Claims 16-35 rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claims 1-2, 4-5, 7-8, 11-12 of U.S. Patent No. 12,014,743 B2 in view of references Kim (US 20090210238 A1) and Ishikawa (US 20110029113 A1). The conflicting claims 1-2, 4-5, 7-8, 11-12 of U.S. Patent No. 12,014,743 B2 does not explicitly teach features of claims 27, 35, and about “merge” as recited in claims 18, 21, 30, 31, etc., of instant application. However, the combination of Kim and Ishikawa teaches the features about “merge”, as discussed in prior art rejection as set forth below, for benefits of minimizing complexity of audio signal processing (Kim, para 46, 50 and the benefits from prior art Ishikawa as set forth in the prior art rejection as set forth below). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the features about “merge”, etc., as taught by the combination of Kim and Ishikawa, to “generating a metadata based on the selection” in the apparatus and the method, as taught by the conflicting claims 1-2, 4-5, 7-8, 11-12 of U.S. Patent No. 12,014,743 B2, for benefits discussed above. The following is a comparison between claims 16-35 of the instant application and conflicting claims 1-2, 4-5, 7-8, 11-12 of U.S. Patent No. 12,014,743 B2 for reference:
Claims 16-35 in the current application
Conflicting claims 1-2, 4-5, 7-8, 11-12 of U.S. Patent No. 12,014,743 B2
16. An apparatus comprising: at least one processor; and at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus to perform at least the following: determining, for at least a first audio stream, multiple first metadata parameters comprising at least one first energy ratio parameter and at least one first audio signal energy parameter; determining, for at least a second audio stream, multiple second metadata parameters comprising at least one second energy ratio parameter and at least one second audio signal energy parameter; determining a first weight based at least on the determined multiple first metadata parameters; determining a second weight based at least on the determined multiple second metadata parameters; and determining spatial metadata to output based on comparison of the first and second weights.

17. The apparatus as claimed in claim 16, wherein: determining the multiple first metadata parameters comprises analysing the first audio stream to determine at least one of the multiple first metadata parameters; determining the multiple second metadata parameters comprises analysing the second audio stream to determine at least one of the multiple second metadata parameters.

18. The apparatus as claimed in claim 16, wherein the determining the spatial metadata to output further comprises: in response to the first weight being at least larger than the second weight, merging the first and second spatial metadata by determining to output at least one of the multiple first metadata parameters.

21. The apparatus as claimed in claim 16, wherein the determining the spatial metadata to output further comprises: in response to the second weight being at least larger than the first weight, merging the first and second spatial metadata by determining to output at least one of the multiple second metadata parameters.

19. The apparatus as claimed in claim 18, wherein the at least one of the multiple first metadata parameters determined to output comprises directional metadata for the first audio stream.

20. The apparatus as claimed in claim 18, wherein the at least one of the multiple first metadata parameters determined to output comprises spatial metadata for the first audio stream.
23. The apparatus as claimed in claim 21, wherein the at least one of the multiple second metadata parameters determined to output comprises spatial metadata for the second audio stream.
26. The apparatus as claimed in claim 16, wherein: the determining the first weight comprises determining the first weight based at least on multiplication of the determined multiple first metadata parameters; and the determining the second weight comprises determining the second weight based at least on multiplication of the determined multiple second metadata parameters.

22. The apparatus as claimed in claim 21, wherein the at least one of the multiple second metadata parameters determined to output comprises directional metadata for the second audio stream.

24. The apparatus as claimed in claim 16, wherein the first audio stream has an audio signal format that is at least one of: an object based audio signal; or a spatial audio signal.

25. The apparatus as claimed in claim 16, wherein the second audio stream has a second audio signal format that is at least one of: an object based audio signal; or a spatial audio signal.

27. The apparatus as claimed in claim 16, wherein determining the spatial metadata to output comprises merging spatial metadata from the first audio stream and spatial metadata from the second audio stream based on comparison of the first and second weights.

28. A method comprising: determining, for at least a first audio stream, multiple first metadata parameters comprising at least one first energy ratio parameter and at least one first audio signal energy parameter; determining, for at least a second audio stream, multiple second metadata parameters comprising at least one second energy ratio parameter and at least one second audio signal energy parameter; determining a first weight based at least on the determined multiple first metadata parameters; determining a second weight based at least on the determined multiple second metadata parameters; and determining spatial metadata to output based on comparison of the first and second weights.

29. The method as claimed in claim 28, wherein: determining the multiple first metadata parameters comprises analysing the first audio stream to determine at least one of the multiple first metadata parameters; determining the multiple second metadata parameters comprises analysing the second audio stream to determine at least one of the multiple second metadata parameters.

30. The method as claimed in claim 28, wherein the determining the spatial metadata to output further comprises: in response to the first weight being at least larger than the second weight, merging the first and second spatial metadata by determining to output at least one of the multiple first metadata parameters.
31. The method as claimed in claim 28, wherein the determining the spatial metadata to output further comprises: in response to the second weight being at least larger than the first weight, merging the first and second spatial metadata by determining to output at least one of the multiple second metadata parameters.

32. The method as claimed in claim 28, wherein the first audio stream has an audio signal format that is at least one of: an object based audio signal; or a spatial audio signal.

33. The method as claimed in claim 28, wherein the second audio stream has a second audio signal format that is at least one of: an object based audio signal; or a spatial audio signal.

34. The method as claimed in claim 28, wherein: the determining the first weight comprises determining the first weight based at least on multiplication of the determined multiple first metadata parameters; and the determining the second weight comprises determining the second weight based at least on multiplication of the determined multiple second metadata parameters.

35. The method as claimed in claim 28, wherein determining the spatial metadata to output comprises merging spatial metadata from the first audio stream and spatial metadata from the second audio stream based on comparison of the first and second weights.
1. An apparatus comprising: at least one processor; and at least one memory including instructions that, when executed by the at least one processor, cause the apparatus at least to perform: determining, for at least one first audio signal of an audio signal format, multiple metadata parameters comprising at least one spatial audio parameter and at least one first audio signal energy parameter; determining, for at least one further audio signal of a further audio signal format, multiple further metadata parameters comprising at least one further spatial audio parameter and at least one further audio signal energy parameter; determining a value based on multiplication of the determined multiple metadata parameters; determining a further value based on multiplication of the determined multiple further metadata parameters; comparing the value and the further value to select at least one of: the at least one spatial audio parameter from the determined multiple metadata parameters, or the at least one further spatial audio parameter from the multiple further metadata parameters; and generating a metadata based on the selection, the generated metadata comprising at least one of: the at least one spatial audio parameter; or the at least one further spatial audio parameter, wherein the generated metadata is configured to be associated with a combined audio signal formed based on the at least one first audio signal and the at least one further audio signal.

7. The apparatus as claimed in claim 1, wherein: determining the multiple metadata parameters comprises: determining the at least one first audio signal energy parameter as one of the metadata parameters that are based on the at least one first audio signal; and determining the at least one spatial audio parameter as another of the metadata parameters that are based on the at least one first audio signal; determining the value based on multiplication of the determined multiple metadata parameters comprises generating a first signal weight as the value based on multiplying the at least one first audio signal energy parameter by the at least one spatial audio parameter; determining multiple further metadata parameters comprises: determining the at least one further audio signal energy parameter based as one of the further metadata parameters that are based on the at least one further audio signal; and determining the at least one further spatial audio parameter as another of the metadata parameters that are based on the at least one first audio signal; determining the further value based on multiplication of the determined multiple further metadata parameters comprises generating a further signal weight as the further value based multiplying the at least one further audio signal energy parameter by the at least one further spatial audio parameter; comparing the value and the further value further comprises comparing the first signal weight and the further signal weight; and generating the metadata further comprises generating the metadata based on the comparing the first signal weight and the further signal weight.

2. The apparatus as claimed in claim 1, wherein: determining the multiple metadata parameters comprises: (1) analysing the at least one first audio signal to determine at least one of the multiple metadata parameters; or (2) decoding the at least one first audio signal to determine at least one of the multiple metadata parameters; and determining the multiple further metadata parameters comprises: (1) analysing the at least one further audio signal to determine at least one of the multiple further metadata parameters; or (2) decoding the at least one further audio signal to determine the at least one of the multiple further metadata parameters.

8. The apparatus as claimed in claim 7, wherein the at least one memory include further instructions that, when executed by the at least one processor, cause the apparatus at least to generate the metadata based on the comparing of the first signal weight and the further signal weight to further cause the apparatus to: use at least one of the multiple metadata parameters based on the at least one first audio signal as the generated metadata when the comparing indicates the first signal weight is greater than the further signal weight by a determined threshold; use at least one of the multiple further metadata parameters based on the at least one further audio signal as the generated metadata when the comparing indicates the further signal weight is greater than the first signal weight by a further determined threshold; and generate a weighted average of the at least one further metadata parameter and the at least one metadata parameter when the comparing indicates otherwise.

4. The apparatus as claimed in claim 3, wherein the extracted metadata block comprises at least one direction parameter; at least one energy ratio parameter; or at least one coherence parameter associated with at least one of the at least one first audio signal or the at least one further audio signal.

7. The apparatus as claimed in claim 1, wherein: determining the multiple metadata parameters comprises: determining the at least one first audio signal energy parameter as one of the metadata parameters that are based on the at least one first audio signal; and determining the at least one spatial audio parameter as another of the metadata parameters that are based on the at least one first audio signal; determining the value based on multiplication of the determined multiple metadata parameters comprises generating a first signal weight as the value based on multiplying the at least one first audio signal energy parameter by the at least one spatial audio parameter; determining multiple further metadata parameters comprises: determining the at least one further audio signal energy parameter based as one of the further metadata parameters that are based on the at least one further audio signal; and determining the at least one further spatial audio parameter as another of the metadata parameters that are based on the at least one first audio signal; determining the further value based on multiplication of the determined multiple further metadata parameters comprises generating a further signal weight as the further value based multiplying the at least one further audio signal energy parameter by the at least one further spatial audio parameter; comparing the value and the further value further comprises comparing the first signal weight and the further signal weight; and generating the metadata further comprises generating the metadata based on the comparing the first signal weight and the further signal weight.

5. The apparatus as claimed in claim 3, wherein the apparatus is configured, based upon the adding of the secondary metadata block, to cause the apparatus to add at least one of: at least one direction parameter; at least one energy ratio parameter; or at least one coherence parameter associated with at least one of the at least one first audio signal or the at least one further audio signa

11. The apparatus as claimed in claim 1, wherein the at least one first audio signal of the audio signal format is at least one of: 2−N channels of a spatial microphone array; 2−N channels of multi-channel audio signal; a first order ambisonics signal; a higher order ambisonics signal; or a spatial audio signal.
12. The apparatus as claimed in claim 1, wherein the at least one further audio signal of the further audio signal format is at least one of: 2−N channels of a spatial microphone array; 2−N channels of multi-channel audio signal; a first order ambisonics signal; a higher order ambisonics signal; or a spatial audio signal.

1. An apparatus comprising: at least one processor; and at least one memory including instructions that, when executed by the at least one processor, cause the apparatus at least to perform: determining, for at least one first audio signal of an audio signal format, multiple metadata parameters comprising at least one spatial audio parameter and at least one first audio signal energy parameter; determining, for at least one further audio signal of a further audio signal format, multiple further metadata parameters comprising at least one further spatial audio parameter and at least one further audio signal energy parameter; determining a value based on multiplication of the determined multiple metadata parameters; determining a further value based on multiplication of the determined multiple further metadata parameters; comparing the value and the further value to select at least one of: the at least one spatial audio parameter from the determined multiple metadata parameters, or the at least one further spatial audio parameter from the multiple further metadata parameters; and generating a metadata based on the selection, the generated metadata comprising at least one of: the at least one spatial audio parameter; or the at least one further spatial audio parameter, wherein the generated metadata is configured to be associated with a combined audio signal formed based on the at least one first audio signal and the at least one further audio signal.

7. The apparatus as claimed in claim 1, wherein: determining the multiple metadata parameters comprises: determining the at least one first audio signal energy parameter as one of the metadata parameters that are based on the at least one first audio signal; and determining the at least one spatial audio parameter as another of the metadata parameters that are based on the at least one first audio signal; determining the value based on multiplication of the determined multiple metadata parameters comprises generating a first signal weight as the value based on multiplying the at least one first audio signal energy parameter by the at least one spatial audio parameter; determining multiple further metadata parameters comprises: determining the at least one further audio signal energy parameter based as one of the further metadata parameters that are based on the at least one further audio signal; and determining the at least one further spatial audio parameter as another of the metadata parameters that are based on the at least one first audio signal; determining the further value based on multiplication of the determined multiple further metadata parameters comprises generating a further signal weight as the further value based multiplying the at least one further audio signal energy parameter by the at least one further spatial audio parameter; comparing the value and the further value further comprises comparing the first signal weight and the further signal weight; and generating the metadata further comprises generating the metadata based on the comparing the first signal weight and the further signal weight.

2. The apparatus as claimed in claim 1, wherein: determining the multiple metadata parameters comprises: (1) analysing the at least one first audio signal to determine at least one of the multiple metadata parameters; or (2) decoding the at least one first audio signal to determine at least one of the multiple metadata parameters; and determining the multiple further metadata parameters comprises: (1) analysing the at least one further audio signal to determine at least one of the multiple further metadata parameters; or (2) decoding the at least one further audio signal to determine the at least one of the multiple further metadata parameters.

8. The apparatus as claimed in claim 7, wherein the at least one memory include further instructions that, when executed by the at least one processor, cause the apparatus at least to generate the metadata based on the comparing of the first signal weight and the further signal weight to further cause the apparatus to: use at least one of the multiple metadata parameters based on the at least one first audio signal as the generated metadata when the comparing indicates the first signal weight is greater than the further signal weight by a determined threshold; use at least one of the multiple further metadata parameters based on the at least one further audio signal as the generated metadata when the comparing indicates the further signal weight is greater than the first signal weight by a further determined threshold; and generate a weighted average of the at least one further metadata parameter and the at least one metadata parameter when the comparing indicates otherwise.

11. The apparatus as claimed in claim 1, wherein the at least one first audio signal of the audio signal format is at least one of: 2−N channels of a spatial microphone array; 2−N channels of multi-channel audio signal; a first order ambisonics signal; a higher order ambisonics signal; or a spatial audio signal.
12. The apparatus as claimed in claim 1, wherein the at least one further audio signal of the further audio signal format is at least one of: 2−N channels of a spatial microphone array; 2−N channels of multi-channel audio signal; a first order ambisonics signal; a higher order ambisonics signal; or a spatial audio signal.

7. The apparatus as claimed in claim 1, wherein: determining the multiple metadata parameters comprises: determining the at least one first audio signal energy parameter as one of the metadata parameters that are based on the at least one first audio signal; and determining the at least one spatial audio parameter as another of the metadata parameters that are based on the at least one first audio signal; determining the value based on multiplication of the determined multiple metadata parameters comprises generating a first signal weight as the value based on multiplying the at least one first audio signal energy parameter by the at least one spatial audio parameter; determining multiple further metadata parameters comprises: determining the at least one further audio signal energy parameter based as one of the further metadata parameters that are based on the at least one further audio signal; and determining the at least one further spatial audio parameter as another of the metadata parameters that are based on the at least one first audio signal; determining the further value based on multiplication of the determined multiple further metadata parameters comprises generating a further signal weight as the further value based multiplying the at least one further audio signal energy parameter by the at least one further spatial audio parameter; comparing the value and the further value further comprises comparing the first signal weight and the further signal weight; and generating the metadata further comprises generating the metadata based on the comparing the first signal weight and the further signal weight.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(B)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

Claims 18-23, 30-31 are rejected under 35 U.S.C. 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which applicant regards as the invention.
Claim 18 recited “merging the first and second spatial metadata by …” and wherein “the first and second spatial metadata” has an insufficient antecedent basis for the limitation in claim 18 and causes confusing because it is unclear what they are referred back to and what they are and thus, renders claim indefinite. Claims 19-20 are rejected due to the dependencies to claim 18.
Claim 21 is rejected for the at least similar reasons described in claim 18 above since claim 21 recited the similar deficient feature as recited in claim 18 above. Claims 22-23 are rejected due to the dependencies to claim 21.
Claims 30-31 are rejected for the at least similar reasons described in claim 18 above since claims 30-31 recited the similar deficient feature as recited in claim 18 above. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 16-35 are rejected under 35 U.S.C. 103 as being unpatentable over Kim et al. (US 20090210238 A1, hereinafter Kim) and in view of reference Ishikawa et al. (US 20110029113 A1, hereinafter Ishikawa).	
Claim 16: Kim teaches an apparatus (title and abstract, ln 1-8, multipoint control unit MCU, para 113, or a downmix apparatus in fig. 12 and details in fig. 19) comprising: 
at least one processor (effect processor 208 in fig. 10, pre-processor 207 in fig. 10, and post processor 185 in fig. 8 and in a plurality of computer systems, para 216); and 
at least one memory storing instructions (functional programs, code stored in computer-readable recording medium, para 216) that, when executed by the at least one processor, cause the apparatus (the computer systems executed the program and code in a decentralized manner, para 216) to perform at least the following: 
determining, for at least a first audio stream (downmix representing audio objects 1, 2, …, n in fig. 12, detailed in fig. 19, downmix A representing audio objects 1, 2, …n in fig. 19), multiple first metadata parameters (side information A in fig. 19) comprising at least one first energy ratio parameter (a ratio of each of the energy levels to a total energy level of the object signals, para 113, or to the highest energy level of the object signal for object 1, 2., …, n, para 142) and at least one first audio signal energy parameter (each of the energy level of the audio objects, para 113, or a highest energy level in a predetermined parameter band for objects 1, 2, …, n in fig. 19, para 142); 
determining, for at least a second audio stream (downmix representing audio objects A, B, …, C in fig. 12, detailed in fig. 19, downmix B representing audio objects 1’, 2’, …n’ in fig. 19), multiple second metadata parameters (side information B in fig. 19) comprising at least one second energy ratio parameter (a ratio of each of the energy levels to a total energy level of the object signals, para 113, or to the highest energy level of the object signal for object 1’, 2’., …, n’, para 142) and at least one second audio signal energy parameter (each of the energy level of the audio objects, para 113, or a highest energy level in a predetermined parameter band for objects 1’, 2’, …, n’ in fig. 19, para 142);
determining a first weight based at least on the determined multiple first metadata parameters (the total energy of the audio objects 1, 2, …, n for the ratio above, and thus, the weight is 1/total energy of the audio objects 1, 2, …, n inherently, i.e., averaged to the total energy, para 113, or averaged in different frames for bitstream BS1, para 214); 
determining a second weight based at least on the determined multiple second metadata parameters (the total energy of the audio objects 1’, 2’, …, n’ for the ratio above, and thus, the weight is 1/total energy of the audio objects 1’, 2’, …, n’ inherently, i.e., averaged to the total energy, para 113, or averaged in different frames for bitstream BS2, para 213); and 
determining spatial metadata to output  (side info C is determined and outputted from BOX 3 in fig. 19) based on the first and second weights (based on side info A and side info B, calculated based at least on the total energies of the audio objects 1, 2, …, n and the audio objects 1’, 2’, …, n’ or at least highest energy levels of the audio objects 1, 2, …, n and audio objects 1’, 2’, …, n’ in fig. 19, para 140-142 or individual signal energy for calculating the ratio above).
However, Kim does not explicitly teach wherein it is based on comparison of the first and the second weights for performing the determination of the spatial metadata to output.
Ishikawa teaches an analogous field of endeavor by disclosing an apparatus (title and abstract, ln 1-14 and a combination device MCU in fig. 5 and details in figs. 11, 17, 23, 27) and wherein the apparatus comprising
at least one processor (a CPU or FPGA, etc., para 326-327 or a computer, para 71); and 
at least one memory storing instructions (memory including CD-ROM, storing program, para 71) that, when executed by the at least one processor, cause the apparatus (executed by the computer, para 71-72) to perform at least the following: 
determining, for at least a first audio stream (DmxB 115 in fig. 4), multiple first metadata parameters (ParasB 113 in fig. 4) comprising at least one first energy ratio parameter and at least one first audio signal energy parameter (including a ratio of the powers of corresponding parameter tiles of the plurality of frequency signals 111, para 135, and absolute energy parameters NRG, para 136, IOC, para 137, and downmix gains DMG, para 138);
determining, for at least a second audio stream (DmxC 115 in fig. 4), multiple second metadata parameters (ParasC 113 in fig. 4) comprising at least one second energy ratio parameter and at least one second audio signal energy parameter (including a ratio of the powers of corresponding parameter tiles of the plurality of frequency signals 111, para 135, and absolute energy parameters NRG, para 136, IOC, para 137, and downmix gains DMG, para 138); 
determining a first weight based at least on the determined multiple first metadata parameters (a highest energy level among the plurality of frequency signals 111, para 136, or an energy of a frequency signal among frequency signals, used for calculating the ratio discussed above, e.g., for ParasB in fig. 4, and used for calculating the energy ratio above or gain value NRG parameter, para 156, for ParasB in fig. 4); 
determining a second weight based at least on the determined multiple second metadata parameters (a highest energy level among the plurality of frequency signals 111, para 136 or an energy of a frequency signal among frequency signals, used for calculating the ratio discussed above, e.g., for ParasC in fig. 4, and used for calculating the energy ratio above or gain value NRG parameter, para 156, for ParasC in fig. 4); and 
determining spatial metadata to output (ParasBCD outputted from parametric encoder 405 in fig. 4) based on comparison of the first and second weights (via detection unit 501 in figs. 8, 17, 23, 27, and combining parameter sub-streams 113 that have signal energy, as weight for each of calculated ratio to the highest energy of the signal above, more than a thresholds, and generating the combined parameter sub-stream 122 in figs. 8, 17, 23, 27, para 162-164) for benefits of reducing computation complexity (by processing a single stream after combining the multiple audio signals, para 35-36) in acceptable high-quality processing (in a low bit rate, para 129, 184).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein it is based on the comparison of the first and second weights for determining the spatial metadata to output, as taught by Ishikawa, to determining the spatial metadata to output based on the first and second weights in the apparatus, as taught by Kim, for the benefits discussed above.
	Claim 28 recited a method and essentially same as the functions implemented by the apparatus as recited in claim 16 above and thus, rejected according to claim 16 above.
Claim 17: the combination of Kim and Ishikawa further teaches, according to claim 16 above, wherein: 
determining the multiple first metadata parameters comprises analysing the first audio stream to determine at least one of the multiple first metadata parameters (Kim, through an analyzer 102 in the object encoder 100, para 122 and extracting parameters by a parameter extraction unit 102B of the analyzer 102, para 126 and the analyzer 102 analyzes the resulting frequency signal 111 using two different methods, and extracting object parameters from the frequency signals 111, para 128 and for objects 1, 2, …, n through BOX 1 in fig. 19, and Ishikawa, through parameter extraction unit 102B in fig. 1 and for ParasB in fig. 4); 
determining the multiple second metadata parameters comprises analysing the second audio stream to determine at least one of the multiple second metadata parameters (Kim, through an analyzer 102 in the object encoder 100, para 122 and extracting parameters by a parameter extraction unit 102B of the analyzer 102, para 126 and the analyzer 102 analyzes the resulting frequency signal 111 using two different methods, and extracting object parameters from the frequency signals 111, para 128 and for objects 1’, 2’, …, n’ through BOX 2 in fig. 19, and Ishikawa, through parameter extraction unit 102B in fig. 1 and for ParasC in fig. 4).
Claim 18: the combination of Kim and Ishikawa further teaches, according to claim 16 above, wherein the determining the spatial metadata to output further comprises: 
in response to the first weight being at least larger than the second weight (Kim and Ishikawa, the signal energy, and NRG, as the weight, e.g., for ratio calculation, as discussed above, Ishikawa, the parameters having the NRG parameter or signal energy that is equal to or greater than a predetermined threshold value, are combined through element 404 in fig. 4, para 155-156, and the parameters having the NRG parameter or signal energy that is equal to or greater than the predetermined threshold value can be mapped either first weight or second weight, vise verse), merging the first and second spatial metadata by determining to output at least one of the multiple first metadata parameters (Kim, by combining through the box 265 in fig. 19 and Ishikawa, through the adder 404 in fig. 4, para 23).
Claim 19: the combination of Kim and Ishikawa further teaches, according to claim 18 above, wherein the at least one of the multiple first metadata parameters determined to output comprises directional metadata for the first audio stream (Kim, the side information including energy difference information, phase difference information, para 30, and representing direction of audio object source by distance, energy difference, and by angles, phase difference inherently and also including direction information of the object signal, para 81).
Claim 20: the combination of Kim and Ishikawa further teaches, according to claim 18 above, wherein the at least one of the multiple first metadata parameters determined to output comprises spatial metadata for the first audio stream (Kim, including spatial parameter outputted from parameter converter 145, para 49 and Ishikawa, spatial relevant parameters are synthesized at acoustic scene reconstruction, para 16).
Claim 21: the combination of Kim and Ishikawa further teaches, according to claim 16 above, wherein the determining the spatial metadata to output further comprises: 
in response to the second weight being at least larger than the first weight, merging the first and second spatial metadata by determining to output at least one of the multiple second metadata parameters (the discussion in claim 18 above, wherein by comparing the NRG or signal energy with the predetermined value through the detection unit 501 in fig. 8, parameters including NGR or signal energy being equal to or greater than the predetermined threshold are combined or merged, discussed in claim 18 above).
Claim 22: the combination of Kim and Ishikawa further teaches, according to claim 21 above, wherein the at least one of the multiple second metadata parameters determined to output comprises directional metadata for the second audio stream (Kim, the side information including energy difference information, phase difference information, para 30, and representing direction of audio object source by distance, energy difference, and by angles, phase difference inherently and also including direction information of the object signal, para 81 and discussion in claim 19 above).
Claim 23: the combination of Kim and Ishikawa further teaches, according to claim 21 above, wherein the at least one of the multiple second metadata parameters determined to output comprises spatial metadata for the second audio stream (Kim, including spatial parameter outputted from parameter converter 145, para 49 and Ishikawa, spatial relevant parameters are synthesized at acoustic scene reconstruction, para 16 and discussion in claim 20 above).
Claim 24: the combination of Kim and Ishikawa further teaches, according to claim 16 above, wherein the first audio stream has an audio signal format that is at least one of: an object based audio signal; or a spatial audio signal (Markush limitation, MPEP 2117, Kim, audio object in fig. 1 and Ishikawa, audio object parameters from the frequency signals 111, para 128, and including spatial sound images as the input, para 15-16).
Claim 25: the combination of Kim and Ishikawa further teaches, according to claim 16 above, wherein the second audio stream has a second audio signal format that is at least one of: an object based audio signal; or a spatial audio signal (Markush limitation, MPEP 2117, Kim, audio object in fig. 1 and Ishikawa, audio object parameters from the frequency signals 111, para 128, and including spatial sound images as the input, para 15-16 and the discussion in claim 24 above).
Claim 26: the combination of Kim and Ishikawa further teaches, according to claim 16 above, wherein: the determining the first weight comprises determining the first weight based at least on multiplication of the determined multiple first metadata parameters; and the determining the second weight comprises determining the second weight based at least on multiplication of the determined multiple second metadata parameters (Kim, the ratio determined upon either total energy or highest level of signal energy, and individual signal energy with multiplication relationship to the parameters related to the highest level of signal energy or total energy of the signals, para 113, and Ishikawa, downmix gain, NRG, or ratio of the powers of corresponding parameter tiles of the frequency signals 111, para 134-138).
Claim 27: the combination of Kim and Ishikawa further teaches, according to claim 16 above, wherein determining the spatial metadata to output comprises merging spatial metadata from the first audio stream and spatial metadata from the second audio stream based on comparison of the first and second weights (Kim, combining the parameters through BOX 3, for performing a combining or merging operations of the side info A and the side info B, para 140 and Ishikawa, based comparison to the predetermined threshold value, combining of the parameters through parameter combining unit 755 in fig. 11 and 506 in fig. 8).
Claim 29 has been analyzed and rejected according to claims 28, 17 above.
Claim 30 has been analyzed and rejected according to claims 28, 18 above.
Claim 31 has been analyzed and rejected according to claims 28, 21 above.
Claim 32 has been analyzed and rejected according to claims 28, 24 above.
Claim 33 has been analyzed and rejected according to claims 28, 25 above.
Claim 34 has been analyzed and rejected according to claims 28, 26 above.
Claim 35 has been analyzed and rejected according to claims 28, 27 above.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589. The examiner can normally be reached Monday-Friday 6:30amp-4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached at 571-272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LESHUI ZHANG/
Primary Examiner, 
Art Unit 2695
Read full office action
Prosecution Timeline

May 23, 2024
Application Filed
Dec 13, 2025
Non-Final Rejection — §103, §112, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/124,589
Patent 12585677
AUTOMATED GENERATION OF IMPROVED LIST-TYPE ANSWERS IN QUESTION ANSWERING SYSTEMS
2y 5m to grant Granted Mar 24, 2026
17/726,728
Patent 12572757
VIDEO PROCESSING METHOD, VIDEO PROCESSING APPARATUS, AND COMPUTER-READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 10, 2026
18/410,942
Patent 12567423
SYSTEM AND METHODS FOR UPSAMPLING OF DECOMPRESSED SPEECH DATA USING A NEURAL NETWORK
2y 5m to grant Granted Mar 03, 2026
18/553,783
Patent 12567424
METHOD AND DEVICE FOR MULTI-CHANNEL COMFORT NOISE INJECTION IN A DECODED SOUND SIGNAL
2y 5m to grant Granted Mar 03, 2026
18/104,083
Patent 12561354
SYSTEMS AND METHODS FOR ITEM-SPECIFIC KEYWORD RECOMMENDATION
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+36.0%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 928 resolved cases by this examiner. Grant probability derived from career allow rate.