Last updated: May 29, 2026
Application No. 18/687,871
SILENCE DESCRIPTOR USING SPATIAL PARAMETERS

Final Rejection §101§103
Filed
Feb 29, 2024
Priority
Aug 30, 2021 — nonprovisional of PCT/FI2021/050584 +1 more
Examiner
CASTILLO-TORRES, KEISHA Y
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Nokia Technologies Oy
OA Round
2 (Final)
Interview Optional

— +29.5% interview lift. Examiner has a relatively high allowance rate (74%); +29.5% interview lift. A written response may suffice.
Based on 110 resolved cases, 2023–2026
Examiner Intelligence

CASTILLO-TORRES, KEISHA Y View full profile →
Grants 74% — above average
Career Allowance Rate
82 granted / 110 resolved
+12.5% vs TC avg
Strong +30% interview lift
Without
With
+29.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
20 currently pending
Career history
142
Total Applications
across all art units
Statute-Specific Performance

§101
8.7%
-31.3% vs TC avg
§103
88.1%
+48.1% vs TC avg
§102
1.3%
-38.7% vs TC avg
§112
1.6%
-38.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 110 resolved cases
Office Action

§101 §103
DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 03/16/2026. 
Claims1-48 and 64-72 have been canceled by the Applicant.
Claim(s) 49-63 are pending and have been examined. Hence, this action has been made FINAL.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments and Amendments
Amendments to the claims by the Applicant have been considered and addressed below. 
With respect to the Claim Objections and 35 USC § 101 and 103 rejections, the Applicant provides several arguments in which the Examiner will respond accordingly, below.

Claim Objection(s)
Arguments:
An object was raised with respect to Claims 54 and 69 because of informalities. Claim 69 has been cancelled as noted below such that the rejection thereof is now moot. However, Claim 54 has been amended to rewrite "liner interpolation" as "linear interpolation" as suggested by the Official Action. As such, the objection to Claim 54 is overcome. 
Examiner’s Response to Arguments:
Applicant’s arguments with respect to the claim objections have been fully considered and are persuasive.  The claim objections of claim 54 has been withdrawn. 

35 USC § 101 rejection(s)
Arguments in pages 7-9 of the Remarks filed on 03/16/2026

Examiner response to Arguments:
Applicant’s arguments, with respect to the rejection(s) of independent claim(s) 49 under 35 USC 101 have been fully considered but are not persuasive. 
The Applicant argues that:
Applicant disagrees and notes that MPEP 2106.04(a)(2) clarifies that "[c]laims do not recite a mental process when they do not contain limitations that can practically be performed in the human mind, for instance when the human mind is not equipped to perform the claim limitations." As such, claims including elements that cannot be "practically performed in the human mind" fails to satisfy step 2A, prong 1 and are therefore statutory. 
[…]
The transmission of the bitstream including a representation of the at least one spatial direction component value for one or more audio frames cannot practically be performed by a human. Indeed, the transmission of the bitstream now recited by amended independent Claim 49 cannot be practically performed either mentally or with pen and paper. Thus, independent Claim 49, as well as the claims dependent thereon, include elements that cannot be "practically performed in the human mind" and, as a result, are statutory. Thus, the rejection under 35 U.S.C. § 101 is overcome on this basis…

Please see detailed analysis below (Prong Two) for more details on how the Examiner understands the independent claims do not recite additional elements that integrate the judicial exception into a practical application. Hence, not qualifying as patent eligible subject matter under 35 U.S.C. § 101.

Please refer to MPEP 2106.04(II): Eligibility Step 2A: Whether a Claim is Directed to a Judicial Exception: (A) Step 2A is a Two-Prong Inquiry: 
(1) Prong One: 
Prong One asks does the claim recite an abstract idea, law of nature, or natural phenomenon? In Prong One examiners evaluate whether the claim recites a judicial exception, i.e. whether a law of nature, natural phenomenon, or abstract idea is set forth or described in the claim. While the terms "set forth" and "described" are thus both equated with "recite", their different language is intended to indicate that there are two ways in which an exception can be recited in a claim. For instance, the claims in Diehr, 450 U.S. at 178 n. 2, 179 n.5, 191-92, 209 USPQ at 4-5 (1981), clearly stated a mathematical equation in the repetitively calculating step, and the claims in Mayo, 566 U.S. 66, 75-77, 101 USPQ2d 1961, 1967-68 (2012), clearly stated laws of nature in the wherein clause, such that the claims "set forth" an identifiable judicial exception. Alternatively, the claims in Alice Corp., 573 U.S. at 218, 110 USPQ2d at 1982, described the concept of intermediated settlement without ever explicitly using the words "intermediated" or "settlement." […]
An example of a claim that recites a judicial exception is "A machine comprising elements that operate in accordance with F=ma." This claim sets forth the principle that force equals mass times acceleration (F=ma) and therefore recites a law of nature exception. Because F=ma represents a mathematical formula, the claim could alternatively be considered as reciting an abstract idea. Because this claim recites a judicial exception, it requires further analysis in Prong Two in order to answer the Step 2A inquiry. An example of a claim that merely involves, or is based on, an exception is a claim to "A teeter-totter comprising an elongated member pivotably attached to a base member, having seats and handles attached at opposing sides of the elongated member." This claim is based on the concept of a lever pivoting on a fulcrum, which involves the natural principles of mechanical advantage and the law of the lever. However, this claim does not recite these natural principles and therefore is not directed to a judicial exception (Step 2A: NO). Thus, the claim is eligible at Pathway B without further analysis.

From this analysis, in Step 2A, Prong One, the Examiner has evaluated the independent claims accordingly and determined that the amended independent claims as drafted indeed describe a judicial exception (i.e., an abstract idea), which represent a mental process and/or mathematical concept (which can be performed by a human with pen and paper). 
Similar to what was discussed in the Non-Final Rejection mailed on 11/18/2025, the limitations as drafted cover a human (mental process and/or mathematical concept). 
More specifically, the claim recitations of:
49. (New) An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to:
determine an error of fit measure between a plurality of spatial direction component values from a plurality of audio frames and a curve fitted to a data set comprising the plurality of spatial direction component values;
compare the error of fit measure to a threshold value;
quantise a spatial direction component value for a first audio frame of an interval of audio frames to give a quantised spatial direction component value for the first audio frame; and
depending on the comparison, either use a method of non-prediction for generating at least one spatial direction component value for each remaining audio frame of the interval of audio frames, or use a method of prediction for generating the at least one spatial direction component value for each remaining audio frame of the interval of audio frames, wherein all remaining audio frames comprise all but the first audio frame of the interval of audio frames; and
transmit a bitstream including a representation of the at least one spatial direction component value for one or more audio frames.

Read on a human (e.g., mentally and/or using pen and paper):
Calculating/determining an error between a predefined point in space and a predefined curve;
Compare said calculated error to a threshold;
Assigning a predefined / finite number of values to the amplitude of audio signals; 
Depending on the results from the comparisons, use either one of two different predefined set of rules; and
Writing on a piece of paper the calculated values associated with the points in space of for the audio signals (i.e., mathematical concepts).

Please also refer to MPEP 2106.05(f)(2): Whether the claim invokes computers or other machinery merely as a tool to perform an existing process, and MPEP 2106.06(b): Clear Improvement to a Technology or to Computer Functionality.  

Please refer to MPEP 2106.04(II): Eligibility Step 2A: Whether a Claim is Directed to a Judicial Exception: (A) Step 2A is a Two-Prong Inquiry: 
(2) Prong Two:  
Prong Two asks does the claim recite additional elements that integrate the judicial exception into a practical application? In Prong Two, examiners evaluate whether the claim as a whole integrates the exception into a practical application of that exception. If the additional elements in the claim integrate the recited exception into a practical application of the exception, then the claim is not directed to the judicial exception (Step 2A: NO) and thus is eligible at Pathway B. This concludes the eligibility analysis. If, however, the additional elements do not integrate the exception into a practical application, then the claim is directed to the recited judicial exception (Step 2A: YES), and requires further analysis under Step 2B (where it may still be eligible if it amounts to an ‘‘inventive concept’’). For more information on how to evaluate whether a judicial exception is integrated into a practical application, see MPEP § 2106.04(d)(2).

From this analysis, in Step 2A, Prong Two, the Examiner has evaluated the independent claims accordingly and determined that the amended independent claims as drafted that the claims as a whole do not include additional elements that integrate the exception into a practical application of that exception. (i.e., an abstract idea). As discussed in the Non-Final Rejection mailed on 11/18/2025:
This judicial exception is not integrated into a practical application because for example: claims 49 recites “at least one processor, memory, and/or computer program code”. As an example, in page 53, line 27 – page 54, line 6 of the as filed specification, it is disclosed: “…For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof. ” Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea. 
Please also refer to MPEP 2106.05(f)(2): Whether the claim invokes computers or other machinery merely as a tool to perform an existing process.
Finally, please refer to MPEP 2106.05(A): Relevant Considerations For Evaluating Whether Additional Elements Amount To An Inventive Concept
Limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception include: 

i. Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, e.g., a limitation indicating that a particular function such as creating and maintaining electronic records is performed by a computer, as discussed in Alice Corp., 573 U.S. at 225-26, 110 USPQ2d at 1984 (see MPEP § 2106.05(f)); 
ii. Simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception, e.g., a claim to an abstract idea requiring no more than a generic computer to perform generic computer functions that are well-understood, routine and conventional activities previously known to the industry, as discussed in Alice Corp., 573 U.S. at 225, 110 USPQ2d at 1984 (see MPEP § 2106.05(d));


From this analysis, in Step 2B, the Examiner has evaluated the independent claims accordingly and determined that the independent claims as drafted have limitations that the courts have found not to be enough to qualify as "significantly more" when recited in a claim with a judicial exception. Similar to what was discussed in the Non-Final Rejection mailed on 11/18/2025:
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible. 
In summary, the Examiner respectfully disagrees with the arguments above. Please refer to analysis above.
	For more details, please refer to updated 35 U.S.C. § 103 rejections for claims 49-63, below.

35 USC § 103 rejection(s)
Arguments in pages 9-11 of the Remarks filed on 03/16/2026
Examiner’s Response to Arguments:
Arguments have been considered but these are not persuasive. The Examiner respectfully disagrees with the arguments of Eckert et al. not teaching the limitation of: "determin[ing] an error fit measure between a plurality of spatial direction component values from a plurality of audio frames and a curve fitted to a data set comprising the plurality of spatial direction component values". 
Specifically, the language claim as drafted does not particularly limit the terms of curve fitting. Hence, under the broadest reasonable interpretation, the curve fitting disclosed in the instant application is interpreted to be read by the smoothing described in Eckert et al. as disclosed in paragraphs [0147, 0152-0154, and 0173-0174] (all citations incorporated below for reference) are interpreted as to read . 
determine an error of fit measure between a plurality of spatial direction component values from a plurality of audio frames (see ¶ [0147 and 0152-0154]: “[0147] In particular, the current inactive frame may be analyzed, in order to determine whether a chance in spatial and/or spectral characteristic of the noise within the current inactive frame has occurred with respect to the previous inactive frame. By way of example, it may be determined whether the value of a distance measure between the upmixing metadata 105 for the current inactive frame and the upmixing metadata 105 of the previous inactive frame is greater than a pre-determined distance threshold. If this is the case, a SID frame may be inserted for the current inactive frame, in order to signal the changed upmixing metadata 105 to the decoding unit 150. If, on the other hand, the value of the distance measure is smaller than the distance threshold, the current inactive frame may be treated as an ND frame. [0152] The method 600 may further comprise encoding 604 the upmixing metadata 105 for the current frame into the bitstream, if, in particular only if, it is determined that the spatial and/or spectral characteristic of background noise comprised within the current frame and/or the signal-to-noise ratio of the current frame has changed with regards to the subsequence of one or more previous inactive frames. Hence, the current frame may be encoded as a SID frame if, in particular, only if, the spatial and/or spectral characteristic of background noise comprised within the current frame and/or the signal-to-noise ratio of the current frame has changed. [0153] Alternatively, or in addition, the method 600 may comprise determining that the current frame is an inactive frame following one or more previous inactive frames. In addition, the method 600 may comprise determining a value of a distance measure (e.g., a mean square error) between the covariance and/or the upmixing metadata 105 for the current frame and a previous covariance and/or previous upmixing metadata 105 for the one or more previous inactive frames. In other words, it may be determined by how much the covariance for the current frame deviates from the corresponding previous covariance for the one or more previous inactive frames, and/o by how much the upmixing metadata 105 for the current frame deviates from the previous upmixing metadata 105 for the one or more previous inactive frames. The previous upmixing metadata 105 may be the upmixing metadata that has been sent in the last SID frame. The previous covariance may be the covariance that has been used for generating the previous upmixing metadata 105. [0154] The method 600 may further comprise determining whether the value of the distance measure is greater than a pre-determined distance threshold. Encoding 604 the upmixing metadata 105 for the current frame into the bitstream may be performed, if, in particular only if, the value of the distance measure is greater than the pre-determined distance threshold. Alternatively, it may be refrained from encoding 604 the upmixing metadata 105 for the current frame into the bitstream, if, in particular only if, the value of the distance measure is smaller than the pre-determined distance threshold.”) and 
a curve fitted to a data set comprising the plurality of spatial direction component values (see ¶ [0147 and 0152-0154] citations as in limitation above and further ¶ [0173-0174]: “[0173] Hence, a method 600 of using spatial parameters 105 and same or different downmixes 103 used for active frames to model spatial characteristics of noise are described, thereby allowing comfort noise generation at the decoder 150 that is spatially consistent between active and non-active frames. The method 600 may comprise determining whether a voice signal is present in one or more frames of an audio input 101. In response to determining that no voice signal is present, a covariance may be estimated using frame to frame averaging. Furthermore, spatial noise parameters 105 may be calculated and entropy coding of the spatial noise parameters 105 may be performed. The entropy coded spatial noise parameters 107 may be packed into the bitstream for the one or more frames. [0174] The method 600 may comprise, in response to detecting transients in a frame of the one or more frames, removing the frame from covariance averaging. Calculating the spatial noise parameters 105 may be performed with a smoothed covariance estimation that smoothens across multiple frames to avoid spatial variability in the noise. The method 600 may comprise smoothing covariance across transients and short talk bursts and removing these from the calculation. Alternatively, or in addition, the method 600 may comprise using a limited set of bands and/or limited set of parameters to reduce parameter bit rate for noise and switching back to a full set when audio is present. Alternatively, or in addition, the method 600 may comprise calculating spatial elements separately from spectral elements of the noise to allow re-use of existing comfort noise generators.”);

For more details, please refer to updated 35 U.S.C. § 103 rejections for claims 49-63, below.
Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim(s) 49-72 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. More specifically directed to the abstract idea grouping of: mathematical concept and/or mental process.
The independent claim(s) recite(s):
49. (Currently Amended) An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to:
determine an error of fit measure between a plurality of spatial direction component values from a plurality of audio frames and a curve fitted to a data set comprising the plurality of spatial direction component values;
compare the error of fit measure to a threshold value;
quantise a spatial direction component value for a first audio frame of an interval of audio frames to give a quantised spatial direction component value for the first audio frame; 
depending on the comparison, either use a method of non-prediction for generating at least one spatial direction component value for each remaining audio frame of the interval of audio frames, or use a method of prediction for generating the at least one spatial direction component value for each remaining audio frame of the interval of audio frames, wherein all remaining audio frames comprise all but the first audio frame of the interval of audio frames; and
transmit a bitstream including a representation of the at least one spatial direction component value for one or more audio frames.


This reads on a human (e.g., mentally and/or using pen and paper):
Calculating/determining an error between a predefined point in space and a predefined curve;
Compare said calculated error to a threshold;
Assigning a predefined / finite number of values to the amplitude of audio signals; and
Depending on the results from the comparisons, use either one of two different predefined set of rules;
Writing on a piece of paper the calculated values associated with the points in space of for the audio signals (i.e., mathematical concepts).


This judicial exception is not integrated into a practical application because for example: claims 49 recites “at least one processor, memory, and/or computer program code”. As an example, in page 53, line 27 – page 54, line 6 of the as filed specification, it is disclosed: “…For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof. ” Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible. 

With respect to claim 50, the claim(s) recite:
wherein the method of non-prediction for generating the at least one spatial direction component value for each remaining audio frame of the interval of audio frames causes the apparatus to:
store the quantised spatial direction component value of the first audio frame for use as a previous quantised spatial direction component value.


This reads on a human (e.g., mentally and/or using pen and paper):
Further defining the rules for the if to use either one of two different predefined set of rules and writing down specific portions of the original input signal.
Writing down the audio signals with assigned predefined / finite number of values to the amplitude.
No additional limitations are present. 	

With respect to claim 51, the claim(s) recite:
wherein the method of prediction for generating the at least one spatial direction component value for each remaining audio frame of the interval of audio causes the apparatus to:
determine whether the interval of audio frames is a first interval of audio frames of a silence region of the spatial audio signal or whether the interval of audio frames is a further interval of audio frames of the silence region of the spatial audio signal.

This reads on a human (e.g., mentally and/or using pen and paper):
Determining whether the audio signal corresponds to silence or not.
No additional limitations are present. 	

With respect to claim 52, the claim(s) recite:
wherein when the interval of audio frames is determined as the first interval of audio frames of the silence region of the spatial audio signal, the apparatus is further caused to:
determine the coefficients of a backward predictor using a data set comprising a plurality of quantised spatial direction component values drawn from the plurality of audio frames;
initialise the backward predictor with the quantised spatial direction component value for the first audio frame of the interval of audio frames; and
use the backward predictor to predict the at least one spatial direction component value for each remaining audio frame of the first interval of audio frames of the silence region.

This reads on a human (e.g., mentally and/or using pen and paper):
Using predetermined set of rules to determine the audio signals with assigned predefined / finite number of values to the amplitude.
Use other predetermined set of rules with the audio signals with assigned predefined / finite number of values to the amplitude and
Use the predetermined set of rules to find a spatial component value for the silence region(s).
No additional limitations are present. 	

With respect to claim 53, the claim(s) recite:
wherein the backward predictor is a first order backward predictor, and wherein the coefficients of the backward predictor are determined using least mean square analysis of the data set comprising the plurality of average spatial direction component values drawn from the plurality of audio frames.

This reads on a human (e.g., mentally and/or using pen and paper):
Using mathematical concepts to define the predetermined set of rules (i.e., least mean square analysis)
No additional limitations are present. 	

With respect to claim 54, the claim(s) recite:
wherein when the interval of audio frames is determined as the further interval of audio frames of the silence region of the spatial audio signal, the apparatus is further caused to:
use linear interpolation to interpolate between the quantised spatial direction component value for the first audio frame of the further interval of audio frames of the silence region and a previous quantised spatial direction component value for a first audio frame from a previous interval of audio frames of the silence region;
extrapolate the linear interpolation to extend over remaining audio frames of the further interval of audio frames; and
assign at least one value from along the extrapolated part of the linear interpolation for each remaining audio frame of the further interval of audio frames, wherein the assigned at least one value is the at least one spatial direction component value for the each remaining audio frame of the further interval of audio frames.

This reads on a human (e.g., mentally and/or using pen and paper):
Using mathematical concepts 
(i.e., linear interpolation) on direction components of audio signals with assigned predefined / finite number of values to the amplitude and previous components from previous signals/regions;
(i.e., extrapolation) to extend the signal;
And assigning a value from the extrapolated part for remaining signal.
No additional limitations are present. 	

With respect to claim 55, the claim(s) recite:
wherein the apparatus configured to determine an error of fit measure between a plurality of spatial direction component values from the plurality of audio frames and the curve fitted to a data set comprising the plurality of spatial direction component values is configured to: 
perform least mean squares analysis on the data set comprising the plurality of spatial direction component values to find coefficients for a polynomial for curve fitting to the data set;
determine for each spatial direction value of the plurality of spatial direction component values an error value between the each spatial direction component value and a point of the curve fitted to the data set; and
determine the error of fit measure as the root mean square of the error values.

This reads on a human (e.g., mentally and/or using pen and paper):
Using mathematical concepts 
(i.e., least mean squares analysis) on the data to find coefficients for a polynomial for curve fitting;
Determining an error value;
(i.e., root mean square of error values)
No additional limitations are present. 	

With respect to claims 56, the claim(s) recite:
56. (New) The apparatus as claimed in Claim 55, wherein the polynomial for curve fitting to the data set is a first order polynomial.

This reads on a human (e.g., mentally and/or using pen and paper):
Further defining mathematical concepts (i.e., polynomial).
No additional limitations are present. 	

With respect to claims 57, the claim(s) recite:
57. (New) The apparatus as claimed in Claim 54, wherein the curve fitted to the data set comprising the plurality of spatial direction component values is the linear interpolation between the quantised average spatial direction component value for the first audio frame of the further interval of audio frames of the silence region and a previous quantised average spatial direction component value for the first frame from the previous interval of audio frames of the silence region, wherein the plurality of spatial direction component values are original spatial direction components values for the previous interval of audio frames, wherein the apparatus caused to determine an error of fit measure between a plurality of spatial direction values from a plurality of audio frames and a curve fitted to a data set comprising the plurality of spatial direction values is caused to: determine for each spatial direction value of the plurality of spatial direction component values an error value between the each spatial direction component value and a point along the is the linear interpolation; and
determine the error of fit measure as the root mean square of the error values.

This reads on a human (e.g., mentally and/or using pen and paper):
Using mathematical concepts 
(i.e., linear interpolation) on direction components of audio signals with assigned predefined / finite number of values to the amplitude and previous components from previous signals/regions;
Determining an error value;
(i.e., root mean square of error values)
No additional limitations are present. 	

With respect to claims 58, the claim(s) recite:
58. (New) The apparatus as claimed in Claim 49, wherein the first audio frame of the interval audio frames comprises a plurality of subframes, wherein each of the plurality of subframes comprises a spatial direction component value and wherein the spatial direction component value is an average spatial direction component value comprising the mean of the plurality of subframe spatial direction component values, and the quantised spatial direction component value is a quantised average spatial direction component value.

This reads on a human (e.g., mentally and/or using pen and paper):
Wherein each audio signal comprises an subframe (e.g., sub groups or intervals).
No additional limitations are present. 	

With respect to claim 59, the claim(s) recite:
wherein a spatial direction component value is related to a spatial direction parameter, wherein the spatial direction parameter comprises an azimuth component and an elevation component, and wherein the spatial direction component value is one of:an x-cartesian component transformed from the azimuth component and elevation component;
a y-cartesian component transformed from the azimuth component and elevation component; and
a z-cartesian component transformed from the azimuth component and elevation component.

This reads on a human (e.g., mentally and/or using pen and paper):
Where the spatial parameter is predefined with azimuth component, and elevation component as x-y-z cartesian components.
No additional limitations are present. 	

With respect to claims 60, the claim(s) recite:
60. (New) The apparatus as claimed in Claim 49, wherein the plurality of audio frames comprises audio frames prior to the first audio frame of the interval of audio frames.

This reads on a human (e.g., mentally and/or using pen and paper):
Considering audio signal with previous (e.g., historical) data/frames.
No additional limitations are present. 	

With respect to claims 61, the claim(s) recite:
61. (New) The apparatus as claimed in Claim 49, wherein the plurality of audio frames comprises the first audio frame of the interval of audio frames and audio frames prior to the first audio frame of the interval of audio frames.

This reads on a human (e.g., mentally and/or using pen and paper):
Considering audio signal with previous (e.g., historical) data/frames.
No additional limitations are present. 	

With respect to claims 62, the claim(s) recite:
62. (New) The apparatus as claimed in Claim 49, wherein the determination of use of prediction or non-prediction is signalled as a 1-bit flag.

This reads on a human (e.g., mentally and/or using pen and paper):
Assigning a value to each of the methods (e.g., 0 or 1)
No additional limitations are present. 	

With respect to claim 63, the claim(s) recite:
63. (New) The apparatus as claimed in Claim 49, wherein the interval of audio frames is a silence descriptor (SID) interval.

This reads on a human (e.g., mentally and/or using pen and paper):
Identifying the audio signal as a silence interval.
No additional limitations are present. 	


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 49-53, 60-61, and 63, is/are rejected under 35 U.S.C. 103 as being unpatentable over Eckert et al. (US 20230215445 A1) and further in view of Vasilache et al. (US 20150287418 A1). 

As to independent claim 49, Eckert et al. teaches:
49. (New) An apparatus comprising at least one processor and at least one memory including computer program code (see ¶ [0009]: “According to a further aspect, a computer program product is described. The computer program may comprise executable instructions for performing the method steps outlined in the present document when executed on a computer.”), the at least one memory and the computer program code configured to, with the at least one processor (see ¶ [0176]: “Memory interface 814 is coupled to processors 801, peripherals interface 802 and memory 815 (e.g., flash, RAM, ROM). Memory 815 stores computer program instructions and data,…”), cause the apparatus to:
determine an error of fit measure between a plurality of spatial direction component values from a plurality of audio frames (see ¶ [0147 and 0152-0154]: “[0147] In particular, the current inactive frame may be analyzed, in order to determine whether a chance in spatial and/or spectral characteristic of the noise within the current inactive frame has occurred with respect to the previous inactive frame. By way of example, it may be determined whether the value of a distance measure between the upmixing metadata 105 for the current inactive frame and the upmixing metadata 105 of the previous inactive frame is greater than a pre-determined distance threshold. If this is the case, a SID frame may be inserted for the current inactive frame, in order to signal the changed upmixing metadata 105 to the decoding unit 150. If, on the other hand, the value of the distance measure is smaller than the distance threshold, the current inactive frame may be treated as an ND frame. [0152] The method 600 may further comprise encoding 604 the upmixing metadata 105 for the current frame into the bitstream, if, in particular only if, it is determined that the spatial and/or spectral characteristic of background noise comprised within the current frame and/or the signal-to-noise ratio of the current frame has changed with regards to the subsequence of one or more previous inactive frames. Hence, the current frame may be encoded as a SID frame if, in particular, only if, the spatial and/or spectral characteristic of background noise comprised within the current frame and/or the signal-to-noise ratio of the current frame has changed. [0153] Alternatively, or in addition, the method 600 may comprise determining that the current frame is an inactive frame following one or more previous inactive frames. In addition, the method 600 may comprise determining a value of a distance measure (e.g., a mean square error) between the covariance and/or the upmixing metadata 105 for the current frame and a previous covariance and/or previous upmixing metadata 105 for the one or more previous inactive frames. In other words, it may be determined by how much the covariance for the current frame deviates from the corresponding previous covariance for the one or more previous inactive frames, and/o by how much the upmixing metadata 105 for the current frame deviates from the previous upmixing metadata 105 for the one or more previous inactive frames. The previous upmixing metadata 105 may be the upmixing metadata that has been sent in the last SID frame. The previous covariance may be the covariance that has been used for generating the previous upmixing metadata 105. [0154] The method 600 may further comprise determining whether the value of the distance measure is greater than a pre-determined distance threshold. Encoding 604 the upmixing metadata 105 for the current frame into the bitstream may be performed, if, in particular only if, the value of the distance measure is greater than the pre-determined distance threshold. Alternatively, it may be refrained from encoding 604 the upmixing metadata 105 for the current frame into the bitstream, if, in particular only if, the value of the distance measure is smaller than the pre-determined distance threshold.”) and a curve fitted to a data set comprising the plurality of spatial direction component values (see ¶ [0147 and 0152-0154] citations as in limitation above and further ¶ [0173-0174]: “[0173] Hence, a method 600 of using spatial parameters 105 and same or different downmixes 103 used for active frames to model spatial characteristics of noise are described, thereby allowing comfort noise generation at the decoder 150 that is spatially consistent between active and non-active frames. The method 600 may comprise determining whether a voice signal is present in one or more frames of an audio input 101. In response to determining that no voice signal is present, a covariance may be estimated using frame to frame averaging. Furthermore, spatial noise parameters 105 may be calculated and entropy coding of the spatial noise parameters 105 may be performed. The entropy coded spatial noise parameters 107 may be packed into the bitstream for the one or more frames. [0174] The method 600 may comprise, in response to detecting transients in a frame of the one or more frames, removing the frame from covariance averaging. Calculating the spatial noise parameters 105 may be performed with a smoothed covariance estimation that smoothens across multiple frames to avoid spatial variability in the noise. The method 600 may comprise smoothing covariance across transients and short talk bursts and removing these from the calculation. Alternatively, or in addition, the method 600 may comprise using a limited set of bands and/or limited set of parameters to reduce parameter bit rate for noise and switching back to a full set when audio is present. Alternatively, or in addition, the method 600 may comprise calculating spatial elements separately from spectral elements of the noise to allow re-use of existing comfort noise generators.”);
compare the error of fit measure to a threshold value (see ¶ [0147 and 0152-0154, and 0173-0174] citations as in limitation above. More specifically ¶ [0147]: “…By way of example, it may be determined whether the value of a distance measure between the upmixing metadata 105 for the current inactive frame and the upmixing metadata 105 of the previous inactive frame is greater than a pre-determined distance threshold…”);
quantise a spatial direction component value for a first audio frame of an interval of audio frames to give a quantised spatial direction component value for the first audio frame (see ¶ [0127]: “The method 600 may comprise quantizing the parameters from the set of parameters for encoding 604 the upmixing metadata 105 for the current frame into the bitstream, using a quantizer. In other words, a quantizer may be used to quantize the set of parameters, which is to be encoded into the bitstream. The quantizer, in particular the quantization step size and/or the number of quantization steps of the quantizer, may be dependent on whether the current frame is an active frame or an inactive frame. In particular, the quantization step size may be lower and/or the number of quantization steps may be higher for an active frame than for an inactive frame. Alternatively, or in addition, the quantizer, in particular the quantization step size and/or the number of quantization steps of the quantizer, may be dependent on the number of channels of the downmix signal. By doing this, the efficiency of encoding spatial background noise at high perceptual quality may be further increased.”); and
transmit a bitstream including a representation of the at least one spatial direction component value for one or more audio frames (see ¶ [0125]: “The method 600 may further comprise encoding 604 the upmixing metadata 105 into a bitstream (wherein the bitstream may be transmitted or provided to a corresponding decoding unit 150)…” and ¶ [0148-0149]: “[0148] As outlined above, an input audio signal 101 may be provided to the encoding unit 100, wherein the input audio signal 101 comprises a series of frames. The frames may e.g., have a temporal length of 20 ms. The series of frames may comprise a subset of audio or voice frames and a subset of frames which consist only of background noise. An example sequence of audio frames may be considered A---A--ST---S----S---S---S----S---S---S----S---S----S---S---S---ST----S---S---S----S---S---S----S---A---A--A—A, wherein “A” indicates an active speech and/or audio frame, and wherein “S” indicates a silence frame (also referred to herein as inactive frame) and “ST” indicates a silence transmitted frame, for which a change in spectral and/or spatial characteristic of background noise is detected and hence spatial and/or spectral parameters are to be coded and sent to the decoding unit 150.
[0149] For a discontinuous transmission (DTX) system, for which the actual bitrate of the codec is significantly reduced during inactive frames by only sending noise shaping parameters and assuming that background noise characteristics do not change as frequent as active speech or audio frames, the above sequence may be translated into the following sequence of frames by the encoding unit 100: AB-AB-SID-ND-ND-ND-ND-ND-ND-ND-ND-ND-ND-ND-ND-SID-ND-ND-ND-ND-ND-ND-ND-AB-AB-AB-AB, wherein “AB” indicates an encoder bitstream for an active frame, wherein “SID” indicates a silence indicator frame, which comprises a series of bits for comfort noise generation, and wherein “ND” indicates no data frames, i.e., nothing is transmitted to the decoding unit 150 during these frames. Note that the frequency of transmission of SID frames in the above sequence is not pre-determined and is dependent on change in spectral and/or spatial characteristics of input background noise.”).

However, Eckert et al. does not explicitly teach, but Vasilache et al. does teach:
depending on the comparison, either use a method of non-prediction for generating at least one spatial direction component value for each remaining audio frame of the interval of audio frames, or use a method of prediction for generating the at least one spatial direction component value for each remaining audio frame of the interval of audio frames (see ¶ [0090 and 0161]: “[0090] In embodiments the prediction mode determiner 402 may be arranged to determine whether the input vector should be quantised using either the predictive mode or the non-predictive (safety-net) mode. [0161] It is to be further understood that for other embodiments the quantiser 400 may operate according to the combined flow diagram of both FIGS. 5 and 3. In other words the prediction mode determiner 402 may determine the mode of operation of the quantiser 400 for quantising the next LSF vector based partly on the comparison of the distance measure against the pre-determined threshold. In addition the selected mode of operation of the quantiser 400 will also depend on the coding mode of the next audio frame and on the relative quantisation error produced by the quantiser 400 operating in both the predictive and non-predictive modes.”), wherein all remaining audio frames comprise all but the first audio frame of the interval of audio frames (see ¶ [0090 and 0161] citations as in limitation above and further ¶ [0058: “[0058] In other words, the encoding unit 100 may be configured to send audio data 106 and encoded metadata 107 to the decoding unit 150 for every active frame. On the other hand, the encoding unit 100 may be configured to send only encoded metadata 107 (and no audio data 106) for a fraction of the inactive frames (i.e., for the SID frames). For the remaining inactive frames (i.e., for the ND frames), no data may be sent at all (not even encoded metadata 107).).
Eckert et al. and Vasilache et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. to incorporate the teachings of Vasilache et al. of depending on the comparison, either use a method of non-prediction for generating at least one spatial direction component value for each remaining audio frame of the interval of audio frames, or use a method of prediction for generating the at least one spatial direction component value for each remaining audio frame of the interval of audio frames, wherein all remaining audio frames comprise all but the first audio frame of the interval of audio frames which provides the benefit of improving the coding efficiency (¶ [0171] of Vasilache et al.).

Regarding claim 50, Eckert et al. and Vasilache et al. teaches the limitations as in claim 49, above.
Vasilache et al.  further teaches:
50. (New) The apparatus as claimed in Claim 49, wherein the method of non-prediction for generating the at least one spatial direction component value for each remaining audio frame of the interval of audio frames (see ¶ [0058, 0090 and 0161]: “[0058] In other words, the encoding unit 100 may be configured to send audio data 106 and encoded metadata 107 to the decoding unit 150 for every active frame. On the other hand, the encoding unit 100 may be configured to send only encoded metadata 107 (and no audio data 106) for a fraction of the inactive frames (i.e., for the SID frames). For the remaining inactive frames (i.e., for the ND frames), no data may be sent at all (not even encoded metadata 107). [0090] In embodiments the prediction mode determiner 402 may be arranged to determine whether the input vector should be quantised using either the predictive mode or the non-predictive (safety-net) mode. [0161] It is to be further understood that for other embodiments the quantiser 400 may operate according to the combined flow diagram of both FIGS. 5 and 3. In other words the prediction mode determiner 402 may determine the mode of operation of the quantiser 400 for quantising the next LSF vector based partly on the comparison of the distance measure against the pre-determined threshold. In addition the selected mode of operation of the quantiser 400 will also depend on the coding mode of the next audio frame and on the relative quantisation error produced by the quantiser 400 operating in both the predictive and non-predictive modes.”) causes the apparatus to:
store the quantised spatial direction component value of the first audio frame for use as a previous quantised spatial direction component value (see ¶ [0058, 0090 and 0161] citations as in limitation above and further ¶ [0072]: “The quantiser 400 can map each input vector 401 to one of a series of finite definite quantised values in order to produce a quantised vector. The quantised vector value may then be referenced by an index value. The index value may then be converted to a binary number in order to facilitate its storage and transmission.”).
Eckert et al. and Vasilache et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. to incorporate the teachings of Vasilache et al. of wherein the method of non-prediction for generating the at least one spatial direction component value for each remaining audio frame of the interval of audio frames causes the apparatus to: store the quantised spatial direction component value of the first audio frame for use as a previous quantised spatial direction component value which provides the benefit of improving the coding efficiency (¶ [0171] of Vasilache et al.).

Regarding claims 51, Eckert et al. and Vasilache et al. teaches the limitations as in claims 49, above.
Vasilache et al.  further teaches:
51. (New) The apparatus as claimed in Claims 49, wherein the method of prediction for generating the at least one spatial direction component value for each remaining audio frame of the interval of audio (see ¶ [0058, 0090 and 0161] citations as in claim 50 above.) causes the apparatus to:
determine whether the interval of audio frames is a first interval of audio frames of a silence region [i.e., silence region taught by Eckert et al.: “[0147] In particular, the current inactive frame may be analyzed, in order to determine whether a chance in spatial and/or spectral characteristic of the noise within the current inactive frame has occurred with respect to the previous inactive frame.”] of the spatial audio signal or whether the interval of audio frames is a further interval of audio frames of the silence region of the spatial audio signal (see ¶ [0091] of Vasilache et al.: “In order to facilitate the decision of whether the prediction mode determiner 402 should determine that the quantiser 400 should be used in a predictive mode of operation or in a non-predictive (safety-net) mode of operation the prediction mode determiner 402 may be arranged to receive a further input 403. The further input 403 may convey to the prediction mode determiner 402 the type of coding regime (or coding mode) used to encode the input audio signal 110 to the encoder 104. In other words, it is to be appreciated that the encoder 104 may operate in one of a number of modes of operation, where each mode of operation can be tailored to suit a particular type of input audio frame.”).
Eckert et al. and Vasilache et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. to incorporate the teachings of Vasilache et al. of determine whether the interval of audio frames is a first interval of audio frames of a silence region of the spatial audio signal or whether the interval of audio frames is a further interval of audio frames of the silence region of the spatial audio signal which provides the benefit of improving the coding efficiency (¶ [0171] of Vasilache et al.).

Regarding claims 52, Eckert et al. and Vasilache et al. teaches the limitations as in claims 51, above.
Eckert et al.  further teaches:
52. (New) The apparatus as claimed in Claims 51, 
 wherein when the interval of audio frames is determined as the first interval of audio frames of the silence region of the spatial audio signal (see ¶ [0057]: “Hence, the encoding unit 100 may be configured to classifying the different frames of the input signal 101 into active (A) or silent (S) frames (which are also referred to as inactive frames). Furthermore, the encoding unit 100 may be configured to determine and encode data for comfort noise generation within a “SID” frame (which corresponds e.g., to the current S frame of a series of S frames). The SID frames may be sent repeatedly, in particular periodically, for a series of S frames. By way of example, a SID frame may be sent every 8.sup.th frame (which corresponds to a time interval of 160 ms between subsequent SID frames, when using 20 ms frames)...”), the apparatus is further caused to:

Vasilache et al.  further teaches:
determine the coefficients of a backward predictor using a data set comprising a plurality of quantised spatial direction component values drawn from the plurality of audio frames (see Fig. 4 (404: backward predictor) and ¶ [0103 and 0106]: “[0103] Initially, for an input audio frame, the quantiser 400 may quantise the input LSF vector conveyed via the input 401 using the predictor 404. In other words a predicted LSF vector may be generated by the predictor 404 and passed to the summer 408. The summer 408 may then determine the difference between the input LSF vector 401 and the predicted LSF vector to provide a residual LSF vector 405…. [0106] With reference to FIG. 4, the quantised LSF vector 409 may be passed to the predictor 404 in order to populate the predictor memory. The predictor memory may be then used to generate the predicted LSF vector for subsequent input LSF vectors.”);
initialise the backward predictor with the quantised spatial direction component value for the first audio frame of the interval of audio frames (see Fig. 4 and ¶ [0103 and 0106] citations as in limitation above.); and
use the backward predictor to predict the at least one spatial direction component value for each remaining audio frame of the first interval of audio frames of the silence region [i.e., silence region taught by Eckert et al.: “[0147] In particular, the current inactive frame may be analyzed, in order to determine whether a chance in spatial and/or spectral characteristic of the noise within the current inactive frame has occurred with respect to the previous inactive frame.”]  (see Fig. 4 and ¶ [0103 and 0106] of Vasilache et al. citations as in limitation above.).
Eckert et al. and Vasilache et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. to incorporate the teachings of Vasilache et al. of determine the coefficients of a backward predictor using a data set comprising a plurality of quantised spatial direction component values drawn from the plurality of audio frames, initialise the backward predictor with the quantised spatial direction component value for the first audio frame of the interval of audio frames and use the backward predictor to predict the at least one spatial direction component value for each remaining audio frame of the first interval of audio frames which provides the benefit of improving the coding efficiency (¶ [0171] of Vasilache et al.).

Regarding claims 53, Eckert et al. and Vasilache et al. teaches the limitations as in claims 52, above.
Vasilache et al.  further teaches:
52. (New) The apparatus as claimed in Claims 51, wherein the backward predictor is a first order backward predictor (see Fig. 4 and ¶ [0103 and 0106] of Vasilache et al. citations as in limitation above.), and wherein the coefficients of the backward predictor are determined using least mean square analysis of the data set comprising the plurality of average spatial direction component values drawn from the plurality of audio frames (see Fig. 4 and ¶ [0103 and 0106] of Vasilache et al. citations as in claim 52 above and further ¶ [0126]: “In embodiments the quantisation error may be formulated as the mean square error between the LSF vector and the quantised LSF vector.”).
Eckert et al. and Vasilache et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. to incorporate the teachings of Vasilache et al. of wherein the backward predictor is a first order backward predictor, and wherein the coefficients of the backward predictor are determined using least mean square analysis of the data set comprising the plurality of average spatial direction component values drawn from the plurality of audio frames which provides the benefit of improving the coding efficiency (¶ [0171] of Vasilache et al.).

Regarding claim 60, Eckert et al. and Vasilache et al. teaches the limitations as in claim 49, above.
Eckert et al.  further teaches:
60. (New) The apparatus as claimed in Claim 49, wherein the plurality of audio frames comprises audio frames prior to the first audio frame of the interval of audio frames (see ¶ [0151]: “In other words, the method 600 may comprise determining that the current frame is an inactive frame following a subsequence of one or more previous inactive frames (which is directly preceding the current frame) …”).

Regarding claim 61, Eckert et al. and Vasilache et al. teaches the limitations as in claim 49, above.
Eckert et al.  further teaches:
61. (New) The apparatus as claimed in Claim 49, wherein the plurality of audio frames comprises the first audio frame of the interval of audio frames and audio frames prior to the first audio frame of the interval of audio frames (see ¶ [0151] citation as in claim 60, above.).

Regarding claims 63, Eckert et al. and Vasilache et al. teaches the limitations as in claims 51, above.
Eckert et al.  further teaches:
63. (New) The apparatus as claimed in Claims 51, wherein the interval of audio frames is a silence descriptor (SID) interval (see ¶ [0057]: “Hence, the encoding unit 100 may be configured to classifying the different frames of the input signal 101 into active (A) or silent (S) frames (which are also referred to as inactive frames). Furthermore, the encoding unit 100 may be configured to determine and encode data for comfort noise generation within a “SID” frame (which corresponds e.g., to the current S frame of a series of S frames). The SID frames may be sent repeatedly, in particular periodically, for a series of S frames. By way of example, a SID frame may be sent every 8.sup.th frame (which corresponds to a time interval of 160 ms between subsequent SID frames, when using 20 ms frames). No data may be transmitted during the one or more following S frames of the series of S frames. Hence, the encoding unit 100 may be configured to perform DTX (discontinuous transmission) or to switch to a DTX mode.”).

Claims 54, and 58-59, is/are rejected under 35 U.S.C. 103 as being unpatentable over Eckert et al. (US 20230215445 A1) and further in view of Vasilache et al. (US 20150287418 A1) as applied to claim 49 and 52 above and further in view of Fuchs et al. (US 20200265851 A1). 

Regarding claims 54, Eckert et al. and Vasilache et al.  teaches the limitations as in claims 52, above.
54. (New) The apparatus as claimed in Claims 52, 
wherein when the interval of audio frames is determined as the further interval of audio frames of the silence region of the spatial audio signal ([i.e., silence region taught by Eckert et al.: “[0147] In particular, the current inactive frame may be analyzed, in order to determine whether a chance in spatial and/or spectral characteristic of the noise within the current inactive frame has occurred with respect to the previous inactive frame.”] see ¶ [0091] of Vasilache et al.: “In order to facilitate the decision of whether the prediction mode determiner 402 should determine that the quantiser 400 should be used in a predictive mode of operation or in a non-predictive (safety-net) mode of operation the prediction mode determiner 402 may be arranged to receive a further input 403. The further input 403 may convey to the prediction mode determiner 402 the type of coding regime (or coding mode) used to encode the input audio signal 110 to the encoder 104. In other words, it is to be appreciated that the encoder 104 may operate in one of a number of modes of operation, where each mode of operation can be tailored to suit a particular type of input audio frame.”), 
Eckert et al. and Vasilache et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. to incorporate the teachings of Vasilache et al. of wherein when the interval of audio frames is determined as the further interval of audio frames of the silence region of the spatial audio signal which provides the benefit of improving the coding efficiency (¶ [0171] of Vasilache et al.).

However, Eckert et al. and Vasilache et al.  does not explicitly teach, but Fuchs et al. does teach:
the apparatus is further caused to: use linear interpolation to interpolate between the quantised spatial direction component value for the first audio frame of the further interval of audio frames of the silence region and a previous quantised spatial direction component value for a first audio frame from a previous interval of audio frames of the silence region (see ¶ [0143-0144]: “[0143] Advantageously, only the non-negative values from the ICC quantization table are used, as icc=[1.0, 0.937, 0.84118, 0.60092, 0.36764, 0.0], containing only 6 levels of the original 8. Because an ICC of 0.0 corresponds to a diffuseness of 1.0, and an ICC of 1.0 corresponds to a diffuseness of 0.0, a set of y coordinates are created as y=1.0−icc, with a corresponding set of x coordinates as x=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0]. A shape-preserving piecewise cubic interpolation method, known as Piecewise Cubic Hermite Interpolating Polynomial (PCHIP), is used to derive a curve passing through the set of points defined by x and y. The number of steps of the diffuseness quantizer is diff_alph, which in the proposed implementation is 8, but it has no relation to the total number of levels of the ICC quantization table, which is also 8. [0144] A new set of diff_alph equally spaced coordinates x_interpolated from 0.0 to 1.0 (or close to, but smaller than 1.0, when the case of pure diffuseness of 1.0 is avoided because of sound rendering considerations) are generated, and the corresponding y values on the curve are used as the reconstruction values, those reconstruction values being non-linearly spaced. Points half-way between consecutive x_interpolated values are also generated, and the corresponding y values of the curve are used as threshold values to decide which values map to a particular diffuseness index and therefore reconstruction value. For the proposed implementation, the generated reconstruction and threshold values (rounded to 5 digits), computed by the generate_diffuseness_quantizer function are:…”);
extrapolate the linear interpolation to extend over remaining audio frames of the further interval of audio frames (see ¶ [0179-0182]: “[0179] FIG. 5d illustrates this second encoding mode which can, for example, be an entropy coding mode with modeling. The preprocessed indexes which are, for example, categorized for a mixed diffuseness frame as illustrated in FIG. 5a at 240 are input into a block 266 which collects corresponding quantization data such as elevation indexes, elevation alphabets, azimuth indexes, azimuth alphabets, and this data is collected into separate vectors for a frame. In block 267, averages are calculated for elevation and azimuth clearly based on information derived from dequantization and corresponding vector transformation as is discussed later on. These average values are quantized with the highest angular precision used in the frame indicated at block 268. Predicted elevation and azimuth indexes are generated from the average values as illustrated in block 269, and, signed distances for elevation and azimuth from the original indexes and related to the predicted elevation and azimuth indexes are computed and optionally reduced to another smaller interval of values. [0180] As illustrated in FIG. 5e, the data generated by the modeling operation using a projection operation for deriving prediction values illustrated in FIG. 5d is entropy encoded. This encoding operation illustrated in FIG. 5e finally generates encoding bits from the corresponding data. In block 271, the average values for azimuth and elevation are converted to signed values and, a certain reordering 272 is performed in order to have a more compact representation and, those average values are encoded 273 with a binary code or a punctured binary code in order to generate the elevation average bits 274 and the azimuth average bits. In block 275, a Golomb-Rice parameter is determined such as illustrated in FIG. 5f, and this parameter is then also encoded with a (punctured) binary code illustrated at block 276 in order to have the Golomb-Rice parameter for elevation and another Golomb-Rice parameter for azimuth illustrated at 277. In block 278, the (reduced) signed distances calculated by block 270 are reordered and then encoded with the extended Golomb-Rice method illustrated at 279 in order to have the encoded elevation distances and azimuth distances indicated at 280. [0181] FIG. 5f illustrates an implementation for the determination of the Golomb-Rice parameter in block 275 which is performed both for the determination of the elevation Golomb-Rice parameter or the azimuth Golomb-Rice parameter. In block 281, an interval is determined for the corresponding Golomb-Rice parameter. In block 282, the total number of bits for all reduced signed distances are computed, for each candidate value and, in block 283, the candidate value resulting in the smallest number of bits is selected as the Golomb-Rice parameter for either azimuth or elevation processing. [0182] Subsequently, FIG. 5g is discussed in order to further illustrate the procedure in block 279 of FIG. 5e, i.e., the extended Golomb-Rice method. Based on the selected Golomb-Rice parameter p, the distance index either for elevation or for azimuth is separated in a most significant part MSP and a least significant part LSP as illustrated to the right of block 284. In block 285, a terminating zero bit of the MSP part is eliminated, in the case when the MSP is the maximum possible value, and in block 286, the result is encoded with a (punctured) binary code.”); and
assign at least one value from along the extrapolated part of the line[a]r interpolation for each remaining audio frame of the further interval of audio frames, wherein the assigned at least one value is the at least one spatial direction component value for the each remaining audio frame of the further interval of audio frames (see ¶ [0143-0144 and 0179-0182] citations as in limitations above and ¶ [0183-0184]: “[0183] The LSP part is also encoded with a (punctured) binary code illustrated at 287. Thus, on lines 288 and 289, the encoded bits for the most significant part MSP and the encoded bits for the least significant part LSP are obtained which together represent the corresponding encoded reduced signed distances either for elevation or for azimuth. [0184] FIG. 8d illustrates an example for an encoded direction…”).
Eckert et al. and Vasilache et al. and Fuchs et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. and Vasilache et al. to incorporate the teachings of Fuchs et al. of use linear interpolation to interpolate between the quantised spatial direction component value for the first audio frame of the further interval of audio frames of the silence region and a previous quantised spatial direction component value for a first audio frame from a previous interval of audio frames of the silence region; extrapolate the linear interpolation to extend over remaining audio frames of the further interval of audio frames; and assign at least one value from along the extrapolated part of the line[a]r interpolation for each remaining audio frame of the further interval of audio frames, wherein the assigned at least one value is the at least one spatial direction component value for the each remaining audio frame of the further interval of audio frames which provides the benefit of providing an improved processing concept for the spatial audio coding parameters (¶ [0030] of Fuchs et al.).

Regarding claim 58, Eckert et al. and Vasilache et al. teaches the limitations as in claim 49, above.
However, Eckert et al. and Vasilache et al. does not explicitly teach, but Fuchs et al. does teach:
58. (New) The apparatus as claimed in Claim 49, 
wherein the first audio frame of the interval audio frames comprises a plurality of subframes (see ¶ [0029] : “[0029] It is of advantage to perform a processing of the parameters in frames, where each frame is organized in a certain number of bands, where each band comprises at least two original frequency bins, in which the parameters have been calculated…”), 
wherein each of the plurality of subframes comprises a spatial direction component value (see Fig. 7b and ¶ [0225]: “FIG. 7b illustrates an advantageous procedure performed by the parameter resolution converter. In block 721, the parameter resolution converter 710 obtains the diffuseness/direction parameters for a frame…”) and 
wherein the spatial direction component value is an average spatial direction component value comprising the mean of the plurality of subframe spatial direction component values, and the quantised spatial direction component value is a quantised average spatial direction component value (see ¶ [0234]: “When decoding with modeling was indicated by the mode bit 806, then the averages for the azimuth/elevation data in the band/frame is decoded as indicated by block 866. In block 868, distances for the azimuth/elevation information in the band are decoded and, in block 870, the corresponding quantized elevation and azimuth parameters are calculated using typically an addition operation…”).
Eckert et al. and Vasilache et al. and Fuchs et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. and Vasilache et al. to incorporate the teachings of Fuchs et al. of wherein the first audio frame of the interval audio frames comprises a plurality of subframes, wherein each of the plurality of subframes comprises a spatial direction component value and wherein the spatial direction component value is an average spatial direction component value comprising the mean of the plurality of subframe spatial direction component values, and the quantised spatial direction component value is a quantised average spatial direction component value which provides the benefit of providing an improved processing concept for the spatial audio coding parameters (¶ [0030] of Fuchs et al.).

Regarding claims 59, Eckert et al. and Vasilache et al.  teaches the limitations as in claims 49, above.
However, Eckert et al. and Vasilache et al.  does not explicitly teach, but Fuchs et al. does teach:
59. (New) The apparatus as claimed in Claims 49, wherein a spatial direction component value is related to a spatial direction parameter, wherein the spatial direction parameter comprises an azimuth component and an elevation component (see ¶ [0033]: “Another feature of the second aspect is that the direction parameters are converted into an azimuth/elevation representation. In this feature, the elevation value is used to determine the alphabet for the quantization and encoding of the azimuth value. Advantageously, the azimuth alphabet has the greatest amount of different values when the elevation indicates a zero angle or generally an equator angle on the unit sphere. The smallest amount of values in the azimuth alphabet is when the elevation indicates the north or south pole of the unit sphere. Hence, the alphabet value decreases with an increasing absolute value of the elevation angle counted from the equator.”), and wherein the spatial direction component value (see ¶ [0033] citation as in limitation above: direction parameters) is one of:
an x-cartesian component transformed from the azimuth component and elevation component (see Fig. 2d (i.e., direction parameter is now a unit vector (in a 2- or 3- dimensional region)) and ¶ [0100]: “FIG. 2d illustrates the calculation of the direction parameters with the second resolution. In block 146, the amplitude-related measure is calculated per bin in the third or fourth resolution similar to item 143 of FIG. 2c. In block 147, weighting factors are calculated for each bin, but not only dependent on the amplitude-related measure obtained from block 147 but also using the corresponding diffuseness parameter per bin as illustrated in FIG. 2d. Thus, for the same amplitude-related measure, a higher factor is typically calculated for a lower diffuseness. In block 148, a grouping and averaging is performed using a weighted combination such as an addition and the result can be normalized as illustrated in optional block 146. Thus, at the output of block 146, the direction parameter is obtained as a unit vector corresponding to a two-dimensional or three-dimensional region such as a Cartesian vector that can easily be converted into a polar form having an azimuth value and an elevation value.”);
a y-cartesian component transformed from the azimuth component and elevation component (see Fig. 2d (i.e., direction parameter is now a unit vector (in a 2- or 3- dimensional region)) and ¶ [0100] citations as in limitation above.); and
a z-cartesian component transformed from the azimuth component and elevation component (see Fig. 2d (i.e., direction parameter is now a unit vector (in a 2- or 3- dimensional region)) and ¶ [0100] citations as in limitation above.).
Eckert et al. and Vasilache et al. and Fuchs et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. and Vasilache et al. to incorporate the teachings of Fuchs et al. of wherein a spatial direction component value is related to a spatial direction parameter, wherein the spatial direction parameter comprises an azimuth component and an elevation component, and wherein the spatial direction component value is one of: an x-cartesian component transformed from the azimuth component and elevation component; a y-cartesian component transformed from the azimuth component and elevation component; and a z-cartesian component transformed from the azimuth component and elevation component which provides the benefit of providing an improved processing concept for the spatial audio coding parameters (¶ [0030] of Fuchs et al.).

Claims 55-56 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eckert et al. (US 20230215445 A1) and further in view of Vasilache et al. (US 20150287418 A1) as applied to claim 49 above and further in view of Asbeck et al. (US 20100295613 A1). 

Regarding claim 55, Eckert et al. and Vasilache et al.  teaches the limitations as in claims 52, above.
Eckert et al. further teaches:
55. (New) The apparatus as claimed in Claim 49, 
wherein the apparatus configured to determine an error of fit measure between a plurality of spatial direction component values from the plurality of audio frames and the curve fitted to a data set comprising the plurality of spatial direction component values (see ¶ [0147 and 0152-0154]: “[0147] In particular, the current inactive frame may be analyzed, in order to determine whether a chance in spatial and/or spectral characteristic of the noise within the current inactive frame has occurred with respect to the previous inactive frame. By way of example, it may be determined whether the value of a distance measure between the upmixing metadata 105 for the current inactive frame and the upmixing metadata 105 of the previous inactive frame is greater than a pre-determined distance threshold… [0152] The method 600 may further comprise encoding 604 the upmixing metadata 105 for the current frame into the bitstream, if, in particular only if, it is determined that the spatial and/or spectral characteristic of background noise comprised within the current frame and/or the signal-to-noise ratio of the current frame has changed with regards to the subsequence of one or more previous inactive frames… [0153] …In addition, the method 600 may comprise determining a value of a distance measure (e.g., a mean square error) between the covariance and/or the upmixing metadata 105 for the current frame and a previous covariance and/or previous upmixing metadata 105 for the one or more previous inactive frames… [0154] The method 600 may further comprise determining whether the value of the distance measure is greater than a pre-determined distance threshold. Encoding 604 the upmixing metadata 105 for the current frame into the bitstream may be performed, if, in particular only if, the value of the distance measure is greater than the pre-determined distance threshold…”) is configured to: 

While Eckert et al. and Vasilache et al. teach limitations mentioned above associated with the of spatial direction component values from the plurality of audio frames, Eckert et al. and Vasilache et al.  do not explicitly teach, but Asbeck et al. does teach:
perform least mean squares analysis on the data set (see ¶ [0029]: “…where N is the order of nonlinearity for x and M is the order of nonlinearity for v. The coefficients, c.sub.nm, are found by a least mean square (LMS) algorithm and then the entries in the amplitude LUT are replaced with the curve fitted value calculated from equation (2).

    PNG
    media_image1.png
    66
    228
    media_image1.png
    Greyscale
 (Equation 2)”);
determine for each s(see ¶ [0039]: “…The LUT updating was iterated 14 times in order to provide sufficient LUT coverage to achieve a normalized RMS error of 3.98%.”); and
determine the error of fit measure as the root mean square of the error values (see ¶ [0039] citation as in limitation above.).
Eckert et al. and Vasilache et al. and Asbeck et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. and Vasilache et al. to incorporate the teachings of Asbeck et al. of perform least mean squares analysis on the data set comprising et al.).

Regarding claim 56, Eckert et al. and Vasilache et al. and Asbeck et al.  teaches the limitations as in claims 55, above.
Asbeck et al.  further teaches: 
56. (New) The apparatus as claimed in Claim 55, wherein the polynomial for curve fitting to the data set is a first order polynomial (see ¶ [0029] and Equation 2 citations as in claim 55, above.).
Eckert et al. and Vasilache et al. and Asbeck et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. and Vasilache et al. to incorporate the teachings of Asbeck et al. of wherein the polynomial for curve fitting to the data set is a first order polynomial which provides the benefit of providing significant improvements in overall system efficiency (¶ [0041] of Asbeck et al.).

Claims 57 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eckert et al. (US 20230215445 A1) and further in view of Vasilache et al. (US 20150287418 A1) and Fuchs et al. (US 20200265851 A1)  as applied to claim 54 above and further in view of Asbeck et al. (US 20100295613 A1). 

Regarding claims 57, Eckert et al. and Vasilache et al. and Fuchs et al. teaches the limitations as in claims 54, above.
Fuchs et al. further teaches:
57. (New) The apparatus as claimed in Claim 54, 
wherein the plurality of spatial direction component values are original spatial direction components values for the previous interval of audio frames (see ¶ [0147 and 0152-0154]: “[0147] In particular, the current inactive frame may be analyzed, in order to determine whether a chance in spatial and/or spectral characteristic of the noise within the current inactive frame has occurred with respect to the previous inactive frame. By way of example, it may be determined whether the value of a distance measure between the upmixing metadata 105 for the current inactive frame and the upmixing metadata 105 of the previous inactive frame is greater than a pre-determined distance threshold. If this is the case, a SID frame may be inserted for the current inactive frame, in order to signal the changed upmixing metadata 105 to the decoding unit 150. If, on the other hand, the value of the distance measure is smaller than the distance threshold, the current inactive frame may be treated as an ND frame. [0152] The method 600 may further comprise encoding 604 the upmixing metadata 105 for the current frame into the bitstream, if, in particular only if, it is determined that the spatial and/or spectral characteristic of background noise comprised within the current frame and/or the signal-to-noise ratio of the current frame has changed with regards to the subsequence of one or more previous inactive frames. Hence, the current frame may be encoded as a SID frame if, in particular, only if, the spatial and/or spectral characteristic of background noise comprised within the current frame and/or the signal-to-noise ratio of the current frame has changed. [0153] Alternatively, or in addition, the method 600 may comprise determining that the current frame is an inactive frame following one or more previous inactive frames. In addition, the method 600 may comprise determining a value of a distance measure (e.g., a mean square error) between the covariance and/or the upmixing metadata 105 for the current frame and a previous covariance and/or previous upmixing metadata 105 for the one or more previous inactive frames. In other words, it may be determined by how much the covariance for the current frame deviates from the corresponding previous covariance for the one or more previous inactive frames, and/o by how much the upmixing metadata 105 for the current frame deviates from the previous upmixing metadata 105 for the one or more previous inactive frames. The previous upmixing metadata 105 may be the upmixing metadata that has been sent in the last SID frame. The previous covariance may be the covariance that has been used for generating the previous upmixing metadata 105. [0154] The method 600 may further comprise determining whether the value of the distance measure is greater than a pre-determined distance threshold. Encoding 604 the upmixing metadata 105 for the current frame into the bitstream may be performed, if, in particular only if, the value of the distance measure is greater than the pre-determined distance threshold. Alternatively, it may be refrained from encoding 604 the upmixing metadata 105 for the current frame into the bitstream, if, in particular only if, the value of the distance measure is smaller than the pre-determined distance threshold.”)
wherein the apparatus caused to determine an error of fit measure between a plurality of spatial direction values from a plurality of audio frames and a curve fitted to a data set comprising the plurality of spatial direction values (see ¶ [0147 and 0152-0154] citations as in limitation above and further ¶ [0173-0174]: “[0173] Hence, a method 600 of using spatial parameters 105 and same or different downmixes 103 used for active frames to model spatial characteristics of noise are described, thereby allowing comfort noise generation at the decoder 150 that is spatially consistent between active and non-active frames. The method 600 may comprise determining whether a voice signal is present in one or more frames of an audio input 101. In response to determining that no voice signal is present, a covariance may be estimated using frame to frame averaging. Furthermore, spatial noise parameters 105 may be calculated and entropy coding of the spatial noise parameters 105 may be performed. The entropy coded spatial noise parameters 107 may be packed into the bitstream for the one or more frames. [0174] The method 600 may comprise, in response to detecting transients in a frame of the one or more frames, removing the frame from covariance averaging. Calculating the spatial noise parameters 105 may be performed with a smoothed covariance estimation that smoothens across multiple frames to avoid spatial variability in the noise. The method 600 may comprise smoothing covariance across transients and short talk bursts and removing these from the calculation. Alternatively, or in addition, the method 600 may comprise using a limited set of bands and/or limited set of parameters to reduce parameter bit rate for noise and switching back to a full set when audio is present. Alternatively, or in addition, the method 600 may comprise calculating spatial elements separately from spectral elements of the noise to allow re-use of existing comfort noise generators.”) 
Eckert et al. and Vasilache et al. and Fuchs et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. and Vasilache et al. to incorporate the teachings of Fuchs et al. of wherein the plurality of spatial direction component values are original spatial direction components values for the previous interval of audio frames, wherein the apparatus caused to determine an error of fit measure between a plurality of spatial direction values from a plurality of audio frames and a curve fitted to a data set comprising the plurality of spatial direction values which provides the benefit of providing an improved processing concept for the spatial audio coding parameters (¶ [0030] of Fuchs et al.).

However, Eckert et al. and Vasilache et al.  does not explicitly teach, but Fuchs et al. does teach:
wherein the curve fitted to the data set comprising the plurality of spatial direction component values is the linear interpolation between the quantised average spatial direction component value for the first audio frame of the further interval of audio frames of the silence region and a previous quantised average spatial direction component value for the first frame from the previous interval of audio frames of the silence region (see ¶ [0143-0144]: “[0143] Advantageously, only the non-negative values from the ICC quantization table are used, as icc=[1.0, 0.937, 0.84118, 0.60092, 0.36764, 0.0], containing only 6 levels of the original 8. Because an ICC of 0.0 corresponds to a diffuseness of 1.0, and an ICC of 1.0 corresponds to a diffuseness of 0.0, a set of y coordinates are created as y=1.0−icc, with a corresponding set of x coordinates as x=[0.0, 0.2, 0.4, 0.6, 0.8, 1.0]. A shape-preserving piecewise cubic interpolation method, known as Piecewise Cubic Hermite Interpolating Polynomial (PCHIP), is used to derive a curve passing through the set of points defined by x and y. The number of steps of the diffuseness quantizer is diff_alph, which in the proposed implementation is 8, but it has no relation to the total number of levels of the ICC quantization table, which is also 8. [0144] A new set of diff_alph equally spaced coordinates x_interpolated from 0.0 to 1.0 (or close to, but smaller than 1.0, when the case of pure diffuseness of 1.0 is avoided because of sound rendering considerations) are generated, and the corresponding y values on the curve are used as the reconstruction values, those reconstruction values being non-linearly spaced. Points half-way between consecutive x_interpolated values are also generated, and the corresponding y values of the curve are used as threshold values to decide which values map to a particular diffuseness index and therefore reconstruction value. For the proposed implementation, the generated reconstruction and threshold values (rounded to 5 digits), computed by the generate_diffuseness_quantizer function are:…”)
determine for each spatial direction value of the plurality of spatial direction component values an error value between the each spatial direction component value and a point along the is the linear interpolation (see ¶ [0143-0144] citations as in limitation above.); and
While Eckert et al. and Vasilache et al. and Fuchs et al. teach limitations mentioned above associated with the of spatial direction component values from the plurality of audio frames, Eckert et al. and Vasilache et al. and Fuchs et al. do not explicitly teach, but Asbeck et al. does teach:
determine the error of fit measure as the root mean square of the error values (see ¶ [0039]: “…The LUT updating was iterated 14 times in order to provide sufficient LUT coverage to achieve a normalized RMS error of 3.98%.”)
Eckert et al. and Vasilache et al. and Fuchs et al. and Asbeck et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. and Vasilache et al. to incorporate the teachings of Asbeck et al. of determine the error of fit measure as the root mean square of the error values which provides the benefit of providing significant improvements in overall system efficiency (¶ [0041] of Asbeck et al.).
Claim 62 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eckert et al. (US 20230215445 A1) and further in view of Vasilache et al. (US 20150287418 A1) as applied to claim 49 above and further in view of Purnagen et al. (US 20130030819 A1). 

Regarding claim 62, Eckert et al. and Vasilache et al. teaches the limitations as in claim 49, above.
However, Eckert et al. and Vasilache et al.  do not explicitly teach, but Purnagen et al. does teach:
62. (New) The apparatus as claimed in Claim 49, wherein the determination of use of prediction or non-prediction is signalled as a 1-bit flag (see ¶ [0229]: “[0144] The following data elements are used for this tool: [0145] cplx_pred_all 0: Some bands use L/R coding, as signaled by cplx_pred_used[] [0146] 1: All bands use complex stereo prediction [0147] cplx_pred_used[g][sfb] One-bit flag per window group g and scalefactor band sfb (after mapping from prediction bands) indicating that [0148] 0: complex prediction is not being used, L/R coding is used [0149] 1: complex prediction is being used…”).
Eckert et al. and Vasilache et al. and Purnagen et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in encoding/decoding audio signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Eckert et al. and Vasilache et al. to incorporate the teachings of Purnagen et al. of wherein the determination of use of prediction or non-prediction is signalled as a 1-bit flag which provides the benefit of improved coding efficiency (¶ [0085] of Purnagen et al.).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Keisha Y. Castillo-Torres
Examiner
Art Unit 2659



                                                                                                                                                                                                      /Keisha Y. Castillo-Torres/Examiner, Art Unit 2659   

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Feb 29, 2024
Application Filed
Nov 18, 2025
Non-Final Rejection mailed — §101, §103
Mar 16, 2026
Response Filed
May 06, 2026
Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/467,236
Patent 12627724
SYSTEMS AND METHODS FOR ARTIFICIAL DUBBING
4y 8m to grant Granted May 12, 2026
17/865,788
Patent 12620410
ALIGNING PARAMETER DATA WITH AUDIO RECORDINGS
3y 9m to grant Granted May 05, 2026
18/441,704
Patent 12608546
PROCESSING EVENT DATA AND/OR TABULAR DATA FOR INPUT TO ONE OR MORE MACHINE LEARNING MODELS
2y 2m to grant Granted Apr 21, 2026
17/710,137
Patent 12573402
GENERATING AND/OR UTILIZING UNINTENTIONAL MEMORIZATION MEASURE(S) FOR AUTOMATIC SPEECH RECOGNITION MODEL(S)
3y 11m to grant Granted Mar 10, 2026
18/187,330
Patent 12536989
Language-agnostic Multilingual Modeling Using Effective Script Normalization
2y 10m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+29.5%)
2y 10m (~7m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 110 resolved cases by this examiner. Grant probability derived from career allowance rate.