Last updated: May 29, 2026
Application No. 18/541,132
AUDIO RENDERING SYSTEM AND METHOD, AND ELECTRONIC DEVICE

Non-Final OA §103
Filed
Dec 15, 2023
Priority
Jun 15, 2021 — CN PCT/CN2021/100062 +1 more
Examiner
ADESANYA, OLUJIMI A
Art Unit
2658
Tech Center
2600 — Communications
Assignee
BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
OA Round
3 (Non-Final)
Interview Optional

— +25.6% interview lift. Examiner has a relatively high allowance rate (66%); +25.6% interview lift. A written response may suffice.
Based on 660 resolved cases, 2023–2026
Examiner Intelligence

ADESANYA, OLUJIMI A View full profile →
Grants 66% — above average
Career Allowance Rate
435 granted / 660 resolved
+3.9% vs TC avg
Strong +26% interview lift
Without
With
+25.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
22 currently pending
Career history
693
Total Applications
across all art units
Statute-Specific Performance

§101
5.1%
-34.9% vs TC avg
§103
87.5%
+47.5% vs TC avg
§102
4.5%
-35.5% vs TC avg
§112
1.3%
-38.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 660 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

                                 Continued Examination under 37 CFR 1.114
             A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 4/16/26 has been entered.

Response to Amendment
Th prior 35 U.S.C. 112 rejection of claims (1/16/26) is hereby withdrawn in light of amendments to the claims.

Response to Arguments
Applicant's arguments filed 3/16/26 have been fully considered but they are not persuasive. 
  Regarding amended claim 1 and similar independent claims 8 and 15 previously rejected with reference Sen, as well as additional reference Stein, Applicant argues that the primary concept of Sen is separate encoding of different types of audio signals, but that Sen does not disclose encoding various types of input signals into spatially encoded signals in a single and common format, and that the acquisition and encoding in Sen is obviously different from the claimed language, and as such argues that Sen and Stein fail to disclose all the limitations of the claims (Arguments, pg. 13 – pg. 18. ln 3).
   Examiner respectfully disagrees as Sen discloses acquiring an immersive audio content 111 that includes different audio scenes (i.e., specific audio formats) including multi-channel audio, audio objects, Higher order ambisonics (HOA) and dialogue (fig. 1; para. [0024]), where the Audio scenes of the immersive audio content 111 are represented by a number of channels/objects 150, HOA 154, and dialogue 158, and where each scene is accompanied by its metadata including channel/object metadata 151, HOA metadata 155, and dialogue metadata 159 (fig. 1; para. [0026]), and where each metadata includes properties of the corresponding sound/audio (para. [0026]), corresponding to limitation “an acquisition step of acquiring an audio signal in a specific audio content format and relevant parameters of the audio signal in the specific audio content format, wherein the relevant parameters are acquired based on metadata associated with the audio signal in the specific audio content format”. Sen also discloses the use of a hierarchical HOA spatial encoder 135 to spatially encode the HOA along with its metadata into a HOA audio stream 184 that is used by a HOA baseline encoder 141 to encode the different audio scenes (i.e., specific audio formats) of channel/object audio stream 180, HOA audio stream 184, and stereo audio stream 186 into a common audio stream 191 for subsequent reconstruction/playback (fig. 1; para. [0035]), corresponding to limitation “a spatial encoding step of spatially encoding the audio signal in the specific audio content format based on the relevant parameters of the audio signal in the specific audio content format to obtain a spatially encoded audio signal in a common spatial format, wherein the spatially encoded audio signal is an Ambisonics type of audio signal”. Sen further discloses the use of a dialogue spatial encoder 139 and a audio object spatial encoder 131 to respectively spatially encode a dialogue-based audio format along with its metadata (i.e., spatial attribute information)159 (fig. 1; para. [0025]; para. [0034]) and spatially encode audio objects 160 along with metadata 161 (fig. 1; para. [0027]), corresponding to limitation “wherein the spatial encoding step further comprises in response to the audio signal in the specific audio content format comprising an object-based audio representation signal, spatially encoding the object-based audio representation signal based on spatial attribute information in the relevant parameters of the object-based audio representation signal”.
Also, as provided in the previous rejection of claim 4 and as identified by Applicant, Stein discloses encoding spatial audio objects using metadata directional information/depth information involving Azimuth information relative to a listener (para. [para. [0006]; para. [0026]; para. [0068]; para. [0081]-[0082]), corresponding to limitation “wherein the spatial attribute information of the object-based audio representation signal comprises information related to a spatial propagation path of a sound object of the audio representation signal to a listener, which comprises at least one of propagation duration, propagation distance, azimuth information, path energy intensity or nodes along the way of the spatial propagation path of the sound object to the listener”.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.

                Copending Application 
                       (18/541,665)
                Instant Application
                      (18/541,132)
1. An audio rendering system, comprising: 
              
            an audio information processing module configured to acquire relevant parameters of an audio signal in a specific audio content format based on metadata associated with the audio signal in the specific audio content format, wherein the audio signal in the specific audio content format comprises at least one of an object-based audio representation signal, a scene-based audio representation signal, or a channel-based audio representation signal;
             an audio signal spatial encoding module configured to spatially encode the audio signal in the specific audio content format based on the relevant parameters of the audio signal in the specific audio content format to obtain a spatially encoded audio signal in a common spatial format, wherein the spatially encoded audio signal is an Ambisonics type of audio signal, 
               wherein the audio signal spatial encoding module is further configured to in response to the audio signal in the specific audio content format comprising the object-based audio representation signal, spatially encode the object-based audio representation signal based on spatial attribute information in relevant parameters of the object-based audio representation signal, wherein the spatial attribute information of the object-based audio representation signal comprises information related to a spatial propagation path of a sound object of the audio signal to a listener, which comprises at least one of propagation duration, propagation distance, azimuth information, path energy intensity and nodes along the way of the spatial propagation path of the sound object to the listener; and
            an audio signal decoding module configured to spatially decode the encoded audio signal to obtain a decoded audio signal for audio rendering.
1, An audio processing method for audio rendering, comprising:
               an acquisition step of acquiring an audio signal in a specific audio content format and relevant parameters of the audio signal in the specific audio content format, wherein the relevant parameters are acquired based on metadata associated with the audio signal in the specific audio content format; and
              


             a spatial encoding step of spatially encoding the audio signal in the specific audio content format based on the relevant parameters of the audio signal in the specific audio content format to obtain a spatially encoded audio signal in a common spatial format, wherein the spatially encoded audio signal is an Ambisonics type of audio signal,
           wherein the spatial encoding step further comprises in response to the audio signal in the specific audio content format comprising an object-based audio representation signal, spatially encoding the object-based audio representation signal based on spatial attribute information in the relevant parameters of the object-based audio representation signal, and wherein the spatial attribute information of the object-based audio representation signal comprises information related to a spatial propagation path of a sound object of the audio representation signal to a listener, which comprises at least one of propagation duration, propagation distance, azimuth information, path energy intensity or nodes along the way of the spatial propagation path of the sound object to the listener



Claims 1, 2, 4-9, 11-16 and 18-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 2, 4, 5, 10, 11, 13, 14, 19 and 20 of Copending application 18/541,665. Although the claims at issue are not identical as provided in the table above, they are not patentably distinct from each other because Claims 1 (as well as the other independent claims) of the instant application is anticipated by copending Application Claim 1 (as well as the other independent claims) in that Claim 1 of the copending Application contains all the limitations of Claim 1 of the instant application. Claim 1 of the instant application therefore is not patently distinct from the copending application claims and as such is unpatentable for provisional obviousness-type double patenting. The Instant application Claim 1 is broader in every aspect than the copending claim 1 and is therefore an obvious variant thereof. 
Although the conflicting claims are not identical, they are not patentably distinct from each other because removing inherent and/or unnecessary limitations/step and rearranging the claims would be within the level of one of ordinary skill in the art. It is well settled that the omission of an element, e.g. “interference affected channel”, and its function is an obvious expedient if the remaining elements perform the same function as before. In re Karlson, 136 USPQ 184 (CCPA 1963). Also note Ex parte Rainu, 168 USPQ 375 (Bd. App. 1969). Omission of a reference element or step whose function is not needed would be obvious to one of ordinary skill in the art.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.         Claims 1, 2, 4-9, 11-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sen et al US - 2023/0360661 A1 (“Sen”) in view of Stein - US 2022/0345813 A1 (“Stein”)
          Per claim 1, Sen discloses an audio encoding method for audio rendering, comprising:
               an acquisition step of acquiring an audio signal in a specific audio content format and relevant parameters of the audio signal in the specific audio content format, wherein the relevant parameters are acquired based on metadata associated with the audio signal in the specific audio content format (The immersive audio content 111 may include various immersive audio input formats, also referred to as sound field representations, such as multi-channel audio, audio objects, HOA, dialogue, and the like…., para. [0024]; Audio scenes of the immersive audio content 111 may be represented by a number of channels/objects 150, HOA 154, and dialogue 158, accompanied by channel/object metadata 151, HOA metadata 155, and dialogue metadata 159, respectively. Metadata may be used to describe properties of the associated sound field such as the layout configuration or directional parameters of the associated channels, or locations, sizes, direction, or spatial image parameters of the associated objects or HOA to aid a renderer …, para. [0026], multi-channel audio, audio objects, HOA, dialogue as different specific audio content formats, metadata as including relevant parameters); and
               a spatial encoding step of spatially encoding the audio signal in the specific audio content format based on the relevant parameters of the audio signal in the specific audio content format to obtain a spatially encoded audio signal in a common spatial format, wherein the spatially encoded audio signal is an Ambisonics type of audio signal (fig. 1, elements 135, 184, 185, 1941, 191; It is understood that the HOA may also include first-order ambisonics (FOA)…., para. [0024]; Metadata may be used to describe properties of the associated sound field such as the layout configuration or directional parameters of the associated channels, or locations, sizes, direction, or spatial image parameters of the associated objects or HOA …, para. [0025]; In one aspect, the HOA metadata 155 may provide information to guide the HOA priority decision module 125 in determining the HOA priority ranking 166…., para. [0031]; A hierarchical HOA spatial encoder 135 may spatially encode the HOA …, para. [0032]; A baseline encoder 141 may encode the channel/object audio stream 180, HOA audio stream 184, and stereo audio stream 186 into an audio stream 191 based on the target bitrate 190…., para. [0035]; para. [0045], encoding audio of different formats 111 including HOA format into audio stream 191 for subsequent reconstruction/playback as implying obtaining a spatially encoded audio signal in a common spatial format),
                wherein the spatial encoding step further comprises in response to the audio signal in the specific audio content format comprising an object-based audio representation signal, spatially encoding the object-based audio representation signal based on spatial attribute information in the relevant parameters of the object-based audio representation signal (fig. 1, elements 139, 158, 159; Metadata may be used to describe properties of the associated sound field such as the layout configuration or directional parameters of the associated channels …, para. [0025]-[0027]; audio scenes of the immersive audio content 111 may contain dialogue 158 and associated metadata 159. A dialogue spatial encoder 139 may encode the dialogue 158 and the associated metadata 159 based on the target bitrate 190 to generate a stream of speech 188 and speech metadata 189. In one aspect, the dialogue spatial encoder 139 may encode the dialogue 158 into a speech stream 188 of two channel …, para. [0034]), and
                 Sen does not explicitly disclose wherein the spatial attribute information of the object-based audio representation signal comprises information related to a spatial propagation path of a sound object of the audio representation signal to a listener, which comprises at least one of propagation duration, propagation distance, azimuth information, path energy intensity or nodes along the way of the spatial propagation path of the sound object to the listener
               However, this feature is taught by Stein (para. [para. [0006]; para. [0026]; The spatial analysis module 406 can be configured to use a frequency domain signal to determine a relative location of one or more signals or signal components thereof. For example, the spatial analysis module 406 can be configured to determine that a first sound source is or should be positioned in front (e.g., 0° azimuth) of a listener or a reference video location and a second sound source is or should be positioned to the right (e.g., 90° azimuth) of the listener or reference video location.…, para. [0068]; para. [0081]-[0082]) 
               It would have been obvious to one of ordinary skill in the art at the time of the effective filing of the invention to combine the teachings of Stein with the method of Sen in arriving at the missing features of Sen, because such combination would have resulted in differentiating between nearfield and far field sources while rendering spatial audio signal that include depth characteristics (Stein, para. [0008]-[0009]).
            Per claim 2, Sen in view of Stein discloses the method of claim 1, 
                   Sen discloses wherein the audio signal in the specific audio content format comprises at least one of a scene- based audio representation signal, or a channel-based audio representation signal (The immersive audio content 111 may include various immersive audio input formats, also referred to as sound field representations, such as multi-channel audio, audio objects, HOA, dialogue …, para. [0024]), or
              wherein the Ambisonics type of audio signal comprises at least one of First Order Ambisonics (FOA), Higher Order Ambisonics (HOA) or Mixed-Order Ambisonics (MOA)
            Per claim 4, Sen in view of Stein discloses the method of claim 3, 
                 Stein discloses wherein, the spatial encoding step further comprises performing spatial encoding of the audio signal according to at least one of a filtering function that filters the audio signal based on the path energy intensity of the spatial propagation path of a sound object in the audio signal to a listener or a spherical harmonic function based on the azimuth information of the spatial propagation path (fig. 5; Abstract; para. [0006]; para. [0026]; para. [0063]; step 508 can include encoding the spatial audio signal based on direction information identified in step 504 or received with the depth characteristic at step 506 …, para. [0083]) or,
                 wherein the spatial encoding step further comprises encoding the audio signal by adopting at least one of a near-field compensation function or a diffusion function based on the length of a spatial propagation path of a sound object in the audio signal to a listener, or,
                wherein the spatial encoding step further comprises, in response to the audio signal containing a plurality of sound objects, for each sound object in the audio signal, spatially encoding the audio signal based on information related to the spatial propagation path of the sound object of the audio signal to the listener, and based on weights of sound objects defined in metadata, weightedly superposing the encoded signals of audio representation signals of respective sound objects       
            Per claim 5, Sen in view of Stein discloses the method of claim 1, 
                  Sen discloses: wherein the spatial encoding step further comprises in response to the audio signal in the specific audio content format comprising an object-based audio representation signal, acquiring a reverberation relevant signal of the object-based audio signal based on reverberation parameters in the relevant parameters of the object-based audio representation signal, or, 
                 wherein the spatial encoding step further comprises, in response to the audio signal in the specific audio content format comprising a scene-based audio representation signal, weighting the scene-based audio representation signal based on weight information in the relevant parameters of the scene-based audio representation signal, or, 
                  wherein the spatial encoding step further comprises, in response to the audio signal in the specific audio content format comprising a scene-based audio representation signal, performing a sound field rotation operation on the scene-based audio representation signal based on the rotation information indicated in the relevant parameters of the scene-based audio representation signal, or, 
                 wherein the spatial encoding step further comprises, in response to the audio signal in the specific audio content format comprising a specific type of channel signal in the channel-based audio representation signal, converting the specific type of channel signal into an object-based audio representation signal and then encoding it, or, 
                wherein the spatial encoding step further comprises, in response to the audio signal in the specific audio content format comprising a specific type of channel signal in the channel-based audio representation signal, splitting the specific type of channel signal into audio elements by channel and converting them into metadata for encoding (fig. 1; In one aspect, the priority ranking 162 may be determined based on the spatial saliency of the channels and objects, such as the position, direction, movement, density, etc., of the channels/objects 150. For example, channels/objects with greater movement near the perceived position of the dominant sound may be more spatially salient and thus may be ranked higher than channels/objects with less movement away from the perceived position of the dominant sound…., para. [0026], channel prioritization as requiring splitting/separation).
              Per claim 6, Sen in view of Stein discloses the method of claim 1,
                  Sen discloses: wherein the spatial attribute information of the object-based audio representation signal further comprises at least one of azimuth information of each audio element in the audio representation signal in the coordinate system, distance information of each audio element, or relative azimuth information of a sound source related to the audio signal relative to a listener, or,
                  in response to the audio signal in the specific audio content format being a scene- based audio representation signal, the relevant parameters comprise rotation information related to the audio signal, wherein the rotation information related to the audio signal comprises at least one of rotation information of the audio signal or rotation information of a listener of the audio signal, or,
                  in response to the audio signal in the specific audio content format being a specific type of channel signal in a channel-based audio signal, the relevant parameters comprise metadata that is obtained by splitting an audio representation of the specific type of channel signal into audio elements by channel and then performing conversion (fig. 1; In one aspect, the priority ranking 162 may be determined based on the spatial saliency of the channels and objects, such as the position, direction, movement, density, etc., of the channels/objects 150. For example, channels/objects with greater movement near the perceived position of the dominant sound may be more spatially salient and thus may be ranked higher than channels/objects with less movement away from the perceived position of the dominant sound…., para. [0026], channel prioritization as requiring splitting/separation)
                 Stein discloses: wherein the spatial attribute information of the object-based audio representation signal further comprises at least one of azimuth information of each audio element in the audio representation signal in the coordinate system, distance information of each audio element, or relative azimuth information of a sound source related to the audio signal relative to a listener (para. [0006]; para. [0068]; para, [0081]-[0082])
              Per claim 7, Sen in view of Stein discloses the method of claim 1, 
                   Sen discloses wherein the audio signal in the specific audio content format is parsed from an input audio signal in a spatial audio exchange format (fig. 1; The immersive audio content 111 may include various immersive audio input formats, also referred to as sound field representations, such as multi-channel audio, audio objects, HOA, dialogue, and the like.…, para. [0024]).
             Per claim 8, Sen discloses an audio processing device for audio rendering, comprising:
                  an acquisition unit configured to acquire an audio signal in a specific audio content format and relevant parameters of the audio signal in the specific audio content format, wherein the relevant parameters are acquired based on metadata associated with the audio signal in the specific audio content format (The immersive audio content 111 may include various immersive audio input formats, also referred to as sound field representations, such as multi-channel audio, audio objects, HOA, dialogue, and the like…., para. [0024]; Audio scenes of the immersive audio content 111 may be represented by a number of channels/objects 150, HOA 154, and dialogue 158, accompanied by channel/object metadata 151, HOA metadata 155, and dialogue metadata 159, respectively. Metadata may be used to describe properties of the associated sound field such as the layout configuration or directional parameters of the associated channels, or locations, sizes, direction, or spatial image parameters of the associated objects or HOA to aid a renderer …, para. [0026], multi-channel audio, audio objects, HOA, dialogue as different specific audio content formats, metadata as including relevant parameters); and
               a spatial encoding unit configured to spatially encode the audio signal in the specific audio content format based on the relevant parameters of the audio signal in the specific audio content format to obtain a spatially encoded audio signal in a common spatial format, wherein the spatially encoded audio signal is an Ambisonics type of audio signal (fig. 1, elements 135, 184, 185, 1941, 191; It is understood that the HOA may also include first-order ambisonics (FOA)…., para. [0024]; Metadata may be used to describe properties of the associated sound field such as the layout configuration or directional parameters of the associated channels, or locations, sizes, direction, or spatial image parameters of the associated objects or HOA …, para. [0025]; In one aspect, the HOA metadata 155 may provide information to guide the HOA priority decision module 125 in determining the HOA priority ranking 166…., para. [0031]; A hierarchical HOA spatial encoder 135 may spatially encode the HOA …, para. [0032]; A baseline encoder 141 may encode the channel/object audio stream 180, HOA audio stream 184, and stereo audio stream 186 into an audio stream 191 based on the target bitrate 190…., para. [0035]; para. [0045], encoding audio of different formats 111 including HOA format into audio stream 191 for subsequent reconstruction/playback as implying obtaining a spatially encoded audio signal in a common spatial format),
                wherein the spatial encoding step further comprises in response to the audio signal in the specific audio content format comprising an object-based audio representation signal, spatially encoding the object-based audio representation signal based on spatial attribute information in the relevant parameters of the object-based audio representation signal (fig. 1, elements 139, 158, 159; para. [0025]-[0027]; audio scenes of the immersive audio content 111 may contain dialogue 158 and associated metadata 159. A dialogue spatial encoder 139 may encode the dialogue 158 and the associated metadata 159 based on the target bitrate 190 to generate a stream of speech 188 and speech metadata 189. In one aspect, the dialogue spatial encoder 139 may encode the dialogue 158 into a speech stream 188 of two channel …, para. [0034]), and
                 Sen does not explicitly disclose wherein the spatial attribute information of the object-based audio representation signal comprises information related to a spatial propagation path of a sound object of the audio representation signal to a listener, which comprises at least one of propagation duration, propagation distance, azimuth information, path energy intensity or nodes along the way of the spatial propagation path of the sound object to the listener
               However, this feature is taught by Stein (para. [para. [0006]; para. [0026]; The spatial analysis module 406 can be configured to use a frequency domain signal to determine a relative location of one or more signals or signal components thereof. For example, the spatial analysis module 406 can be configured to determine that a first sound source is or should be positioned in front (e.g., 0° azimuth) of a listener or a reference video location and a second sound source is or should be positioned to the right (e.g., 90° azimuth) of the listener or reference video location. …, para. [0068]; para. [0082]), 
               It would have been obvious to one of ordinary skill in the art at the time of the effective filing of the invention to combine the teachings of Stein with the device of Sen in arriving at the missing features of Sen, because such combination would have resulted in differentiating between nearfield and far field sources while rendering spatial audio signal that include depth characteristics (Stein, para. [0008]-[0009])
                Per claim 9, Sen in view of Stein discloses the audio processing device of claim 8, 
                    Encoder claim 8 and method claim 2 are related as Encoder and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 9 is similarly rejected under the same rationale as applied above with respect to claim 2.
                Per claim 11, Sen in view of Stein discloses the audio processing device of claim 10, 
                   Encoder claim 11 and method claim 4 are related as Encoder and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 11 is similarly rejected under the same rationale as applied above with respect to claim 4.   
             Per claim 12, Sen in view of Stein discloses the audio processing device of claim 8,
                   Encoder claim 12 and method claim 5 are related as Encoder and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 12 is similarly rejected under the same rationale as applied above with respect to claim 5.
             Per claim 13, Sen in view of Stein discloses the audio processing device of claim 8, 
                   Encoder claim 13 and method claim 6 are related as Encoder and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 13 is similarly rejected under the same rationale as applied above with respect to claim 6.
            Per claim 14, Sen in view of Stein discloses the audio processing device of claim 8, 
                Encoder claim 14 and method claim 7 are related as Encoder and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 14 is similarly rejected under the same rationale as applied above with respect to claim 7.
           Per claim 15, Sen discloses an electronic apparatus, comprising:
                a memory (para. [0053]-[0054]), and
                a processor coupled to the memory (para. [0053]-[0054]); 
                the processor is configured to execute the following steps: an acquisition step of acquiring an audio signal in a specific audio content format and relevant parameters of the audio signal in the specific audio content format, wherein the relevant parameters are acquired based on metadata associated with the audio signal in the specific audio content format (The immersive audio content 111 may include various immersive audio input formats, also referred to as sound field representations, such as multi-channel audio, audio objects, HOA, dialogue, and the like…., para. [0024]; Audio scenes of the immersive audio content 111 may be represented by a number of channels/objects 150, HOA 154, and dialogue 158, accompanied by channel/object metadata 151, HOA metadata 155, and dialogue metadata 159, respectively. Metadata may be used to describe properties of the associated sound field such as the layout configuration or directional parameters of the associated channels, or locations, sizes, direction, or spatial image parameters of the associated objects or HOA to aid a renderer …, para. [0026], multi-channel audio, audio objects, HOA, dialogue as different specific audio content formats, metadata as including relevant parameters); and
               a spatial encoding step of spatially encoding the audio signal in the specific audio content format based on the relevant parameters of the audio signal in the specific audio content format to obtain a spatially encoded audio signal in a common spatial format, wherein the spatially encoded audio signal is an Ambisonics type of audio signal (fig. 1, elements 135, 184, 185, 1941, 191; It is understood that the HOA may also include first-order ambisonics (FOA)…., para. [0024]; Metadata may be used to describe properties of the associated sound field such as the layout configuration or directional parameters of the associated channels, or locations, sizes, direction, or spatial image parameters of the associated objects or HOA …, para. [0025]; In one aspect, the HOA metadata 155 may provide information to guide the HOA priority decision module 125 in determining the HOA priority ranking 166…., para. [0031]; A hierarchical HOA spatial encoder 135 may spatially encode the HOA …, para. [0032]; A baseline encoder 141 may encode the channel/object audio stream 180, HOA audio stream 184, and stereo audio stream 186 into an audio stream 191 based on the target bitrate 190…., para. [0035]; para. [0045], encoding audio of different formats 111 including HOA format into audio stream 191 for subsequent reconstruction/playback as implying obtaining a spatially encoded audio signal in a common spatial format),
                wherein the spatial encoding step further comprises in response to the audio signal in the specific audio content format comprising an object-based audio representation signal, spatially encoding the object-based audio representation signal based on spatial attribute information in the relevant parameters of the object-based audio representation signal (fig. 1, elements 139, 158, 159; para. [0025]-[0027]; audio scenes of the immersive audio content 111 may contain dialogue 158 and associated metadata 159. A dialogue spatial encoder 139 may encode the dialogue 158 and the associated metadata 159 based on the target bitrate 190 to generate a stream of speech 188 and speech metadata 189. In one aspect, the dialogue spatial encoder 139 may encode the dialogue 158 into a speech stream 188 of two channel …, para. [0034]), and
                 Sen does not explicitly disclose wherein the spatial attribute information of the object-based audio representation signal comprises information related to a spatial propagation path of a sound object of the audio representation signal to a listener, which comprises at least one of propagation duration, propagation distance, azimuth information, path energy intensity or nodes along the way of the spatial propagation path of the sound object to the listener
               However, this feature is taught by Stein (para. [para. [0006]; para. [0026]; The spatial analysis module 406 can be configured to use a frequency domain signal to determine a relative location of one or more signals or signal components thereof. For example, the spatial analysis module 406 can be configured to determine that a first sound source is or should be positioned in front (e.g., 0° azimuth) of a listener or a reference video location and a second sound source is or should be positioned to the right (e.g., 90° azimuth) of the listener or reference video location. …, para. [0068]; para. [0082]), 
               It would have been obvious to one of ordinary skill in the art at the time of the effective filing of the invention to combine the teachings of Stein with the apparatus of Sen in arriving at the missing features of Sen, because such combination would have resulted in differentiating between nearfield and far field sources while rendering spatial audio signal that include depth characteristics (Stein, para. [0008]-[0009]).
            Per claim 16, Sen in view of Stein discloses the electronic apparatus of claim 15, 
                  Apparatus claim 16 and method claim 2 are related as Apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 16 is similarly rejected under the same rationale as applied above with respect to claim 2.
                Per claim 18, Sen in view of Stein discloses the electronic apparatus of claim 17,
                   Apparatus claim 18 and method claim 4 are related as Apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 18 is similarly rejected under the same rationale as applied above with respect to claim 4.
               Per claim 19, Sen in view of Stein discloses the electronic apparatus of claim 15, 
                   Apparatus claim 19 and method claim 5 are related as Apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 19 is similarly rejected under the same rationale as applied above with respect to claim 5.
             Per claim 20, Sen in view of Stein discloses the electronic apparatus of claim 15, 
                Apparatus claim 20 and method claim 6 are related as Apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 20 is similarly rejected under the same rationale as applied above with respect to claim 6.             

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO 892 form.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUJIMI A ADESANYA whose telephone number is (571)270-3307. The examiner can normally be reached Monday-Friday 8:30-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/OLUJIMI A ADESANYA/Primary Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Dec 15, 2023
Application Filed
Jul 29, 2025
Non-Final Rejection mailed — §103
Oct 29, 2025
Response Filed
Jan 16, 2026
Final Rejection mailed — §103
Mar 16, 2026
Response after Non-Final Action
Apr 16, 2026
Request for Continued Examination
Apr 19, 2026
Response after Non-Final Action
May 06, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/334,684
Patent 12639519
TRAINING MASKED LANGUAGE MODELS BASED ON PARTIAL SEQUENCES OF TOKENS
4y 12m to grant Granted May 26, 2026
17/978,023
Patent 12632668
PROHIBITING INCONSISTENT NAMED ENTITY RECOGNITION TAG SEQUENCES
3y 6m to grant Granted May 19, 2026
18/042,518
Patent 12633294
MATRIX CODED STEREO SIGNAL WITH PERIPHONIC ELEMENTS
3y 2m to grant Granted May 19, 2026
18/148,045
Patent 12632664
Description-driven Task-oriented Dialogue Modeling
3y 4m to grant Granted May 19, 2026
18/680,606
Patent 12633298
APPARATUS FOR ENCODING A SPEECH SIGNAL EMPLOYING ACELP IN THE AUTOCORRELATION DOMAIN
1y 11m to grant Granted May 19, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
66%
Grant Probability
92%
With Interview (+25.6%)
3y 6m (~1y 0m remaining)
Median Time to Grant
High
PTA Risk
Based on 660 resolved cases by this examiner. Grant probability derived from career allowance rate.