DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Marulescu (US 20220279303).
Regarding claim 1, Marulescu teaches a device comprising: a memory configured to store data associated with an immersive audio environment (memory, [0069]); and one or more processors (processor, [0069]) configured to: obtain contextual movement estimate data associated with a portion of the immersive audio environment (pose estimate, [0062]; pose metadata, [0069]); set a pose update parameter based on the contextual movement estimate data (update initial pose estimate, [0062]); obtain pose data based on the pose update parameter (pose is obtained when updated as inherent in computer processing, [0062]); obtain rendered assets associated with the immersive audio environment based on the pose data (refined spatial audio data based on pose estimate, [0064]); and generate an output audio signal based on the rendered assets (output audio via speakers, [0067]).
Regarding claim 2, Marulescu teaches the device of claim 1, wherein the pose update parameter indicates a pose data update rate or an operational mode associated with a pose sensor (mode using pose update for rendering compared to pose estimation performed at companion device, [0066]).
Regarding claim 3, Marulescu teaches the device of claim 1, wherein, to set the pose update parameter, the one or more processors are configured to send the pose update parameter to a pose sensor to cause the pose sensor to provide the pose data at a rate associated with the pose update parameter (inherently when “sampling” there is a sampling rate at which pose metadata is sampled, [0068]).
Regarding claim 4, Marulescu teaches the device of claim 1, wherein the one or more processors are configured to determine a listener pose associated with the immersive audio environment (processor 204 generates pose metadata based on the motion data or acceleration data generated by the motion sensor 206, [0069]), and wherein the contextual movement estimate data is based on the listener pose (pose estimate, [0066]).
Regarding claim 5, Marulescu teaches the device of claim 1, wherein the one or more processors are configured to obtain movement trace data associated with the immersive audio environment (processor 204 generates pose metadata based on the motion data or acceleration data generated by the motion sensor 206, [0069]; the camera 212 generates image data corresponding to the face of a wearer of the wearable device 104 or the environment in one or more directions around the wearer, [0069]), wherein the contextual movement estimate data is based on the movement trace data, and wherein the movement trace data is based on historical user interactions of one or more users associated with the immersive audio environment (Image data generated by the camera 212 can be included in the pose metadata or is used by the processor 204 to derive at least a portion of the pose metadata, [0069]).
Regarding claim 6, Marulescu teaches the device of claim 1, wherein the one or more processors are configured to obtain metadata associated with the immersive audio environment, and wherein the contextual movement estimate data is based on the metadata (pose metadata, [0069]).
Regarding claim 7, Marulescu teaches the device of claim 6, wherein the metadata indicates a genre associated with the immersive audio environment, and wherein the one or more processors are configured to determine the contextual movement estimate data based on the genre (spatial audio generated by processor according to an acoustic room model, [0071]).
Regarding claim 8, Marulescu teaches the device of claim 6, wherein the metadata includes one or more movement cues associated with the immersive audio environment and wherein the one or more processors are configured to determine the contextual movement estimate data based on the one or more movement cues (processor 204 generates pose metadata based on the motion data or acceleration data generated by the motion sensor 206, [0069]).
Regarding claim 9, Marulescu teaches the device of claim 1, wherein the one or more processors are configured to, based on pose data associated with a first time: determine two or more predicted listener poses associated with a second time subsequent to the first time (second pose metadata, [0080]); obtain a first rendered asset associated with a first predicted listener pose; obtain a second rendered asset associated with a second predicted listener pose; and selectively generate the output audio signal based on either the first rendered asset or the second rendered asset (wearable device 104 generates an updated pose estimate based on the second pose metadata 312, which takes into account the change in the real pose of the wearable device 104 that occurred from time T1 to time T2. The wearable device 104 then refines the spatial audio data based on the updated pose estimate, [0080]).
Regarding claim 10, Marulescu teaches the device of claim 9, wherein, to selectively generate the output audio signal based on either the first rendered asset or the second rendered asset, the one or more processors are configured to: obtain a first target asset associated with the first predicted listener pose; render the first target asset to generate the first rendered asset (At time T1, the wearable device 104 transmits the first pose metadata 310 to the companion device 102, and the companion device 102 subsequently generates a pose estimate based on the first pose metadata 310, [0079]); obtain a second target asset associated with the second predicted listener pose; render the second target asset to generate the second rendered asset (wearable device 104 generates an updated pose estimate based on the second pose metadata 312, which takes into account the change in the real pose of the wearable device 104 that occurred from time T1 to time T2. The wearable device 104 then refines the spatial audio data based on the updated pose estimate, [0080]); obtain pose data associated with the second time; and select, based on the pose data associated with the second time, the first rendered asset or the second rendered asset for further processing (From time T1 to time T2 in this example, the pose of the wearable device 104 changes (e.g., due to the wearer turning or moving their head) by about 45 degrees. This change in pose is not indicated in the first pose metadata 310 or the corresponding pose estimate generated by the companion device 102. After receiving the pose estimate from the companion device 102 at time T2, the wearable device 104 generates an updated pose estimate based on the second pose metadata 312, which takes into account the change in the real pose of the wearable device 104 that occurred from time T1 to time T2., [0080]).
Regarding claim 11, Marulescu teaches the device of claim 1, wherein, to obtain the rendered assets, the one or more processors are configured to: determine a target asset based on the pose data (companion device 102 may use a head-related transfer function (HRFT) model to render the spatial audio data for emulated reproduction at a particular spatial location based on the pose estimate, [0089]); and generate an asset retrieval request to retrieve the target asset from a storage location (companion device 102 can retrieve the audio data from a local memory of the companion device 102 or from a remote memory, [0089]).
Regarding claim 12, Marulescu teaches the device of claim 11, wherein the target asset is a pre-rendered asset and wherein, to generate the output audio signal, the one or more processors are configured to apply head related transfer functions to the target asset to generate a binaural output signal (companion device 102 may use a head-related transfer function (HRFT) model to render the spatial audio data for emulated reproduction at a particular spatial location based on the pose estimate, [0089]).
Regarding claim 13, Marulescu teaches the device of claim 11, wherein, to obtain the rendered assets, the one or more processors are configured to render the target asset based on the pose data to generate a rendered asset (processor 204 generates pose metadata based on the motion data or acceleration data generated by the motion sensor 206, [0069]), and wherein, to generate the output audio signal, the one or more processors are configured to apply head related transfer functions to the rendered asset to generate a binaural output signal (companion device 102 may use a head-related transfer function (HRFT) model to render the spatial audio data for emulated reproduction at a particular spatial location based on the pose estimate, [0089]).
Regarding claim 14, Marulescu teaches the device of claim 1, wherein the pose data includes first data indicating a translational position of a listener in the immersive audio environment and second data indicating a rotational orientation of the listener in the immersive audio environment (real pose 416 is rotated with respect to the real pose 406, indicating that the user turned their head or body during the second time period, [0083]).
Regarding claim 15, Marulescu teaches the device of claim 1, further comprising a pose sensor coupled to the one or more processors, wherein the pose sensor and the one or more processors are integrated within a head-mounted wearable device (wearable device 104, [0076], fig 1).
Regarding claim 16, Marulescu teaches the device of claim 1, further comprising a modem coupled to the one or more processors (e wearable device 104 may wirelessly transmit the first pose metadata to the companion device 102 using the transceiver 202, [0086]) and configured to send the pose update parameter to a device that includes a pose sensor (first pose metadata includes motion data or acceleration data generated by the motion sensor 206 of the wearable device 104, which may be an IMU, [0085]).
Claims 17 and 20 are each substantially similar to claim 1 and are rejected for the same reasons.
Regarding claim 18, Marulescu teaches the method of claim 17, further comprising, based on pose data associated with a first time: determining two or more predicted listener poses associated with a second time subsequent to the first time (second pose metadata, [0080]); obtaining a first rendered asset associated with a first predicted listener pose; obtaining a second rendered asset associated with a second predicted listener pose; and selectively generating the output audio signal based on either the first rendered asset or the second rendered asset (wearable device 104 generates an updated pose estimate based on the second pose metadata 312, which takes into account the change in the real pose of the wearable device 104 that occurred from time T1 to time T2. The wearable device 104 then refines the spatial audio data based on the updated pose estimate, [0080]), wherein selectively generating the output audio signal based on either the first rendered asset or the second rendered asset comprises: obtaining a first target asset associated with the first predicted listener pose; rendering the first target asset to generate the first rendered asset (At time T1, the wearable device 104 transmits the first pose metadata 310 to the companion device 102, and the companion device 102 subsequently generates a pose estimate based on the first pose metadata 310, [0079]); obtaining a second target asset associated with the second predicted listener pose; rendering the second target asset to generate the second rendered asset (wearable device 104 generates an updated pose estimate based on the second pose metadata 312, which takes into account the change in the real pose of the wearable device 104 that occurred from time T1 to time T2. The wearable device 104 then refines the spatial audio data based on the updated pose estimate, [0080]); obtaining pose data associated with the second time; and selecting, based on the pose data associated with the second time, the first rendered asset or the second rendered asset for further processing (From time T1 to time T2 in this example, the pose of the wearable device 104 changes (e.g., due to the wearer turning or moving their head) by about 45 degrees. This change in pose is not indicated in the first pose metadata 310 or the corresponding pose estimate generated by the companion device 102. After receiving the pose estimate from the companion device 102 at time T2, the wearable device 104 generates an updated pose estimate based on the second pose metadata 312, which takes into account the change in the real pose of the wearable device 104 that occurred from time T1 to time T2., [0080]).
Regarding claim 19, Marulescu teaches the method of claim 17, wherein the pose data includes first data indicating a translational position of a listener in the immersive audio environment and second data indicating a rotational orientation of the listener in the immersive audio environment (view 400 shows the real pose 416 of the wearable device 416 at the end of a second time period. from time T1 to time T2, in which the motion sensor 206 of the wearable device 104 generates second pose metadata, [0083]), and further comprising: receiving first translation data from a first device; receiving second translation data from a second device distinct from the first device (real pose 416 is rotated with respect to the real pose 406, indicating that the user turned their head or body during the second time period, [0083]); and determining the first data based on the first translation data and the second translation data (real pose 416 is rotated with respect to the real pose 406, indicating that the user turned their head or body during the second time period, [0083]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Kile Blair whose telephone number is (571)270-3544. The examiner can normally be reached M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KILE O BLAIR/Primary Examiner, Art Unit 2691