DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/05/2025 has been entered.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-8, 14, 16 of U.S. Patent No. 11,381,797 (herein, ‘797) in view of US 2021/0004201 by Munoz et al.
Regarding claim 1 of instant application
Claim 1 of instant application
Claim 1 of ‘797
A method comprising:
at an electronic device having a processor:
obtaining audio-visual content of a physical environment, wherein the audio- visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
A method comprising:
at an electronic device having a processor:
obtaining audio-visual content of a physical environment, wherein the audio-visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
wherein determining the context comprises: identifying an occurrence of an action of a user interacting with the visual content; or
identifying a positional relationship of the user relative to the visual content;
in accordance with the determined context based on identifying the occurrence of the action or the positional relationship, selecting an audio characteristic for presenting the plurality of audio portions with the visual content; and
determining a temporal relationship between one or more audio portions of the plurality of audio portions and the visual content, the temporal relationship determined based on the context; and
presenting the visual content with the selected audio characteristics.
presenting synthesized audio-visual content based on the temporal relationship.
It should be noted that the table above distinguishes the equivalent limitations as recited claim 1 of instant application in comparison to the limitation as recited in claim 1 of ‘797.
However, claim 1 of ‘797 fails to teach in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content.
Munoz et al. teaches in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content (paragraph 0079-0081 teaches “In the context of streaming application (live or recorded), there may be a large number of audio streams associated with varying levels of quality and/or content. The audio streams may represent any type of audio data, including scene-based audio data (e.g., ambisonic audio data, including FOA audio data, MOA audio data and/or HOA audio data), channel-based audio data, and object-based audio data……the audio decoding device 34 may adaptively select between audio streams available via the bitstream 27 (which are represented by the bitstream 27 and hence the bitstream 27 may be referred to as “audio streams 27”). The audio decoding device 34 may select between different audio streams of the audio streams 27 based on audio location information (ALI) (e.g., 45A in FIGS. 1A-1C), such as capture location information or location information relating to a synthesized audio source included as metadata accompanying the audio streams 27, where the audio location information may define coordinates in the displayed world for the microphones that capture the respective audio streams 27 or coordinates in an acoustical space. The ALI 45A may be representative of a capture location (or synthesize location) in a displayed world (or an acoustical space) at which the corresponding one of the audio streams 27 was captured or synthesized. The audio decoding device 34 may select, based on the ALI 45A, a subset of the audio streams 27, where the subset of the audio streams 27 excludes at least one of the audio streams 27. The audio decoding device 34 may output the subset of audio streams 27 as audio data 19′ (which may also be referred to as “audio streams 19′”). In some examples, the audio decoding device 34 may only decode the subset of the audio streams in response to the selection.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include selecting an audio characteristic for presenting the plurality of audio portions with the visual content, as taught by Munoz et al. into claim 1 of ‘797, because such incorporation would allow for the benefit of accurately performing playback of the content using selection, thus increase user accessibility of the system.
Claim 2 of the instant application corresponds to claim 8 of ‘797 Patent.
Claim 3 of the instant application corresponds to claim 16 of ‘797 Patent.
Claim 4 of the instant application corresponds to claim 8 of ‘797 Patent.
Claim 5 of the instant application corresponds to claim 16 of ‘797 Patent.
Claim 6 of the instant application corresponds to claim 2 of ‘797 Patent.
Claim 7 of the instant application corresponds to claim 3 of ‘797 Patent.
Claim 8 of the instant application corresponds to claim 4 of ‘797 Patent.
Claim 9 of the instant application corresponds to claim 5 of ‘797 Patent.
Claim 10 of the instant application corresponds to claim 6 of ‘797 Patent.
Claim 11 of the instant application corresponds to claim 7 of ‘797 Patent.
Claim 12 of the instant application corresponds to claim 8 of ‘797 Patent.
Claim 13 of the instant application corresponds to claim 8 of ‘797 Patent.
Claim 14 of the instant application corresponds to claim 8 of ‘797 Patent.
Claim 15 of the instant application corresponds to claim 14 of ‘797 Patent.
Claim 16 of the instant application corresponds to claim 16 of ‘797 Patent.
Claim 17 of the instant application corresponds to claim 14 of ‘797 Patent.
Claim 18 of the instant application corresponds to claim 16 of ‘797 Patent.
Regarding claim 19 of instant application
Claim 19 of instant application
Claim 19 of ‘797
A system comprising:
a non-transitory computer-readable storage medium; and
one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising:
obtaining audio-visual content of a physical environment, wherein the audio- visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
A system comprising:
a non-transitory computer-readable storage medium; and
one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising:
obtaining audio-visual content of a physical environment, wherein the audio-visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
wherein determining the context comprises: identifying an occurrence of an action of a user interacting with the visual content; or
identifying a positional relationship of the user relative to the visual content;
in accordance with the determined context based on identifying the occurrence of the action or the positional relationship, selecting an audio characteristic for presenting the plurality of audio portions with the visual content; and
determining a temporal relationship between one or more audio portions of the plurality of audio portions and the visual content, the temporal relationship determined based on the context; and
presenting the visual content with the selected audio characteristics.
presenting synthesized audio-visual content based on the temporal relationship.
It should be noted that the table above distinguishes the equivalent limitations as recited claim 19 of instant application in comparison to the limitation as recited in claim 19 of ‘797.
However, claim 19 of ‘797 fails to teach in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content.
Munoz et al. teaches in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content (paragraph 0079-0081 teaches “In the context of streaming application (live or recorded), there may be a large number of audio streams associated with varying levels of quality and/or content. The audio streams may represent any type of audio data, including scene-based audio data (e.g., ambisonic audio data, including FOA audio data, MOA audio data and/or HOA audio data), channel-based audio data, and object-based audio data……the audio decoding device 34 may adaptively select between audio streams available via the bitstream 27 (which are represented by the bitstream 27 and hence the bitstream 27 may be referred to as “audio streams 27”). The audio decoding device 34 may select between different audio streams of the audio streams 27 based on audio location information (ALI) (e.g., 45A in FIGS. 1A-1C), such as capture location information or location information relating to a synthesized audio source included as metadata accompanying the audio streams 27, where the audio location information may define coordinates in the displayed world for the microphones that capture the respective audio streams 27 or coordinates in an acoustical space. The ALI 45A may be representative of a capture location (or synthesize location) in a displayed world (or an acoustical space) at which the corresponding one of the audio streams 27 was captured or synthesized. The audio decoding device 34 may select, based on the ALI 45A, a subset of the audio streams 27, where the subset of the audio streams 27 excludes at least one of the audio streams 27. The audio decoding device 34 may output the subset of audio streams 27 as audio data 19′ (which may also be referred to as “audio streams 19′”). In some examples, the audio decoding device 34 may only decode the subset of the audio streams in response to the selection.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include selecting an audio characteristic for presenting the plurality of audio portions with the visual content, as taught by Munoz et al. into claim 19 of ‘797, because such incorporation would allow for the benefit of accurately performing playback of the content using selection, thus increase user accessibility of the system.
Regarding claim 20 of instant application
Claim 20 of instant application
Claim 20 of ‘797
A non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations comprising:
obtaining audio-visual content of a physical environment, wherein the audio- visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
A non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations comprising:
at an electronic device having a processor:
obtaining audio-visual content of a physical environment, wherein the audio-visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
wherein determining the context comprises: identifying an occurrence of an action of a user interacting with the visual content; or
identifying a positional relationship of the user relative to the visual content;
in accordance with the determined context based on identifying the occurrence of the action or the positional relationship, selecting an audio characteristic for presenting the plurality of audio portions with the visual content; and
determining a temporal relationship between one or more audio portions of the plurality of audio portions and the visual content, the temporal relationship determined based on the context; and
presenting the visual content with the selected audio characteristics.
presenting synthesized audio-visual content based on the temporal relationship.
It should be noted that the table above distinguishes the equivalent limitations as recited claim 20 of instant application in comparison to the limitation as recited in claim 20 of ‘797.
However, claim 20 of ‘797 fails to teach in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content.
Munoz et al. teaches in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content (paragraph 0079-0081 teaches “In the context of streaming application (live or recorded), there may be a large number of audio streams associated with varying levels of quality and/or content. The audio streams may represent any type of audio data, including scene-based audio data (e.g., ambisonic audio data, including FOA audio data, MOA audio data and/or HOA audio data), channel-based audio data, and object-based audio data……the audio decoding device 34 may adaptively select between audio streams available via the bitstream 27 (which are represented by the bitstream 27 and hence the bitstream 27 may be referred to as “audio streams 27”). The audio decoding device 34 may select between different audio streams of the audio streams 27 based on audio location information (ALI) (e.g., 45A in FIGS. 1A-1C), such as capture location information or location information relating to a synthesized audio source included as metadata accompanying the audio streams 27, where the audio location information may define coordinates in the displayed world for the microphones that capture the respective audio streams 27 or coordinates in an acoustical space. The ALI 45A may be representative of a capture location (or synthesize location) in a displayed world (or an acoustical space) at which the corresponding one of the audio streams 27 was captured or synthesized. The audio decoding device 34 may select, based on the ALI 45A, a subset of the audio streams 27, where the subset of the audio streams 27 excludes at least one of the audio streams 27. The audio decoding device 34 may output the subset of audio streams 27 as audio data 19′ (which may also be referred to as “audio streams 19′”). In some examples, the audio decoding device 34 may only decode the subset of the audio streams in response to the selection.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include selecting an audio characteristic for presenting the plurality of audio portions with the visual content, as taught by Munoz et al. into claim 20 of ‘797, because such incorporation would allow for the benefit of accurately performing playback of the content using selection, thus increase user accessibility of the system.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-8, 11-13 of U.S. Patent No. 11,729,363 (herein, ‘363) in view of US 2021/0004201 by Munoz et al.
Regarding claim 1 of instant application
Claim 1 of instant application
Claim 1 of ‘363
A method comprising:
at an electronic device having a processor:
obtaining audio-visual content of a physical environment, wherein the audio- visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
A method comprising:
at an electronic device having a processor:
obtaining audio-visual content of a physical environment, wherein the audio-visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
wherein determining the context comprises: identifying an occurrence of an action of a user interacting with the visual content; or
identifying a positional relationship of the user relative to the visual content;
in accordance with the determined context based on identifying the occurrence of the action or the positional relationship, selecting an audio characteristic for presenting the plurality of audio portions with the visual content; and
in accordance with a first context, presenting a first audio loop and a static representation of the visual content; and
presenting the visual content with the selected audio characteristics.
in accordance with a second context, presenting a second audio loop and a looping representation of the visual content.
It should be noted that the table above distinguishes the equivalent limitations as recited claim 1 of instant application in comparison to the limitation as recited in claim 1 of ‘363.
However, claim 1 of ‘363 fails to teach wherein the context comprises: an action of a user interacting with the visual content; or a positional relationship of the user relative to the visual content; in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content.
Munoz et al. teaches wherein the context comprises: an action of a user interacting with the visual content; or a positional relationship of the user relative to the visual content (paragraph 0033 that “While described in this disclosure with respect to the VR device, various aspects of the techniques may be performed in the context of other devices, such as a mobile device. In this instance, the mobile device (such as a so-called smartphone) may present the displayed world via a screen, which may be mounted to the head of the user or viewed as would be done when normally using the mobile device. As such, any information on the screen can be part of the mobile device. The mobile device may be able to provide tracking information and thereby allow for both a VR experience (when head mounted) and a normal experience to view the displayed world, where the normal experience may still allow the user to view the displayed world providing a VR-lite-type experience (e.g., holding up the device and rotating or translating the device to view different portions of the displayed world).” Fig. 5D, paragraph 0138-0139 teaches “For example, if listener wants to hear the performers close up (e.g., near stage 444), but wants to see more of the stage (e.g., a wider view than a location closer to the stage) such that position 430 of the listener which is further away from the stage, then the listener may bias the audio source distance threshold of the snapping towards stage audio element S.sub.2 448 instead of closest audio element R.sub.2 456 (distance a>distance b). In some examples, with this bias towards the stage audio elements, the listener stays snapped to S.sub.2 448 as they move towards position 432. At position 432 the listener may snap to audio element S.sub.3 450 because the listener's distance (c) to audio element S.sub.3 450 is less than the listener's distance (d) to audio element S.sub.2 448. Without the bias, the listener would have snapped to audio elements R.sub.2 456 and R.sub.3 458 as the listener moved from position 430 to position 432.” Munoz et al. shows in 5D that listener/user want to view the stage differently and move from one place to another. Based on the user interaction with the visual content); in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content (paragraph 0079-0081 teaches “In the context of streaming application (live or recorded), there may be a large number of audio streams associated with varying levels of quality and/or content. The audio streams may represent any type of audio data, including scene-based audio data (e.g., ambisonic audio data, including FOA audio data, MOA audio data and/or HOA audio data), channel-based audio data, and object-based audio data……the audio decoding device 34 may adaptively select between audio streams available via the bitstream 27 (which are represented by the bitstream 27 and hence the bitstream 27 may be referred to as “audio streams 27”). The audio decoding device 34 may select between different audio streams of the audio streams 27 based on audio location information (ALI) (e.g., 45A in FIGS. 1A-1C), such as capture location information or location information relating to a synthesized audio source included as metadata accompanying the audio streams 27, where the audio location information may define coordinates in the displayed world for the microphones that capture the respective audio streams 27 or coordinates in an acoustical space. The ALI 45A may be representative of a capture location (or synthesize location) in a displayed world (or an acoustical space) at which the corresponding one of the audio streams 27 was captured or synthesized. The audio decoding device 34 may select, based on the ALI 45A, a subset of the audio streams 27, where the subset of the audio streams 27 excludes at least one of the audio streams 27. The audio decoding device 34 may output the subset of audio streams 27 as audio data 19′ (which may also be referred to as “audio streams 19′”). In some examples, the audio decoding device 34 may only decode the subset of the audio streams in response to the selection.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include selecting an audio characteristic for presenting the plurality of audio portions with the visual content, as taught by Munoz et al. into claim 1 of ‘363, because such incorporation would allow for the benefit of accurately performing playback of the content using selection, thus increase user accessibility of the system.
Claim 2 of the instant application corresponds to claim 2 of ‘363 Patent.
Claim 3 of the instant application corresponds to claim 1 of ‘363 Patent.
Claim 4 of the instant application corresponds to claim 3 of ‘363 Patent.
Claim 5 of the instant application corresponds to claim 3 of ‘363 Patent.
Claim 6 of the instant application corresponds to claim 2 of ‘363 Patent.
Claim 7 of the instant application corresponds to claim 3 of ‘363 Patent.
Claim 8 of the instant application corresponds to claim 4 of ‘363 Patent.
Claim 9 of the instant application corresponds to claim 5 of ‘363 Patent.
Claim 10 of the instant application corresponds to claim 6 of ‘363 Patent.
Claim 11 of the instant application corresponds to claim 7 of ‘363 Patent.
Claim 12 of the instant application corresponds to claim 8 of ‘363 Patent.
Claim 13 of the instant application corresponds to claim 8 of ‘363 Patent.
Claim 14 of the instant application corresponds to claim 8 of ‘363 Patent.
Claim 15 of the instant application corresponds to claim 11 of ‘363 Patent.
Claim 16 of the instant application corresponds to claim 11 of ‘363 Patent.
Claim 17 of the instant application corresponds to claim 11 of ‘363 Patent.
Claim 18 of the instant application corresponds to claim 7 of ‘363 Patent.
Regarding claim 19 of instant application
Claim 19 of instant application
Claim 12 of ‘363
A system comprising:
a non-transitory computer-readable storage medium; and
one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising:
obtaining audio-visual content of a physical environment, wherein the audio- visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
A system comprising:
a non-transitory computer-readable storage medium; and
one or more processors coupled to the non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium comprises program instructions that, when executed on the one or more processors, cause the system to perform operations comprising:
obtaining audio-visual content of a physical environment, wherein the audio-visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
wherein determining the context comprises: identifying an occurrence of an action of a user interacting with the visual content; or
identifying a positional relationship of the user relative to the visual content;
in accordance with the determined context based on identifying the occurrence of the action or the positional relationship, selecting an audio characteristic for presenting the plurality of audio portions with the visual content; and
in accordance with a first context, presenting a first audio loop and a static representation of the visual content; and
presenting the visual content with the selected audio characteristics.
in accordance with a second context, presenting a second audio loop and a looping representation of the visual content.
It should be noted that the table above distinguishes the equivalent limitations as recited claim 19 of instant application in comparison to the limitation as recited in claim 12 of ‘363.
However, claim 12 of ‘363 fails to teach wherein the context comprises: an action of a user interacting with the visual content; or a positional relationship of the user relative to the visual content; in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content.
Munoz et al. teaches wherein the context comprises: an action of a user interacting with the visual content; or a positional relationship of the user relative to the visual content (paragraph 0033 that “While described in this disclosure with respect to the VR device, various aspects of the techniques may be performed in the context of other devices, such as a mobile device. In this instance, the mobile device (such as a so-called smartphone) may present the displayed world via a screen, which may be mounted to the head of the user or viewed as would be done when normally using the mobile device. As such, any information on the screen can be part of the mobile device. The mobile device may be able to provide tracking information and thereby allow for both a VR experience (when head mounted) and a normal experience to view the displayed world, where the normal experience may still allow the user to view the displayed world providing a VR-lite-type experience (e.g., holding up the device and rotating or translating the device to view different portions of the displayed world).” Fig. 5D, paragraph 0138-0139 teaches “For example, if listener wants to hear the performers close up (e.g., near stage 444), but wants to see more of the stage (e.g., a wider view than a location closer to the stage) such that position 430 of the listener which is further away from the stage, then the listener may bias the audio source distance threshold of the snapping towards stage audio element S.sub.2 448 instead of closest audio element R.sub.2 456 (distance a>distance b). In some examples, with this bias towards the stage audio elements, the listener stays snapped to S.sub.2 448 as they move towards position 432. At position 432 the listener may snap to audio element S.sub.3 450 because the listener's distance (c) to audio element S.sub.3 450 is less than the listener's distance (d) to audio element S.sub.2 448. Without the bias, the listener would have snapped to audio elements R.sub.2 456 and R.sub.3 458 as the listener moved from position 430 to position 432.” Munoz et al. shows in 5D that listener/user want to view the stage differently and move from one place to another. Based on the user interaction with the visual content); in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content (paragraph 0079-0081 teaches “In the context of streaming application (live or recorded), there may be a large number of audio streams associated with varying levels of quality and/or content. The audio streams may represent any type of audio data, including scene-based audio data (e.g., ambisonic audio data, including FOA audio data, MOA audio data and/or HOA audio data), channel-based audio data, and object-based audio data……the audio decoding device 34 may adaptively select between audio streams available via the bitstream 27 (which are represented by the bitstream 27 and hence the bitstream 27 may be referred to as “audio streams 27”). The audio decoding device 34 may select between different audio streams of the audio streams 27 based on audio location information (ALI) (e.g., 45A in FIGS. 1A-1C), such as capture location information or location information relating to a synthesized audio source included as metadata accompanying the audio streams 27, where the audio location information may define coordinates in the displayed world for the microphones that capture the respective audio streams 27 or coordinates in an acoustical space. The ALI 45A may be representative of a capture location (or synthesize location) in a displayed world (or an acoustical space) at which the corresponding one of the audio streams 27 was captured or synthesized. The audio decoding device 34 may select, based on the ALI 45A, a subset of the audio streams 27, where the subset of the audio streams 27 excludes at least one of the audio streams 27. The audio decoding device 34 may output the subset of audio streams 27 as audio data 19′ (which may also be referred to as “audio streams 19′”). In some examples, the audio decoding device 34 may only decode the subset of the audio streams in response to the selection.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include selecting an audio characteristic for presenting the plurality of audio portions with the visual content, as taught by Munoz et al. into claim 12 of ‘363, because such incorporation would allow for the benefit of accurately performing playback of the content using selection, thus increase user accessibility of the system.
Regarding claim 20 of instant application
Claim 20 of instant application
Claim 13 of ‘363
A non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations comprising:
obtaining audio-visual content of a physical environment, wherein the audio- visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
A non-transitory computer-readable storage medium, storing program instructions computer-executable on a computer to perform operations comprising:
at an electronic device having a processor:
obtaining audio-visual content of a physical environment, wherein the audio-visual content comprises visual content and audio content comprising a plurality of audio portions corresponding to the visual content;
determining a context for presenting the audio-visual content;
wherein determining the context comprises: identifying an occurrence of an action of a user interacting with the visual content; or
identifying a positional relationship of the user relative to the visual content;
in accordance with the determined context based on identifying the occurrence of the action or the positional relationship, selecting an audio characteristic for presenting the plurality of audio portions with the visual content; and
in accordance with a first context, presenting a first audio loop and a static representation of the visual content; and
presenting the visual content with the selected audio characteristics.
in accordance with a second context, presenting a second audio loop and a looping representation of the visual content.
It should be noted that the table above distinguishes the equivalent limitations as recited claim 20 of instant application in comparison to the limitation as recited in claim 13 of ‘363.
However, claim 13 of ‘363 fails to teach wherein the context comprises: an action of a user interacting with the visual content; or a positional relationship of the user relative to the visual content; in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content.
Munoz et al. teaches wherein the context comprises: an action of a user interacting with the visual content; or a positional relationship of the user relative to the visual content (paragraph 0033 that “While described in this disclosure with respect to the VR device, various aspects of the techniques may be performed in the context of other devices, such as a mobile device. In this instance, the mobile device (such as a so-called smartphone) may present the displayed world via a screen, which may be mounted to the head of the user or viewed as would be done when normally using the mobile device. As such, any information on the screen can be part of the mobile device. The mobile device may be able to provide tracking information and thereby allow for both a VR experience (when head mounted) and a normal experience to view the displayed world, where the normal experience may still allow the user to view the displayed world providing a VR-lite-type experience (e.g., holding up the device and rotating or translating the device to view different portions of the displayed world).” Fig. 5D, paragraph 0138-0139 teaches “For example, if listener wants to hear the performers close up (e.g., near stage 444), but wants to see more of the stage (e.g., a wider view than a location closer to the stage) such that position 430 of the listener which is further away from the stage, then the listener may bias the audio source distance threshold of the snapping towards stage audio element S.sub.2 448 instead of closest audio element R.sub.2 456 (distance a>distance b). In some examples, with this bias towards the stage audio elements, the listener stays snapped to S.sub.2 448 as they move towards position 432. At position 432 the listener may snap to audio element S.sub.3 450 because the listener's distance (c) to audio element S.sub.3 450 is less than the listener's distance (d) to audio element S.sub.2 448. Without the bias, the listener would have snapped to audio elements R.sub.2 456 and R.sub.3 458 as the listener moved from position 430 to position 432.” Munoz et al. shows in 5D that listener/user want to view the stage differently and move from one place to another. Based on the user interaction with the visual content); in accordance with the determined context, selecting an audio characteristic for presenting the plurality of audio portions with the visual content (paragraph 0079-0081 teaches “In the context of streaming application (live or recorded), there may be a large number of audio streams associated with varying levels of quality and/or content. The audio streams may represent any type of audio data, including scene-based audio data (e.g., ambisonic audio data, including FOA audio data, MOA audio data and/or HOA audio data), channel-based audio data, and object-based audio data……the audio decoding device 34 may adaptively select between audio streams available via the bitstream 27 (which are represented by the bitstream 27 and hence the bitstream 27 may be referred to as “audio streams 27”). The audio decoding device 34 may select between different audio streams of the audio streams 27 based on audio location information (ALI) (e.g., 45A in FIGS. 1A-1C), such as capture location information or location information relating to a synthesized audio source included as metadata accompanying the audio streams 27, where the audio location information may define coordinates in the displayed world for the microphones that capture the respective audio streams 27 or coordinates in an acoustical space. The ALI 45A may be representative of a capture location (or synthesize location) in a displayed world (or an acoustical space) at which the corresponding one of the audio streams 27 was captured or synthesized. The audio decoding device 34 may select, based on the ALI 45A, a subset of the audio streams 27, where the subset of the audio streams 27 excludes at least one of the audio streams 27. The audio decoding device 34 may output the subset of audio streams 27 as audio data 19′ (which may also be referred to as “audio streams 19′”). In some examples, the audio decoding device 34 may only decode the subset of the audio streams in response to the selection.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to incorporate the ability to include selecting an audio characteristic for presenting the plurality of audio portions with the visual content, as taught by Munoz et al. into claim 13 of ‘363, because such incorporation would allow for the benefit of accurately performing playback of the content using selection, thus increase user accessibility of the system.
Response to Arguments
Applicant's arguments filed 11/05/2025 have been fully considered but they are not persuasive.
In re pages 7-8, the applicant argues that “Fig. 5D and the related description of Munoz thus appears to describe that a listener/user may want to view the stage differently and move from one place to another. While user position affects audio elements (S i 446, S2 448, S 3 450, S4 452) near the stage and audio elements (R i 454, R 2 456, R 3 458, R4 460) away from the stage, Munoz snapping (change of audio source position) is based on changes in distance between the user 430/432 and these multiple audio elements. There is no suggestion of "determining a context for presenting the audio-visual content, wherein determining the context comprises: identifying an occurrence of an action of a user interacting with the visual content; or identifying a positional relationship of the user relative to the visual content; in accordance with the determined context based on identifying the occurrence of the action or the positional relationship, selecting an audio characteristic for presenting the plurality of audio portions with the visual context," as recited in the independent claims. Withdrawal of the rejections of all claims is respectfully requested for at least this reason.”
In response, the examiner respectfully disagrees. Munoz et al. discloses in paragraph 0033 that “While described in this disclosure with respect to the VR device, various aspects of the techniques may be performed in the context of other devices, such as a mobile device. In this instance, the mobile device (such as a so-called smartphone) may present the displayed world via a screen, which may be mounted to the head of the user or viewed as would be done when normally using the mobile device. As such, any information on the screen can be part of the mobile device. The mobile device may be able to provide tracking information and thereby allow for both a VR experience (when head mounted) and a normal experience to view the displayed world, where the normal experience may still allow the user to view the displayed world providing a VR-lite-type experience (e.g., holding up the device and rotating or translating the device to view different portions of the displayed world).” Fig. 5D, paragraph 0138-0139 teaches “For example, if listener wants to hear the performers close up (e.g., near stage 444), but wants to see more of the stage (e.g., a wider view than a location closer to the stage) such that position 430 of the listener which is further away from the stage, then the listener may bias the audio source distance threshold of the snapping towards stage audio element S.sub.2 448 instead of closest audio element R.sub.2 456 (distance a>distance b). In some examples, with this bias towards the stage audio elements, the listener stays snapped to S.sub.2 448 as they move towards position 432. At position 432 the listener may snap to audio element S.sub.3 450 because the listener's distance (c) to audio element S.sub.3 450 is less than the listener's distance (d) to audio element S.sub.2 448. Without the bias, the listener would have snapped to audio elements R.sub.2 456 and R.sub.3 458 as the listener moved from position 430 to position 432.” Munoz et al. shows in 5D that listener/user want to view the stage differently and move from one place to another. Based on where the action of the user interaction with the visual content, position of the user changes relative to visual content, selects different audio elements, thus meets claimed invention.
In re page 8, the applicant argues that “Dependent claim 12 recites "The method of claim 1, wherein determining the context for presenting the audio-visual content comprises determining at least whether the audio-visual content is selected based on user actions and determining a spatial distance between the user and a representation of the audio-visual content." The Office Action cites paras. 0068, 0079-0086, and 0118 of Munoz. While these sections describe selecting between different audio streams based on audio location information (ALI) and device location information (DLI) (e.g., distances between), there is no suggestion of doing so based on a distance between a user and a representation of audio- visual content - in Munoz the audio location are not equivalent to audio-visual content that comprises both visual content and audio content. Accordingly, Munoz does not suggest the claimed subject matter and the rejection of this claim should be withdrawn.”
In response, the examiner respectfully disagrees. As discussed above, Munoz et al. shows in 5D that listener/user want to view the stage differently and move from one place to another. Based on where the action of the user interaction with the visual content, position of the user changes relative to visual content, selects different audio elements, thus meets claimed invention.
In re pages 8-9, the applicant argues that “Dependent claim 13 recites "The method of claim 1, wherein determining the context comprises determining that a user has selected the visual content." The Office Action cites paras. 0047, 0068, 0079-0086, 0118, and 0311 of Munoz. While these sections describe selecting between different audio streams based on audio location information (ALI) and device location information (DLI) (e.g., distances between), there is no suggestion of doing so based on a user selecting visual content. Accordingly, Munoz does not suggest the claimed subject matter and the rejection of this claim should be withdrawn.”
In response, the examiner respectfully disagrees. Munoz discloses paragraph 0047 teaches “The bitstream 27 may represent a compressed version of the audio data 19 and any other different types of the content 21 (such as a compressed version of spherical video data, image data, or text data)”. Furthermore, fig. 1A, 1B, 1C shows source device 12A generates bitstream 27, which in form of audio, video, image data, or text data.
Claims 14-18 are rejected for the same reason as discussed in the corresponding paragraph above. In addition to that, paragraph 0049 teaches “The content consumer device 14A may be operated by an individual, and may represent a VR client device. Although described with respect to a VR client device, content consumer device 14A may represent other types of devices, such as an augmented reality (AR) client device, a mixed reality (MR) client device (or other XR client device), a standard computer, a headset, headphones, a mobile device (including a so-called smartphone), or any other device capable of tracking head movements and/or general translational movements of the individual operating the content consumer device 14A.”
Therefore, in view of the above, the examiner believes that the features of the claims are taught by the applied arts. See also the Office Action sets for the below.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 2021/0004201 by Munoz et al.
Regarding claim 1, a method comprising:
at an electronic devic