Last updated: April 19, 2026
Application No. 18/728,987
SIGNALING VOLUMETRIC VISUAL VIDEO-BASED CODING CONTENT IN IMMERSIVE SCENE DESCRIPTIONS

Non-Final OA §103
Filed
Jul 15, 2024
Examiner
AMIN, JWALANT B
Art Unit
2612
Tech Center
2600 — Communications
Assignee
Interdigital Vc Holdings Inc.
OA Round
1 (Non-Final)
Interview Optional

— +15.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 631 resolved cases, 2023–2026
Examiner Intelligence

AMIN, JWALANT B View full profile →
Grants 79% — above average
Career Allow Rate
500 granted / 631 resolved
+17.2% vs TC avg
Strong +15% interview lift
Without
With
+15.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
14 currently pending
Career history
645
Total Applications
across all art units
Statute-Specific Performance

§101
13.4%
-26.6% vs TC avg
§103
56.8%
+16.8% vs TC avg
§102
7.5%
-32.5% vs TC avg
§112
10.8%
-29.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 631 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Objections
Claim 9 is objected to because of the following informalities: On line 1, between the words “claim 8” and “wherein”, a comma “,” is missing.  Appropriate correction is required.
Claim 19 is objected to because of the following informalities: On line 1, between the words “claim 18” and “wherein”, a comma “,” is missing.  Appropriate correction is required.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-9 and 11-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouazizi et al. (US 2021/0409818, hereinafter Bouazizi’818), and further in view of Bouazizi et al. (US 2021/0099773, hereinafter Bouazizi’773).
Regarding claim 1, Bouazizi’818 teaches a method of using extensions for video texture formats in a Moving Pictures Expert Group (MPEG) scene description ([0050]: the presently disclosed media access architecture is described with respect to the Moving Picture Experts Group (MPEG) standard … The MPEG standard includes a proposed improvement to existing scene description formats in order to support immersive media, with a specific instantiation for glTF 2.0 (‘Graphics Language Transmission Format’). The media access architecture described herein can provide specific extensions that address identified gaps within the existing MPEG and glTF 2.0-based framework(s), while also retaining compatibility and/or interoperability; [0052]: the MAF can be configured to deliver any given media element in a file format that is requested or expected by any given presentation engine, in which case the MAF is not restricted to any particular presentation engine(s) … the MAF can process media content in any input file format for which the MAF can read or can retrieve an appropriate plugin for the input file format. In some examples, the MAF can be a plugin usable by one or more presentation engines, can be integrated with (e.g., integrated into the software of, etc.) one or more presentation engines, a combination of the two, and/or otherwise implemented; [0090]: scene description data can be utilized to provide support for immersive media presentation and experiences, leveraging but not limited to MPEG media and media objects. In some examples, scene description data can be based on glTF 2.0 (or other standards and formats) and can provide a set of specific extensions to address identified gaps. glTF is a file format (e.g., for three-dimensional (3D) scenes) and can be modeled using the JavaScript Object Notation (JSON) standard. For instance, glTF is an application programming interface (API)-neutral runtime asset delivery format), the method comprising: 
receiving information on a MPEG-I scene ([0122]: the scene description can provide support for pipelines or media pipelines (e.g., the processing model of MPEG-I scene description can provide support for the concept of pipelines).; [0125]: In the example of pipeline 576, a single input track is received (e.g., track #4) and, after decoding and processing are performed, is provided to two separate output buffers (e.g., buffers #3 and #4).; [0181]: FIG. 8 is a flow diagram illustrating an example of a process 800 for using one or more media pipelines and buffers to fetch MPEG media objects; [0182]: At block 804, the process 800 includes parsing an MPEG-I scene description file or document. In one illustrative example, the scene description file can be based on the glTF 2.0 format and includes the scene description document 348 shown in FIG. 3 and/or the scene description and scene update documents shown in FIG. 4. In some examples, parsing the scene description file can be performed by the presentation engine upon initial receipt or retrieval of the scene description file. In some cases, the scene description file can be parsed in an ongoing fashion (e.g., in which the presentation engine parses the scene description document to obtain media information and/or buffer information as needed to generate one or more calls to a MAF API and/or a buffer API); [0183]: the output from parsing the MPEG-I scene description can be stored remote from the presentation engine. In some examples, the output from parsing the MPEG-I scene description can be stored remote from the presentation engine, such that the presentation engine obtains or receives the output from parsing the MPEG-I scene description rather than performing the parsing itself), wherein that information includes chroma texture ([0120]: One or more media tracks 522 can include media data and in some cases other data (e.g., metadata, etc.). For example, a media track 522 can include a full media object (such as a 3D asset). In some cases, a media track 522 can include one or more components of a media object (e.g., media components such as a color component (red, green, blue, chroma, luma, etc.), depth, vertex positions, polygons, textures, etc.); [0125]: A configuration such as pipeline 576 can be used in examples in which a single media track contains multiple (e.g., in this case, two) media components, such as one or more color components (e.g., red, green, blue, chroma, luma, etc.), depth information (e.g., for 3D scenes), vertex positions (e.g., for 3D scenes), polygons (e.g., for 3D scenes), textures, etc.);
processing the chroma texture using an extension to provide information to a presentation engine (PE) ([0050]: The MPEG standard includes a proposed improvement to existing scene description formats in order to support immersive media, with a specific instantiation for glTF 2.0 (‘Graphics Language Transmission Format’). The media access architecture described herein can provide specific extensions that address identified gaps within the existing MPEG and glTF 2.0-based framework(s), while also retaining compatibility and/or interoperability; [0090]: scene description data can be based on glTF 2.0 (or other standards and formats) and can provide a set of specific extensions to address identified gaps. glTF is a file format (e.g., for three-dimensional (3D) scenes); [0120]: For example, a media track 522 can include a full media object (such as a 3D asset). In some cases, a media track 522 can include one or more components of a media object (e.g., media components such as a color component (red, green, blue, chroma, luma, etc.), depth, vertex positions, polygons, textures, etc.); [125]: In the example of pipeline 576, a single input track is received (e.g., track #4) and, after decoding and processing are performed, is provided to two separate output buffers (e.g., buffers #3 and #4). In particular, the single input track is provided to one buffer in a decoded and processed form and provided to another buffer in a decoded, but un-processed form. A configuration such as pipeline 576 can be used in examples in which a single media track contains multiple (e.g., in this case, two) media components, such as one or more color components (e.g., red, green, blue, chroma, luma, etc.), depth information (e.g., for 3D scenes), vertex positions (e.g., for 3D scenes), polygons (e.g., for 3D scenes), textures, etc. For example, track #4 may comprise a single stream that stores both color information (as a first media component) and depth information (as a second media component). If the presentation engine 540 expects color information and depth information to be provided in separate output buffers, then media pipeline 576 can configure the decoder #4 to separate the two different media components after decoding, configure the processing element to process (as needed) only one of the media components (e.g., the color information or the depth information), and provide the media components to their own output buffers (e.g., to buffer #3 and buffer #4). The approach illustrated by pipeline 576 is in contrast with that of pipeline 574, the discussion of which contemplated the example of color information and depth information being obtained in two separate tracks and then being mixed by the pipeline into a single output buffer; [0180]: the MAF 720 can query or obtain additional information from the presentation engine 740 in order to optimize the delivery of one or more requested media objects. For example, MAF 720 can obtain additional information regarding a required quality for one or more of the output buffers (e.g., MPEG circular buffers 732a-c). The MAF 720 can also obtain timing information used by the presentation engine 740 for one or more of the requested media objects initially identified in the media information/call to MAF API 325, among other factors. The MAF 720 can also obtain GL Transmission Format (glTF) information, such as gITF accessors and buffer views 738 (e.g., identified by the bufferView variable noted herein). glTF is a standard file format for three-dimensional scenes and models. The gITF information can be stored in gITF buffer 736. Other information that can be used by the presentation engine 740 and/or the MAF 720 include MPEG visual timed accessors 742 and MPEG visual timed accessors 794 (which can be in communication with timed scene extensions and timed accessor sync 790, which can synchronize scene updates). In one example, an accessor in glTF 2.0 defines the types and layout of the data as stored in a buffer that is viewed through a bufferView. For instance, when timed data is read from a buffer, the data in the buffer could change dynamically with time. The buffer element can be extended to add support for a circular buffer that is used with timed data. Examples of buffer views that can be used by the system 700 include buffer view 746a and buffer view 746b. Scene update information 748 can also be used by the MAF 720 and/or the presentation engine 740 to update the scene. For example, when a media sample becomes active, the media player can load the sample data into the presentation engine 740, which can trigger a scene update performed by the presentation engine 740. If the scene update contains an addition of new information (e.g., glTF nodes and/or potential modifications of existing gITF nodes, such as one or more components), the presentation engine 740 can interacts with the MAF 720 to fetch any new content associated with the scene update and can present the new content accordingly; [0182]: At block 804, the process 800 includes parsing an MPEG-I scene description file or document. In one illustrative example, the scene description file can be based on the glTF 2.0 format and includes the scene description document 348 shown in FIG. 3 and/or the scene description and scene update documents shown in FIG. 4); and 
rendering the scene using the received MPEG-I scene information and the processed chroma texture ([0051]: A presentation engine can be used to process and render the content of a scene. This scene content can be delivered or otherwise made available to the presentation engine by the MAF. The MAF can perform media fetching and related tasks such as pre-processing, decoding, format conversion, etc., while the presentation engine can perform rendering functions, resulting in the decoupling of the rendering by the presentation engine from the fetching of the media content from the media sources by the MAF. In some examples of this decoupling dynamic, the MAF can asynchronously retrieve and prepare media content for rendering, on request by the presentation engine. At some later time, the presentation engine can then access the media content retrieved and prepared by the MAF and render the content of a scene; [0181]: the presentation engine can be used to render, present, or otherwise display a 3D media scene or an immersive media scene; [0183]: At block 806, the process 800 includes completing the parsing of the MPEG-I scene description. In one illustrative example, the MPEG-I scene description contains information corresponding to one or more different media objects that are used by the presentation engine to render an MPEG media scene. In some examples, the parsing of the MPEG-I scene description can be performed by the presentation engine to generate media information (e.g., MediaInfo as described with respect to Tables 2 and 3) and/or to generate buffer information (e.g., BufferInfo as described with respect to Tables 2, 3, 5, and 6). The output from parsing the MPEG-I scene description can be stored locally at the presentation engine, for example for later use in populating or otherwise generating one or more MAF API or buffer API calls. In some examples, the output from parsing the MPEG-I scene description can be stored remote from the presentation engine. In some examples, the output from parsing the MPEG-I scene description can be stored remote from the presentation engine, such that the presentation engine obtains or receives the output from parsing the MPEG-I scene description rather than performing the parsing itself; [0190]: At block 820, the process 800 includes starting a render loop at the presentation engine. In one example, the presentation engine can start the render loop by accessing one or more output buffers to obtain previously requested media objects for rendering a media scene. In some cases, a MAF API call can correspond to a specific render loop, e.g., the MAF API can identify some or all of the media objects needed by the presentation engine to perform a given render loop. In some examples, one or more of the render loops performed by the presentation engine can be timed, e.g., a render loop can be associated with a specific start time, and the MAF can perform a scheduling and pipeline management process such that the appropriate media pipelines and buffers are all created, initialized, and triggered to start fetching at an earlier enough time to ensure that processed media objects are delivered to the output buffers at or before the render loop start time. In some examples, the presentation engine can request the MAF to provide processed media objects to the output buffers with some offset from the render loop start time. For example, the presentation engine could specify in the MAF API call, or as a default/pre-determined parameter, that processed media objects are to be provided to the output buffers at least 500 milliseconds prior to the start time (other amounts of time can also be specified, e.g., between 50 milliseconds and 3 seconds). In some examples, the presentation engine can include individual delivery time requirements or deadlines for one or more requested media objects. For example, the presentation engine can include in the MAF API call an additional parameter to specify a delivery time for a requested media object; [0191]: At block 822, the process 800 includes performing the render loop started at block 820 by iterating through the media objects contained in the output buffers provided by the MAF to the presentation engine. In one illustrative example, iterating through the media objects can be based on the MPEG-I scene description and/or the presentation engine's parsing of the MPEG-I scene description (e.g., the parsing performed at blocks 804 and/or 806). From the scene description, the presentation engine determines each media object that is to be rendered for the render loop. For each media object, the presentation engine can iterate through its constituent attributes or media elements (which can themselves be media objects). The presentation engine can then fetch a processed frame for each attribute by accessing and reading from one or more corresponding output buffers. The processed frames can be obtained from the processed media elements provided to the output buffer(s) by the media pipelines initialized by the MAF. In some examples, the presentation engine can determine the corresponding output buffer(s) for a given frame because the presentation engine's MAF API call(s) specified the output buffer(s) for each requested media object. In some examples, the render loop can proceed after the frame fetching described above to the presentation engine binding attribute data and rendering each object of the render loop).
Bouazizi’818 does not explicitly teach the chroma texture is processed using an extension to provide information to sample video texture.
Bouazizi’773 teaches the chroma texture (chrominance) is processed using an extension to provide information to sample video texture (MPEG_texture_video is an extension used to support video-based textures in glTF2 that converts the output luminance and chrominance to a texture format that is supported by the GPU; [0105]: encapsulation unit 30 may store a sequence data set in a video sample that does not include actual coded video data. A video sample may generally correspond to an access unit, which is a representation of a coded picture at a specific time instance; [0111]: Conventional glTF2 defines an extension mechanism (“Specifying Extensions”) that allows the base format to be extended with new capabilities. Any glTF2 object can have an optional extensions property that lists the extensions that are used by that object. According to glTF2, all extensions that are used in a glTF2 scene must be listed in the top-level extensionsUsed array object. Likewise, according to glTF2, extensions that are required to correctly render the scene must also be listed in the extensionsRequired array; [0112]: In order to support video-based textures in glTF2, the techniques of this disclosure (e.g., performed by content preparation device 20, server device 60, and/or client device 40) may use a defined extension of glTF2 similar to the <video> element of HTML and the Media Source extensions thereof. In particular, these techniques include use of a texture element to define a new MotionPictureTexture object. This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU), and making it available through a bufferView with the necessary synchronization information; [0113]: f a texture contains an extension property and the extension property defines its MPEG_texture_video property, then client device 40 may extract the texture from an MPEG compressed video stream. Likewise, server device 60 and/or content preparation device 20 may provide the MPEG_texture_video property and a bitstream including the texture). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Bouazizi’773’s knowledge of using an extension to process the chroma texture to provide information to sample video texture as taught and modify the process of Bouazizi’818 because such a process can present current timed media data in the correct position, such that a user can view/hear the media as if the media were being presented from the corresponding position in the three-dimensional space ([0006]).
Claim 14 is similar in scope to claim 1, and therefore the examiner provides similar rationale to reject claim 14. Moreover, Bouazizi’818 teaches a wireless transmit/receive unit (WTRU) (Media system 300 comprising Media Access Function (MAF) unit that can wirelessly transmit and receive data from the presentation engine; Media Access Function 320, fig. 3; [0052]: the MAF can access or retrieve media content from any media source that is addressable or reachable (e.g., locally or over a wired or wireless network) by the MAF; [0110]: one or more of presentation engine 340 and MAF 320 can be provided remotely (e.g., over the Internet or a wired or wireless network) … the presentation engine 340 can transmit MAF API 325 calls over the Internet to the MAF 320, and the MAF 320 can deliver (e.g., stream) the requested media objects over the Internet to the presentation engine 340) configured to use extensions to support and allow usage of video texture formats in an MPEG-I scene description ([0050]: the presently disclosed media access architecture is described with respect to the Moving Picture Experts Group (MPEG) standard … The MPEG standard includes a proposed improvement to existing scene description formats in order to support immersive media, with a specific instantiation for glTF 2.0 (‘Graphics Language Transmission Format’). The media access architecture described herein can provide specific extensions that address identified gaps within the existing MPEG and glTF 2.0-based framework(s), while also retaining compatibility and/or interoperability; [0052]: the MAF can be configured to deliver any given media element in a file format that is requested or expected by any given presentation engine, in which case the MAF is not restricted to any particular presentation engine(s) … the MAF can process media content in any input file format for which the MAF can read or can retrieve an appropriate plugin for the input file format. In some examples, the MAF can be a plugin usable by one or more presentation engines, can be integrated with (e.g., integrated into the software of, etc.) one or more presentation engines, a combination of the two, and/or otherwise implemented; [0090]: scene description data can be utilized to provide support for immersive media presentation and experiences, leveraging but not limited to MPEG media and media objects. In some examples, scene description data can be based on glTF 2.0 (or other standards and formats) and can provide a set of specific extensions to address identified gaps. glTF is a file format (e.g., for three-dimensional (3D) scenes) and can be modeled using the JavaScript Object Notation (JSON) standard. For instance, glTF is an application programming interface (API)-neutral runtime asset delivery format), the WTRU comprising: a transceiver (communication interface 1240, fig. 12; ; MAF can wirelessly transmit and receive data from the presentation engine and therefore, inherently includes a transceiver; [0110]: one or more of presentation engine 340 and MAF 320 can be provided remotely (e.g., over the Internet or a wired or wireless network) … the presentation engine 340 can transmit MAF API 325 calls over the Internet to the MAF 320, and the MAF 320 can deliver (e.g., stream) the requested media objects over the Internet to the presentation engine 340; [0185]:  The communication session between the MAF and the presentation engine permits the transmission and exchange of requested media objects and their corresponding media information and/or buffer information, for example via one or more calls to a MAF API (such as the MAF API 325 shown in FIGS. 3, 4, and 7). In some examples, completing the initialization of the MAF includes establishing (and/or confirming the ability to establish) a communication session between the MAF and a buffer API (such as the buffer API 335 shown in FIGS. 3, 4, and 7). Completing the initialization of the MAF can also include establishing (and/or confirming the ability to establish) communication sessions between the MAF and one or more media sources or media source locations, in some examples using one or more of the plugins initialized at the MAF as described above with respect to block 808; [0202]: the computing device may include a display, a network interface configured to communicate and/or receive the data, any combination thereof, and/or other component(s)); and a processor operatively coupled to the transceiver (processor 1210, fig. 12 and [0231]; processor 1210 is operatively coupled to the communication interface 1240 as shown in fig. 12).
Regarding claim 2, the combination of Bouazizi’818 and Bouazizi’773 does not explicitly teach the method of claim 1, wherein the chroma texture comprises YCbCr.
However, the combination of Bouazizi’818 and Bouazizi’773 teaches the chroma texture comprises YUV (Bouazizi’818 – [0104]: the media source location can provide the media object as a compressed video stream in YUV420 and the presentation engine 340 can request the media object be provided in a buffer as interleaved RGBA color data; Bouazizi’773 - [0112]: In order to support video-based textures in glTF2, the techniques of this disclosure (e.g., performed by content preparation device 20, server device 60, and/or client device 40) may use a defined extension of glTF2 similar to the <video> element of HTML and the Media Source extensions thereof. In particular, these techniques include use of a texture element to define a new MotionPictureTexture object. This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU), and making it available through a bufferView with the necessary synchronization information). It would have been prima facie obvious to one skilled in the art before the effective filing date of the claimed invention for the chroma texture to be in YCbCr color space. Whether the chroma texture is in YUV color space or in YCbCr color space is solely a matter of aesthetic design choice, and would not be sufficient to distinguish over the prior art. See MPEP 2144.04.
Regarding claim 3, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 1, wherein the extension is configured for chroma texture (Bouazizi’818 – [0050]: The MPEG standard includes a proposed improvement to existing scene description formats in order to support immersive media, with a specific instantiation for glTF 2.0 (‘Graphics Language Transmission Format’). The media access architecture described herein can provide specific extensions that address identified gaps within the existing MPEG and glTF 2.0-based framework(s), while also retaining compatibility and/or interoperability; Bouazizi’818 – [0090]: scene description data can be based on glTF 2.0 (or other standards and formats) and can provide a set of specific extensions to address identified gaps. glTF is a file format (e.g., for three-dimensional (3D) scenes); Bouazizi’818 – [0120]: For example, a media track 522 can include a full media object (such as a 3D asset). In some cases, a media track 522 can include one or more components of a media object (e.g., media components such as a color component (red, green, blue, chroma, luma, etc.), depth, vertex positions, polygons, textures, etc.); Bouazizi’818 – [125]: In the example of pipeline 576, a single input track is received (e.g., track #4) and, after decoding and processing are performed, is provided to two separate output buffers (e.g., buffers #3 and #4). In particular, the single input track is provided to one buffer in a decoded and processed form and provided to another buffer in a decoded, but un-processed form. A configuration such as pipeline 576 can be used in examples in which a single media track contains multiple (e.g., in this case, two) media components, such as one or more color components (e.g., red, green, blue, chroma, luma, etc.), depth information (e.g., for 3D scenes), vertex positions (e.g., for 3D scenes), polygons (e.g., for 3D scenes), textures, etc. For example, track #4 may comprise a single stream that stores both color information (as a first media component) and depth information (as a second media component). If the presentation engine 540 expects color information and depth information to be provided in separate output buffers, then media pipeline 576 can configure the decoder #4 to separate the two different media components after decoding, configure the processing element to process (as needed) only one of the media components (e.g., the color information or the depth information), and provide the media components to their own output buffers (e.g., to buffer #3 and buffer #4). The approach illustrated by pipeline 576 is in contrast with that of pipeline 574, the discussion of which contemplated the example of color information and depth information being obtained in two separate tracks and then being mixed by the pipeline into a single output buffer; Bouazizi’818 – [0180]: the MAF 720 can query or obtain additional information from the presentation engine 740 in order to optimize the delivery of one or more requested media objects. For example, MAF 720 can obtain additional information regarding a required quality for one or more of the output buffers (e.g., MPEG circular buffers 732a-c). The MAF 720 can also obtain timing information used by the presentation engine 740 for one or more of the requested media objects initially identified in the media information/call to MAF API 325, among other factors. The MAF 720 can also obtain GL Transmission Format (glTF) information, such as gITF accessors and buffer views 738 (e.g., identified by the bufferView variable noted herein). glTF is a standard file format for three-dimensional scenes and models. The gITF information can be stored in gITF buffer 736. Other information that can be used by the presentation engine 740 and/or the MAF 720 include MPEG visual timed accessors 742 and MPEG visual timed accessors 794 (which can be in communication with timed scene extensions and timed accessor sync 790, which can synchronize scene updates). In one example, an accessor in glTF 2.0 defines the types and layout of the data as stored in a buffer that is viewed through a bufferView. For instance, when timed data is read from a buffer, the data in the buffer could change dynamically with time. The buffer element can be extended to add support for a circular buffer that is used with timed data. Examples of buffer views that can be used by the system 700 include buffer view 746a and buffer view 746b. Scene update information 748 can also be used by the MAF 720 and/or the presentation engine 740 to update the scene. For example, when a media sample becomes active, the media player can load the sample data into the presentation engine 740, which can trigger a scene update performed by the presentation engine 740. If the scene update contains an addition of new information (e.g., glTF nodes and/or potential modifications of existing gITF nodes, such as one or more components), the presentation engine 740 can interacts with the MAF 720 to fetch any new content associated with the scene update and can present the new content accordingly; Bouazizi’818 – [0182]: At block 804, the process 800 includes parsing an MPEG-I scene description file or document. In one illustrative example, the scene description file can be based on the glTF 2.0 format and includes the scene description document 348 shown in FIG. 3 and/or the scene description and scene update documents shown in FIG. 4; Bouazizi’773 - [0112]: In order to support video-based textures in glTF2, the techniques of this disclosure (e.g., performed by content preparation device 20, server device 60, and/or client device 40) may use a defined extension of glTF2 similar to the <video> element of HTML and the Media Source extensions thereof. In particular, these techniques include use of a texture element to define a new MotionPictureTexture object. This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU), and making it available through a bufferView with the necessary synchronization information).
Regarding claim 4, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 3, wherein the extension adapts the chroma texture (Bouazizi’773 - [0112]: In order to support video-based textures in glTF2, the techniques of this disclosure (e.g., performed by content preparation device 20, server device 60, and/or client device 40) may use a defined extension of glTF2 similar to the <video> element of HTML and the Media Source extensions thereof. In particular, these techniques include use of a texture element to define a new MotionPictureTexture object. This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU), and making it available through a bufferView with the necessary synchronization information. The extension may be denoted “MPEG_texture_video”; Bouazizi’773 - [0113]: f a texture contains an extension property and the extension property defines its MPEG_texture_video property, then client device 40 may extract the texture from an MPEG compressed video stream. Likewise, server device 60 and/or content preparation device 20 may provide the MPEG_texture_video property and a bitstream including the texture).
Regarding claim 5, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 3, wherein the chroma texture is from a modern graphics API (Bouazizi’818 – [0010]: the media information and the buffer information are provided as parameters of a Media Access Function (MAF) Application Programming Interface (API) call made for the media object of the scene description; Bouazizi’818 – [0095]: The media objects retrieved by MAF 320 may include the media objects that comprise the media scene described by the scene description document 348. As illustrated, the presentation engine 340 can interact and communicate with MAF 320 through a MAF Application Programming Interface (API) 325, which sits between the presentation engine 340 and MAF 320. The media system 300 also includes a buffer API 335, which provides the presentation engine 340 and MAF 320 with an interface to buffer management 330; Bouazizi’818 – [0119]: he MAF 420 may itself be implemented with a plugin to the presentation engine 440 (e.g., as illustrated by MAF plugin 443). In some examples, the MAF 420 may be provided as a separate component of the media system 400 (in which case the MAF 420 is not implemented using a plugin). The MAF plugin 443 can provide the presentation engine 440 with the ability to utilize the MAF API 325 and request media objects from MAF 325. In some examples, the MAF plugin 443 can translate or convert media information (e.g., generated when the presentation engine 440 parses the scene description 448) into a properly formatted or constructed MAF API call. In some examples, the MAF plugin 443 can intercept media object requests that the presentation engine 440 generated with a first structure, and can convert (e.g., automatically convert) the requests from the first structure into the structure of a MAF API call (e.g., MAF plugin 443 could be used to convert existing requests into MAF API calls), providing compatibility with presentation engines that previously were not decoupled from media fetching; Bouazizi’818 – [0120]: a media track 522 can include a full media object (such as a 3D asset). In some cases, a media track 522 can include one or more components of a media object (e.g., media components such as a color component (red, green, blue, chroma, luma, etc.), depth, vertex positions, polygons, textures, etc.); Bouazizi’773 - [0112]: This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU); Bouazizi’773 - [0134]: In an extension to the above video element, a media source API may be defined as an element to allow scripting in order to generate media streams for playback).
Regarding claim 6, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 1, wherein the scene is processed with a shader implementation (Bouazizi’773 – [0130]: he transfer node may be used to convert the decoded video output color space and spatial packing into a GPU-friendly color space format. The transfer function is provided as a GLSL function implementation that is called from the fragment shader to extract the correct texture coordinates).
Regarding claim 7, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 1, wherein the scene is processed with a GPU implementation (Bouazizi’773 - [0112]: This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU); Bouazizi’773 - [0176]: decoding encoded video data of the video object to produce decoded YUV data; converting the decoded YUV data to a texture format supported by a local graphics processing unit (GPU)).
Regarding claim 8, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 1, wherein the chroma texture is referenced through a glTF.textures array (Bouazizi’773 - [0111]: Conventional glTF2 defines an extension mechanism (“Specifying Extensions”) that allows the base format to be extended with new capabilities. Any glTF2 object can have an optional extensions property that lists the extensions that are used by that object. According to glTF2, all extensions that are used in a glTF2 scene must be listed in the top-level extensionsUsed array object. Likewise, according to glTF2, extensions that are required to correctly render the scene must also be listed in the extensionsRequired array; Bouazizi’773 - [0112]: In order to support video-based textures in glTF2, the techniques of this disclosure (e.g., performed by content preparation device 20, server device 60, and/or client device 40) may use a defined extension of glTF2 similar to the <video> element of HTML and the Media Source extensions thereof. In particular, these techniques include use of a texture element to define a new MotionPictureTexture object. This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU), and making it available through a bufferView with the necessary synchronization information; Bouazizi’773 - [0113]: f a texture contains an extension property and the extension property defines its MPEG_texture_video property, then client device 40 may extract the texture from an MPEG compressed video stream. Likewise, server device 60 and/or content preparation device 20 may provide the MPEG_texture_video property and a bitstream including the texture; Bouazizi’773 - [0114]: FIG. 5 is a conceptual diagram illustrating an example extended glTF2 schema 220 according to techniques of this disclosure. In this example, glTF2 schema 220 includes texture element 200, which includes timed accessor element 202, transfer element 204, and video_source element 206. Timed_accessor element 202 is a modification of a previous accessor that allows for rotating through a set of buffer_view elements 208A-208N (buffer_view elements 208), based on their timestamps, to access a corresponding texture).
Regarding claim 9, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 8, wherein the glTF.textures array has a MPEG_texture_video extension (Bouazizi’773 - [0111]: Conventional glTF2 defines an extension mechanism (“Specifying Extensions”) that allows the base format to be extended with new capabilities. Any glTF2 object can have an optional extensions property that lists the extensions that are used by that object. According to glTF2, all extensions that are used in a glTF2 scene must be listed in the top-level extensionsUsed array object. Likewise, according to glTF2, extensions that are required to correctly render the scene must also be listed in the extensionsRequired array; Bouazizi’773 - [0112]: In order to support video-based textures in glTF2, the techniques of this disclosure (e.g., performed by content preparation device 20, server device 60, and/or client device 40) may use a defined extension of glTF2 similar to the <video> element of HTML and the Media Source extensions thereof. In particular, these techniques include use of a texture element to define a new MotionPictureTexture object. This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU), and making it available through a bufferView with the necessary synchronization information; Bouazizi’773 - [0113]: f a texture contains an extension property and the extension property defines its MPEG_texture_video property, then client device 40 may extract the texture from an MPEG compressed video stream. Likewise, server device 60 and/or content preparation device 20 may provide the MPEG_texture_video property and a bitstream including the texture).
Regarding claim 11, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 1, further comprising converting YUV to RGB in a GPU using sampler information (Bouazizi’818 – [0104]: In some examples, the media information in the MAF API call can specify how the presentation engine 340 expects the MAF 320 to deliver a processed media object into one or more of the buffers 332a-n. For example, the media information can include a required or expected format for the processed media object, one or more buffer identifiers for delivery of the processed media object, an indication of one or more processing steps for MAF 320 to apply in order to generate the processed media object, etc. The format of a media object obtained from a media source location can differ from the expected format requested by the presentation engine 340. For example, the media source location can provide the media object as a compressed video stream in YUV420 and the presentation engine 340 can request the media object be provided in a buffer as interleaved RGBA color data; Bouazizi’773 - [0112]: In order to support video-based textures in glTF2, the techniques of this disclosure (e.g., performed by content preparation device 20, server device 60, and/or client device 40) may use a defined extension of glTF2 similar to the <video> element of HTML and the Media Source extensions thereof. In particular, these techniques include use of a texture element to define a new MotionPictureTexture object. This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU), and making it available through a bufferView with the necessary synchronization information; Bouazizi’773 - [0176]: decoding encoded video data of the video object to produce decoded YUV data; converting the decoded YUV data to a texture format supported by a local graphics processing unit (GPU)).
Regarding claim 12, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 11, wherein the converting occurs natively (Bouazizi’818 – [0104]: In some examples, the media information in the MAF API call can specify how the presentation engine 340 expects the MAF 320 to deliver a processed media object into one or more of the buffers 332a-n. For example, the media information can include a required or expected format for the processed media object, one or more buffer identifiers for delivery of the processed media object, an indication of one or more processing steps for MAF 320 to apply in order to generate the processed media object, etc. The format of a media object obtained from a media source location can differ from the expected format requested by the presentation engine 340. For example, the media source location can provide the media object as a compressed video stream in YUV420 and the presentation engine 340 can request the media object be provided in a buffer as interleaved RGBA color data; Bouazizi’773 - [0112]: In order to support video-based textures in glTF2, the techniques of this disclosure (e.g., performed by content preparation device 20, server device 60, and/or client device 40) may use a defined extension of glTF2 similar to the <video> element of HTML and the Media Source extensions thereof. In particular, these techniques include use of a texture element to define a new MotionPictureTexture object. This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU), and making it available through a bufferView with the necessary synchronization information; Bouazizi’773 – [0130]: The transfer node may be used to convert the decoded video output color space and spatial packing into a GPU-friendly color space format; Bouazizi’773 - [0176]: decoding encoded video data of the video object to produce decoded YUV data; converting the decoded YUV data to a texture format supported by a local graphics processing unit (GPU)).
Regarding claim 13, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 1, further comprising requesting, via a Media Access Function (MAF), the information on the MPEG-I scene from a server based on one or more views by viewers of the MPEG-I scene (Bouazizi’818 – [0122]: a Media Access Function (e.g., such as MAF 320) can perform setup and management (e.g., initialization) of the media pipelines 572-578. The media pipelines 572-578 can be constructed based on information determined from a scene description (e.g., the media information that the presentation engine 540 obtains by parsing a scene description and provides to the MAF via a MAF API call, as described previously). In some cases, the scene description can provide support for pipelines or media pipelines (e.g., the processing model of MPEG-I scene description can provide support for the concept of pipelines); Bouazizi’818 – [0141]: the “MediaInfo” parameter (e.g., of Table 2 and Table 3) can include location information corresponding to media objects that will be provided to a media pipeline. In some cases, the MediaInfo can include information provided by a scene description. In some examples, the MediaInfo can include information provided by a presentation engine (e.g., the information obtained from the presentation engine parsing a scene description document). In some examples, the MediaInfo can include information provided by the MPEG media extension. A MediaInfo can be assigned a name. In some cases, the MediaInfo can provide one or more alternative locations for the MAF or media pipeline to access a media object (e.g., the one or more alternative locations can be provided in an alternatives array); Bouazizi’818 – [0177]: for each of the MPEG circular buffers 732a, 732b, and 732c that are to be filled, the Media Access Function (MAF) 720 can have sufficient information to select an appropriate source for a requested media object. In some examples, when multiple sources are specified (e.g., a primary source and one or more alternative sources), the MAF 720 can select between the multiple sources. For example, MAF 720 can select an appropriate source from the multiple sources made available to it based on factors such as user and/or system preferences, capabilities of the presentation engine 740, capabilities of the MAF 720, current or projected network conditions and capabilities, etc.; Bouazizi’818 – [0180]: the MAF 720 can query or obtain additional information from the presentation engine 740 in order to optimize the delivery of one or more requested media objects. For example, MAF 720 can obtain additional information regarding a required quality for one or more of the output buffers (e.g., MPEG circular buffers 732a-c). The MAF 720 can also obtain timing information used by the presentation engine 740 for one or more of the requested media objects initially identified in the media information/call to MAF API 325, among other factors. The MAF 720 can also obtain GL Transmission Format (glTF) information, such as gITF accessors and buffer views 738 (e.g., identified by the bufferView variable noted herein). glTF is a standard file format for three-dimensional scenes and models. The gITF information can be stored in gITF buffer 736. Other information that can be used by the presentation engine 740 and/or the MAF 720 include MPEG visual timed accessors 742 and MPEG visual timed accessors 794 (which can be in communication with timed scene extensions and timed accessor sync 790, which can synchronize scene updates). In one example, an accessor in glTF 2.0 defines the types and layout of the data as stored in a buffer that is viewed through a bufferView. For instance, when timed data is read from a buffer, the data in the buffer could change dynamically with time. The buffer element can be extended to add support for a circular buffer that is used with timed data. Examples of buffer views that can be used by the system 700 include buffer view 746a and buffer view 746b. Scene update information 748 can also be used by the MAF 720 and/or the presentation engine 740 to update the scene. For example, when a media sample becomes active, the media player can load the sample data into the presentation engine 740, which can trigger a scene update performed by the presentation engine 740. If the scene update contains an addition of new information (e.g., glTF nodes and/or potential modifications of existing gITF nodes, such as one or more components), the presentation engine 740 can interacts with the MAF 720 to fetch any new content associated with the scene update and can present the new content accordingly; Bouazizi’818 – [0183]: At block 806, the process 800 includes completing the parsing of the MPEG-I scene description. In one illustrative example, the MPEG-I scene description contains information corresponding to one or more different media objects that are used by the presentation engine to render an MPEG media scene. In some examples, the parsing of the MPEG-I scene description can be performed by the presentation engine to generate media information (e.g., MediaInfo as described with respect to Tables 2 and 3) and/or to generate buffer information (e.g., BufferInfo as described with respect to Tables 2, 3, 5, and 6). The output from parsing the MPEG-I scene description can be stored locally at the presentation engine, for example for later use in populating or otherwise generating one or more MAF API or buffer API calls. In some examples, the output from parsing the MPEG-I scene description can be stored remote from the presentation engine. In some examples, the output from parsing the MPEG-I scene description can be stored remote from the presentation engine, such that the presentation engine obtains or receives the output from parsing the MPEG-I scene description rather than performing the parsing itself; Bouazizi’818 – [0191]: At block 822, the process 800 includes performing the render loop started at block 820 by iterating through the media objects contained in the output buffers provided by the MAF to the presentation engine. In one illustrative example, iterating through the media objects can be based on the MPEG-I scene description and/or the presentation engine's parsing of the MPEG-I scene description (e.g., the parsing performed at blocks 804 and/or 806)).
Claims 15-19 are similar in scope to claims 3, 5, 6, 8 and 9, and therefore the examiner provides similar rationale to reject claims 15-19.


Claim(s) 10 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouazizi’818, in view of Bouazizi’773, and further in view of Manjunath et al. (US 6332030, hereinafter Manjunath).
Regarding claim 10, the combination of Bouazizi’818 and Bouazizi’773 teaches the method of claim 9, wherein the chroma texture refers to a sampler (Bouazizi’773 - [0111]: Conventional glTF2 defines an extension mechanism (“Specifying Extensions”) that allows the base format to be extended with new capabilities. Any glTF2 object can have an optional extensions property that lists the extensions that are used by that object. According to glTF2, all extensions that are used in a glTF2 scene must be listed in the top-level extensionsUsed array object. Likewise, according to glTF2, extensions that are required to correctly render the scene must also be listed in the extensionsRequired array; Bouazizi’773 - [0112]: In order to support video-based textures in glTF2, the techniques of this disclosure (e.g., performed by content preparation device 20, server device 60, and/or client device 40) may use a defined extension of glTF2 similar to the <video> element of HTML and the Media Source extensions thereof. In particular, these techniques include use of a texture element to define a new MotionPictureTexture object. This element may reference a new source type that will allow for accessing 2D video elements, either locally or remotely. The source should allow for decoding of the compressed 2D video, converting the output luminance and chrominance (YUV) to a texture format that is supported by the graphics processing unit (GPU), and making it available through a bufferView with the necessary synchronization information; Bouazizi’773 - [0113]: f a texture contains an extension property and the extension property defines its MPEG_texture_video property, then client device 40 may extract the texture from an MPEG compressed video stream. Likewise, server device 60 and/or content preparation device 20 may provide the MPEG_texture_video property and a bitstream including the texture; Bouazizi’773 – [0130]: The transfer node may be used to convert the decoded video output color space and spatial packing into a GPU-friendly color space format; Bouazizi’773 – [0140]: A media decoder, such as video decoder 258 (which may correspond to video decoder 48 of client device 40 of FIG. 1) may extract encoded media data from a track buffer corresponding to one of the track buffers, e.g., track_buffer 256A. Video decoder 258 may then decode the encoded media data and store decoded media data to decoded picture buffer 260. Decoded picture buffer 260, in turn, may output decoded video data to circular buffer 240. Decoded picture buffer 260 and circular buffer 240 may form part of one or more hardware memory devices of client device 40 of FIG. 1, such as random access memory (RAM); Bouazizi’773 - [0176]: decoding encoded video data of the video object to produce decoded YUV data; converting the decoded YUV data to a texture format supported by a local graphics processing unit (GPU)).
The combination of Bouazizi’818 and Bouazizi’773 does not explicitly teach the chroma texture refers to a sampler with "MPEG_YUV" extension which provides sampling information for a decoded video texture in nominal format.
Manjunath teaches teach the chroma texture refers to a sampler with "MPEG_YUV" extension which provides sampling information for a decoded video texture in nominal format (col. 4 lines 3-9: We use the YUV color space for representing color. The Y component is the luminance part of the signal, and U and V represent the chrominance components. Adopting the YUV color space facilitates a simple extension from images to digital video such as those in the MPEG format. The U, V components are down-sampled by a factor of two; col. 14 lines 8-14: We use the YUV color space for representing color. The Y component is the luminance part of the signal, and U and V represent the chrominance components. Adopting the YUV color space facilitates a simple extension from images to digital video such as those in the MPEG format. The U, V components are down-sampled by a factor of two). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply Manjunath’s knowledge of using YUV color space facilitating an extension from images to digital video in MPEG format and modify the process of Bouazizi’818 and Bouazizi’773 because such a process efficiently embeds a significant amount of data in images and/or video (col. 4 lines 25-26).
Claim 20 is similar in scope to claim 10, and therefore the examiner provides similar rationale to reject claim 20.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to JWALANT B AMIN whose telephone number is (571)272-2455. The examiner can normally be reached Monday-Friday 10am - 630pm CST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Said Broome can be reached at 571-272-2931. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JWALANT AMIN/Primary Examiner, Art Unit 2612
Read full office action
Prosecution Timeline

Jul 15, 2024
Application Filed
Jan 19, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/698,287
Patent 12597091
COMPUTER-IMPLEMENTED METHOD, APPARATUS, SYSTEM AND COMPUTER PROGRAM FOR CONTROLLING A SIGHTEDNESS IMPAIRMENT OF A SUBJECT
2y 5m to grant Granted Apr 07, 2026
18/447,234
Patent 12592020
TRACKING SYSTEM, TRACKING METHOD, AND SELF-TRACKING TRACKER
2y 5m to grant Granted Mar 31, 2026
18/463,406
Patent 12585324
PROCESSOR, IMAGE PROCESSING DEVICE, GLASSES-TYPE INFORMATION DISPLAY DEVICE, IMAGE PROCESSING METHOD, AND IMAGE PROCESSING PROGRAM
2y 5m to grant Granted Mar 24, 2026
18/464,019
Patent 12585130
LUMINANCE-AWARE UNINTRUSIVE RECTIFICATION OF DEPTH PERCEPTION IN EXTENDED REALITY FOR REDUCING EYE STRAIN
2y 5m to grant Granted Mar 24, 2026
18/160,569
Patent 12579571
METHOD FOR IMPROVING AESTHETIC APPEARANCE OF RETAILER GRAPHICAL USER INTERFACE
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
94%
With Interview (+15.3%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 631 resolved cases by this examiner. Grant probability derived from career allow rate.