Last updated: April 19, 2026
Application No. 18/549,827
A METHOD, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR VIDEO ENCODING AND VIDEO DECODING

Non-Final OA §103
Filed
Sep 08, 2023
Examiner
ROBINSON, TERRELL M
Art Unit
2614
Tech Center
2600 — Communications
Assignee
Nokia Technologies Oy
OA Round
3 (Non-Final)
Interview Optional

— +7.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 486 resolved cases, 2023–2026
Examiner Intelligence

ROBINSON, TERRELL M View full profile →
Grants 83% — above average
Career Allow Rate
403 granted / 486 resolved
+20.9% vs TC avg
Moderate +8% lift
Without
With
+7.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 3m
Avg Prosecution
27 currently pending
Career history
513
Total Applications
across all art units
Statute-Specific Performance

§101
7.0%
-33.0% vs TC avg
§103
54.5%
+14.5% vs TC avg
§102
11.7%
-28.3% vs TC avg
§112
17.2%
-22.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 486 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on February 3, 2026 has been entered.

Response to Arguments
Applicant’s arguments, see pages 7-9, filed February 3, 2026, with respect to the rejections of previous claims 16-35 under 35 USC § 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new grounds of rejection is made in view of Biocca (US 2008/0266323 A1).
	In regards to independent claim 1, the Gladkov reference was previously cited as it discloses a distributed, pluggable architecture for an artificial reality (AR) system that enables concurrent execution and collaborative scene rendering for multiple artificial reality applications (see abstract). The MPEG reference was previously cited as it discloses use of extensions to existing scene description formats in order to support MPEG media, in particular immersive media (see 1 “Scope” section, page 1).
	In regards to the applicants arguments on page 8 regarding the Gladkov and MPEG references not disclosing the amended claim language In regards to the applicants arguments on page 7 regarding the Gladkov reference not teaching the amended language “determining dependency information for the objects, wherein the dependency information indicates an external factor on which the objects are dependent on for transformation during a runtime process, the external factor relating to a viewer of the three-dimensional media content”, the Examiner respectfully disagrees as the previously cited Gladkov reference disclosed that during operation, an artificial reality system 10 performs object recognition within image data captured by image capture devices 138 of HMD 112 to identify hand 132, including optionally identifying individual fingers or the thumb, and/or all or portions of arm 134 of user 110…elements which have been interpreted as external factors relating to a viewer of the 3D content regarding a viewer’s position, hand position, viewer’s rotation, viewer’s hand rotation. Next, the reference discloses that shown in FIG. 1B, artificial reality system 20 represents a multi-user environment in which a plurality of artificial reality applications executing on console 106 and/or HMDs 112 are concurrently running and displayed on a common rendered scene presented to each of users 110A-110C (collectively, “users 110”) based on a current viewing perspective of a corresponding frame of reference for the respective user further interpreted as another element which acknowledges the viewer scene interaction or external factor regarding a viewer's viewport (see paragraphs [0033] and [0051]) as further detailed in the rejections of the office action below.
In addition, the previously cited MPEG reference disclosed  that any glTF object (i.e. objects of a scene) can have an optional extensions property that lists the extensions that are used by that object. Furthermore, the MPEG details various MPEG media extensions and definitions such as “controls” that should be displayed and “alternatives” that allows for items indicating alternatives of the same media in which the client could select items depending on client capabilities, and thus the use of these extensions by a user (i.e. the user being an external factor) to control media items and active processing for updating scene descriptions was interpreted as the dependency information indicates an external factor on which an objects are dependent on for transformation during runtime process (see MPEG, 4.2 “Gitf 2.0 Extension mechanisms” section page 3 and  5.2.1.1. “Semantics” section, pages 8-9) as further detailed in the rejections of the office action below.
With respect to the amended limitation “wherein the external factor comprises marker information”, the Examiner agrees that Gladkov and MPEG do not disclose this limitation, however the Biocca reference has now been cited for the independent claims as it details an augmented reality user interaction system that includes a wearable computer equipped with a camera to detect one or more fiducial markers worn by a user (see abstract). Biocca discloses that in the illustrative embodiment depicted in FIG. 1, a menu system is located on the hand. A temporary, stick-on tattoo bearing a fiducial marker 102 is placed on the back of the palm or inside the palm. Another fiducial marker 100 is attached to a ring to detect the location of the other hand as an interaction tool. Virtual menus and objects, such as animations 108, scales 104, and models 106 can be displayed to the user based on detected position and orientation of fiducial marker 102 as part of user interface as this is interpreted as demonstrating a relationship between external factors regarding a user of an AR system and marker information for interacting with the scene (see paragraph [0023]) as further detailed in rejections of the office action below. The Examiner suggests incorporation of features that more clearly convey the technical effect regarding a creator being enabled to generate these object dependencies even when they are unavailable as argued by applicant on page 8 and expressed in features of the objected claims listed below to overcome the current rejections of record.
In regards to independent claims 22 and 30, these claims recite limitations similar in scope to that of claim 16, and therefore remain rejected under the same rationale as provided above and further detailed in the rejections of the office action below.
In regards to dependent claims 17, 21, 23, 24, 28, 29, 31, and 32, these claims depend from the rejected base claims 16, 22, and 30, and therefore remain rejected under the same rationale as provided above and further detailed in the rejections of the office action below.

Allowable Subject Matter
Claims 18-20, 25-27, and 33-35 are objected to as being dependent upon a rejected base claim, but would be allowable if the claims are incorporated into the corresponding independent claims including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter:  
In regards to dependent claim 18, none of the cited prior art alone or in combination provides motivation to teach “wherein the external factor is unavailable during the creation of the scene structure” as the references only teach functions for scene description improvements in addition to applications for concurrent collaborative scene rendering within artificial reality environments for various objects, however the references fail to explicitly disclose functions which allow for enabling a content producer to generate dependencies for objects from external factors not available at the initial point of creation which differs from scene structure standards provided in the prior art, in conjunction with the remaining features of claim 16 with which it depends for the purpose of providing scene description for media content.
In addition, there is no teaching, suggestion, or motivation found in the current references and none that can be inferred from the examiner’s own knowledge with respect to the current limitation.
In regards to dependent claims 25 and 33, these claims recite limitations similar in scope to that of claim 18, and thus are objected to based on the same rationale as provided above.
In regards to dependent claims 19, 20, 26, 27, 34, and 35, these claims depend from objected to base claims, and thus are objected to based on the same rationale as provided above.
As allowable subject matter has been indicated, applicant's reply must either comply with all formal requirements or specifically traverse each requirement not complied with.  See 37 CFR 1.111(b) and MPEP § 707.07(a).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 16, 17, 21-24, and 28-32 are rejected under 35 U.S.C. 103 as being unpatentable over Gladkov (US 2021/0090315 A1, hereinafter referenced “Gladkov”) in view of MPEG (2021 “Potential improvement on ISO/IEC 23090-14 Scene Description for MPEG Media”, hereinafter referenced “MPEG”) in further view of Biocca (US 2008/0266323 A1, hereinafter referenced “Biocca”).

1-15. (Canceled)  
 
In regards to claim 16. (Currently Amended) Gladkov discloses a method (Gladkov, Abstract), comprising: 
-creating a scene structure for a three-dimensional media content, wherein the scene structure comprises three-dimensional data for objects of the three-dimensional media content (Gladkov, para [0034]; Reference discloses as further described below, concurrent application engine 107 includes a centralized scene controller (referred to as a “shell”) that presents a client interface (e.g., application programming interface (API)) by which the artificial reality applications register with the shell and communicate modeling information of objects of artificial reality applications (i.e. created scene structure for 3D media content regarding modelling info). The centralized scene controller aggregates the modeling information from each of the artificial reality applications, positions the respective objects within a common 3D scene, and renders the 3D visualization of the objects (i.e. AR objects represent three-dimensional data for objects of the three-dimensional media content) to the user such that the artificial reality applications are concurrently running and displayed on the common scene); 

-the external factor relating to a viewer of the three-dimensional media content (Gladkov, para [0033] and [0051]; Reference at [0033] discloses during operation, artificial reality system 10 performs object recognition within image data captured by image capture devices 138 of HMD 112 to identify hand 132, including optionally identifying individual fingers or the thumb, and/or all or portions of arm 134 of user 110. Further, artificial reality system 10 tracks the position, orientation, and configuration of hand 132 (optionally including particular digits of the hand), and/or portions of arm 134 over a sliding window of time) (i.e. viewer’s position, hand position, viewer’s rotation, viewer’s hand rotation). Para [0051] discloses shown in FIG. 1B, artificial reality system 20 represents a multi-user environment in which a plurality of artificial reality applications executing on console 106 and/or HMDs 112 are concurrently running and displayed on a common rendered scene presented to each of users 110A-110C (collectively, “users 110”) based on a current viewing perspective of a corresponding frame of reference for the respective user (i.e. viewer's viewport). The elements relating to an external user’s interaction with the media content which causes a change to the scene based on the user such as viewing perspective interpreted as external factor relating to a viewer of the three-dimensional media content );

-

Gladkov does not explicitly disclose but MPEG teaches
-determining dependency information for the objects (MPEG, Fig. 1; Reference illustrates the scene structure which guides the dependency information for objects), 
-wherein the dependency information indicates an external factor on which the objects are dependent on for transformation during runtime process (MPEG, 4.2 “Gitf 2.0 Extension mechanisms” section page 3 and  5.2.1.1. “Semantics” section, pages 8-9; Reference at page 3 discloses any glTF object can have an optional extensions property that lists the extensions that are used by that object. Similar to Javascript for HTML documents, an active processing may be supported in order to update a
glTF scene description. This allows to update the description object model in an asynchronous manner (based on events such as interactivity or server events) as well as in a synchronous manner with a media source (i.e. during a runtime process). Pages 8-9 disclose the MPEG media extensions and definitions such as controls that should be displayed and alternatives that allows for items indicating alternatives of the same media in which the client could select items depending on client capabilities, and thus the use of these extensions by a user to control media items disclosed are interpreted dependency information indicates an external factor on which the objects are dependent on for transformation during runtime process),
-storing a scene description defining the objects and their dependency information into a bitstream structure (MPEG, Fig.1; Reference illustrates the gITF 2.0 Scene structure and discloses in addition to the extensions, which provide a tight integration of MPEG media with the Scene Description, the interface between the Presentation Engine and the Media Retrieval Engine is defined. Finally, a processing model as well as conformance and validation definitions of scene descriptions according to this specification are provided (i.e. storing a scene description defining the objects and their dependency information into a bitstream structure)); 
-and transferring the scene description to a renderer (MPEG, 4.2 “General Architecture” section page 4; Reference discloses the scene description is consumed by a Presentation Engine to render a 30 scene to the viewer),

Gladkov and MPEG does not explicitly disclose but Biocca teaches
-wherein the external factor comprises marker information (Biocca, para [0023]; Reference discloses in the illustrative embodiment depicted in FIG. 1, a menu system is located on the hand. A temporary, stick-on tattoo bearing a fiducial marker 102 is placed on the back of the palm or inside the palm. Another fiducial marker 100 is attached to a ring to detect the location of the other hand as an interaction tool. Virtual menus and objects, such as animations 108, scales 104, and models 106 can be displayed to the user based on detected position and orientation of fiducial marker 102 as part of user interface).
Gladkov and MPEG are combinable because they are in the same field of endeavor regarding scene construction. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the artificial reality 3D scene rendering system of Gladkov to include the scene description features of MPEG in order to provide the user with system for concurrent execution and collaborative scene rendering for multiple artificial reality applications as taught by Gladkov while incorporating the scene description features of MPEG in order to provide improvements regarding extensions to existing scene description formats, applicable to immersive media applications such as those taught in Gladkov. 
Gladkov and Biocca are also combinable because they are in the same field of endeavor regarding virtual rendering. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the artificial reality 3D scene rendering system of Gladkov, in view of the scene description features of MPEG, to include the AR user interaction features of Biocca in order to provide the user with system for concurrent execution and collaborative scene rendering for multiple artificial reality applications as taught by Gladkov while incorporating the scene description features of MPEG in order to provide improvements regarding extensions to existing scene description formats. Further incorporating the AR user interaction features of Biocca allows for use of an augmented reality user interaction system that includes a wearable computer to detect one or more fiducial markers worn by a user to extract a position and orientation of the marker in an image, and superimpose on the image a visual representation of a user interface component directly on or near the user based on the position and orientation to add virtual elements exactly as perceived by the user for more accurate rendering, applicable to virtual formatting and rendering systems such as those taught in Gladkov and MPEG. 

In regards to claim 17. (Previously Presented) Gladkov in view of MPEG in further view of Biocca teach the method according to claim 16.
Gladkov further discloses
-wherein the objects are represented as a node hierarchy (Gladkov, para [0042]; Reference discloses for example, application developers may specify a scene graph including objects (referred to as “nodes” in a scene graph), modeling properties of the nodes, and relationships (e.g., spatial and logical) between the nodes of a graphical scene. A scene graph may be a general data structure, such as a graph or tree structure, with a parent/child hierarchy).  

In regards to claim 21. (Previously Presented) Gladkov in view of MPEG in further view of Biocca teach the method according to claim 16.
Gladkov further discloses
-wherein the bitstream structure is according to graphics language transmission format (glTF) (Gladkov, para [0092]; Reference discloses the client interface and shell communicate using a serialization format protocol that defines a set of constructs, such as textures, meshes, nodes, and other abstractions for encoding objects. In some examples, the protocol is based on an extended GL Transmission Format (glTF) that is extended with 2D and animation extensions (e.g., animation can now control any plausibly-animatable property rather than just node transforms)). 

In regards to claim 22. (Currently Amended) Gladkov discloses an apparatus comprising at least one processor and a memory including instructions that, when executed with the at least one processor (Gladkov, para [0082]), cause the apparatus to perform at least the following:
-creating a scene structure for a three-dimensional media content, wherein the scene structure comprises three-dimensional data for objects of the three-dimensional media content (Gladkov, para [0034]; Reference discloses as further described below, concurrent application engine 107 includes a centralized scene controller (referred to as a “shell”) that presents a client interface (e.g., application programming interface (API)) by which the artificial reality applications register with the shell and communicate modeling information of objects of artificial reality applications (i.e. created scene structure for 3D media content regarding modelling info). The centralized scene controller aggregates the modeling information from each of the artificial reality applications, positions the respective objects within a common 3D scene, and renders the 3D visualization of the objects (i.e. AR objects represent three-dimensional data for objects of the three-dimensional media content) to the user such that the artificial reality applications are concurrently running and displayed on the common scene); 

-the external factor relating to a viewer of the three-dimensional media content (Gladkov, para [0033] and [0051]; Reference at [0033] discloses during operation, artificial reality system 10 performs object recognition within image data captured by image capture devices 138 of HMD 112 to identify hand 132, including optionally identifying individual fingers or the thumb, and/or all or portions of arm 134 of user 110. Further, artificial reality system 10 tracks the position, orientation, and configuration of hand 132 (optionally including particular digits of the hand), and/or portions of arm 134 over a sliding window of time) (i.e. viewer’s position, hand position, viewer’s rotation, viewer’s hand rotation). Para [0051] discloses shown in FIG. 1B, artificial reality system 20 represents a multi-user environment in which a plurality of artificial reality applications executing on console 106 and/or HMDs 112 are concurrently running and displayed on a common rendered scene presented to each of users 110A-110C (collectively, “users 110”) based on a current viewing perspective of a corresponding frame of reference for the respective user (i.e. viewer's viewport). The elements relating to an external user’s interaction with the media content which causes a change to the scene based on the user such as viewing perspective interpreted as external factor relating to a viewer of the three-dimensional media content );

-

Gladkov does not explicitly disclose but MPEG teaches
-determining dependency information for the objects (MPEG, Fig. 1; Reference illustrates the scene structure which guides the dependency information for objects), 
-wherein the dependency information indicates an external factor on which the objects are dependent on for transformation during runtime process (MPEG, 4.2 “Gitf 2.0 Extension mechanisms” section page 3 and  5.2.1.1. “Semantics” section, pages 8-9; Reference at page 3 discloses any glTF object can have an optional extensions property that lists the extensions that are used by that object. Similar to Javascript for HTML documents, an active processing may be supported in order to update a
glTF scene description. This allows to update the description object model in an asynchronous manner (based on events such as interactivity or server events) as well as in a synchronous manner with a media source (i.e. during a runtime process). Pages 8-9 disclose the MPEG media extensions and definitions such as controls that should be displayed and alternatives that allows for items indicating alternatives of the same media in which the client could select items depending on client capabilities, and thus the use of these extensions by a user to control media items disclosed are interpreted dependency information indicates an external factor on which the objects are dependent on for transformation during runtime process),
-storing a scene description defining the objects and their dependency information into a bitstream structure (MPEG, Fig.1; Reference illustrates the gITF 2.0 Scene structure and discloses in addition to the extensions, which provide a tight integration of MPEG media with the Scene Description, the interface between the Presentation Engine and the Media Retrieval Engine is defined. Finally, a processing model as well as conformance and validation definitions of scene descriptions according to this specification are provided (i.e. storing a scene description defining the objects and their dependency information into a bitstream structure)); 
-and transferring the scene description to a renderer (MPEG, 4.2 “General Architecture” section page 4; Reference discloses the scene description is consumed by a Presentation Engine to render a 30 scene to the viewer),

Gladkov and MPEG does not explicitly disclose but Biocca teaches
-wherein the external factor comprises marker information (Biocca, para [0023]; Reference discloses in the illustrative embodiment depicted in FIG. 1, a menu system is located on the hand. A temporary, stick-on tattoo bearing a fiducial marker 102 is placed on the back of the palm or inside the palm. Another fiducial marker 100 is attached to a ring to detect the location of the other hand as an interaction tool. Virtual menus and objects, such as animations 108, scales 104, and models 106 can be displayed to the user based on detected position and orientation of fiducial marker 102 as part of user interface).
Gladkov and MPEG are combinable because they are in the same field of endeavor regarding scene construction. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the artificial reality 3D scene rendering system of Gladkov to include the scene description features of MPEG in order to provide the user with system for concurrent execution and collaborative scene rendering for multiple artificial reality applications as taught by Gladkov while incorporating the scene description features of MPEG in order to provide improvements regarding extensions to existing scene description formats, applicable to immersive media applications such as those taught in Gladkov. 
Gladkov and Biocca are also combinable because they are in the same field of endeavor regarding virtual rendering. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the artificial reality 3D scene rendering system of Gladkov, in view of the scene description features of MPEG, to include the AR user interaction features of Biocca in order to provide the user with system for concurrent execution and collaborative scene rendering for multiple artificial reality applications as taught by Gladkov while incorporating the scene description features of MPEG in order to provide improvements regarding extensions to existing scene description formats. Further incorporating the AR user interaction features of Biocca allows for use of an augmented reality user interaction system that includes a wearable computer to detect one or more fiducial markers worn by a user to extract a position and orientation of the marker in an image, and superimpose on the image a visual representation of a user interface component directly on or near the user based on the position and orientation to add virtual elements exactly as perceived by the user for more accurate rendering, applicable to virtual formatting and rendering systems such as those taught in Gladkov and MPEG. 

In regards to claim 23. (Previously Presented) Gladkov in view of MPEG in further view of Biocca teach the apparatus according to claim 22.
Gladkov further discloses
-wherein the objects are represented as a node hierarchy (Gladkov, para [0042]; Reference discloses for example, application developers may specify a scene graph including objects (referred to as “nodes” in a scene graph), modeling properties of the nodes, and relationships (e.g., spatial and logical) between the nodes of a graphical scene. A scene graph may be a general data structure, such as a graph or tree structure, with a parent/child hierarchy).

In regards to claim 24. (Previously Presented) Gladkov in view of MPEG in further view of Biocca teach the apparatus according to claim 23.
Gladkov further discloses
-wherein the dependency information is inherited from parent nodes of the node hierarchy to child nodes of the node hierarchy (Gladkov, para [0042]; Reference discloses for example, application developers may specify a scene graph including objects (referred to as “nodes” in a scene graph), modeling properties of the nodes, and relationships (e.g., spatial and logical) between the nodes of a graphical scene. A scene graph may be a general data structure, such as a graph or tree structure, with a parent/child hierarchy (i.e. sharing  of info from parent to child implicit to scene graph hierarchy structure)).  

In regards to claim 28. (Previously Presented) Gladkov in view of MPEG in further view of Biocca teach the apparatus according to claim 22.
Gladkov further discloses
-wherein the bitstream structure is according to graphics language transmission format (glTF) (Gladkov, para [0092]; Reference discloses the client interface and shell communicate using a serialization format protocol that defines a set of constructs, such as textures, meshes, nodes, and other abstractions for encoding objects. In some examples, the protocol is based on an extended GL Transmission Format (glTF) that is extended with 2D and animation extensions (e.g., animation can now control any plausibly-animatable property rather than just node transforms)).  

In regards to claim 29. (Previously Presented) Gladkov in view of MPEG in further view of Biocca teach the apparatus according to claim 28.
Gladkov does not explicitly disclose but MPEG teaches
wherein the dependency information is provided through an extension mechanism of graphics language transmission format (glTF) (MPEG, Fig. 1 and “4.2. gITF 2.0 Extension Mechanisms” section, pages 3-4; Reference at Fig. 1 shows the scene hierarchies and dependencies as the description of page 3 discloses glTF 2.0 defines an extension mechanism ([glTF2.0]#specifying-extensions) that allows the base format to be extended with new capabilities. Any glTF object can have an optional extensions property that lists the extensions that are used by that object).  
Gladkov and Biocca are also combinable because they are in the same field of endeavor regarding virtual rendering. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the artificial reality 3D scene rendering system of Gladkov, in view of the scene description features of MPEG, to include the AR user interaction features of Biocca in order to provide the user with system for concurrent execution and collaborative scene rendering for multiple artificial reality applications as taught by Gladkov while incorporating the scene description features of MPEG in order to provide improvements regarding extensions to existing scene description formats. Further incorporating the AR user interaction features of Biocca allows for use of an augmented reality user interaction system that includes a wearable computer to detect one or more fiducial markers worn by a user to extract a position and orientation of the marker in an image, and superimpose on the image a visual representation of a user interface component directly on or near the user based on the position and orientation to add virtual elements exactly as perceived by the user for more accurate rendering, applicable to virtual formatting and rendering systems such as those taught in Gladkov and MPEG. 

In regards to claim 30. (Currently Amended) Gladkov discloses a non-transitory computer readable medium comprising instructions configured to, when executed with at least one processor (Gladkov, para [0133]), cause an apparatus or a system to:
-create a scene structure for a three-dimensional media content, wherein the scene structure comprises three-dimensional data for objects of the three-dimensional media content (Gladkov, para [0034]; Reference discloses as further described below, concurrent application engine 107 includes a centralized scene controller (referred to as a “shell”) that presents a client interface (e.g., application programming interface (API)) by which the artificial reality applications register with the shell and communicate modeling information of objects of artificial reality applications (i.e. created scene structure for 3D media content regarding modelling info). The centralized scene controller aggregates the modeling information from each of the artificial reality applications, positions the respective objects within a common 3D scene, and renders the 3D visualization of the objects (i.e. AR objects represent three-dimensional data for objects of the three-dimensional media content) to the user such that the artificial reality applications are concurrently running and displayed on the common scene); 

-the external factor relating to a viewer of the three-dimensional media content (Gladkov, para [0033] and [0051]; Reference at [0033] discloses during operation, artificial reality system 10 performs object recognition within image data captured by image capture devices 138 of HMD 112 to identify hand 132, including optionally identifying individual fingers or the thumb, and/or all or portions of arm 134 of user 110. Further, artificial reality system 10 tracks the position, orientation, and configuration of hand 132 (optionally including particular digits of the hand), and/or portions of arm 134 over a sliding window of time) (i.e. viewer’s position, hand position, viewer’s rotation, viewer’s hand rotation). Para [0051] discloses shown in FIG. 1B, artificial reality system 20 represents a multi-user environment in which a plurality of artificial reality applications executing on console 106 and/or HMDs 112 are concurrently running and displayed on a common rendered scene presented to each of users 110A-110C (collectively, “users 110”) based on a current viewing perspective of a corresponding frame of reference for the respective user (i.e. viewer's viewport). The elements relating to an external user’s interaction with the media content which causes a change to the scene based on the user such as viewing perspective interpreted as external factor relating to a viewer of the three-dimensional media content );

-

Gladkov does not explicitly disclose but MPEG teaches
-determine dependency information for the objects (MPEG, Fig. 1; Reference illustrates the scene structure which guides the dependency information for objects), 
-wherein the dependency information indicates an external factor on which the objects are dependent on for transformation during runtime process (MPEG, 4.2 “Gitf 2.0 Extension mechanisms” section page 3 and  5.2.1.1. “Semantics” section, pages 8-9; Reference at page 3 discloses any glTF object can have an optional extensions property that lists the extensions that are used by that object. Similar to Javascript for HTML documents, an active processing may be supported in order to update a
glTF scene description. This allows to update the description object model in an asynchronous manner (based on events such as interactivity or server events) as well as in a synchronous manner with a media source (i.e. during a runtime process). Pages 8-9 disclose the MPEG media extensions and definitions such as controls that should be displayed and alternatives that allows for items indicating alternatives of the same media in which the client could select items depending on client capabilities, and thus the use of these extensions by a user to control media items disclosed are interpreted dependency information indicates an external factor on which the objects are dependent on for transformation during runtime process),
-store a scene description defining the objects and their dependency information into a bitstream structure (MPEG, Fig.1; Reference illustrates the gITF 2.0 Scene structure and discloses in addition to the extensions, which provide a tight integration of MPEG media with the Scene Description, the interface between the Presentation Engine and the Media Retrieval Engine is defined. Finally, a processing model as well as conformance and validation definitions of scene descriptions according to this specification are provided (i.e. storing a scene description defining the objects and their dependency information into a bitstream structure)); 
-and transfer the scene description to a renderer (MPEG, 4.2 “General Architecture” section page 4; Reference discloses the scene description is consumed by a Presentation Engine to render a 30 scene to the viewer),

Gladkov and MPEG does not explicitly disclose but Biocca teaches
-wherein the external factor comprises marker information (Biocca, para [0023]; Reference discloses in the illustrative embodiment depicted in FIG. 1, a menu system is located on the hand. A temporary, stick-on tattoo bearing a fiducial marker 102 is placed on the back of the palm or inside the palm. Another fiducial marker 100 is attached to a ring to detect the location of the other hand as an interaction tool. Virtual menus and objects, such as animations 108, scales 104, and models 106 can be displayed to the user based on detected position and orientation of fiducial marker 102 as part of user interface).
Gladkov and MPEG are combinable because they are in the same field of endeavor regarding scene construction. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the artificial reality 3D scene rendering system of Gladkov to include the scene description features of MPEG in order to provide the user with system for concurrent execution and collaborative scene rendering for multiple artificial reality applications as taught by Gladkov while incorporating the scene description features of MPEG in order to provide improvements regarding extensions to existing scene description formats, applicable to immersive media applications such as those taught in Gladkov. 
Gladkov and Biocca are also combinable because they are in the same field of endeavor regarding virtual rendering. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the artificial reality 3D scene rendering system of Gladkov, in view of the scene description features of MPEG, to include the AR user interaction features of Biocca in order to provide the user with system for concurrent execution and collaborative scene rendering for multiple artificial reality applications as taught by Gladkov while incorporating the scene description features of MPEG in order to provide improvements regarding extensions to existing scene description formats. Further incorporating the AR user interaction features of Biocca allows for use of an augmented reality user interaction system that includes a wearable computer to detect one or more fiducial markers worn by a user to extract a position and orientation of the marker in an image, and superimpose on the image a visual representation of a user interface component directly on or near the user based on the position and orientation to add virtual elements exactly as perceived by the user for more accurate rendering, applicable to virtual formatting and rendering systems such as those taught in Gladkov and MPEG. 

In regards to claim 31. (Previously Presented) Gladkov in view of MPEG in further view of Biocca teach the non-transitory computer readable medium according to claim 30.
Gladkov further discloses
-wherein the objects are represented as node hierarchy (Gladkov, para [0042]; Reference discloses  for example, application developers may specify a scene graph including objects (referred to as “nodes” in a scene graph), modeling properties of the nodes, and relationships (e.g., spatial and logical) between the nodes of a graphical scene. A scene graph may be a general data structure, such as a graph or tree structure, with a parent/child hierarchy).  

In regards to claim 32. (Currently Amended) Gladkov in view of MPEG in further view of Biocca teach the non-transitory computer readable medium according to claim 31.
Gladkov further discloses
-wherein the dependency information is inherited from parent nodes of the node hierarchy to child nodes of the node hierarchy (Gladkov, para [0042]; Reference discloses For example, application developers may specify a scene graph including objects (referred to as “nodes” in a scene graph), modeling properties of the nodes, and relationships (e.g., spatial and logical) between the nodes of a graphical scene. A scene graph may be a general data structure, such as a graph or tree structure, with a parent/child hierarchy (i.e. sharing  of info from parent to child implicit to scene graph hierarchy structure)).  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: See the Notice of References Cited (PTO-892).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to TERRELL M ROBINSON whose telephone number is (571)270-3526. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KENT CHANG can be reached at 571-272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TERRELL M ROBINSON/Primary Examiner, Art Unit 2614
Read full office action
Prosecution Timeline

Sep 08, 2023
Application Filed
May 03, 2025
Non-Final Rejection — §103
Jul 02, 2025
Response Filed
Oct 01, 2025
Final Rejection — §103
Feb 03, 2026
Request for Continued Examination
Feb 17, 2026
Response after Non-Final Action
Feb 20, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/551,675
Patent 12602852
DYNAMIC GRAPHIC EDITING METHOD AND DEVICE
2y 5m to grant Granted Apr 14, 2026
18/427,878
Patent 12572196
MANAGING AN INDUSTRIAL ENVIRONMENT HAVING MACHINERY OPERATED BY REMOTE WORKERS AND PHYSICALLY PRESENT WORKERS
2y 5m to grant Granted Mar 10, 2026
18/617,535
Patent 12573124
PROGRESSIVE REAL-TIME DIFFUSION OF LAYERED CONTENT FILES WITH ANIMATED FEATURES
2y 5m to grant Granted Mar 10, 2026
18/691,461
Patent 12573111
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SYSTEM, AND INFORMATION PROCESSING METHOD FOR APPROPRIATE DISPLAY OF PRESENTER AND PRESENTATION ITEM
2y 5m to grant Granted Mar 10, 2026
18/411,559
Patent 12561904
IMAGE PROCESSING DEVICE AND IMAGE PROCESSING METHOD FOR CORRECTING COMPUTER GRAPHICS IMAGE IN MIXED REALITY
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
90%
With Interview (+7.5%)
2y 3m
Median Time to Grant
High
PTA Risk
Based on 486 resolved cases by this examiner. Grant probability derived from career allow rate.
A METHOD, AN APPARATUS AND A COMPUTER PROGRAM PRODUCT FOR VIDEO ENCODING AND VIDEO DECODING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email