Last updated: April 19, 2026
Application No. 18/496,987
CONTEXTUAL ADAPTATION OF VIRTUAL OBJECTS IN VOLUMETRIC VIDEO

Non-Final OA §103§112
Filed
Oct 30, 2023
Examiner
AHMAD, NAUMAN UDDIN
Art Unit
2611
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
3 (Non-Final)
Interview Optional

— +19.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 36 resolved cases, 2023–2026
Examiner Intelligence

AHMAD, NAUMAN UDDIN View full profile →
Grants 78% — above average
Career Allow Rate
28 granted / 36 resolved
+15.8% vs TC avg
Strong +20% interview lift
Without
With
+19.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
31 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
4.8%
-35.2% vs TC avg
§103
68.4%
+28.4% vs TC avg
§102
4.1%
-35.9% vs TC avg
§112
15.8%
-24.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 36 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This Office Action is in response to Applicant’s RCE amendment filed 01/30/2026 which has been entered and made of record. Claims 1-5, 7-12, and 14-19 have been amended. No claim has been newly added or cancelled. Claims 1-20 are pending in the application. 
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 8 and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument (due to applicant’s arguments directed to newly amend limitation(s) which is addressed by new prior art presented in this Office Action).
The arguments regarding dependent claims for the virtue of their dependency are moot
because the independent claims are not allowable.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 4, 7, 11, 14, and 18 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 4, 11, and 18 recites the limitation "…and a prompt" in lines 5-6.  There is insufficient antecedent basis for this limitation in the claim. This is because it’s unclear whether this prompt is the same instance of “a prompt” from claims 1, 8 and 15 (respectively) or a new instance being recited. 
Claims 7 and 14 rejected under 35 U.S.C. 112(b) since they depend on a claim that is rejected under rejected under 35 U.S.C. 112(b).
Note. Most likely these claims depend on some dependent claim or are missing elements.
In order to fix this issue, dependency should be reviewed and any first instance of an element
should be made clear that it’s a first instance and should be referred to as “a” or “an” instead of
“the”, and if multiple instances exist, further instances should be further distinguished for example by saying “first”, “second”, and/or “third” etc.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3, 5, 8, 10, 12, 15, 17, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Holland et al. (U.S. Patent Application Publication No. 2025/0054230), hereinafter referenced as Holland, in view of Dal Mutto et al. (U.S. Patent Application Publication No. 2019/0108396), hereinafter referenced as Dal, Mehr et al. (U.S. Patent Application Publication No. 2022/0245431), hereinafter referenced as Mehr and McIntyre-Kirwin (U.S. Patent Application Publication No. 2024/0303891), hereinafter referenced as McIntyre-Kirwin.
Regarding claim 1, Holland teaches a computer-implemented method for contextual adaptation of virtual objects in a volumetric interface, the computer-implemented method comprising: (Abstract teaches "a method can include obtaining, by a conditioning engine, a baseline model for a virtual representative; obtaining, by the conditioning engine, one or more conditioning inputs configured to condition an action in one or more multi-user experiences of the virtual representative"); the conditioning here is the contextual adaptation and its done for virtual objects/representatives in a 3D volumetric/virtual world; identifying, by a processor, a corpus of volumetric video content containing a first volumetric object, (paragraph 47 teaches "knowledge base 210 can include a plurality of sources of training data for training one or more virtual representative models"…"data collected from previous experiences "); knowledge base acts as corpus of volumetric video content since applicant's disclosure paragraph 20 mentions "the knowledge corpus is prior or historical knowledge, from the volumetric video" and as shown in fig. 2 of Holland, knowledge base 210 also includes personal identity data 212 (of volumetric object) and previous experiences 216, in addition, the volumetric video is due to multiple frames of the virtual representative since they are used in virtual session for 3D collaborative virtual environment as is shown from the claim 3 explanation in this action; wherein the corpus of volumetric video content is associated with a human avatar which is comprised of a plurality of human spoken contents and a plurality of human body language data (paragraph 47 teaches "the knowledge base data can include, without limitation, personal identity data 212 (e.g., name, age, nationality, languages spoken, favorite color, voice profile, facial shape and/or facial expression features, height, weight, eye color, any other data related to a specific individual, and/or any combination thereof),"); this describes association of attributes of a human avatar, the languages spoken and voice profile show human spoken contents, and facial expressions show human body language; in response to the corpus of volumetric video content being associated with the human avatar, extracting, by the processor, a plurality of video frames from the corpus of volumetric; (paragraph 47 teaches “conditioning data in the knowledge base 210 can be organized in categories. For example, the knowledge base data can include, without limitation, personal identity data 212 (e.g., name, age, nationality, languages spoken, favorite color, voice profile, facial shape and/or facial expression features, height, weight, eye color, any other data related to a specific individual, and/or any combination thereof” and paragraph 50 teaches “first conditioned model 224 can be trained with personal identity data 212, a specific category of work emails 215, text messages, photographs, videos”); this shows knowledge base/corpus of volumetric video content being associated with human avatar(traits thereof) and subsequently a model being trained with videos which must be extracted to do so; generating a response comprising a vocal human response and a visual facial change to the additional human avatar (Holland, paragraph 37 teaches "the sequence of words generated by audio understanding 132 can be an input to NLP 110, which can utilize NLU 112 to interpret the query and NLG 114 to generate an appropriate response"…" audio generation 134 can convert the text response output from the NLG 114 into an audio response (e.g., a synthesized voice). In some implementations, image generation 124 can be used to generate an avatar (e.g., a 2D model, a 3D model, or the like) that can be displayed and coordinated with the output of the audio response generated by audio generation 134."); the appropriate response here is a response to the prompt, the synthesized voice shows a vocal human response, and the coordination with audio output requires visual facial change.
However, Holland fails to explicitly teach extracting, by the processor, a plurality of video frames from the corpus of volumetric video content, wherein the volumetric video content contains a plurality of viewing angles of the first volumetric object, wherein extracting is based on applying a convolutional neural network to the corpus of volumetric video content; training, by the processor, a generative adversarial network to adapt a volumetric object, based at least in part on the plurality of video frames from the corpus of volumetric video content; receiving a prompt from a user viewing a video comprising an additional human avatar; selecting a knowledge corpus based on human spoken contents and human body language data from the additional human avatar; and replacing, by the processor, a frame within the video with a generated frame comprising point clouds adapting the additional human avatar based on the response and the generative adversarial network.
However, Dal explicitly teaches extracting, by the processor, a plurality of video frames from the corpus of volumetric video content, (Dal, fig. 12 rendering step teaches video frames being extracted from volumetric content); wherein the volumetric video content contains a plurality of viewing angles of the first volumetric object, (Dal, paragraph 132 teaches "Techniques for computing a descriptor of a 3-D model are based on a forward evaluation of a Multi-View Convolutional Neural Network (MV-CNN) or by a Volumetric Convolutional Neural Network (V-CNN)"); multi-view CNN would provide plurality of viewing angles for the volumetric object/virtual representative in 3D world; wherein extracting is based on applying a convolutional neural network to the corpus of volumetric video content (Dal, paragraph 142 and figs. 12-13 teach "FIGS. 12 and 13 are illustration of max-pooling according to one embodiment of the present invention. As shown in FIG. 13, each of the n views is supplied to the first stage CNN.sub.1"); as shown in fig. 12, the views/frames are extracted due to the need of being used by CNN (convolutional neural network); training, by the processor, a generative adversarial network to adapt a volumetric object, based at least in part on the plurality of video frames from the corpus of volumetric video content (Dal, paragraph 208 teaches "generative models (such as Generative Adversarial Networks or GANs) may be used to detect abnormal objects"…"the two components are trained in an adversarial way—the generator tries to fool the discriminator and the discriminator tries to catch the samples from the generator"); this shows training a GAN to detect abnormal objects, which would be done using the extracted video frames since those are what the abnormal objects would appear in. Dal is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of extracting volumetric multi-view frames based on CNN and training a GAN. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Holland's invention with the frame extraction using CNN and GAN techniques of Dal so adding additional items for identification does not necessarily require a full retraining of a neural network, and techniques such as transfer learning and fine tuning can be used to improve the quality of the training (Dal, paragraph 149). This ensures more efficiency since full retraining won't be required if certain features change between extracted frames.
However, the combination of Holland and Dal fails to teach receiving a prompt from a user viewing a video comprising an additional human avatar; selecting a knowledge corpus based on human spoken contents and human body language data from the additional human avatar; and replacing, by the processor, a frame within the video with a generated frame comprising point clouds adapting the additional human avatar based on the response and the generative adversarial network.
However, Mehr teaches and replacing, by the processor, a frame within the video with a generated frame comprising point clouds (Mehr, paragraph 46
teaches “Any discrete geometrical representation herein may for example be a 3D point cloud… The 3D reconstruction process may comprise providing the real object, providing one or more physical sensors each configured for acquiring a respective physical signal, and acquiring one or more respective physical signals by operating the one or more physical sensors on the real object (i.e. scanning the real object with each sensor). The 3D reconstruction may then automatically determine a 3D point cloud…the one or more sensors may comprise a plurality of (e.g. RGB, and/or image or video) cameras”; this shows that the scanned/imaged from the sensor content (volumetric video when viewed in combination), is then generated/replaced into frames having point clouds due to the 3D reconstruction, also, the real object provided in such a process would be the one later mentioned that is adapted; adapting the additional human avatar based on the response and the generative adversarial network  (Mehr, paragraph 6 teaches "The generative neural network is configured for generating a deformation basis of an input 3D modeled object." and fig. 1 shows deformation being applied to the 3D modeled object); the deformation applied to the object is the adapting of the volumetric object since it changes it and this is based on a trained GAN. Mehr is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of changing a 3D/volumetric modeled object based on GAN training. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Holland and Dal with the adapting volumetric objects based on GAN techniques of Mehr to output deformation basis allowing to deform the input objects into realistic 3D modeled objects, as the adversarial training increases the level of exigency/requirement for assessing/evaluating realism while training the generative neural network to improve the realism of its output (Mehr, paragraph 36). This means more realistic outputs leading to a better user experience.
However, the combination of Holland, Dal and Mehr fails to teach receiving a prompt from a user viewing a video comprising an additional human avatar; selecting a knowledge corpus based on human spoken contents and human body language data from the additional human avatar; adapting the additional human avatar based on the response. 
However, McIntyre-Kirwin teaches receiving a prompt from a user viewing a video comprising an additional human avatar (McIntyre-Kirwin, paragraph 30 teaches “Virtual characters (or “avatars”) may facilitate communication with a user via a user device”, paragraph 98 teaches “People, and virtual characters, can actuate communication in many forms, such as via speech 608 and/or body language 610 …virtual character may simulate animacy, and the virtual characters may respond in a semantically coherent and even interesting way. The virtual character can provide any of synthesized speech 612 and/or animated body language” and figs. 3 and 6 visualize this; this shows virtual characters (plural meaning additional human avatar) are viewed by user who sends a prompt and then the prompt is received to provide the correct output; selecting a knowledge corpus based on human spoken contents and human body language data from the additional human avatar (McIntyre-Kirwin, paragraph 60 teaches “input and inspect multiple inputs to understand and generate more accurate animations/speech/outputs represented in actions by the virtual character”, paragraph 64 teaches “multi-modal model may generate an output for the virtual character that includes both animations and speech. The possible animations may be stored in a library of potential animations for a character. This may be associated with a similar library of pre-recorded audio files that correspond to each animation”, and fig. 3 shows output being based on body language such as facial expression 304 and speech to text 302; similar library of pre-recorded audio files shows that library acts as knowledge corpus and this is corresponding to the output which is based on human spoken contents and body language data of additional avatar(s); adapting the additional human avatar based on the response (McIntyre-Kirwin, fig. 3 and 6 shows output being based on body language such as facial expression 304/610 and speech to text 302/610); this shows the output/additional human avatar is adapted to and based on the response comprising visual facial changes and vocal human response. McIntyre-Kirwin is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of emotionally engaging responses by the virtual character/avatar due to user input/prompt. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Holland, Dal and Mehr with the prompt and knowledge corpus selection techniques of McIntyre-Kirwin to ensure accuracy is increased by jointly analyzing inputs from similar users and by continually retraining the internal models based on real-world data (McIntyre-Kirwin, abstract). This would be done by selecting a knowledge corpus with similar criteria and utilizing feedback from user prompt(s).
Regarding claim 3, the combination of Holland, Dal, Mehr and McIntyre-Kirwin teaches further comprising: identifying, by the processor, a plurality of volumetric objects in the volumetric video content, environment, (Holland, paragraph 27 teaches "a virtual session provided by an XR system may include a 3D collaborative virtual environment for a group of users. The users may interact with one another via virtual avatars of the users in the virtual environment"); 3D environment with multiple virtual avatars indicates plurality of volumetric objects in volumetric/3D session/video environment; wherein the plurality of volumetric objects is comprised of the human avatar, (Holland, paragraph 28 teaches "an avatar representing a user may mimic an appearance, movement, mannerisms, and/or other features of the user. A virtual avatar may be generated/animated in real-time based on captured input from users devices. Avatars may range from basic synthetic 3D representations to more realistic representations of the user"); avatar that mimics user appearance, mannerisms and provides realistic representation of user indicates a human avatar; and wherein the volumetric video content is a virtual reality environment (Holland, paragraph 22 teaches "XR systems can include virtual reality (VR) systems facilitating interactions with VR environments"); and selecting, by the processor, a first volumetric object from the plurality of volumetric objects, wherein the first volumetric object is the human avatar, (Holland, paragraph 42 teaches "can select which virtual representative will attend a particular multi-user experience from a collection of virtual representatives included in a representative bank"..."a virtual representative with a multi-modal baseline model may be selected for a gathering that requires the use of 3D avatars for multi-user experience participants"); selected virtual representative can be considered first volumetric object and it's a human avatar as described above in paragraph 28 of Holland.; wherein a human avatar is a representation of a previously recorded human in the virtual reality environment (Holland, paragraph 42 teaches "can select the virtual representative based on one or more experience parameters. For example, the systems and techniques may select a particular virtual representative based on an experience parameter indicating that the particular virtual representative attended one or more previous meetings in a meeting series"); selecting virtual representative that attended one or more previous meetings would mean a previously recorded human avatar/virtual representative.
Regarding claim 5, the combination of Holland, Dal, Mehr and McIntyre-Kirwin teaches further comprising: monitoring, by the processor, a body language of the user in response to adapting the first volumetric object, based on a virtual reality system (Holland, paragraph 47 teaches "knowledge base data can include,"…" facial expressions, any other data source, and/or any combination thereof. In some cases, the virtual representative conditioning system 200 can obtain data from personal interactions by a user and include personal interaction data in the knowledge base 210."); personal interaction data obtained shows monitoring, facial expression shows the user's body language and this is in response to the adapting of the first volumetric object/virtual representative since its based off a previous interaction after adaptation; determining, by the processor, a corresponding output to the body language, based at least in part on the generative adversarial network (Holland, paragraph 48 teaches "based on one or more baseline training data sets, the baseline models 222 can learn to understand and respond to queries to be able to understand and/or respond to queries"..." the conditioned models 224 can include, without limitation, generative models 102, NLP 110"); being able to understand and respond to queries shows a corresponding output to the monitored user's body language since the queries can include facial expressions due to both queries and facial expressions being part of knowledge base as personal interactions; and adapting, by the processor, the first volumetric object based on the corresponding output (Holland, paragraph 67 teaches "In some examples, generating the avatar representing the one or more physical characteristics of the individual includes generating a 3D model based on the one or more physical characteristics of the individual"); generation of the avatar in this case is adapting the first volumetric object since it's still from the baseline model and physical characteristics would include and be based on the facial expressions.
Regarding claim 8, the system claim 8 recites similar limitations as method claim 1, and thus is rejected under similar rationale. In addition, Holland, fig. 6 teaches memory 615, processor 610 and paragraph 79 teaches "process described herein may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executing collectively on one or more processors, by hardware, or combinations thereof."
Regarding claim 10, the system claim 10 recites similar limitations as method claim 3, and thus is rejected under similar rationale.
Regarding claim 12, the system claim 12 recites similar limitations as method claim 5, and thus is rejected under similar rationale.
Regarding claim 15, the computer program product claim 15 recites similar limitations as system claim 8, and thus is rejected under similar rationale. 
Regarding claim 17, the computer program product claim 17 recites similar limitations as system claim 10, and thus is rejected under similar rationale. 
Regarding claim 19, the computer program product claim 19 recites similar limitations as system claim 12, and thus is rejected under similar rationale. 

Claim(s) 2, 9 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Holland, Dal, Mehr and McIntyre-Kirwin as applied to claim 1 above, and further in view of Rakshit et al. (U.S. Patent Application Publication No. 20200151934), hereinafter referenced as Rakshit.
Regarding claim 2, the combination of Holland, Dal, Mehr and McIntyre-Kirwin fails explicitly to teach wherein the vocal human response has an associated emotional response
However, Rakshit teaches wherein the vocal human response has an associated emotional response (Rakshit, paragraph 34 teaches "An emotional response can include, but need not necessarily include, vocal response,"); this shows the vocal response is associated to an emotional one; Rakshit is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of deriving avatar expressions based off emotion in VR settings. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Holland, Dal, Mehr and McIntyre-Kirwin with the avatar emotion expression techniques of Rakshit to provide visual, auditory, and other forms of sensory feedback (Rakshit, paragraph 2). Diverse feedback and responses from the virtual objects would lead to a better user experience and more engagement.
Regarding claim 9, the system claim 9 recites similar limitations as method claim 2, and thus is rejected under similar rationale.
Regarding claim 16, the computer program product claim 16 recites similar limitations as system claim 9, and thus is rejected under similar rationale. 

Claim(s) 6, 13 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Holland, Dal, Mehr and McIntyre-Kirwin as applied to claim 1 above, and further in view of Miller et al. (U.S. Patent Application Publication No. 2019/0188895), hereinafter referenced as Miller.
Regarding claim 6, the combination of Holland, Dal, Mehr and McIntyre-Kirwin fails to teach wherein the corpus of volumetric video content is associated with a classroom setting and wherein the first volumetric object is an instructor within the classroom setting.
However, Miller teaches wherein the corpus of volumetric video content is associated with a classroom setting and wherein the first volumetric object is an instructor within the classroom setting (Miller, paragaph 52 teaches "The avatar can also provide a way for users to interact with each other and do things together in a shared virtual environment. For example, a student attending an online class can perceive other students' or teachers' avatars in a virtual classroom and can interact with the avatars of the other students or the teacher."); this shows data in the corpus of Holland can be associated with experiences of classrooms as well and the virtual representative/avatar/volumetric object can be a teacher/instructor in the classroom setting. Miller is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of contextual based rendering of avatars/virtual objects in specific settings. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Holland, Dal, Mehr and McIntyre-Kirwin with the instructor as volumetric object and classroom setting techniques of Miller to perceive an avatar of another user in the viewer's environment and thereby create a tangible sense of the other user's presence in the viewer's environment (Miller, paragraph 52). This would create a more engaging and realistic experience leading to a higher user engagement.
Regarding claim 13, the system claim 13 recites similar limitations as method claim 6, and thus is rejected under similar rationale.
Regarding claim 20, the computer program product claim 20 recites similar limitations as system claim 13, and thus is rejected under similar rationale. 

Claim(s) 4, 7, 11, 14 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Holland, Dal, Mehr and McIntyre-Kirwin as applied to claim 1 above, and further in view of Gupta et al. (U.S. Patent Application Publication No. 2023/0418364 A1), hereinafter referenced as Gupta and Miller.
Regarding claim 4, the combination of Holland, Dal, Mehr and McIntyre-Kirwin teaches further comprising: identifying, by the processor, a position of an eye of the user (Holland, paragraph 72 teaches "the computing device (or component thereof) can obtain user feedback"…"the user feedback includes one or more of highlights, lowlights, survey responses, engagement metrics, eye-tracking, or emotion detection"); eye-tracking indicates position of eye of user; wherein the virtual reality device is a virtual reality headset (Holland, paragraph 76 teaches "The computing device can include any suitable device, such as"..."e.g., a VR headset, an AR headset"); activating, by the processor, the first volumetric object, based at least in part on the position of the eye and a prompt (Holland, paragraph 41 teaches "can select data from the knowledge base for conditioning the virtual representative automatically"... "For example, the individual may provide one or more text-based and/or audio prompts directing the behavior of the virtual representative. In some examples, the individual and the virtual representative can engage in a dialog as part of the conditioning process."); directing a behavior of a virtual representative and engaging dialog in conditioning process shows activating the virtual representative/first volumetric object, and this is based on the prompt mentioned and the eye position since the eye position is in the knowledge base mentioned as taught in paragraph 72.
However, the combination of Holland, Dal, Mehr and McIntyre-Kirwin fails to teach monitoring, by the processor, a spatial orientation of a virtual reality headset; and determining, by the processor, one or more viewing angles of the first volumetric object, based at least in part on monitoring the spatial orientation and an eye position of the virtual reality headset.  
However, Gupta teaches monitoring, by the processor, a spatial orientation of a virtual reality headset (Gupta, paragraph 91 teaches "motion detectors can also be used to determine the spatial orientation of the virtual reality device 200 as well in three-dimensional space by detecting a gravitational direction”). Gupta is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of tracking spatial orientation of VR device. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Holland, Dal, Mehr and McIntyre-Kirwin with the spatial orientation tracking techniques of Gupta to improve the functioning of the electronic device itself by and improving the overall user experience to overcome problems specifically arising in the realm of the technology associated with electronic device user interaction (Gupta, paragraph 24). This would be done by improving accuracy and depth of objects due to knowing the users/VR systems spatial orientation.
However, the combination of Holland, Dal, Mehr, McIntyre-Kirwin and Gupta fails to teach and determining, by the processor, one or more viewing angles of the first volumetric object, based at least in part on monitoring the spatial orientation and an eye position of the virtual reality headset.  
However, Miller teaches and determining, by the processor, one or more viewing angles of the first volumetric object, based at least in part on monitoring the spatial orientation and an eye position of the virtual reality headset.   (Miller, fig. 21a reference 2140a shows a gaze fixation point and ray direction in world frame); ray direction in world frame would be a viewing angle and this is based on the eye tracking and head pose in world frame[spatial orientation] of the previous steps 2136a-2136b. Miller is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of contextual based rendering of avatars/virtual objects in specific settings. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Holland, Dal, Mehr, McIntyre-Kirwin and Gupta with the spatial orientation and eye position techniques of Miller to perceive an avatar of another user in the viewer's environment and thereby create a tangible sense of the other user's presence in the viewer's environment (Miller, paragraph 52). This would create a more engaging and realistic experience leading to a higher user engagement. 
	Regarding claim 7, the combination of Holland, Dal, Mehr, McIntyre-Kirwin, Gupta and Miller teaches further comprising; continuously monitoring, the spatial orientation of the virtual reality headset (Gupta, paragraph 91 teaches "motion detectors can also be used to determine the spatial orientation of the virtual reality device 200 as well in three-dimensional space by detecting a gravitational direction"); since done with motion detectors, this would be continuous with motion; and updating, by the processor, the video with additional frames via the generative adversarial network, continuously, in real-time, based at least in part on the monitoring of the virtual reality headset and a received query (Holland, paragraph 52 teaches "conditioning input 235 can include prompts to control the capabilities of the conditioned model. For example, the conditioned model may be prompted to limit the complexity of mathematical calculations, avoid using technical jargon, limit the scope of discussion relative to the total knowledge contained in the baseline mode, any other prompt for controlling the capabilities of the conditioned model, and/or any combination thereof" and paragraph 26 teaches "XR system can use tracking information to calculate the relative pose of devices, objects, and/or features of the real-world environment in order to match the relative position"..." relative pose information can be used to match virtual content with the user's perceived motion and the spatio-temporal state of the devices, objects, and real-world environment"); controlling capabilities of conditioned/adapted model shows updating adaptation of first volumetric object, the prompt used to do so is the query, and this is based on the monitoring of the VR since virtual content matches users perceived motion and spatio-temporal state of devices, also, this can be done using the GAN of Mehr since it also adapts the objects. The same motivations used in claim 4 apply here in claim 7.  
Regarding claim 11, the system claim 11 recites similar limitations as method claim 4, and thus is rejected under similar rationale.
Regarding claim 14, the system claim 14 recites similar limitations as method claim 7, and thus is rejected under similar rationale.
Regarding claim 18, the computer program product claim 18 recites similar limitations as system claim 11, and thus is rejected under similar rationale. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAUMAN U AHMAD whose telephone number is (703)756-5306. The examiner can normally be reached Monday - Friday 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571) 272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/N.U.A./Examiner, Art Unit 2611                                                                                                                                                                                                        

/KEE M TUNG/Supervisory Patent Examiner, Art Unit 2611
Read full office action
Prosecution Timeline

Oct 30, 2023
Application Filed
Jul 16, 2025
Non-Final Rejection — §103, §112
Sep 16, 2025
Interview Requested
Sep 24, 2025
Examiner Interview Summary
Sep 24, 2025
Applicant Interview (Telephonic)
Oct 17, 2025
Response Filed
Oct 28, 2025
Final Rejection — §103, §112
Dec 09, 2025
Interview Requested
Dec 15, 2025
Examiner Interview Summary
Dec 15, 2025
Applicant Interview (Telephonic)
Dec 22, 2025
Response after Non-Final Action
Jan 30, 2026
Request for Continued Examination
Feb 02, 2026
Response after Non-Final Action
Mar 24, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/473,814
Patent 12592036
BLENDING ELEVATION DATA INTO A SEAMLESS HEIGHTFIELD
2y 5m to grant Granted Mar 31, 2026
18/134,270
Patent 12530807
METHODS AND SYSTEMS FOR COMPRESSING DIGITAL ELEVATION MODEL DATA
2y 5m to grant Granted Jan 20, 2026
18/251,995
Patent 12518472
DEFORMABLE NEURAL RADIANCE FIELDS
2y 5m to grant Granted Jan 06, 2026
18/447,963
Patent 12518482
VIRTUAL REPRESENTATIVE CONDITIONING SYSTEM
2y 5m to grant Granted Jan 06, 2026
18/450,779
Patent 12505601
CONTENT DISPLAY CONTROL DEVICE, CONTENT DISPLAY CONTROL METHOD, AND STORAGE MEDIUM STORING CONTENT DISPLAY CONTROL PROGRAM
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
98%
With Interview (+19.8%)
2y 8m
Median Time to Grant
High
PTA Risk
Based on 36 resolved cases by this examiner. Grant probability derived from career allow rate.