Last updated: April 19, 2026
Application No. 18/387,806
METHOD AND SYSTEM FOR TRACKING CAMERA POSES

Final Rejection §103
Filed
Nov 07, 2023
Examiner
HILAIRE, CLIFFORD
Art Unit
2488
Tech Center
2400 — Computer Networks
Assignee
Hexagon Geosystems Services AG
OA Round
2 (Final)
Interview Optional

— +15.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 438 resolved cases, 2023–2026
Examiner Intelligence

HILAIRE, CLIFFORD View full profile →
Grants 72% — above average
Career Allow Rate
313 granted / 438 resolved
+13.5% vs TC avg
Strong +16% interview lift
Without
With
+15.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
32 currently pending
Career history
470
Total Applications
across all art units
Statute-Specific Performance

§101
3.1%
-36.9% vs TC avg
§103
47.9%
+7.9% vs TC avg
§102
19.6%
-20.4% vs TC avg
§112
28.9%
-11.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 438 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Applicant(s) Response to Official Action
The response filed on 1/30/2026 has been entered and made of record.

Response to Arguments/Amendments
Examiner fully addresses below any arguments that were not rendered moot.
Claim Rejections - 35 USC § 103
Summary of Arguments:
Regarding claims 1 and 9, Applicant seems to argue that:
Claim 1 is directed to a technical system in which a film or television set includes a video screen displaying an artificial background, and in which pose data of a digital video camera is provided in real time to a VFX engine, such that the VFX engine adapts the artificial background displayed on the video screen in real time based on the pose of the digital video camera. The claim therefore requires a closed, real-time control loop between camera pose tracking, a rendering system, and a physical display device forming part of the set itself. The cited combination of references fails to teach or suggest this subject matter.
Gay does not disclose a set including a video screen that displays an artificial background, nor does it disclose adapting such a background in real time based on camera pose. The output of Gay is a signal delivered to a viewer or performer, not a dynamically updated physical background forming part of the filming environment. Accordingly, the reference lacks the core requirement of amended claim 1 that a VFX engine adapts a background shown on a video screen at the set itself in response to camera pose data.
Tamama does not disclose a film or television set including a video screen displaying an artificial background, nor does it disclose providing camera pose data to a VFX engine for real-time adaptation of such a background. The reference stops at spatial understanding and does not address rendering or adapting a background on a physical display in the environment.
Moreover, a person of ordinary skill in the art would not have been motivated to combine Tamama with Gay to arrive at amended claim 1. The two references address different technical problems and operate at different system levels. Applicant submits that Gay focuses on performer perception and interaction with virtual objects, whereas Tamama focuses on computational techniques for determining object position in 3D space. Neither reference addresses, recognizes, or suggests the problem solved by amended claim 1, namely maintaining a parallax-correct artificial background on a physical video screen in real time as a camera moves within a set.
Such a combination would still fail to disclose or suggest the claimed real-time feedback loop in which camera pose data is used by a VFX engine to adapt the content displayed on a video screen that constitutes the background of the filmed scene. Achieving such a system requires architectural decisions that go beyond routine design choice, including low- latency pose tracking, synchronization between camera movement and display update, and integration of rendering output directly into the physical set.
Claim 1 enables real-time, pose-correct background rendering on a physical screen at the set, allowing correct parallax, reduced post-production, and immediate visual feedback during filming. Neither cited reference teaches or suggests this effect, nor provides any indication that such an effect should be pursue.

Examiner’s Response: 
Examiner respectfully disagrees. Regarding claims 1 and 9, Examiner contends:
In response to applicant’s argument that the references fail to show certain features of the invention, it is noted that the features upon which applicant relies (i.e., “a closed, real-time control loop between camera pose tracking, a rendering system, and a physical display device forming part of the set itself”, “maintaining a parallax-correct artificial background on a physical video screen in real time as a camera moves within a set”, “real-time feedback loop in which camera pose data is used by a VFX engine to adapt the content displayed on a video screen that constitutes the background of the filmed scene” and “real-time, pose-correct background rendering on a physical screen at the set, allowing correct parallax, reduced post-production, and immediate visual feedback during filming”) are not recited in the rejected claim(s). Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007). In this case, it would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (Michael- ¶0027).
In response to applicant’s arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
Accordingly, Examiner maintains the rejections.

CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f): 
(f) ELEMENT IN CLAIM FOR A COMBINATION.—An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph: 
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph: 
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as "configured to" or "so that"; and 
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: “imaging unit” (claim 1), “localization unit” (claim 1); “inertial measuring unit” (claims 2 & 12), “display unit” (claim 8). 

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. 
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-5, 7-10, 12 and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Hideo Tamama et al. [US 20200027236 A1: already of record] in view of Michael F. Gay et al. [US 20120200667 A1: already of record].
Regarding claim 1, Hideo teaches:
1. A pose tracking system for continuously determining a pose of a video camera while filming a scene at a set of a film or TV production (i.e. In electronic device, method, and computer readable medium for 3D association of detected objects are provided- Abstract), the system comprising:
an imaging unit with one or more 2D cameras configured to provide two- dimensional image data of an environment (i.e. an RGB camera 306- fig. 3), 
a pose tracking device that comprises the imaging unit and at least one time-of- flight camera comprising a sensor array and one or more laser emitters, the time-of-flight camera being configured for capturing three-dimensional point-cloud data of the set (i.e. The monochrome camera 314 can be a second RGB camera to provide data to the 3D head pose block 312, or can be another type of image sensing device. In other embodiments, other devices such as laser scanners or sonar devices can be used to provide images, 3D space environmental data for generating the 3D point cloud, and other environmental and SLAM data- ¶0058), wherein the pose tracking device: 
is attached to or configured to be attached to the video camera so that the at least one time-of-flight camera is oriented to capture three-dimensional point-cloud data of the scene filmed by the video camera (i.e. FIG. 5 illustrates a diagrammatic view of a 3D feature point reprojection process for 3D association of detected objects according to embodiments of the present disclosure. The process can be performed by the electronic device 101, and can be executed by the processor 120. The reprojection process includes capturing an RGB frame 502 from a particular head pose (P_trgb) by a sensor such as the RGB camera 306. The processor detects, such as via the object detection block 304, an object 504 within the RGB frame 502 and defines a bounded area 506 around the detected object 504. The bounded area 506 has an (x,y) coordinate within the RGB frame 502, and the processor sizes the bounded area 506 to surround the detected object 504 according to a determined width (w) and height (h). Detected object and bounded area information, such as object ID, object class, timestamp, position, and size, can be stored in a memory to be used during 3D association, such as in the memory 310. Using SLAM, the processor generates a 3D point cloud 508, such as via the 3D head pose block 312, based on one or more parameters such as head pose data received from an IMU, such as IMU 318, and by determining a plurality of 3D feature points of an environment from image data generated, such as from monochrome camera 314. The SLAM data, such as head poses, 3D point clouds, images with 2D feature points, or other information can be stored in a memory to be used during 3D association, such as in the memory 322.- ¶0066); 
comprises a localization unit that is configured to execute a pose determination functionality that comprises continuously capturing two-dimensional image data of the environment by the one or more 2D cameras, continuously determining a pose of the video camera based at least on the two- dimensional image data and/or the three-dimensional point-cloud data, and generating pose data based on the determined pose (i.e. a 3D head pose block 312 receives data from, for example, a monochrome camera 314 through an associated ISP 316. The data received at the 3D head pose block 312 includes monochrome frames or images, and each frame or image can include a timestamp- ¶0058… The 3D head pose block 312 executed by the processor generates a 3D point cloud of the environment using images captured by the monochrome camera 314 and movement and orientation data, such as head or camera poses, received from the IMU 318. The data from the 3D head pose block 312, such as head poses, 3D point clouds, images with 2D feature points, or other information can be stored in a memory 322. The memory 322 can be a temporary memory location or permanent storage location local to a processor executing the 3D head pose block 312, the object detection block 304, and/or the 3D association block 302, can be a remote storage location such as a server, or other storage location. The memory 322 can also be the same memory as memory 310, or a separate memory- ¶0058); and
comprises a data interface for providing a data connection with a the video camera and/or a VFX engine (i.e.  It should also be understood that the RGB camera 306, the monochrome camera 314, and the IMU 318 can be one of the sensors 180 described with respect to FIG. 1- ¶0062), particularly a VFX engine of the pose tracking system or of the film or TV production (i.e. a rendering block 326- ¶0061), 
wherein the pose tracking device is configured to provide the pose data to the video camera and/or the VFX engine (i.e. The object information stored in the memory 324 is used by a rendering block 326 executed by a processor to attach graphics and virtual objects to the detected objects in the 3D space. The rendering block 326 retrieves object information from the memory 324, 3D models from a memory 328, and head pose data from the 3D head pose block 312- ¶0061), the provided pose data allowing adapting display on the video screen in real-time to the pose of the digital video camera (i.e. The inertial measurement unit can track head poses and/or other movement of the electronic device 101- ¶0045… The electronic device 101 can thus monitor current available system resources, and can also monitor the quality of the 3D point cloud, and switch 3D association methods (reprojection, 2D feature point triangulation, pixel matching triangulation) in real-time according to the current available system resources- ¶0079).
However, Hideo does not teach explicitly:
	time-of-flight camera, wherein the set comprises a video screen that displays an artificial background for the scene, wherein the pose tracking device is configured to provide at least the pose data to the VFX engine in real-time.
In the same field of endeavor, Michael teaches:
time-of-flight camera (i.e. A depth map may be generated, according to some embodiments, either by processing the video sequences of multiple views of the scene or by a three dimensional cameras such as a Light Detection And Ranging ("LIDAR") camera. A LIDAR camera may be associated with an optical remote sensing technology that measures the distance to, or other properties, of a target by illuminating the target with light (e.g., using laser pulses). A LIDAR camera may use ultraviolet, visible, or near infrared light to locate and image objects based on the reflected time of flight. This information may then be used in connection with any of the embodiments described herein. Other technologies utilizing RF, infrared, and Ultra-wideband signals may be used to measure relative distances of objects in the scene. Note that a similar effect might be achieved using sound waves to determine an anchorperson's location- ¶0036),
wherein the set comprises a video screen that displays an artificial background for the scene, wherein the pose tracking device is configured to provide at least the pose data to the VFX engine in real-time (i.e. To improve interactions with virtual content, FIG. 6 is block diagram of a system 600 that may be provided in accordance with some embodiments. The system 600 creates an augmented reality environment using a camera 640 to capture a video sequence of a real-world scene 610 including a person 620 generates a graphical sequence that includes a virtual object 630. For example, in a studio, the broadcast camera 640 may record the person 620 (possibly using a "green screen" in the background) or on location "in the field." A "virtual camera" (the perspective used by a graphic engine to render a virtual environment) may be aligned with the broadcast camera 640 so that the rendered environment matches the person's scale, movements, etc. Typically, the broadcast camera's perspective (including position, roll, pan, tilt, and focal-length) is extracted using sensors mounted on the broadcast camera 640 or by analyzing the video frames received from the broadcast camera 640. According to some embodiments described herein, capabilities to improve the accuracy and realism of the mixed real and virtual production may be provided by the system 600. Note that a person 620 who cannot "see" the virtual object 630 he or she interacts with will have less natural and accurate interactions with the virtual content. The system 600 disclosed herein may improve and extend the interaction between the person 620 and the virtual object 630 and allow the person 620 to spatially and temporally affect the rendering of the virtual object 630 during the production- ¶0031…According to some embodiments, a 3D model of the person 620 (including his or her location, pose, surface, and texture and color characteristics) may be obtained through an analysis of the broadcast camera 640 and potentially auxiliary cameras and/or sensors 642 (attached to the person or external to the person). Once a 3D model of the person 620 is obtained, the image of the person 620 may be reconstructed into a new image that shows the person with a new pose and/or appearance relative to the virtual elements. According to some embodiments, the "viewer video" may be served to the audience- ¶0033).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (Michael- ¶0027).

Regarding claim 2, Hideo and Michael teach all the limitations of claim 1 and Hideo further teaches:
wherein the pose tracking device comprises an inertial measuring unit, wherein the pose determination functionality comprises continuously (i.e. The triangulation process described herein allows the augmented reality tagging application to continue to operate efficiently even when less system resources are available. This is important for augmented reality applications in which users utilize a camera to provide a continuous view of a scene or environment, as the user expects the application to display on the user's device accurate and continuously updated information concerning the objects in the environment- ¶0087) capturing inertial data using the inertial measuring unit and generating the pose data is also based on the inertial data(i.e. The 3D point cloud can be generated from one or more captured images and data received by the processor from an inertial measurement unit- ¶0072);
the pose tracking device comprises a position-sensitive device, comprising a GNSS receiver and/or a compass, wherein the pose determination functionality comprises continuously capturing position data using the position-sensitive device and generating the pose data is also based on the position data; and/or the pose determination functionality comprises the one or more time-of-flight cameras continuously capturing three-dimensional point-cloud data of the environment, and generating the pose data is also based on the three-dimensional point-cloud data,wherein: the one or more 2D cameras are configured to capture the two-dimensional image data with a rate of at least 5 operations per second, particularly at least operations per second; and/or the localization unit is configured to continuously track the pose of the digital video camera in six degrees-of-freedom. 

Regarding claim 3, Hideo and Michael teach all the limitations of claim 1 and Hideo further teaches:
wherein the pose tracking device is configured to provide the three-dimensional point-cloud data of the set and the pose data to the VFX engine (i.e. The SLAM data, such as head poses, 3D point clouds, images with 2D feature points, or other information can be stored in a memory to be used during 3D association, such as in the memory 322- ¶0066), and to generate the three-dimensional point-cloud data of the set and the pose data in such a way that they can be used by the VFX engine for applying visual effects, augmented reality and/or an artificial background to the scene (i.e. The reprojection process further includes, such as via the 3D association block 302, the processor mapping or reprojecting the 3D point cloud through the captured RGB frame 502 and the bounded area 506, such that the 3D feature points are within the field of view (FOV) 510 of the RGB camera from pose P_trgb. RGB camera parameters 512 for the RGB frame 502 can be used to provide the FOV 510- ¶0068… The 3D position along with other object information such as an object ID and a timestamp is stored in a memory, such as in the memory 324, to be used by the augmented reality tagging application. The processor can use the determined 3D position of the detected object 504 to place the detected object 504 in 3D space 514 at rendering block 326 during operation of the augmented reality tagging application and for display on the display 332- ¶0069), 
However, Hideo does not teach explicitly:
	particularly wherein the set comprises at least one chroma keying screen or a video screen and/or is a virtual studio.
In the same field of endeavor, Michael teaches:
	particularly wherein the set comprises at least one chroma keying screen or a video screen and/or is a virtual studio (i.e. To improve interactions with virtual content, FIG. 6 is block diagram of a system 600 that may be provided in accordance with some embodiments. The system 600 creates an augmented reality environment using a camera 640 to capture a video sequence of a real-world scene 610 including a person 620 generates a graphical sequence that includes a virtual object 630. For example, in a studio, the broadcast camera 640 may record the person 620 (possibly using a "green screen" in the background) or on location "in the field." A "virtual camera" (the perspective used by a graphic engine to render a virtual environment) may be aligned with the broadcast camera 640 so that the rendered environment matches the person's scale, movements, etc. Typically, the broadcast camera's perspective (including position, roll, pan, tilt, and focal-length) is extracted using sensors mounted on the broadcast camera 640 or by analyzing the video frames received from the broadcast camera 640. According to some embodiments described herein, capabilities to improve the accuracy and realism of the mixed real and virtual production may be provided by the system 600. Note that a person 620 who cannot "see" the virtual object 630 he or she interacts with will have less natural and accurate interactions with the virtual content. The system 600 disclosed herein may improve and extend the interaction between the person 620 and the virtual object 630 and allow the person 620 to spatially and temporally affect the rendering of the virtual object 630 during the production- ¶0031).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (Michael- ¶0027).

Regarding claim 4, Hideo and Michael teach all the limitations of claim 1 and Hideo further teaches:
wherein the pose tracking device is configured to provide also the three-dimensional point-cloud data of the set to the VFX engine in real-time (i.e. Once the 3D feature points are reprojected into the FOV 510, the processor determines which 3D feature points within the bounded area 506 are located on the surface of the detected object 504. The processor removes from the bounded area 506 the 3D feature points that are not located on the surface of the detected object 504. The processor determines the 3D position, including (x,y,z) coordinates, of the detected object 504 in a 3D space 514 from the remaining 3D feature points on the surface of the detected object 504. The 3D position along with other object information such as an object ID and a timestamp is stored in a memory, such as in the memory 324, to be used by the augmented reality tagging application. The processor can use the determined 3D position of the detected object 504 to place the detected object 504 in 3D space 514 at rendering block 326 during operation of the augmented reality tagging application and for display on the display 332- ¶0069).

Regarding claim 5, Hideo and Michael teach all the limitations of claim 1 and Hideo further teaches:
wherein the scene involves objects and/or actors being present on the set, and the three-dimensional point-cloud data of the set captured by the at least one time-of-flight camera comprises point-cloud data of the objects and/or actors (i.e. The reprojection process further includes, such as via the 3D association block 302, the processor mapping or reprojecting the 3D point cloud through the captured RGB frame 502 and the bounded area 506, such that the 3D feature points are within the field of view (FOV) 510 of the RGB camera from pose P_trgb. RGB camera parameters 512 for the RGB frame 502 can be used to provide the FOV 510. The processor maps the bounded area 506 including the detected object 504 to the FOV 510- ¶0068). 
the pose tracking device is configured: to 
detect the moving objects and/or actors in the three-dimensional point- cloud data,to track positions of the moving objects and/or actors during the take (i.e. determine a location of the detected object in a 3D space using the head pose data and the bounded area in the captured image- abstract), and to provide the point-cloud data of the set to the VFX engine, including the tracked positions of the moving objects and/or actors together with time stamps, particularly so that the movements of the objects and/or actors can be made visible using a timeline slider(i.e. At block 402, the processor controls one or more cameras to capture an image of an environment. At block 404, the processor detects an object in the captured image. At block 406, the processor defines a bounded area around the detected object. The bounded area includes an (x,y) position and a size (w,h) calculated to surround the detected object. At block 408, the processor generates head pose data. The head pose data can be head or camera poses provided by the IMU 318 to the processor, and can include other information such as a 3D point cloud, images with 2D features points, timestamps, or other information. At block 410, the processor determines a location of the detected object in the 3D space associated with the environment using the bounded area and the head pose data, in accordance with various embodiments of the present disclosure- ¶0064…  Once the 3D feature points are reprojected into the FOV 510, the processor determines which 3D feature points within the bounded area 506 are located on the surface of the detected object 504. The processor removes from the bounded area 506 the 3D feature points that are not located on the surface of the detected object 504. The processor determines the 3D position, including (x,y,z) coordinates, of the detected object 504 in a 3D space 514 from the remaining 3D feature points on the surface of the detected object 504. The 3D position along with other object information such as an object ID and a timestamp is stored in a memory, such as in the memory 324, to be used by the augmented reality tagging application. The processor can use the determined 3D position of the detected object 504 to place the detected object 504 in 3D space 514 at rendering block 326 during operation of the augmented reality tagging application and for display on the display 332.- ¶0069); and/or 
the three-dimensional point-cloud data of the set comprising the point-cloud data of the objects and/or actors is provided to the VFX engine so that it can be used by the VFX engine to determine a three-dimensional position for a visual effect to be applied to the scene relative to three-dimensional positions of the objects and/or actors, particularly wherein the three-dimensional point-cloud data of the set comprising the point-cloud data of the objects and/or actors is provided to the VFX engine in real-time.  
However, Hideo does not teach explicitly:
particularly wherein the set comprises a chroma keying screen used for adding visual effects, augmented reality, and/or artificial background to the scene in post-production, the objects and/or actors being in front of the chroma keying screen wherein: the objects and/or actors are moving during a take of the scene.
In the same field of endeavor, Michael teaches:
particularly wherein the set comprises a chroma keying screen used for adding visual effects, augmented reality, and/or artificial background to the scene in post-production, the objects and/or actors being in front of the chroma keying screen wherein: the objects and/or actors are moving during a take of the scene.
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (¶0027).

Regarding claim 7, Hideo and Michael teach all the limitations of claim 1.
However, Hideo does not teach explicitly:
wherein the laser emitters are configured to emit infrared light and at least a subset of the laser emitters is configured: to emit light pulses in the form of a pattern to generate a pattern of reflections of the light pulses, wherein the at least one time-of-flight camera is configured for capturing three-dimensional point-cloud data using the pattern of reflections, particularly wherein the subset of the laser emitters comprises an optical lens, grating or mesh to produce the pattern, and/or the localization unit is configured to use the three-dimensional point-cloud data of the pattern of reflections to perform a ToF SLAM functionality for simultaneous localization and mapping; or to emit diffused infrared lighting, wherein the sensor array of each of the time-of- flight cameras is configured to receive reflections of the diffused infrared lighting emitted by the one or more laser emitters, the time-of- flight cameras are configured to generate intensity images based on the received reflections of the diffused infrared lighting, and the localization unit is configured to execute a Visual-SLAM and/or ToF-SLAM functionality using the intensity images received from the time-of-flight cameras for simultaneous localization and mapping, particularly also using two-dimensional image data of the imaging unit and/or localization data of the localization unit.  
In the same field of endeavor, Michael teaches:
wherein the laser emitters are configured to emit infrared light and at least a subset of the laser emitters is configured: to emit light pulses in the form of a pattern to generate a pattern of reflections of the light pulses, wherein the at least one time-of-flight camera is configured for capturing three-dimensional point-cloud data using the pattern of reflections, particularly wherein the subset of the laser emitters comprises an optical lens, grating or mesh to produce the pattern, and/or the localization unit is configured to use the three-dimensional point-cloud data of the pattern of reflections to perform a ToF SLAM functionality for simultaneous localization and mapping; or to emit diffused infrared lighting, wherein the sensor array of each of the time-of- flight cameras is configured to receive reflections of the diffused infrared lighting emitted by the one or more laser emitters, the time-of- flight cameras are configured to generate intensity images based on the received reflections of the diffused infrared lighting, and the localization unit is configured to execute a Visual-SLAM and/or ToF-SLAM functionality using the intensity images received from the time-of-flight cameras for simultaneous localization and mapping, particularly also using two-dimensional image data of the imaging unit and/or localization data of the localization unit (i.e. A depth map may be generated, according to some embodiments, either by processing the video sequences of multiple views of the scene or by a three dimensional cameras such as a Light Detection And Ranging ("LIDAR") camera. A LIDAR camera may be associated with an optical remote sensing technology that measures the distance to, or other properties, of a target by illuminating the target with light (e.g., using laser pulses). A LIDAR camera may use ultraviolet, visible, or near infrared light to locate and image objects based on the reflected time of flight. This information may then be used in connection with any of the embodiments described herein. Other technologies utilizing RF, infrared, and Ultra-wideband signals may be used to measure relative distances of objects in the scene. Note that a similar effect might be achieved using sound waves to determine an anchorperson's location- ¶0036).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (Michael ¶0027).

Regarding claim 8, Hideo and Michael teach all the limitations of claim 1.
the system comprising the VFX engine, wherein the VFX engine is configured: 
to apply visual effects, augmented reality and/or an artificial background to the scene, particularly wherein the set comprises at least one chroma keying screen or a video screen and/or is a virtual studio, and to generate and/or adapt, using at least the pose data, VFX data that is related to the visual effects, the augmented reality and/or the artificial background, wherein the pose tracking device is configured to provide at least the pose data to the VFX engine in real-time and the VFX engine is configured: to adapt, in real-time and based on the pose data, an artificial background displayed on a video screen at the set to the pose of the digital video camera; and/or to generate, using the generated VFX data and video stream data generated by the digital video camera, live feedback data, and to provide, in real-time, the live feedback data to a display unit at the set, particularly to a display unit of the pose tracking device, for visualizing the live feedback data as a live feedback video, particularly wherein the VFX data is related to visual effects, the pose tracking device is configured to provide also the three-dimensional point-cloud data to the VFX engine in real-time, and the VFX engine is configured to use the three- dimensional point-cloud data for generating the VFX data, particularly for defining three-dimensional positions of the visual effects in the live feedback video (i.e. FIG. 3 illustrates an example of an augmented reality object detection and 3D association architecture 300 according to embodiments of the present disclosure. The architecture 300 includes a 3D association block 302. The 3D association block 302 defines various functions that are executed by a processor, such as processor 120. The processor receives data from two distinct data pipes to perform 3D association of detected objects. One pipe provides object detection information and the other pipe provides 3D head or camera pose and 3D environment information (SLAM data). For the object detection pipe, an object detection block 304 receives data from an RGB camera 306 through an associated image signal processor (ISP) 308. The data received at the object detection block 304 includes RGB frames or images. Each RGB frame or image can include a timestamp- ¶0032, ¶0046, ¶0053, ¶0056, ¶0076, ¶0086).  

Regarding claim 9, method claim 9 corresponds to apparatus claim 1, and therefore is also rejected for the same reasons of obviousness as listed above. 

Regarding claim 10, method claim 10 corresponds to apparatus claim 5, and therefore is also rejected for the same reasons of obviousness as listed above. 

Regarding claim 12, method claim 12 corresponds to apparatus claim 2, and therefore is also rejected for the same reasons of obviousness as listed above. 

Regarding claim 15, Hideo and Michael teach all the limitations of claim 9.
However, Hideo and Michael do not teach explicitly:
wherein the set comprises at least one chromakeying screen that is used for applying an artificial background to the scene, the method comprising: generating, using at least the pose data, VFX data that is related to the artificial background for the scene, generating, using the VFX data and video stream data generated by the digital video camera, live feedback data, providing, the live feedback data to a display unit, particularly to a display unit of the pose tracking device, and visualizing, in real time, the live feedback data as a live feedback video, particularly to an operator of the video camera, wherein the artificial background is visualized on the chroma keying screen in the live feedback video.  
In the same field of endeavor, Michael teaches:
wherein the set comprises at least one chromakeying screen that is used for applying an artificial background to the scene, the method comprising: generating, using at least the pose data, VFX data that is related to the artificial background for the scene, generating, using the VFX data and video stream data generated by the digital video camera, live feedback data, providing, the live feedback data to a display unit, particularly to a display unit of the pose tracking device, and visualizing, in real time, the live feedback data as a live feedback video, particularly to an operator of the video camera, wherein the artificial background is visualized on the chroma keying screen in the live feedback video (i.e. To improve interactions with virtual content, FIG. 6 is block diagram of a system 600 that may be provided in accordance with some embodiments. The system 600 creates an augmented reality environment using a camera 640 to capture a video sequence of a real-world scene 610 including a person 620 generates a graphical sequence that includes a virtual object 630. For example, in a studio, the broadcast camera 640 may record the person 620 (possibly using a "green screen" in the background) or on location "in the field." A "virtual camera" (the perspective used by a graphic engine to render a virtual environment) may be aligned with the broadcast camera 640 so that the rendered environment matches the person's scale, movements, etc. Typically, the broadcast camera's perspective (including position, roll, pan, tilt, and focal-length) is extracted using sensors mounted on the broadcast camera 640 or by analyzing the video frames received from the broadcast camera 640. According to some embodiments described herein, capabilities to improve the accuracy and realism of the mixed real and virtual production may be provided by the system 600. Note that a person 620 who cannot "see" the virtual object 630 he or she interacts with will have less natural and accurate interactions with the virtual content. The system 600 disclosed herein may improve and extend the interaction between the person 620 and the virtual object 630 and allow the person 620 to spatially and temporally affect the rendering of the virtual object 630 during the production- ¶0031).  
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (Michael ¶0027).

Regarding claim 16, Hideo and Michael teach all the limitations of claim 9.
However, Hideo and Michael do not teach explicitly:
comprising: adapting, in real-time and based on the pose data, an artificial background displayed on a video screen at the set to a current pose of the video camera; and/orgenerating, using the generated VFX data and video stream data generated by the video camera, live feedback data, providing, in real-time, the live feedback data to a display unit at the set, particularly to a display unit of the pose tracking device, and visualizing the live feedback data as a live feedback video, particularly wherein the VFX data is related to visual effects, and the three-dimensional point-cloud data is used for generating the VFX data, particularly for defining three-dimensional positions of the visual effects in the live feedback video. 
In the same field of endeavor, Michael teaches:
comprising: adapting, in real-time and based on the pose data, an artificial background displayed on a video screen at the set to a current pose of the video camera; and/orgenerating, using the generated VFX data and video stream data generated by the video camera, live feedback data, providing, in real-time, the live feedback data to a display unit at the set, particularly to a display unit of the pose tracking device, and visualizing the live feedback data as a live feedback video, particularly wherein the VFX data is related to visual effects, and the three-dimensional point-cloud data is used for generating the VFX data, particularly for defining three-dimensional positions of the visual effects in the live feedback video(i.e. To improve interactions with virtual content, FIG. 6 is block diagram of a system 600 that may be provided in accordance with some embodiments. The system 600 creates an augmented reality environment using a camera 640 to capture a video sequence of a real-world scene 610 including a person 620 generates a graphical sequence that includes a virtual object 630. For example, in a studio, the broadcast camera 640 may record the person 620 (possibly using a "green screen" in the background) or on location "in the field." A "virtual camera" (the perspective used by a graphic engine to render a virtual environment) may be aligned with the broadcast camera 640 so that the rendered environment matches the person's scale, movements, etc. Typically, the broadcast camera's perspective (including position, roll, pan, tilt, and focal-length) is extracted using sensors mounted on the broadcast camera 640 or by analyzing the video frames received from the broadcast camera 640. According to some embodiments described herein, capabilities to improve the accuracy and realism of the mixed real and virtual production may be provided by the system 600. Note that a person 620 who cannot "see" the virtual object 630 he or she interacts with will have less natural and accurate interactions with the virtual content. The system 600 disclosed herein may improve and extend the interaction between the person 620 and the virtual object 630 and allow the person 620 to spatially and temporally affect the rendering of the virtual object 630 during the production- ¶0031-0035).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (Michael ¶0027).

Regarding claim 17, Hideo and Michael teach all the limitations of claim 16.
However, Hideo and Michael do not teach explicitly:
further comprising applying visual effects, augmented reality and/or an artificial background to the scene based on the VFX data. 
In the same field of endeavor, Michael teaches:
further comprising applying visual effects, augmented reality and/or an artificial background to the scene based on the VFX data (i.e. To improve interactions with virtual content, FIG. 6 is block diagram of a system 600 that may be provided in accordance with some embodiments. The system 600 creates an augmented reality environment using a camera 640 to capture a video sequence of a real-world scene 610 including a person 620 generates a graphical sequence that includes a virtual object 630. For example, in a studio, the broadcast camera 640 may record the person 620 (possibly using a "green screen" in the background) or on location "in the field." A "virtual camera" (the perspective used by a graphic engine to render a virtual environment) may be aligned with the broadcast camera 640 so that the rendered environment matches the person's scale, movements, etc. Typically, the broadcast camera's perspective (including position, roll, pan, tilt, and focal-length) is extracted using sensors mounted on the broadcast camera 640 or by analyzing the video frames received from the broadcast camera 640. According to some embodiments described herein, capabilities to improve the accuracy and realism of the mixed real and virtual production may be provided by the system 600. Note that a person 620 who cannot "see" the virtual object 630 he or she interacts with will have less natural and accurate interactions with the virtual content. The system 600 disclosed herein may improve and extend the interaction between the person 620 and the virtual object 630 and allow the person 620 to spatially and temporally affect the rendering of the virtual object 630 during the production- ¶0031- ¶0035). 
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (Michael ¶0027).

Regarding claim 18, Hideo and Michael teach all the limitations of claim 17.
However, Hideo and Michael do not teach explicitly:
wherein the set comprises at least one chroma keying screen or a video screen and/or is a virtual studio. 
In the same field of endeavor, Michael teaches:
wherein the set comprises at least one chroma keying screen or a video screen and/or is a virtual studio (i.e. To improve interactions with virtual content, FIG. 6 is block diagram of a system 600 that may be provided in accordance with some embodiments. The system 600 creates an augmented reality environment using a camera 640 to capture a video sequence of a real-world scene 610 including a person 620 generates a graphical sequence that includes a virtual object 630. For example, in a studio, the broadcast camera 640 may record the person 620 (possibly using a "green screen" in the background) or on location "in the field." A "virtual camera" (the perspective used by a graphic engine to render a virtual environment) may be aligned with the broadcast camera 640 so that the rendered environment matches the person's scale, movements, etc. Typically, the broadcast camera's perspective (including position, roll, pan, tilt, and focal-length) is extracted using sensors mounted on the broadcast camera 640 or by analyzing the video frames received from the broadcast camera 640. According to some embodiments described herein, capabilities to improve the accuracy and realism of the mixed real and virtual production may be provided by the system 600. Note that a person 620 who cannot "see" the virtual object 630 he or she interacts with will have less natural and accurate interactions with the virtual content. The system 600 disclosed herein may improve and extend the interaction between the person 620 and the virtual object 630 and allow the person 620 to spatially and temporally affect the rendering of the virtual object 630 during the production- ¶0031- ¶0035).  
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (Michael ¶0027).


Claims 6 is rejected under 35 U.S.C. 103 as being unpatentable over Hideo Tamama et al. [US 20200027236 A1: already of record] in view of Michael F. Gay et al. [US 20120200667 A1: already of record] and further in view Rainer Wohlgenannt et al. [US 20220414915 A1: already of record].
Regarding claim 6, Hideo and Michael teach all the limitations of claim 1.
However, Hideo and Michael do not teach explicitly:
the pose tracking device comprises at least three 2D cameras that are arranged on the device to provide two-dimensional image data of different parts of the set; and/or at least one 2D cameras is configured as a high-definition camera and arranged to provide two-dimensional image data of the scene; and/or at least one 2D camera is configured as a wide angle or fisheye camera arrangement, particularly wherein at least two or three 2D cameras are configured as wide angle or fisheye camera arrangements, the wide angle or fisheye camera arrangement comprising a high-resolution 2D camera and a wide angle or fisheye lens, particularly wherein the high-resolution 2D camera and a wide angle or fisheye lens are arranged and configured to capture image data covering a visual field of 3600 around a first axis and at least 1600 particularly at least 1900 around a second axis that is orthogonal to the first axis.  
In the same field of endeavor, Rainer wherein:
the pose tracking device comprises at least three 2D cameras that are arranged on the device to provide two-dimensional image data of different parts of the set; and/or at least one 2D cameras is configured as a high-definition camera and arranged to provide two-dimensional image data of the scene; and/or at least one 2D camera is configured as a wide angle or fisheye camera arrangement, particularly wherein at least two or three 2D cameras are configured as wide angle or fisheye camera arrangements, the wide angle or fisheye camera arrangement comprising a high-resolution 2D camera and a wide angle or fisheye lens, particularly wherein the high-resolution 2D camera and a wide angle or fisheye lens are arranged and configured to capture image data covering a visual field of 3600 around a first axis and at least 1600 particularly at least 1900 around a second axis that is orthogonal to the first axis (i.e. FIGS. 4a and 4b each show a sensor unit 3 of the reality capture devices of FIGS. 2a,b and 3a,b. The cameras on the backside and in the interior are made visible here using dashed lines. In both embodiments, the sensor unit 3 comprises three ToF cameras 30 that are arranged at the same distances around the housing of the sensor unit 3 to cover a hemispherical field of view0- ¶0094-0095).  
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo and Michael with the teachings of Rainer to provide an improved reality capture device that allows instantly capturing image data and 3D point information in 360°, particularly with a field of view that is at least hemispherically (Rainer- ¶0011).


Claims 11 is rejected under 35 U.S.C. 103 as being unpatentable over Hideo Tamama et al. [US 20200027236 A1: already of record] in view of Michael F. Gay et al. [US 20120200667 A1: already of record] and further in view Ke Huo et al. [US 20210256765 A1: already of record].
Regarding claim 11, Hideo and Michael teach all the limitations of claim 9.
However, Hideo does not teach explicitly:
wherein the digital camera is moved through the environment along a trajectory while capturing the video stream of the scene, and the pose data also relates to the trajectory of the video camera,
wherein the set comprises a chroma keying screen or a video screen for applying an artificial background to the scene, wherein: the scene comprises objects in front of the chromakeying screen or video screen, particularly wherein the objects and/or actors are moving while the video camera is moved along the trajectory, and/or the artificial background comprises three-dimensional virtual objects,5 particularly a landscape, moving virtual objects or visual effects 
In the same field of endeavor, Michael wherein:
wherein the set comprises a chroma keying screen or a video screen for applying an artificial background to the scene, wherein: the scene comprises objects in front of the chromakeying screen or video screen, particularly wherein the objects and/or actors are moving while the video camera is moved along the trajectory, and/or the artificial background comprises three-dimensional virtual objects,5 particularly a landscape, moving virtual objects or visual effects (i.e. To improve interactions with virtual content, FIG. 6 is block diagram of a system 600 that may be provided in accordance with some embodiments. The system 600 creates an augmented reality environment using a camera 640 to capture a video sequence of a real-world scene 610 including a person 620 generates a graphical sequence that includes a virtual object 630. For example, in a studio, the broadcast camera 640 may record the person 620 (possibly using a "green screen" in the background) or on location "in the field." A "virtual camera" (the perspective used by a graphic engine to render a virtual environment) may be aligned with the broadcast camera 640 so that the rendered environment matches the person's scale, movements, etc. Typically, the broadcast camera's perspective (including position, roll, pan, tilt, and focal-length) is extracted using sensors mounted on the broadcast camera 640 or by analyzing the video frames received from the broadcast camera 640. According to some embodiments described herein, capabilities to improve the accuracy and realism of the mixed real and virtual production may be provided by the system 600. Note that a person 620 who cannot "see" the virtual object 630 he or she interacts with will have less natural and accurate interactions with the virtual content. The system 600 disclosed herein may improve and extend the interaction between the person 620 and the virtual object 630 and allow the person 620 to spatially and temporally affect the rendering of the virtual object 630 during the production- ¶0031).  
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo with the teachings of Michael to improve the production of video presentation involving augmented reality technology (Michael ¶0027).
However, Hideo and Michael do not teach explicitly:
wherein the digital camera is moved through the environment along a trajectory while capturing the video stream of the scene, and the pose data also relates to the trajectory of the video camera.
In the same field of endeavor, Ke wherein:
wherein the digital camera is moved through the environment along a trajectory while capturing the video stream of the scene, and the pose data also relates to the trajectory of the video camera (i.e. In at least one embodiment, the program instructions stored on the memories 26 include a SLAM and Synchronization program 34. As discussed in further detail below, the processors 24 are configured to execute the SLAM and Synchronization program 34 to process a plurality of images frames captured of the scene 60 and/or inertial data received from the respective IMU 33 to perform visual and/or visual-inertial odometry to estimate the position, orientation, and trajectory of the respective camera 22 with respect to the scene 60 over the plurality of images frames. Based on the estimated position, orientation, and trajectory of the respective camera 22, the processors 24 are each configured to generate a three-dimensional model or map representation of the scene 60, referred to herein as a SLAM map 36, which is stored in the memories 26.- ¶0032).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo and Michael with the teachings of Ke to enable a variety of spatially aware augmented reality features and interactions (Ke- Abstract).


Claims 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Hideo Tamama et al. [US 20200027236 A1: already of record] in view of Michael F. Gay et al. [US 20120200667 A1: already of record] and further in view Matthew Walker [US 20200344425 A1: already of record].
Regarding claim 13, Hideo and Michael teach all the limitations of claim 9.
However, Hideo and Michael do not teach explicitly:
wherein generating and/or adapting the VFX data is performed by a VFX engine, wherein video stream data from the digital video camera is received by the VFX engine.
In the same field of endeavor, Matthew wherein:
wherein generating and/or adapting the VFX data is performed by a VFX engine (i.e. The present invention generally relates visual effects (VFX) and the formation of a composite image using motion tracking. More specifically, the present invention relates to a system and method for real-time tracking of a camera providing live recording of a subject and to process the data received for a three-dimensional composite an image thereby obviating the need for match moving- ¶0002), wherein video stream data from the digital video camera is received by the VFX engine (i.e. The present invention discloses a system and method that automates match moving and provides for real time 3D tracking of a recording device such as a video camera. The system is configured to approximate at least the orientation and velocity of the camera to provide a visual effect in a live recording by utilizing smartphone hardware, a mobile application, servers, displays, networks, and/or dedicated software of firmware- ¶0008) 
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo and Michael with the teachings of Matthew to obviate manual match making during in VFX (Matthew- ¶0049).

Regarding claim 14, Hideo, Michael and Matthew teach all the limitations of claim 13.
However, Hideo and Michael do not teach explicitly:
generating and/or adapting the VFX data is also based on the video stream data; the video stream data and the pose data are continuously received by the VFX engine; the video stream data comprises the pose data as meta data; and/or the method comprises capturing, using the digital video camera, a video stream of the scene, wherein the video stream data is generated based on the video stream.
In the same field of endeavor, Matthew wherein:
generating and/or adapting the VFX data is also based on the video stream data; the video stream data and the pose data are continuously received by the VFX engine; the video stream data comprises the pose data as meta data; and/or the method comprises capturing, using the digital video camera, a video stream of the scene, wherein the video stream data is generated based on the video stream (i.e. One such type of tracking that is being relied upon more and more is real-time tracking. Real-time tracking involves 3D tracking of cameras themselves. To achieve this; a number of components from hardware to software need to be combined. Software collects all of the six degrees of freedom movement of the camera as well as metadata such as zoom, focus, iris and shutter elements from many different types of hardware devices, ranging from motion capture systems such as active LED marker-based system, passive systems, to rotary encoders fitted to camera cranes and dollies or inertia gyroscopic sensors mounted directly to the camera, the sensor then being hooked in to the hardware and software components- ¶0008) 
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention, to modify the teachings of Hideo and Michael with the teachings of Matthew to obviate manual match making during in VFX (Matthew- ¶0049).


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLIFFORD HILAIRE whose telephone number is (571)272-8397. The examiner can normally be reached 5:30-1400.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SATH V PERUNGAVOOR can be reached at (571)272-7455. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

CLIFFORD HILAIRE
Primary Examiner
Art Unit 2488



/CLIFFORD HILAIRE/Primary Examiner, Art Unit 2488
Read full office action
Prosecution Timeline

Nov 07, 2023
Application Filed
Nov 01, 2025
Non-Final Rejection — §103
Jan 30, 2026
Response Filed
Mar 06, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/029,979
Patent 12602591
TRAINING REINFORCEMENT LEARNING AGENTS USING AUGMENTED TEMPORAL DIFFERENCE LEARNING
2y 5m to grant Granted Apr 14, 2026
18/140,726
Patent 12596427
REWARD GENERATING METHOD FOR REDUCING PEAK LOAD OF POWER CONSUMPTION AND COMPUTING DEVICE FOR PERFORMING THE SAME
2y 5m to grant Granted Apr 07, 2026
18/808,910
Patent 12576797
ROTATING DEVICE FOR DISPLAY OF VEHICLE
2y 5m to grant Granted Mar 17, 2026
18/155,435
Patent 12573211
SYSTEMS AND METHODS FOR MANEUVER IDENTIFICATION FROM CONDENSED REPRESENTATIONS OF VIDEO
2y 5m to grant Granted Mar 10, 2026
18/700,494
Patent 12568310
TARGET TRACKING DEVICE, TARGET TRACKING METHOD, AND RECORDING MEDIUM FOR STORING TARGET TRACKING PROGRAM
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
87%
With Interview (+15.7%)
2y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 438 resolved cases by this examiner. Grant probability derived from career allow rate.