Office Action Analysis: 18645955 — AUGMENTED REALITY STREAMING DEVICE, METHOD, AND SYSTEM INTEROPERATING WITH EDGE SERVER

Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
1	This action is in response to the amendment filed on 01/21/2026. Claims 1, 6 and 11 have been amended, and claim 12 has been added. Claims 1-11 remain rejected, and claim 12 is rejected.

Response to Arguments
2	Applicant’s arguments with respect to independent claims 1, 6, and 11 filed on 01/21/2026, with respect to the rejection under 35 USC § 103 regarding that the prior art does not teach the following but not limited to “a display module configured to stream AR video solely to stream AR video received from the edge server; perform synchronization and encode on the image data and the inertia data to transmit to the edge server through the communication module without performing AR rendering on the AR streaming device; in response to segmentation rendering-processed AR video comprising segmentation rendering AR video being received from the edge server in which all real-time segmentation processing and AR rendering have been performed at the edge server, decode and blend on the segmentation rendering AR video to perform control to be streamed through the display module, wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering based on ultra-low delay communication.”. This argument has been considered, but are moot due to similar and new grounds of rejection.

3	Regarding arguments under claims 2-5 and 7-10, they directly/indirectly depend on independent claims 1 and 6 respectfully. Applicant does not argue anything other than independent claims 1, 6, and 11. The limitations in those claims, in conjunction with combination, was previously established as explained.

4	Claim 12 is a new claim that was added, and is dependent of the independent claim 1. It is considered, but are moot under new grounds of rejection.

Claim Rejections - 35 USC § 103
5	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

6	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

7	Claim(s) 1, 3, 6, 8, and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gruteser et al. (US 20210110191 A1) in view of Varshney et al. (US 20200118342 A1) and Kamaraju et al. (US 20220334789 A1).

8	Regarding claim 1, Gruteser teaches an augmented reality (AR) streaming device interoperating with an edge server, the AR streaming device comprising ([Abstract] recited “Systems and methods for edge assisted real-time object detection for mobile augmented reality are provided.”; [0027] reciting “The system further provides a parallel streaming and inference method to pipeline the streaming and inference processes to further reduce the offloading latency. The system creates a Motion Vector Based Object Tracking (MvOT) process to achieve fast and lightweight object tracking on the AR devices, based on the embedded motion vectors in the encoded video stream. The system is an end-to-end system based on commodity hardware, and can achieve 60 fps AR experience with accurate object detection.”):
a sensing module including a camera ([0056] reciting “As shown in FIG. 1, the AR device 12 (i.e. smartphone or AR headset) is assumed to be connected to an edge cloud 14 through a wireless connection 16 (i.e. WiFi or LTE). The arrow 17 illustrates the critical path for a single frame. Let t.sub.e2e be the end-to-end latency, which includes the offloading latency t.sub.offload and the rendering latency t.sub.render. t.sub.off load is determined by three main components: (1) the time to stream a frame captured by the camera from the AR device 12 to the edge cloud”) ;
a display module configured to stream AR video ([0057] reciting “An experiment can be conducted measure the latency and its impact on detection accuracy in the entire pipeline, and find that it is extremely challenging for existing AR system to achieve high object detection accuracy in 60 fps display systems.”) 
a communication module configured to transmit or receive data to or from the edge server and the sensing module ([0048] reciting “FIG. 1 is a diagram illustrating the system of the present disclosure. The system includes an AR device 12 which communicates with one or more remote (“edge” or “cloud”) computing systems 14 via a communications link 16, which could be a wireless network communications link (e.g., WiFi, Bluetooth, etc.).”);
; and 
a processor configured to execute the program stored in the memory to ([0071] reciting “Even further, it is possible that the approach discussed herein in connection with FIGS. 1 and 3 could be performed by a single computer having two separate processors (one processor being dedicated to the functions of the AR device 12, and a second (perhaps more powerful) processor being dedicated to the functions of the edge cloud 14).”):
in response to image data and  being obtained through the sensing module, perform synchronization and encode on the image data and the  to transmit to the edge server through the communication module ([0089] reciting “In the rendering thread, the Motion Vector based Object Tracking module uses the extracted motion vectors and cached detection results to achieve fast object tracking. The system then renders virtual overlays based on the coordinates of the detection result.”; [Abstract] reciting “The system also includes dynamic RoI encoding and motion vector-based object tracking processes that operate in a tracking and rendering pipeline executing on the AR device.”) without performing AR rendering on the AR streaming device ([0069] reciting “Object Tracking (MvOT) process 22 is used to adjust the prior cached detection results with viewer or scene motion. Compared to traditional object tracking approaches that match image feature points (i.e. SIFT and Optical Flow) on two frames, this process 22 again reuses motion vectors embedded in the encoded video frames, which allows object tracking without any extra processing overhead.”; [Abstract] reciting “The system also includes dynamic RoI encoding and motion vector-based object tracking processes that operate in a tracking and rendering pipeline executing on the AR device.”), and

based on ultra-low delay communication ([Abstract] reciting “The system employs a low latency offloading process, decouples the rendering pipeline from the offloading pipeline, and uses a fast object tracking method to maintain detection accuracy.”).

9	Gruteser does not explicitly teach a sensing module including a camera and a certain inertia sensor; a display module configured to stream AR video solely to stream AR video received from the edge server; … a memory storing a program for providing an AR streaming service, based on the data; in response to image data and inertia data being obtained through the sensing module, perform synchronization and encode on the image data and the inertia data to transmit to the edge server through the communication module without performing AR rendering on the AR streaming device, and in response to segmentation rendering-processed AR video comprising segmentation rendering AR video being received from the edge server in which all real-time segmentation processing and AR rendering have been performed at the edge server, decode and blend on the segmentation rendering AR video to perform control to be streamed through the display module, wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering based on ultra-low delay communication.

10	Varshney teaches a sensing module including a camera and a certain inertia sensor; … a memory storing a program for providing an AR streaming service, based on the data; … in response to image data and inertia data being obtained through the sensing module, perform synchronization and encode on the image data and the inertia data to transmit to the edge server through the communication module without performing AR rendering on the AR streaming device ([0046] reciting “In an example embodiment, the user may control this orientation through input such as a mouse or rotation of inertia sensors in a phone or virtual reality (VR) headset.”; [0009] reciting “In accordance with another example embodiment, an apparatus may include at least one processor and at least one memory including computer program code. The at least one memory and the computer program code may be configured to, with the at least one processor, cause the apparatus at least to simultaneously capture 360 video data and audio data from a plurality of viewpoints within a real-world environment.”), and 
in response to segmentation rendering-processed AR video comprising segmentation rendering AR video being received from the edge server in which all real-time segmentation processing and AR rendering have been performed at the edge server, decode and blend on the segmentation rendering AR video to perform control to be streamed through the display module ([0053] reciting “In certain example embodiments, each video may be encoded into a compressed video stream using certain video coding formats.”; [0086] reciting “According to another example embodiment, the background may be subtracted. For example, background segmentation may be performed to isolate the subject in each frame from the background pixels. A background frame may be captured prior to recording when no subject is in the capture volume. In addition, the background segmentation algorithm may use two frames for each camera, the background capture frame, and the subject capture frame.”)…

11	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser) to incorporate the teachings of Varshney to provide a method that can include a type of inertia sensor to go with the existing types of sensors like a camera from Gruteser, to have a clearer indication of a processor stored in a memory for program execution, as well as to provide a type of segmentation and blending for the videos provided from Gruteser using the methods of Varshney. Doing so would allow the image to be used for camera position and orientation in a calibrated scene as stated by Varshney ([0084] recited).

12	Gruteser in view of Varshney does not explicitly teach a display module configured to stream AR video solely to stream AR video received from the edge server; … and in response to segmentation rendering-processed AR video comprising segmentation rendering AR video being received from the edge server in which all real-time segmentation processing and AR rendering have been performed at the edge server, … wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering based on ultra-low delay communication.

13	Kamaraju teaches a display module configured to stream AR video solely to stream AR video received from the edge server ([0063] reciting “In operation 414, the transmitting device may then, in some embodiments, display the video stream and AR objects on a display associated with (e.g., part of and/or communicatively coupled with) the transmitting device.”);
… in response to segmentation rendering-processed AR video comprising segmentation rendering AR video being received from the edge server in which all real-time segmentation processing and AR rendering have been performed at the edge server ([0024] reciting “Central server 108 may be needed to establish communication if one or more endpoints to a communication session, e.g., consumer device 102, service provider device 104, or other devices participating in the communication session, are behind a firewall, router, Network Address Translation (NAT), or other device interfering with or otherwise preventing a direct peer-to-peer connection between devices in the communication session… In some embodiments, central server 108 is a cloud service that may be offered by the third party.”; [0063] reciting “Once the transmitting device, e.g. consumer device 102, receives the AR data from the receiving device, e.g. service provider device 104, in operation 412 the transmitting device may then render the AR objects received from the receiving device, e.g., by extracting information from data received in-band and or out-of-band and/or in separate data streams, into the video stream that was previously transmitted, in operation 412… Thus, a user of the transmitting device can view the scene captured by the transmitting device's camera and remotely placed AR objects from the receiving device in real-time.”),
… wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering  based on ultra-low delay communication ([0021] reciting “Other embodiments may implement either of devices 102 or 104 on a variety of different devices, such as a computer (desktop or laptop), tablet, two-in-one, hybrid, smart glasses, or any other computing device that can accept a camera and provide necessary positional information, as will be discussed in greater detail herein.”; [0072] reciting “Such embodiments may result in a slight delay or lag between capture of video and display of video; this lag, however, may be relatively minimal and acceptable to a user of consumer device 102, particularly where consumer device 102 and service provider device 104 are connected via a high-speed network connection.”).

14	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney) to incorporate the teachings of Kamaraju to provide a method that can determine the display module for the AR methods only, to note that the segmentation is in real time and is performed in the server, and is performed at a type of a lightweight playback-oriented device for the servers, while utilizing the servers, AR devices, and the low latency methods taught by the teachings of Gruteser in view of Varshney. Doing so would allow the AR methods to indicating positional changes of the one or more AR objects in each frame relative to the one or more referenced anchor points as stated by Kamaraju.

15	Regarding claim 3, Gruteser in view of Varshney and Kamaraju teaches the AR streaming device of claim 1 (see claim 1 rejection above), wherein the processor is configured to buffer media chunk of the segmentation rendering AR video received from the edge server through the communication module in real time to perform decoding (Varshney; [0071] reciting “In certain example embodiments, compressed video data packets from the demultiplexer may be decoded into a picture that can be displayed to the user. For this, the playback software may initialize a decoding component and use it to decompress data packets before rendering. In an example embodiment, each separate decoder may have its own memory context and buffers that it uses during the decoding process.”; [0081] reciting “Certain example embodiments may provide a system that uses image/video data captured by a camera array to reconstruct view dependent holograms in real-time.”).

16	Regarding claim 6, Gruteser teaches a method performed by an augmented reality (AR) streaming device interoperating with an edge server, the method comprising ([Abstract] recited “Systems and methods for edge assisted real-time object detection for mobile augmented reality are provided.”; [0027] reciting “The system further provides a parallel streaming and inference method to pipeline the streaming and inference processes to further reduce the offloading latency. The system creates a Motion Vector Based Object Tracking (MvOT) process to achieve fast and lightweight object tracking on the AR devices, based on the embedded motion vectors in the encoded video stream. The system is an end-to-end system based on commodity hardware, and can achieve 60 fps AR experience with accurate object detection.”):
obtaining image data and  through a camera and a ; 
performing synchronization and encoding on the image data and the without performing AR rendering on the AR streaming device; transmitting the encoding-completed image data and  to the edge server ([0069] reciting “Object Tracking (MvOT) process 22 is used to adjust the prior cached detection results with viewer or scene motion. Compared to traditional object tracking approaches that match image feature points (i.e. SIFT and Optical Flow) on two frames, this process 22 again reuses motion vectors embedded in the encoded video frames, which allows object tracking without any extra processing overhead.”; [Abstract] reciting “The system also includes dynamic RoI encoding and motion vector-based object tracking processes that operate in a tracking and rendering pipeline executing on the AR device.”; [0089] reciting “In the rendering thread, the Motion Vector based Object Tracking module uses the extracted motion vectors and cached detection results to achieve fast object tracking. The system then renders virtual overlays based on the coordinates of the detection result.”); 

based on ultra-low delay communication ([Abstract] reciting “The system employs a low latency offloading process, decouples the rendering pipeline from the offloading pipeline, and uses a fast object tracking method to maintain detection accuracy.”).

17	Gruteser does not explicitly teach obtaining image data and inertia data through a camera and a certain inertia sensor; … performing synchronization and encoding on the image data and the inertia data without performing AR rendering on the AR streaming device; transmitting the encoding-completed image data and inertia data to the edge server; … receiving segmentation rendering-processed AR video (hereinafter referred to as segmentation rendering AR video) from the edge server in which all real-time segmentation processing and AR rendering have been performed at the edge server; decoding and blending the segmentation rendering AR video; and streaming the AR video through a display module, wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering based on ultra-low delay communication.

18	Varshney teaches obtaining image data and inertia data through a camera and a certain inertia sensor; … performing synchronization and encoding on the image data and the inertia data without performing AR rendering on the AR streaming device; transmitting the encoding-completed image data and inertia data to the edge server ([0046] reciting “In an example embodiment, the user may control this orientation through input such as a mouse or rotation of inertia sensors in a phone or virtual reality (VR) headset.”); … receiving segmentation rendering-processed AR video (hereinafter referred to as segmentation rendering AR video) from the edge server in which all real-time segmentation processing and AR rendering have been performed at the edge server; decoding and blending the segmentation rendering AR video; and streaming the AR video through a display module ([0053] reciting “In certain example embodiments, each video may be encoded into a compressed video stream using certain video coding formats.”; [0086] reciting “According to another example embodiment, the background may be subtracted. For example, background segmentation may be performed to isolate the subject in each frame from the background pixels. A background frame may be captured prior to recording when no subject is in the capture volume. In addition, the background segmentation algorithm may use two frames for each camera, the background capture frame, and the subject capture frame.”)…

19	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser) to incorporate the teachings of Varshney to provide a method that can include a type of inertia sensor to go with the existing types of sensors like a camera from Gruteser, as well as to provide a type of segmentation and blending for the videos provided from Gruteser using the methods of Varshney. Doing so would allow the image to be used for camera position and orientation in a calibrated scene as stated by Varshney ([0084] recited).

20	Gruteser in view of Varshney does not explicitly teach receiving segmentation rendering-processed AR video (hereinafter referred to as segmentation rendering AR video) from the edge server in which all real-time segmentation processing and AR rendering have been performed at the edge server; … and streaming the AR video through a display module, wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering based on ultra-low delay communication.

21	Kamaraju teaches receiving segmentation rendering-processed AR video (hereinafter referred to as segmentation rendering AR video) from the edge server in which all real-time segmentation processing and AR rendering have been performed at the edge server ([0024] reciting “Central server 108 may be needed to establish communication if one or more endpoints to a communication session, e.g., consumer device 102, service provider device 104, or other devices participating in the communication session, are behind a firewall, router, Network Address Translation (NAT), or other device interfering with or otherwise preventing a direct peer-to-peer connection between devices in the communication session… In some embodiments, central server 108 is a cloud service that may be offered by the third party.”; [0063] reciting “Once the transmitting device, e.g. consumer device 102, receives the AR data from the receiving device, e.g. service provider device 104, in operation 412 the transmitting device may then render the AR objects received from the receiving device, e.g., by extracting information from data received in-band and or out-of-band and/or in separate data streams, into the video stream that was previously transmitted, in operation 412… Thus, a user of the transmitting device can view the scene captured by the transmitting device's camera and remotely placed AR objects from the receiving device in real-time.”); 
… and streaming the AR video through a display module, wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering based on ultra-low delay communication ([0021] reciting “Other embodiments may implement either of devices 102 or 104 on a variety of different devices, such as a computer (desktop or laptop), tablet, two-in-one, hybrid, smart glasses, or any other computing device that can accept a camera and provide necessary positional information, as will be discussed in greater detail herein.”; [0072] reciting “Such embodiments may result in a slight delay or lag between capture of video and display of video; this lag, however, may be relatively minimal and acceptable to a user of consumer device 102, particularly where consumer device 102 and service provider device 104 are connected via a high-speed network connection.”).

22	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney) to incorporate the teachings of Kamaraju to provide a method that can note that the segmentation to be in real time and is performed in the server, and is performed at a type of a lightweight playback-oriented device for the servers, while utilizing the servers, AR devices, and the low latency methods taught by the teachings of Gruteser in view of Varshney. Doing so would allow the AR methods to indicating positional changes of the one or more AR objects in each frame relative to the one or more referenced anchor points as stated by Kamaraju.

23	Regarding claim 8, Gruteser in view of Varshney teaches the method of claim 6 (see claim 6 rejection above), wherein decoding and blending the segmentation rendering AR video buffers media chunk of the segmentation rendering AR video received from the edge server through the communication module in real time to perform decoding (Varshney; [0071] reciting “In certain example embodiments, compressed video data packets from the demultiplexer may be decoded into a picture that can be displayed to the user. For this, the playback software may initialize a decoding component and use it to decompress data packets before rendering. In an example embodiment, each separate decoder may have its own memory context and buffers that it uses during the decoding process.”; [0081] reciting “Certain example embodiments may provide a system that uses image/video data captured by a camera array to reconstruct view dependent holograms in real-time.”).

24	Regarding claim 11, Gruteser teaches a lightweight augmented reality (AR) streaming system comprising: an edge server configured to perform recognition of a posture and a position of a user wearing the AR streaming device ([Abstract] recited “Systems and methods for edge assisted real-time object detection for mobile augmented reality are provided.”; [0026] reciting “Still further, the system results in a reduction in local processing on a device (phone, AR headset), as well as quantifying accuracy and latency requirements in an end-to-end AR system with the object detection task offloaded.”; [0027] reciting “The system creates a Motion Vector Based Object Tracking (MvOT) process to achieve fast and lightweight object tracking on the AR devices, based on the embedded motion vectors in the encoded video stream.”) ; and
the AR streaming device configured to obtain image data and  through a camera and a , perform synchronization and encode on the image data and the  without performing AR rendering on the AR streaming device and transmit to the edge server ([0069] reciting “Object Tracking (MvOT) process 22 is used to adjust the prior cached detection results with viewer or scene motion. Compared to traditional object tracking approaches that match image feature points (i.e. SIFT and Optical Flow) on two frames, this process 22 again reuses motion vectors embedded in the encoded video frames, which allows object tracking without any extra processing overhead.”; [Abstract] reciting “The system also includes dynamic RoI encoding and motion vector-based object tracking processes that operate in a tracking and rendering pipeline executing on the AR device.”; [0089] reciting “In the rendering thread, the Motion Vector based Object Tracking module uses the extracted motion vectors and cached detection results to achieve fast object tracking. The system then renders virtual overlays based on the coordinates of the detection result.”), and based on ultra-low delay communication ([Abstract] reciting “The system employs a low latency offloading process, decouples the rendering pipeline from the offloading pipeline, and uses a fast object tracking method to maintain detection accuracy.”).

25	Gruteser does not explicitly teach to perform a space configuration and matching process, based on image data and inertia data from an AR streaming device, and then, transmit segmentation rendering-performed AR video comprising segmentation rendering AR video to the AR streaming device, the edge server configured to perform all real-time segmentation processing and AR rendering; and the AR streaming device configured to obtain image data and inertia data through a camera and a certain inertia sensor, perform synchronization and encode on the image data and the inertia data without performing AR rendering on the AR streaming device and transmit to the edge server, and in response to the segmentation rendering AR video being received from the edge server, decode and blend the segmentation rendering AR video to perform control to be streamed through the display module, wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering based on ultra-low delay communication.

26	Varshney teaches to perform a space configuration and matching process, based on image data and inertia data from an AR streaming device ([0052] reciting “According to certain example embodiments, 360 cameras may include multiple camera lenses placed in a spherical configuration to have full coverage of the recorded environment. As such, proper video stitching may be implemented to convert individual recordings of different regions of the environment into a single 360 video recording. In one example embodiment, the stitching process may include finding and merging matching regions between camera frames and re-projecting them onto a spherical canvas.”; [0110] reciting “To achieve this, certain example embodiments provide a pipeline that estimates physical properties of the multi-view 360 camera captured scene such as 3D geometry of objects and relative position and orientation of viewpoints. The estimated values may be used to establish a correspondence between the multi-viewpoint 360 cameras captured scene and a virtual 3D space containing virtual objects.”)…; and the AR streaming device configured to obtain image data and inertia data through a camera and a certain inertia sensor, perform synchronization and encode on the image data and the inertia data without performing AR rendering on the AR streaming device and transmit to the edge server ([0046] reciting “In an example embodiment, the user may control this orientation through input such as a mouse or rotation of inertia sensors in a phone or virtual reality (VR) headset.”), and in response to the segmentation rendering AR video being received from the edge server, decode and blend the segmentation rendering AR video to perform control to be streamed through the display module ([0053] reciting “In certain example embodiments, each video may be encoded into a compressed video stream using certain video coding formats.”; [0086] reciting “According to another example embodiment, the background may be subtracted. For example, background segmentation may be performed to isolate the subject in each frame from the background pixels. A background frame may be captured prior to recording when no subject is in the capture volume. In addition, the background segmentation algorithm may use two frames for each camera, the background capture frame, and the subject capture frame.”)…

27	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser) to incorporate the teachings of Varshney to provide a method that can include a type of space configuration and matching with the image data provided by Gruteser, can include a type of inertia sensor to go with the existing types of sensors like a camera from Gruteser, as well as to provide a type of segmentation and blending for the videos provided from Gruteser using the methods of Varshney. Doing so would allow the image to be used for camera position and orientation in a calibrated scene as stated by Varshney ([0084] recited).

28	Gruteser in view of Varshney does not explicitly teach to transmit segmentation rendering-performed AR video comprising segmentation rendering AR video to the AR streaming device, the edge server configured to perform all real-time segmentation processing and AR rendering; … wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering based on ultra-low delay communication.

29	Kamaraju teaches to transmit segmentation rendering-performed AR video comprising segmentation rendering AR video to the AR streaming device, the edge server configured to perform all real-time segmentation processing and AR rendering ([0024] reciting “Central server 108 may be needed to establish communication if one or more endpoints to a communication session, e.g., consumer device 102, service provider device 104, or other devices participating in the communication session, are behind a firewall, router, Network Address Translation (NAT), or other device interfering with or otherwise preventing a direct peer-to-peer connection between devices in the communication session… In some embodiments, central server 108 is a cloud service that may be offered by the third party.”; [0063] reciting “Once the transmitting device, e.g. consumer device 102, receives the AR data from the receiving device, e.g. service provider device 104, in operation 412 the transmitting device may then render the AR objects received from the receiving device, e.g., by extracting information from data received in-band and or out-of-band and/or in separate data streams, into the video stream that was previously transmitted, in operation 412… Thus, a user of the transmitting device can view the scene captured by the transmitting device's camera and remotely placed AR objects from the receiving device in real-time.”); 
… wherein the AR streaming device is configured to operate as a lightweight playback-oriented device relying on the edge server for real-time segmentation rendering based on ultra-low delay communication ([0021] reciting “Other embodiments may implement either of devices 102 or 104 on a variety of different devices, such as a computer (desktop or laptop), tablet, two-in-one, hybrid, smart glasses, or any other computing device that can accept a camera and provide necessary positional information, as will be discussed in greater detail herein.”; [0072] reciting “Such embodiments may result in a slight delay or lag between capture of video and display of video; this lag, however, may be relatively minimal and acceptable to a user of consumer device 102, particularly where consumer device 102 and service provider device 104 are connected via a high-speed network connection.”).

30	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney) to incorporate the teachings of Kamaraju to provide a method that can note the segmentation to be in real time and is performed in the server, and is performed at a type of a lightweight playback-oriented device for the servers, while utilizing the servers, AR devices, and the low latency methods taught by the teachings of Gruteser in view of Varshney. Doing so would allow the AR methods to indicating positional changes of the one or more AR objects in each frame relative to the one or more referenced anchor points as stated by Kamaraju.

31	Claim(s) 2 and 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gruteser et al. (US 20210110191 A1) in view of Varshney et al. (US 20200118342 A1) and Kamaraju et al. (US 20220334789 A1) as of claims 1 and 6, further in view of Byagowi et al. (US 20210288738 A1).

32	Regarding claim 2, Gruteser in view of Varshney and Kamaraju teaches the AR streaming device of claim 1 (see claim 1 rejection above), but does not explicitly teach wherein the communication module is configured to transmit or receive the data by using QUIC protocol based on HTTP.

33	Byagowi teaches wherein the communication module is configured to transmit or receive the data by using QUIC protocol based on HTTP ([0066] reciting “Content delivery networks may typically utilize a distributed system architecture to deliver content (e.g., video, images, etc.) from a caching backend to downstream client devices for consumption by users”; [0097] reciting “The shared context may include destination information, stream identification, encryption information, and header/packet information. Sending the HTTP body transmission may further include sending a group of send packets containing instructions for a caching backend server 414 to transmit QUIC data representing the requested content (i.e., content 416) directly to client device 402.”).

34	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney and Kamaraju) to incorporate the teachings of Byagowi to provide a method that utilizes a QUIC protocol based on HTTP with the content like videos for examples that were taught by Gruteser in view of Varshney and Kamaraju. Doing so would allow sending packet instructions in a variety of ways as stated by Byagowi ([0097] recited).

35	Regarding claim 7, Gruteser in view of Varshney and Kamaraju teaches the method of claim 6 (see claim 6 rejection above), wherein transmitting the encoding-completed image data and inertia data to the edge server and the step of receiving the segmentation rendering-processed AR video comprising the segmentation rendering AR video from the edge server (Varshney; [0053] reciting “In certain example embodiments, each video may be encoded into a compressed video stream using certain video coding formats.”; [0086] reciting “According to another example embodiment, the background may be subtracted. For example, background segmentation may be performed to isolate the subject in each frame from the background pixels. A background frame may be captured prior to recording when no subject is in the capture volume. In addition, the background segmentation algorithm may use two frames for each camera, the background capture frame, and the subject capture frame.”) .  

36	Gruteser in view of Varshney and Kamaraju does not explicitly teach … transmits or receives the data by using quick user datagram protocol (UDP) Internet connections (QUIC) protocol based on hypertext transfer protocol (HTTP).  

37	Byagowi teaches … transmits or receives the data by using quick user datagram protocol (UDP) Internet connections (QUIC) protocol based on hypertext transfer protocol (HTTP) ([0066] reciting “Content delivery networks may typically utilize a distributed system architecture to deliver content (e.g., video, images, etc.) from a caching backend to downstream client devices for consumption by users”; [0097] reciting “The shared context may include destination information, stream identification, encryption information, and header/packet information. Sending the HTTP body transmission may further include sending a group of send packets containing instructions for a caching backend server 414 to transmit QUIC data representing the requested content (i.e., content 416) directly to client device 402.”).

38	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney and Kamaraju) to incorporate the teachings of Byagowi to provide a method that utilizes a QUIC protocol based on HTTP with the content like videos for examples that were taught by Gruteser in view of Varshney and Kamaraju. Doing so would allow sending packet instructions in a variety of ways as stated by Byagowi ([0097] recited).

39	Claim(s) 4 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gruteser et al. (US 20210110191 A1) in view of Varshney et al. (US 20200118342 A1) and Kamaraju et al. (US 20220334789 A1) as of claims 1 and 6, further in view of Tielemans et al. (US 20210044639 A1).

40	Regarding claim 4, Gruteser in view of Varshney and Kamaraju teaches the AR streaming device of claim 1 (see claim 1 rejection above), wherein the processor is configured to perform multiplexing of the encoded image data and inertia data by image frame units (Varshney; [0092] reciting “According to a further example embodiment, multi-view videos may be mapped into streams of a video to achieve multi-view encoding… Furthermore, in certain example embodiments, video streams of all cameras may be multiplexed into one single multimedia container.”), , and perform de-multiplexing of the received segmentation rendering AR video (Varshney; [0066] reciting “The demultiplexer may also take in state information about the system and uses it to make decisions on which data packets to send to the decoder, and which to ignore. For example, the software system may keep track of which viewpoint is currently being displayed. In one example embodiment, the demultiplexer may use this information to discard packets from all bitstreams except for the bitstreams associated with the currently displayed viewpoint.”), 

41	Gruteser in view of Varshney and Kamaraju does not explicitly teach …based on ISO 23000-19 common media application format (CMAF) standard video.

42	Tielemans teaches …based on ISO 23000-19 common media application format (CMAF) standard video ([0089] reciting “Information on the availability of the video in both the independent and dependent version may be provided in the form of a URL to a manifest file that is available on the server, for example a manifest file following the Common Media Application Format (CMAF) for segmented media according to ISO/IEC 23000-19.”).

43	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney and Kamaraju) to incorporate the teachings of Tielemans to provide a method that utilizes ISO 23000-19 CMAF standard video with the videos that are taught by Gruteser in view of Varshney and Kamaraju. Doing so would allow a server for streaming a video to a client involves making the video available from the server to the client upon request as stated by Tielemans ([Abstract] recited).

44	Regarding claim 10, Gruteser in view of Varshney teaches the method of claim 6 (see claim 6 rejection above), further comprising: performing multiplexing of the encoded image data and inertia data by image frame units (Varshney; [0092] reciting “According to a further example embodiment, multi-view videos may be mapped into streams of a video to achieve multi-view encoding… Furthermore, in certain example embodiments, video streams of all cameras may be multiplexed into one single multimedia container.”), ; and performing de-multiplexing of the received segmentation rendering AR video (Varshney; [0066] reciting “The demultiplexer may also take in state information about the system and uses it to make decisions on which data packets to send to the decoder, and which to ignore. For example, the software system may keep track of which viewpoint is currently being displayed. In one example embodiment, the demultiplexer may use this information to discard packets from all bitstreams except for the bitstreams associated with the currently displayed viewpoint.”),   

45	Gruteser in view of Varshney and Kamaraju does not explicitly teach …based on ISO 23000-19 common media application format (CMAF) standard video.

46	Tielemans teaches …based on ISO 23000-19 common media application format (CMAF) standard video ([0089] reciting “Information on the availability of the video in both the independent and dependent version may be provided in the form of a URL to a manifest file that is available on the server, for example a manifest file following the Common Media Application Format (CMAF) for segmented media according to ISO/IEC 23000-19.”).

47        It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney and Kamaraju) to incorporate the teachings of Tielemans to provide a method that utilizes ISO 23000-19 CMAF standard video with the videos that are taught by Gruteser in view of Varshney and Kamaraju. Doing so would allow a server for streaming a video to a client involves making the video available from the server to the client upon request as stated by Tielemans ([Abstract] recited).

48	Claim(s) 5 and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gruteser et al. (US 20210110191 A1) in view of Varshney et al. (US 20200118342 A1) and Kamaraju et al. (US 20220334789 A1) as of claims 1 and 6, further in view of Morita et al. (US 20080291219 A1).

49	Regarding claim 5, Gruteser in view of Varshney teaches the AR streaming device of claim 1 (see claim 1 rejection above), and although the prior art may teach wherein the processor is configured to perform blending of a region, except a content region, of the decoded pixel of the segmentation rendering AR video to perform rendering (Varshney; [0103] reciting “FIG. 14 illustrates a blending of multiple camera views, according to an example embodiment. For example, as illustrated in FIG. 14, camera view textures of the three nearest cameras may be blended using barycentric weights. The light-gray shaded region in the majority of the image illustrate blending of all three camera textures, the white center shaded regions of the image represents blending of two camera textures, and the dark-gray shaded region that outlines the image represents one camera texture. Further, the white border regions in FIG. 11 represent uniform weight blending of the remaining valid camera views.”; [0104] reciting “In an example embodiment, multi-layer re-projection rendering may be fused using manual depth testing (with threshold depth region), and blending using barycentric weights and uniform weights.”), prior art from Morita can teach this limitation further.

50	Morita further teaches wherein the processor is configured to perform blending of a region, except a content region, of the decoded pixel of the segmentation rendering AR video to perform rendering ([0025] reciting “In a preferred embodiment, the composition unit composites the physical space image and the virtual space image by alpha blending.”; [0071] reciting “In FIG. 8, virtual space images 813 and 814 are rendered to have a black background. By compositing an image 605 of a physical space image onto the virtual space image 814 using alpha blending processing such as addition or the like, a translucent effect can be obtained. As a result, a translucent-processed composite image 815 can be obtained.”).

51	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney and Kamaraju) to incorporate the teachings of Morita to provide a clearer method of blending a segmented region from a video like a physical space, similarly to the AR videos taught by Gruteser in view of Varshney and Kamaraju. Doing so would allow a translucent-processed composite image to be obtained as stated by Morita ([0071] recited).

52	Regarding claim 9, Gruteser in view of Varshney and Kamaraju teaches the method of claim 6 (see claim 6 rejection above), and although it may teach wherein decoding and blending the segmentation rendering AR video performs blending of a region, except a content region, of the decoded pixel of the segmentation rendering AR video to perform rendering (Varshney; [0103] reciting “FIG. 14 illustrates a blending of multiple camera views, according to an example embodiment. For example, as illustrated in FIG. 14, camera view textures of the three nearest cameras may be blended using barycentric weights. The light-gray shaded region in the majority of the image illustrate blending of all three camera textures, the white center shaded regions of the image represents blending of two camera textures, and the dark-gray shaded region that outlines the image represents one camera texture. Further, the white border regions in FIG. 11 represent uniform weight blending of the remaining valid camera views.”; [0104] reciting “In an example embodiment, multi-layer re-projection rendering may be fused using manual depth testing (with threshold depth region), and blending using barycentric weights and uniform weights.”), prior art from Morita can teach this limitation further.  

53	Morita further teaches wherein decoding and blending the segmentation rendering AR video performs blending of a region, except a content region, of the decoded pixel of the segmentation rendering AR video to perform rendering ([0025] reciting “In a preferred embodiment, the composition unit composites the physical space image and the virtual space image by alpha blending.”; [0071] reciting “In FIG. 8, virtual space images 813 and 814 are rendered to have a black background. By compositing an image 605 of a physical space image onto the virtual space image 814 using alpha blending processing such as addition or the like, a translucent effect can be obtained. As a result, a translucent-processed composite image 815 can be obtained.”).

54	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney and Kamaraju) to incorporate the teachings of Morita to provide a clearer method of blending a segmented region from a video like a physical space, similarly to the AR videos taught by Gruteser in view of Varshney and Kamaraju. Doing so would allow a translucent-processed composite image to be obtained as stated by Morita ([0071] recited).

55	Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gruteser et al. (US 20210110191 A1) in view of Varshney et al. (US 20200118342 A1) and Kamaraju et al. (US 20220334789 A1) as of claim 1, further in view of Elliott et al. (US 20180322669 A1).

56	Regarding claim 12, Gruteser in view of Varshney and Kamaraju teaches the AR streaming device of claim 1 (see claim 1 rejection above), but does not explicitly teach wherein the processor is configured to convert the decoded segmentation rendering AR video into pixel data and supply the pixel data to a shader of a graphics processing unit for performing blending.

57	Elliott teaches wherein the processor is configured to convert the decoded segmentation rendering AR video into pixel data and supply the pixel data to a shader of a graphics processing unit for performing blending ([0021] reciting “As used herein, the term virtual reality (VR) relates to any at least partially virtual environment, and may include mixed reality (MR) (e.g., combining of at least two virtual environments) and augmented reality (AR) (e.g., combining of a real world environment with at least one virtual environment).”; [0022] reciting “The devices and methods may perform virtual reality image compositing in one step by blending together various image corrections at the same time to generate a composite image. The devices and methods may correlate final pixel positions in the final virtual environment of surfaces to original pixel positions in the original virtual environments of the surfaces, and may perform one time sampling and image correction (e.g., lens distortion, late latching of head position) of the original source pixels to generate the composite image in the final virtual environment.”; [0057] reciting “That is, compositor 28 may perform n position calculations (where n is a positive number) along with at least one distortion calculation in one step. Each position in the virtual space corresponding to a real display pixel may be worked back into various spaces using compositor 28. Example spaces include, but are not limited to, a physical display space, an undistorted virtual environment space, a distorted virtual environment space, a head position 1 space, a head position 2 space, and a head position current space. For example, the n position calculations performed by compositor 28 are the pixel positions in the composited image 611, while the single distortion calculation includes lens distortion correction and transformation from the coordinates of the virtual environment of the composited image 611 to the coordinates of the mixed virtual world 614 of composite image 46… In an implementation, compositor 28 may be a single shader and the space conversion calculations and inverse distortion calculations may occur in the single shader without the need of storing images encoded in those intermediate spaces, and subsequently, needing to access and/or sample those surfaces.”; [0058] reciting “By knowing where to sample from in the original images and accounting for the latest location of the head position, compositor 28 may perform one step compositing and lens distortion correction of virtual reality image frames from different sources operates in a quick and resource efficient manner to generate a composite virtual environment frame.”).

58	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Gruteser in view of Varshney and Kamaraju) to incorporate the teachings of Elliott to provide a method that incorporate a segmented-like frame data into pixel data to a shader for blending, utilizing the AR frames that are taught by Gruteser in view of Varshney and Kamaraju. Doing so would allow different sources to operate in a quick and resource efficient manner to generate a composite virtual environment frame as stated by Elliott ([0058] recited).

Conclusion
59	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

60	Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNNY TRAN LE whose telephone number is (571)272-5680. The examiner can normally be reached Mon-Thu: 7:30am-5pm; First Fridays Off; Second Fridays: 7:30am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JOHNNY T LE/            Examiner, Art Unit 2614                                                                                                                                                                                            
/KENT W CHANG/            Supervisory Patent Examiner, Art Unit 2614
Read full office action
AUGMENTED REALITY STREAMING DEVICE, METHOD, AND SYSTEM INTEROPERATING WITH EDGE SERVER

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

AUGMENTED REALITY STREAMING DEVICE, METHOD, AND SYSTEM INTEROPERATING WITH EDGE SERVER

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email