DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This Office Action is in response to Applicant’s amendment filed 02/23/2026 which has
been entered and made of record. Claims 1, 3, 12, and 14-15 have been amended. No
claim has been newly added. Claim 11 has been cancelled. Claims 1-10 and 12-17 are pending in the application. Applicant’s amendments to the Claims have overcome each and every objection
previously set forth in the Non-Final Office Action mailed November 28th, 2025.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-10 and 12-17 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument (due to applicant’s arguments directed to newly amend limitation(s) which is addressed by new prior art presented in this Office Action).
Claim Objections
Claims 14-15 objected to because of the following informalities: typo: “the distance” in fourth to last line of claim 14 and in the last line of claim 15 should read “a distance”. Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-3, 10, and 12-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Suzuki et al. (U.S. Patent No. 6,573,912), hereinafter referenced as Suzuki, in view of Yerli (U.S. Patent Application Publication No. 2018/0288393), hereinafter referenced as Yerli, Islamov et al. (U.S. Patent Application Publication No. 2022/0207848), hereinafter referenced as Islamov, Pal et al. (U.S. Patent Application Publication No. 2014/0139746), hereinafter referenced as Pal, and Uchiyama et al. (U.S. Patent Application Publication No. 2017/0262993), hereinafter referenced as Uchiyama.
Regarding claim 1, Suzuki teaches a computer-implemented method of enabling a client device to render a three- dimensional [3D] scene comprising one or more objects, comprising: (col. 2, lines 30-33 teach "Video-based modeling and rendering methods explicitly create three-dimension model structures and use real video images as models of scene appearance"); - at a server system, streaming a video-based representation of an object as one or more video streams to the client device (col. 6, lines 39-42 teach "system includes several video cameras trained on the three-dimensional objects and a computer that calculates the voxel-representation of the solid: The server broadcasts each camera view and voxel information"); server broadcasting shows the streaming, computer is client device and video cameras trained on 3D objects shows video-based representation of object; - at the client device, rendering the scene from a viewing position of the client device within the scene to obtain a rendered view of the scene, (col. 6, lines 43-46 teach "A user selects the viewing perspective desired, and the computer at the user's site receives the nearest camera views to the requested perspective and voxel information for the interpolation and renders the selected view."); computer at user's site is client device and view with object information being rendered shows rendering of the scene from a viewing position within the scene; wherein the rendering of the scene comprises placing the video-based representation of the object at an object position within the scene (col. 10, lines 38-39 teach "cameras 404-406. Each captures real views A-C", lines 45-47 teach "result is a voxel database 426 that is supplied to a network server 428 along with background-subtracted real-views (A-C) 416-418", lines 50-53 teach "The voxel calculation and only one background-subtracted real-view or intermediate view are supplied to each unique network client 432 on demand. For example, FIG. 4 represents the selection of real view A", and lines 58-61 teach "A novel view (N) 440 is interpolated from the available information by a renderor 442 in response to a perspective selection 444 provided by a user. Such novel view N depends more on real view A"); view "A" is of object only since it is background subtracted and view N depends on this (and is a novel view of the scene) meaning the object/view A would be in the scene of view N/perspective selected view at a position; wherein the method further comprises: - at the server system or at the client device, determining a relative position between the viewing position and the object position, (col. 10, lines 48-49 teach "Intermediate views N1, N2 are computed by the network server 428"); as visualized in figs. 2-4 these intermediate views are position between viewing position/camera 304-306 and object position/3D object; wherein the relative position is representable as a direction and a distance between the viewing position of the client device and the object position (figs. 2-4 teach N1 and N2 being a vector between camera/viewing position and 3D object); this shows relative position represented as direction and distance since it's a vector and is between the viewing/camera position and object position, also, this vector definition having direction and distance is consistent with applicant's disclosure page 10, lines 1-6 mentioning "The relative position between the object position and the viewing position may be represented by a combination of distance and direction, which may in the following also be referred to as 'relative distance' and 'relative direction'. For example, if the relative position is expressed as a vector, the magnitude of the vector may represent the distance while the orientation of the vector may represent the direction."; wherein each of the videos shows the object from a different viewing angle (col. 6, lines 39-40 teaches "several video cameras trained on the three-dimensional objects" and lines 42-43 teach "user selects the viewing perspective desired"); viewing perspective/angle shows different viewing angles (also visualized in fig. 3) and several cameras for such shows each of the videos would show the object from the different angles, also this is part of streaming thus would be for server system; wherein the set comprises at least three viewing angles, (col. 6, lines 63-64 teach "the two or three video perspectives supplied"); this alongside the 3 cameras shown in figs. 2-4 represent the set of video comprises at least three viewing angles; and wherein the limited set of viewing angles is selected based on the relative position such that the viewing angle which geometrically corresponds to the direction to the object is included within the interval (col. 7, lines 51-60 teach "studio 102 is populated with many cameras set at different viewpoints around the object, e.g., front, left, right; top, and back. All these real viewpoints are represented by a group of cameras 106-108 and their corresponding perspective views 110-112. In practical applications, at least two cameras and perspectives will be needed. Any of a number of virtual perspectives that are not actually populated with a camera can be computed by interpolation of the real views. A pair of virtual perspectives 114 and 116 represent these novel views"); this corresponds to direction to object since the viewpoints are around the object, the real viewpoints and number of virtual perspectives would be limited set of viewing angles, and are based on relative position such as direction and distance between viewing position and object (since derived from such); and - at the client device, when changing the viewing position and the relative position, selecting a viewing angle from the limited set of viewing angles that corresponds to the changed relative position of the client device, (abstract teaches "A user selects the viewing perspective desired, and the computer at the user's site receives the nearest camera views to the requested perspective and voxel information for the interpolation and renders the selected view."); desired viewing perspective shows viewing position and relative position changed, and nearest camera view shows the viewing angle from limited set of viewing angles that corresponds to the changed relative position being selected.
However, Suzuki fails to explicitly teach the viewing position being viewing position of the client device; relative position of the client device (although, Suzuki’s mention of user selecting perspective as cited above could mean that the perspective coincides with client device, especially when viewed alongside the combination of references below).
However, Yerli explicitly teaches viewing position of the client device (Yerli, paragraph 8 teaches “generate the output stream in order to reflect a current viewing position associated with the output device”); this shows the viewing position above (that scene is rendered from and that distance is taken from) would be that of the client device; relative position of the client device (Yerli, paragraph 8 teaches, “the viewing position may correspond to a device position… viewing position may also be determined based on a position of the user relative to the device.”); this shows that viewing angle selected from above that correspond to relative position would be the relative position of the client device. Yerli is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of accounting for position of client device for rendering views. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Suzuki's invention with the positioning of client device for rendering techniques of Yerli for an improved immersive experience of captured content by taking into account visual processing principles of humans (Yerli, paragraph 68). This would be done by the position of client device being used in views leading to the more immersive experience.
However, the combination of Suzuki and Yerli fails to teach - at the server system, generating the one or more video streams wherein each viewing angle provides the angle from which a view to the object is provided, to show the object from a limited set of viewing angles which is limited to a set of angles within an interval, wherein the interval is within a range of possible angles from which the object can be rendered,
and placing the video-based representation of the object in accordance with said selected viewing angle in the scene.
However, Islamov teaches - at the server system, generating the one or more video streams wherein each viewing angle provides the angle from which a view to the object is provided, (Islamov, paragraph 55 teaches "time sequence of video data showing relative rotation of the object 1 relative to the camera 4 through a full rotation of 360° of a range of angles of view"); this shows each viewing angle provides angle from which object is viewed (relative rotation of object); to show the object from a limited set of viewing angles which is limited to a set of angles within an interval, (Islamov, paragraph 55 teaches "The object 1 is rotated relative to the camera 4 during the recording of the video image, so that both the raw video data and the first set of video data comprise a time sequence of video data showing relative rotation of the object 1 relative to the camera 4 through a full rotation of 360° of a range of angles of view." and paragraph 107 teaches "In the embodiment described above the time sequence of video data showing relative rotation of the object 1 relative to the camera 4 extends through a full 360° rotation. This is not essential. In other examples...different range of angles. It will be understood that if the time sequence of video data corresponds to a range of angles of less than 360° there will be a corresponding limitation in the angles of view from which the object 1 can be shown."); this would be done by server of Suzuki since it's part of broadcasting/streaming the video and this object shown from limited set of angles within an interval here is consistent with applicant's disclosure page 10, lines 13-16 mentioning "The set of viewing angles may thus be
limited to a sub-range of [0, 360], or to a sub-range of any other range of viewing angles from which the object could be rendered. This sub-range may here and elsewhere also be referred to as an 'interval' within the larger range of possible angles."; wherein the interval is within a range of possible angles from which the object can be rendered, (Islamov, paragraph 14 teaches "capturing a first set of video image data of a region of record including an object, the first set of video image data comprising a time sequence showing rotation of the object at a range of angles of view"); object at range of angles of view shows interval is within a range of possible angles from which the object can be rendered;
and placing the video-based representation of the object in accordance with said selected viewing angle in the scene (Islamov, paragraph 101 teaches "video data showing relative rotation of the object 1 relative to the camera 4 so that the displayed frame of the video model shows the object 1 as viewed by the camera 4 at a viewing angle corresponding to the current angle of view"); rotation of object relative to camera showing object at current angle of view shows the object[video-based representation of object from Suzuki] being in accordance with the viewing angle of scene. Islamov is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of video data with objects having specific viewing angles. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki and Yerli's invention with the viewing angles techniques of Islamov to require less computing resources and improve the quality of the model produced (Islamov, paragraph 74). This would be due to the specific processing techniques and angles used.
However, the combination of Suzuki, Yerli and Islamov fails to teach wherein generating the one or more video streams comprises spatially multiplexing a set of videos.
However, Pal teaches wherein generating the one or more video streams comprises spatially multiplexing a set of videos, (Pal, claim 8 teaches “spatial multiplexer to multiplex a first set of video frames of a first set of analog videos to form a first array multiplexed video frames; a spatial multiplexer to multiplex second set of video frames”, and fig. 2 and paragraph 52 teaches “compressing form the 4 CIF videos 203 and 204 is broadcasted by first set of four and second set of four broadcasting channels respectively. The spatial multiplexing of four videos has been discussed for describing the best mode of the present invention in accordance with an exemplary embodiment. However, the invention is enabled to multiplex any number of `N` videos, where N is the number of videos”); this shows the video set from the combination above can be multiplexed to generate stream(s). Pal is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of spatially multiplexing a set of videos. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki, Yerli and Islamov with the spatially multiplexing techniques of Pal to ensure the invention enables efficient usage of the available bandwidth in a smarter way by multiplexing (Pal, paragraph 65). This would increase the desirability of users using the invention due to increased efficiency.
However, the combination of Suzuki, Yerli, Islamov and Pal fails to teach and a spacing between the viewing angles within the interval is selected based on the distance.
However, Uchiyama teaches and a spacing between the viewing angles within the interval is selected based on the distance (Uchiyama, paragraph 23 teaches "Changing the focal length changes a range of a space (an angle of view) that is an object of which distance data is taken"); this shows focal length/distance (distance from the above combination when viewed in combination) impacting the selection of view angle in interval by changing the spacing of view angle. Uchiyama is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of distance impacting angle of view. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki, Yerli, Islamov and Pal with the spacing of viewing angles techniques of Uchiyama to correct an angle of view and a position (Uchiyama, paragraph 23). This would be done by selecting and updating the viewing angle in accordance to the focal length/distance
Regarding claim 2, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama teaches wherein the limited set of viewing angles is further selected based on the distance (Suzuki, abstract teaches "user selects the viewing perspective desired, and the computer at the user's site receives the nearest camera views."); this shows viewing angles/perspectives selected from nearest (based on distance) camera views.
Regarding claim 3, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama teaches wherein at least one of: a width of the interval, and a number of viewing angles within the interval, is selected based on the distance (Islamov, paragraph 107 teaches "It will be understood that if the time sequence of video data corresponds to a range of angles of less than 360° there will be a corresponding limitation in the angles of view from which the object 1 can be shown."); one of ordinary skill in the art would understand that you would have a number of viewing angles in an interval, this shows width of interval being less than 360 and these both are selected based on distance because happen in a step after such.
Regarding claim 10, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama teaches further comprising, at the client device: - rendering the scene within a viewport, wherein the viewport is defined by a viewing direction and/or a field of view (Islamov, figs. 3 reference 7, figs. 5-6, and 9 show viewport, paragraph 38 teaches "Images are displayed on the display device together with a live camera view to create the illusion that the subject of the video (the model) are present in the field of view of the camera in real time" and paragraph 50 teaches "object 1 can move, but should stay within the region of record 7 defined by the field of view of the camera 4 and in front of the background 2 and floor 3 from the point of view of the camera 4"); this shows the scene rendered in viewport which has viewing direction and field of view as described;
- providing metadata to the server system, wherein the metadata is indicative of the viewing direction and/or the field of view (Islamov, paragraph 55 teaches "The object 1 is rotated relative to the camera 4 during the recording of the video image, so that both the raw video data and the first set of video data comprise a time sequence of video data showing relative rotation of the object 1 relative to the camera 4 through a full rotation of 360° of a range of angles of view."); time sequence in data alongside object rotated relative to camera both show metadata provided to server (because part of streaming process which server in Suzuki handles) and rotation relative to camera shows orientation which includes viewing direction and directional information;
and at the server system:
- determining a visibility of the object at the client device based on the metadata and controlling the streaming of the one or more video streams based on the visibility (Islamov, paragraph 89 teaches "the display of the model may be based upon a location of a trigger object visible in the field of view of a camera of the display device which generated the background for the AR display of the model"; since this is part of streaming process which server in Suzuki handles, this would be handled by server system of Suzuki, this mentions display of model (streaming) being controlled (based on trigger) from visibility of an object (meaning object visibility would have to be determined first), the metadata above orients object leading to impacted visibility (thus based on such) and the figures mentioned above show the client device which would display the object which has its visibility tested. The same motivations used in claim 1 apply here in claim 10.
Regarding claim 12, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama teaches wherein generating the one or more video streams comprises - using a multi-view coding technique to encode the set of videos (Suzuki, col. 5, lines 14-16 teach "multi-camera video, each video could be encoded using MPEG algorithms").
Regarding claim 13, the non-transitory computer-readable medium claim 13 recites similar limitations as method claim 1, and thus is rejected under similar rationale. In addition, Islamov, paragraph 19 teaches "methods described herein may be performed by software in machine readable form on a tangible storage medium e.g. in the form of a computer program comprising computer program code means adapted to perform all the steps of any of the methods described herein when the program is run on a computer and where the computer program may be embodied on a computer readable medium. Examples of tangible (or non-transitory) storage media include disks, thumb drives, memory cards etc. and do not include propagated signals. The software can be suitable for execution on a parallel processor or a serial processor"; this shows non-transitory storage/medium with program/instructions, executed by processor to perform the method. The same motivations used in claim 1 apply here in claim 13.
Regarding claim 14, Suzuki teaches a client device configured to render a three-dimensional [3D] scene comprising one or more objects, (col. 2, lines 30-33 teach "Video-based modeling and rendering methods explicitly create three-dimension model structures and use real video images as models of scene appearance"); comprising:- a network interface to a network (col. 9, line 22 teaches "signal 144 is transmitted by a network interface controller" and fig. 2 shows network server 228 and network client 232); - a processor subsystem configured to:- from a server system and via the network, receive one or more video streams comprising a video-based representation of an object (fig. 1, reference 132 teaches rendering processor and col. 6, lines 39-42 teach "system includes several video cameras trained on the three-dimensional objects and a computer that calculates the voxel-representation of the solid: The server broadcasts each camera view and voxel information"); server broadcasting shows the streaming, computer is processor subsystem and video cameras trained on 3D objects shows video-based representation of object; - render the scene from a viewing position, within the scene to obtain a rendered view of the scene, (Suzuki, col. 6, lines 43-46 teach "A user selects the viewing perspective desired, and the computer at the user's site receives the nearest camera views to the requested perspective and voxel information for the interpolation and renders the selected view."); view with object information being rendered shows rendering of the scene from a viewing position within the scene; wherein the rendering of the scene comprises placing the video-based representation of the object at an object position within the scene (col. 10, lines 38-39 teach "cameras 404-406. Each captures real views A-C", lines 45-47 teach "result is a voxel database 426 that is supplied to a network server 428 along with background-subtracted real-views (A-C) 416-418", lines 50-53 teach "The voxel calculation and only one background-subtracted real-view or intermediate view are supplied to each unique network client 432 on demand. For example, FIG. 4 represents the selection of real view A", and lines 58-61 teach "A novel view (N) 440 is interpolated from the available information by a renderor 442 in response to a perspective selection 444 provided by a user. Such novel view N depends more on real view A"); view "A" is of object only since it is background subtracted and view N depends on this (and is a novel view of the scene) meaning the object/view A would be in the scene of view N/perspective selected view at a position; wherein the processor subsystem is further configured to:- determine a relative position between the viewing position of the client device and the object position (Suzuki, col. 10, lines 48-49 teach "Intermediate views N1, N2 are computed by the network server 428"); as visualized in figs. 2-4 these intermediate views are position between viewing position/camera 304-306 and object position/3D object; wherein each of the videos shows the object from a different viewing angle (col. 6, lines 39-40 teaches "several video cameras trained on the three-dimensional objects" and lines 42-43 teach "user selects the viewing perspective desired"); viewing perspective/angle shows different viewing angles (also visualized in fig. 3) and several cameras for such shows each of the videos would show the object from the different angles, also this is part of streaming thus would be for server system; which is limited to a set of angles within an interval wherein the set comprises at least three viewing angles, (col. 6, lines 63-64 teach "the two or three video perspectives supplied"); this alongside the 3 cameras shown in figs. 2-4 represent the set of video comprises at least three viewing angles; and wherein the limited set of viewing angles is selected based on the relative position such that the viewing angle which geometrically corresponds to the direction to the object is included within the interval (col. 7, lines 51-60 teach "studio 102 is populated with many cameras set at different viewpoints around the object, e.g., front, left, right; top, and back. All these real viewpoints are represented by a group of cameras 106-108 and their corresponding perspective views 110-112. In practical applications, at least two cameras and perspectives will be needed. Any of a number of virtual perspectives that are not actually populated with a camera can be computed by interpolation of the real views. A pair of virtual perspectives 114 and 116 represent these novel views"); this corresponds to direction to object since the viewpoints are around the object, the real viewpoints and number of virtual perspectives would be limited set of viewing angles, and are based on relative position such as direction and distance between viewing position and object (since derived from such);
However, Suzuki fails to explicitly teach from which viewing position the client device renders the scene; the viewing position being viewing position of the client device; relative position of the client device (although, Suzuki’s mention of user selecting perspective as cited above could mean that the perspective coincides with client device, especially when viewed alongside the combination of references below).
However, Yerli explicitly teaches from which viewing position the client device renders the scene (Yerli, paragraph 8 teaches “generate the output stream in order to reflect a current viewing position associated with the output device”); viewing position of the client device (Yerli, paragraph 8 teaches “generate the output stream in order to reflect a current viewing position associated with the output device”); this shows the viewing position above (that scene is rendered from and that distance is taken from) would be that of the client device; relative position of the client device (Yerli, paragraph 8 teaches, “the viewing position may correspond to a device position…viewing position may also be determined based on a position of the user relative to the device.”); this shows that relative position from below would be the relative position of the client device. Yerli is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of accounting for position of client device for rendering views. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Suzuki's invention with the positioning of client device for rendering techniques of Yerli for an improved immersive experience of captured content by taking into account visual processing principles of humans (Yerli, paragraph 68). This would be done by the position of client device being used in views leading to the more immersive experience.
However, the combination of Suzuki and Yerli fails to teach - provide metadata indicative of the relative position of the client device to the server system to cause the server system to generate the one or more video streams wherein each viewing angle provides the angle from which a view to the object is provided, to show the object from a limited set of viewing angles, wherein the interval is within a range of possible angles from which the object can be rendered,
and - select a viewing angle from the limited set of viewing angles and place the video-based representation of the object in accordance with said selected viewing angle in the scene.
However, Islamov teaches - provide metadata indicative of the relative position of the client device to the server system to cause the server system to generate the one or more video streams to show the object from a limited set of viewing angles, (Islamov, paragraph 55 teaches "The object 1 is rotated relative to the camera 4 during the recording of the video image, so that both the raw video data and the first set of video data comprise a time sequence of video data showing relative rotation of the object 1 relative to the camera 4 through a full rotation of 360° of a range of angles of view." and paragraph 107 teaches "In the embodiment described above the time sequence of video data showing relative rotation of the object 1 relative to the camera 4 extends through a full 360° rotation. This is not essential. In other examples...different range of angles. It will be understood that if the time sequence of video data corresponds to a range of angles of less than 360° there will be a corresponding limitation in the angles of view from which the object 1 can be shown."); time sequence in data alongside object rotated relative to camera both show metadata indicating relative position and this would be done by server of Suzuki since it's part of broadcasting/streaming the video and this object shown from limited set of angles here is consistent with applicant's disclosure page 10, lines 13-15 mentioning "The set of viewing angles may thus be limited to a sub-range of [0, 360], or to a sub-range of any other range of viewing angles from which the object could be rendered." wherein each viewing angle provides the angle from which a view to the object is provided, (Islamov, paragraph 55 teaches "time sequence of video data showing relative rotation of the object 1 relative to the camera 4 through a full rotation of 360° of a range of angles of view"); this shows each viewing angle provides angle from which object is viewed (relative rotation of object); wherein the interval is within a range of possible angles from which the object can be rendered, (Islamov, paragraph 14 teaches "capturing a first set of video image data of a region of record including an object, the first set of video image data comprising a time sequence showing rotation of the object at a range of angles of view"); object at range of angles of view shows interval is within a range of possible angles from which the object can be rendered;
and - select a viewing angle from the limited set of viewing angles and place the video-based representation of the object in accordance with said selected viewing angle in the scene (Islamov, paragraph 101 teaches "video data showing relative rotation of the object 1 relative to the camera 4 so that the displayed frame of the video model shows the object 1 as viewed by the camera 4 at a viewing angle corresponding to the current angle of view"); rotation of object relative to camera showing object at current angle of view shows the object[video-based representation of object from Suzuki] being in accordance with the viewing angle of scene and this would be from user selecting the viewing perspective from Suzuki abstract. The same motivations used in claim 1 apply here in claim 14.
However, the combination of Suzuki, Yerli and Islamov fails to teach wherein generating the one or more video streams comprises spatially multiplexing a set of videos.
However, Pal teaches wherein generating the one or more video streams comprises spatially multiplexing a set of videos, (Pal, claim 8 teaches “spatial multiplexer to multiplex a first set of video frames of a first set of analog videos to form a first array multiplexed video frames; a spatial multiplexer to multiplex second set of video frames”, and fig. 2 and paragraph 52 teaches “compressing form the 4 CIF videos 203 and 204 is broadcasted by first set of four and second set of four broadcasting channels respectively. The spatial multiplexing of four videos has been discussed for describing the best mode of the present invention in accordance with an exemplary embodiment. However, the invention is enabled to multiplex any number of `N` videos, where N is the number of videos”); this shows the video set from the combination above can be multiplexed to generate stream(s). Pal is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of spatially multiplexing a set of videos. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki, Yerli and Islamov with the spatially multiplexing techniques of Pal to ensure the invention enables efficient usage of the available bandwidth in a smarter way by multiplexing (Pal, paragraph 65). This would increase the desirability of users using the invention due to increased efficiency.
However, the combination of Suzuki, Yerli, Islamov and Pal fails to teach and a spacing between the viewing angles within the interval is selected based on the distance.
However, Uchiyama teaches and a spacing between the viewing angles within the interval is selected based on the distance (Uchiyama, paragraph 23 teaches "Changing the focal length changes a range of a space (an angle of view) that is an object of which distance data is taken"); this shows focal length/distance (distance from the above combination when viewed in combination) impacting the selection of view angle in interval by changing the spacing of view angle. Uchiyama is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of distance impacting angle of view. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki, Yerli, Islamov and Pal with the spacing of viewing angles techniques of Uchiyama to correct an angle of view and a position (Uchiyama, paragraph 23). This would be done by selecting and updating the viewing angle in accordance to the focal length/distance.
Regarding claim 15, Suzuki teaches a server system for streaming a video-based representation of an object as one or more video streams to a client device, (col. 6, lines 39-42 teach "system includes several video cameras trained on the three-dimensional objects and a computer that calculates the voxel-representation of the solid: The server broadcasts each camera view and voxel information"); server broadcasting shows the streaming, computer is client device and video cameras trained on 3D objects shows video-based representation of object; wherein the client device is configured to render the object as part of a three-dimensional [3D] scene, comprising:- a network interface to a network (col. 6, lines 43-46 teach "A user selects the viewing perspective desired, and the computer at the user's site receives the nearest camera views to the requested perspective and voxel information for the interpolation and renders the selected view.", col. 9, line 22 teaches "signal 144 is transmitted by a network interface controller" and fig. 2 shows network server 228 and network client 232); computer at user's site is client device and view with object information being rendered shows rendering of the scene from a viewing position within the scene; - a processor subsystem configured to: - determine a relative position between a viewing position, from which viewing position the client device renders the scene, and an object position, wherein the video-based representation of the object is placed at the object position within the scene (fig. 1, reference 132 teaches rendering processor and col. 10, lines 48-49 teach "Intermediate views N1, N2 are computed by the network server 428"); as visualized in figs. 2-4 these intermediate views are position between viewing position/camera 304-306 and object position/3D object (which would be video-based representation of the object when view "A" is of object only and is background subtracted as explained in col. 10, lines 45-47, 50-53, and 58-61 of Suzuki and also described in claim 1 above), also this is at object position since user can choose anywhere to place it; wherein each of the videos shows the object from a different viewing angle (col. 6, lines 39-40 teaches "several video cameras trained on the three-dimensional objects" and lines 42-43 teach "user selects the viewing perspective desired"); viewing perspective/angle shows different viewing angles (also visualized in fig. 3) and several cameras for such shows each of the videos would show the object from the different angles, also this is part of streaming thus would be for server system; which is limited to a set of angles within an interval wherein the set comprises at least three viewing angles, (col. 6, lines 63-64 teach "the two or three video perspectives supplied"); this alongside the 3 cameras shown in figs. 2-4 represent the set of video comprises at least three viewing angles; wherein the limited set of viewing angles is selected based on the relative position such that the viewing angle which geometrically corresponds to the direction to the object is included within the interval (col. 7, lines 51-60 teach "studio 102 is populated with many cameras set at different viewpoints around the object, e.g., front, left, right; top, and back. All these real viewpoints are represented by a group of cameras 106-108 and their corresponding perspective views 110-112. In practical applications, at least two cameras and perspectives will be needed. Any of a number of virtual perspectives that are not actually populated with a camera can be computed by interpolation of the real views. A pair of virtual perspectives 114 and 116 represent these novel views"); this corresponds to direction to object since the viewpoints are around the object, the real viewpoints and number of virtual perspectives would be limited set of viewing angles, and are based on relative position such as direction and distance between viewing position and object (since derived from such);
However, Suzuki fails to teach viewing position, from which viewing position the client device renders the scene
(although, Suzuki’s mention of user selecting perspective as cited above could mean that the perspective coincides with client device, especially when viewed alongside the combination of references below).
However, Yerli explicitly teaches viewing position, from which viewing position the client device renders the scene (Yerli, paragraph 8 teaches “generate the output stream in order to reflect a current viewing position associated with the output device”); this shows the viewing position above (that scene is rendered from) would be that of the client device; Yerli is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of accounting for position of client device for rendering views. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify Suzuki's invention with the positioning of client device for rendering techniques of Yerli for an improved immersive experience of captured content by taking into account visual processing principles of humans (Yerli, paragraph 68). This would be done by the position of client device being used in views leading to the more immersive experience.
However, the combination of Suzuki and Yerli fails to teach
and - generate the one or more video streams wherein each viewing angle provides the angle from which a view to the object is provided, to show the object from a limited set of viewing angles, wherein the interval is within a range of possible angles from which the object can be rendered,
However, Islamov teaches and - generate the one or more video streams wherein each viewing angle provides the angle from which a view to the object is provided, (Islamov, paragraph 55 teaches "time sequence of video data showing relative rotation of the object 1 relative to the camera 4 through a full rotation of 360° of a range of angles of view"); this shows each viewing angle provides angle from which object is viewed (relative rotation of object); to show the object from a limited set of viewing angles, (Islamov, paragraph 55 teaches "The object 1 is rotated relative to the camera 4 during the recording of the video image, so that both the raw video data and the first set of video data comprise a time sequence of video data showing relative rotation of the object 1 relative to the camera 4 through a full rotation of 360° of a range of angles of view." and paragraph 107 teaches "In the embodiment described above the time sequence of video data showing relative rotation of the object 1 relative to the camera 4 extends through a full 360° rotation. This is not essential. In other examples...different range of angles. It will be understood that if the time sequence of video data corresponds to a range of angles of less than 360° there will be a corresponding limitation in the angles of view from which the object 1 can be shown."); this object shown from limited set of angles here is consistent with applicant's disclosure page 10, lines 13-15 mentioning "The set of viewing angles may thus be
limited to a sub-range of [0, 360], or to a sub-range of any other range of viewing angles from which the object could be rendered."; wherein the interval is within a range of possible angles from which the object can be rendered, (Islamov, paragraph 14 teaches "capturing a first set of video image data of a region of record including an object, the first set of video image data comprising a time sequence showing rotation of the object at a range of angles of view"); object at range of angles of view shows interval is within a range of possible angles from which the object can be rendered;. The same motivations used in claim 1 apply here in claim 15. However, the combination of Suzuki, Yerli and Islamov fails to teach wherein generating the one or more video streams comprises spatially multiplexing a set of videos.
However, Pal teaches wherein generating the one or more video streams comprises spatially multiplexing a set of videos, (Pal, claim 8 teaches “spatial multiplexer to multiplex a first set of video frames of a first set of analog videos to form a first array multiplexed video frames; a spatial multiplexer to multiplex second set of video frames”, and fig. 2 and paragraph 52 teaches “compressing form the 4 CIF videos 203 and 204 is broadcasted by first set of four and second set of four broadcasting channels respectively. The spatial multiplexing of four videos has been discussed for describing the best mode of the present invention in accordance with an exemplary embodiment. However, the invention is enabled to multiplex any number of `N` videos, where N is the number of videos”); this shows the video set from the combination above can be multiplexed to generate stream(s). Pal is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of spatially multiplexing a set of videos. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki, Yerli and Islamov with the spatially multiplexing techniques of Pal to ensure the invention enables efficient usage of the available bandwidth in a smarter way by multiplexing (Pal, paragraph 65). This would increase the desirability of users using the invention due to increased efficiency.
However, the combination of Suzuki, Yerli, Islamov and Pal fails to teach and a spacing between the viewing angles within the interval is selected based on the distance.
However, Uchiyama teaches and a spacing between the viewing angles within the interval is selected based on the distance (Uchiyama, paragraph 23 teaches "Changing the focal length changes a range of a space (an angle of view) that is an object of which distance data is taken"); this shows focal length/distance (distance from the above combination when viewed in combination) impacting the selection of view angle in interval by changing the spacing of view angle. Uchiyama is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of distance impacting angle of view. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki, Yerli, Islamov and Pal with the spacing of viewing angles techniques of Uchiyama to correct an angle of view and a position (Uchiyama, paragraph 23). This would be done by selecting and updating the viewing angle in accordance to the focal length/distance.
Regarding claim 16, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama teaches wherein the processor subsystem is configured to receive metadata from the client device comprising at least one of: the viewing position, the object position and the relative position (Islamov, paragraph 55 teaches "The object 1 is rotated relative to the camera 4 during the recording of the video image,"); this shows the metadata mentioned above, having rotating object relative to camera, which means viewing/camera position is known as well as object position, also this would change the relative position computation of intermediate views N1 and N2 from Suzuki meaning it also comprises and affects the relative position from above since the computation depends on viewing and object position. The same motivations used in claim 15 apply here in claim 16.
Regarding claim 17, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama teaches wherein the processor subsystem is configured to generate one or more visual representations of the object to show the object at different viewing angles by at least one of: - rendering a 3D graphics-based object from the different viewing angles; - synthesizing the one or more visual representations of the object from at least one other visual representation of the object (Islamov, paragraph 93 teaches "user is able to view the object 1 from any desired angle on the display device 106 rather than being limited to viewing object 1 from an angle corresponding to the viewing angle of the camera"); this shows 3D object rendered from different view angles. The same motivations used in claim 15 apply here in claim 17.
Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama as applied to claim 1 above, and further in view of Nishiyama (U.S. Patent Application Publication No. 2019/0051035), hereinafter referenced as Nishiyama.
Regarding claim 4, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama fails to teach further comprising, at the server system, adjusting a spatial resolution, a temporal framerate, or another video quality parameter of the one or more video streams based on the distance.
However, Nishiyama teaches further comprising, at the server system, adjusting a spatial resolution, a temporal framerate, or another video quality parameter of the one or more video streams based on the distance (Nishiyama, paragraph 72 teaches "it is possible to cause the spatial resolution to change for each object in accordance with the distance to the object and the acceptable resolution. In such a case, it is possible to suppress a fluctuation of the resolution between frames for each object."); this shows adjusting spatial resolution and suppressing fluctuation in accordance/based on distance and this would happen at the server of Suzuki since part of the streaming process. Nishiyama is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of video quality and its improvement. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki and Islamov with the quality improvement techniques of Nishiyama to suppress a fluctuation (Nishiyama, paragraph 72). This would lead to better user experience.
Claim(s) 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama as applied to claim 1 above, and further in view of Hostyn et al. (U.S. Patent Application Publication No. 2008/0022007), hereinafter referenced as Hostyn.
Regarding claim 5, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama teaches wherein the limited set of angles is selected further based on the latency (Suzuki, col. 5, lines 1-2 and 4 teach "constructing the system several factors influence the overall design…and communication latency."); this shows since design of system is influenced by latency, the set of angles are selected based on such.
However, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama fails to teach further comprising estimating a latency associated with the streaming of the one or more video streams from the server system to the client device.
However, Hostyn teaches further comprising estimating a latency associated with the streaming of the one or more video streams from the server system to the client device, (Hostyn, paragraph 5 teaches "an estimate of the network latency, it is possible for the client device to estimate how much additional data of the audio/video stream of the first content will be present on the network"); this shows estimating latency associated with streaming and the streaming is done from server to client as explained in the combination above. Hostyn is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of estimating latency for streaming. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama with the estimating latency techniques of Hostyn to improve response time and marker data can be sent from the server to the client device to indicate the start of the second content (Hostyn, abstract). This means a faster system due to the effects of the estimated latency.
Claim(s) 6-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama as applied to claim 1 above, and further in view of Varshney et al. (U.S. Patent No. 11,087,549), hereinafter referenced as Varshney.
Regarding claim 6, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama teaches further comprising, at the client device, moving the viewing position within the scene over time, (Islamov, paragraph 53 teaches "In other examples the camera 4 could be rotated around the object 1"); this shows moving viewing/camera position in scene over time; wherein the limited set of angles is selected based on a prediction of a change in the relative position due to said movement of the viewing position (Islamov, paragraph 93 teaches "The apparent angle of view of the object 1 displayed on the display device 106 is based on the position of the display device 1 relative to the fixed virtual location at which the object 1 is apparently displayed"); this shows angle of view (which makes the limited set of angles) being based on position due to movement.
However, the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama fails to teach prediction of a change in the relative position.
However, Varshney teaches prediction of a change in the relative position (Varshney, col. 22, lines 8-9 teach "raw camera footage used to calibrate camera locations and estimate environment geometry." and col. 16, lines 14-16 teach "points from two different images (e.g., corners of a table) may be used to estimate camera position and orientation in a calibrated scene"); estimating positions and environment geometry here shows change in relative position from combination of Suzuki and Islamov would be predicted/estimated. Varshney is considered to be analogous art because it is reasonably pertinent to the problem faced by the inventor of estimating/predicting position in a virtual scene. Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Suzuki, Yerli, Islamov, Pal, and Uchiyama with the predicting position techniques of Varshney to improve upon these 360 camera based navigable virtual environments and establish a correspondence between the multi-viewpoint 360 cameras captured scene and a virtual 3D space containing virtual objects (Varshney, col. 22, lines 22-23 and lines 34-36). This results in enhanced experience.
Regarding claim 7, the combination of Suzuki, Yerli, Islamov, Pal, Uchiyama and Varshney teaches wherein the movement of the viewing position is planned to follow a path to a next viewing position in the scene, (Islamov, paragraph 53 teaches "In other examples the camera 4 could be rotated around the object 1" ); rotation means a circular path followed to next viewing/camera position; wherein the limited set of viewing angles is selected based on the next viewing position or an intermediate viewing position along the path to the next viewing position (Suzuzki, col. 10, lines 48-49 teach "Intermediate views N1, N2 are computed by the network server 428"); as visualized in figs. 2-4 these intermediate views are position between viewing position/camera 304-306 and object position/3D object, thus this shows intermediate view position to next view position acting as a viewing angle (means limited set of viewing angle is selected based on such). The same motivations used in claim 1 apply here in claim 7.
Regarding claim 8, the combination of Suzuki, Yerli, Islamov, Pal, Uchiyama and Varshney teaches further comprising: - at the server system, streaming a panoramic video to the client device to serve as a video-based representation of at least part of the scene (Varshney, abstract teaches "creating a blended virtual environment by combining the captured 360 video data and the audio data with the rendered virtual environment"); one of ordinary skill in the art would understand that 360 video is a type of panoramic video and the streaming is done by the server of Suzuki as aforementioned; - at the client device, rendering the panoramic video as a background to the video-based representation of the object (Varshney, abstract teaches "method may also include preprocessing and compressing the 360 video data and the audio data into a three-dimensional representation suitable for display. The method may further include rendering a virtual environment of the real-world environment"); Varshney fig. 21 shows the client device this would occur on and since this 360 is a virtual environment of the real-world environment it can be treated as a background to the background-subtracted object from Suzuki mentioned above. The same motivations used in claim 6 apply here in claim 8.
Regarding claim 9, the combination of Suzuki, Yerli, Islamov, Pal, Uchiyama and Varshney teaches wherein the panoramic video comprises presentation timestamps, wherein the method further comprises:- at the client device, providing a presentation timestamp to the server system during playout of the panoramic video (Varshney, col. 3, lines 57-63 teach "In one example embodiment, if it is desired that each video and audio recording is preserved in its entirety, synchronization may also happen by creating and storing a timestamp in the metadata for each recording. These timestamps may mark the progress in the recordings that contain synchronization points in the scene. These timestamps may then be read by the playback software"); this shows the 360/panoramic video having timestamps and those being provided during playout (handled by Suzuki server so would be provided to such); - at the server system, generating the one or more video streams to show the object at a temporal state which is determined based on the presentation timestamp (Varshney, col. 6, lines 57-61 teach "multiple 360 video cameras that may capture the same environment from different locations. As described herein, certain example embodiments may synchronize videos to allow users to view the same scene from multiple viewpoints at the same progress time"); scene at same progress time would mean the objects in there too at a temporal state based on same timestamp for synchronization and this would be done by server of Suzuki.
The same motivations used in claim 6 apply here in claim 9.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAUMAN U AHMAD whose telephone number is (703)756-5306. The examiner can normally be reached Monday - Friday 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571) 272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KEE M TUNG/Supervisory Patent Examiner, Art Unit 2611
/N.U.A./Examiner, Art Unit 2611