DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on September 22, 2023 was filed after the mailing date of the application on March 31, 2023. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 15-17 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wang (US011200700B2).
As per Claim 15, Wang teaches an apparatus comprising: network interface circuitry to access a plurality of video streams corresponding to a scene, respective ones of the video streams including respective pluralities of tiles representative of the scene from different viewpoints (video processor processes the video received from the cameras 102A-102N, such as stitching, the encoder encodes the video data, the decoding device may receive the video data through a broadcast network, col. 7, lines 3-11; due to current network bandwidth limitations, the 3D spherical VR content is first processed onto a 2D plane and then encapsulated in a number of tile-based and segmented files for delivery and playback, col. 7, lines 47-54; multiple viewpoints can be used when there are multiple cameras, col. 9, lines 56-57; viewport can change and is therefore not static, for example, as a user moves their head, then the system needs to fetch neighboring tiles to cover the content of what the user wants to view next, col. 8, lines 8-12); and tile selection circuitry to select at least one of the tiles for presentation by a device (it is based on user’s selection on a viewport that some of these variants of different tiles that, when put together, provide a coverage of the selected viewport, and are delivered to the receiver, and then decoded to construct and render the desired viewport, col. 7, lines 62-67).
As per Claim 16, Wang teaches wherein the tile selection circuitry is to select the at least one of the tiles based on user input (it is based on user’s selection on a viewport that some of these variants of different tiles that, when put together, provide a coverage of the selected viewport, and are delivered to the receiver, and then decoded to construct and render the desired viewport, col. 7, lines 62-67).
As per Claim 17, Wang teaches further including user detection circuitry to detect, based on image data, at least one of a position or an orientation of a user of the device, the tile selection circuitry to select the at least one of the tiles based on the at least one of the position or the orientation (for example, as a user moves their head, the system needs to fetch neighboring tiles to cover the content of what the user wants to view next, col. 8, lines 9-12).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-3 and 8-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Moezzi (US005850352A) and Wang (US011200700B2).
As per Claim 1, Moezzi teaches an apparatus comprising: memory (content of a dynamic three-dimensional model—which model is a three-dimensional video memory, col. 18, lines 6-7); instructions; and programmable circuitry to at least one of execute or instantiate the instructions (immersive video system includes a computer running a software program, col. 11, lines 58-59). Moezzi teaches portions of each of multiple video streams showing a single scene, each from a different spatial perspective, that are identified to be static by running comparison (col. 19, lines 16-20). Thus, a first video stream of the multiple video streams including a first video frame including a first portion representative of the scene from a first viewpoint (spatial perspective) and a second portion representative of the scene from a second viewpoint (spatial perspective), the second viewpoint (spatial perspective) different from the first viewpoint (spatial perspective); a second video stream of the multiple video streams including a second video frame including a third portion representative of the scene from a third viewpoint (spatial perspective) and a fourth portion representative of the scene from a fourth viewpoint (spatial perspective), the fourth viewpoint (spatial perspective) different from the third viewpoint (spatial perspective) (col. 19, lines 16-20). Thus, Moezzi teaches accessing a first video stream corresponding to a scene, the first video stream including a first video frame, the first video frame including a first portion representative of the scene from a first viewpoint and a second portion representative of the scene from a second viewpoint, the second viewpoint different from the first viewpoint; access a second video stream corresponding to the scene, the second video stream synchronized with the first video stream, the first video stream including a second video frame, the second video frame including a third portion representative of the scene from a third viewpoint and a fourth portion representative of the scene from a fourth viewpoint, the fourth viewpoint different form the third viewpoint (portions of each of multiple video streams showing a single scene, each from a different spatial perspective, that are identified to be static by running comparison, col. 19, lines 16-20; the underlaying task in video mosaicing is to create larger images from frames obtained from cameras, col. 7, lines 47-49; maintaining spatial-temporal coherence and consistency is integral to generation of such a HyperMosaic, in order to obtain 3D description, multiple perspectives that provide simultaneous coverage must therefore be used and their associated visual information integrated, another necessary feature would be to provide a viewpoint that may be selected, the immersive video system of the present invention caters to these needs, col. 34, lines 44-52; the method being directed to generating a spatial-temporally coherent and consistent three-dimensional video mosaic from multiple individual video streams arising from each of multiple video cameras each of which is imaging at least a part of the scene from a perspective that is at least in part different from other ones of the multiple video cameras, the method being called video hypermosaicing, col. 52, lines 49-56; synchronized video streams, col. 37, lines 26-27); and select at least one of the first portion, the second portion, the third portion, or the fourth portion for presentation by a device (another necessary feature would be to provide a viewpoint that may be selected, the immersive video system of the present invention caters to these needs, col. 34, lines 44-52).
However, Moezzi does not expressly teach that the portions are tiles. However, Wang teaches capturing video of a scene using a plurality of cameras (col. 2, line 8) and multiple viewpoints are used when there are multiple cameras (col. 9, lines 55-58). It is based on user’s selection on a viewport that some of these variants of different tiles that, when put together, provide a coverage of the selected viewport, are delivered to the receiver, and then decoded to construct and render the desired viewport (col. 7, lines 62-67). The techniques deliver the needed tiles to the client to cover what the user will view (col. 8, lines 3-5). Thus, this teaching of the tiles from Wang can be implemented into the device of Moezzi so that the portions are tiles.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Moezzi so that the portions are tiles because Wang suggests that it is efficient to divide into tiles (col. 7, lines 29-34).
As per Claim 2, Moezzi teaches wherein the programmable circuitry is to select the at least one of the first portion, the second portion, the third portion, or the fourth portion based on user input (portions of each of multiple video streams showing a single scene, each from a different spatial perspective, col. 19, lines 16-20; a user can select from multiple views, col. 42, lines 35-38).
However, Moezzi does not expressly teach that the portions are tiles. However, the teaching of the tiles from Wang can be implemented into the device of Moezzi so that the portions are tiles, as discussed in the rejection for Claim 1.
As per Claim 3, Moezzi does not expressly teach detecting, based on image data, at least one of a position or an orientation of a user of the device; and select the at least one of the first tile, the second tile, the third tile, or the fourth tile based on the at least one of the position or the orientation. However, Wang teaches wherein the programmable circuitry is to: detect, based on image data, at least one of a position or an orientation of a user of the device; and select the at least one of the first tile, the second tile, the third tile, or the fourth tile based on the at least one of the position or the orientation (for example, as a user moves their head, the system needs to fetch neighboring tiles to cover the content of what the user wants to view next, col. 8, lines 9-12).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Moezzi to include detecting, based on image data, at least one of a position or an orientation of a user of the device; and select the at least one of the first tile, the second tile, the third tile, or the fourth tile based on the at least one of the position or the orientation because Wang suggests that this is needed for immersive video, so that the simulated environment feels lifelike and interactive, so that the user sees the appropriate views as they move their head (col. 8, lines 9-12).
As per Claim 8, Claim 8 is similar in scope to Claim 1, except that Claim 8 is directed to at least one non-transitory computer readable medium comprising the instructions of Claim 1. Moezzi teaches the immersive video system includes a computer running a software program (col. 11, lines 58-59). Thus, it would have been obvious to one of ordinary skill in the art that there is a non-transitory computer readable medium that comprises the instructions (software program) in order for the computer to access the software program in order to run the software program (col. 11, lines 58-59). Thus, Claim 8 is rejected under the same rationale as Claim 1.
As per Claims 9-10, these claims are similar in scope to Claims 2-3 respectively, and therefore are rejected under the same rationale.
Claim(s) 4-6 and 11-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Moezzi (US005850352A) and Wang (US011200700B2) in view of Pettersson (US 20210058633A1).
As per Claim 4, Moezzi and Wang are relied upon for the teachings as discussed above relative to Claim 1.
However, Moezzi and Wang do not teach selecting one of the first tile or the second tile for presentation by the device; and in response to the selection, (a) access a third video stream different from the first video stream and the second video stream and (b) halt access to the second video stream. However, Pettersson teaches in many video services, including immersive video, partial video data may be extracted from the video. Having the video data grouped into chunks instead of scattered in different places speeds up the extraction. Dividing the picture into parts may be resolved using tiles [0081]. Pettersson teaches extracting a part of a picture from a picture with the tiles grouped with tile group IDs from the received bit stream. The decoder parses the tile structure syntax for the picture to obtain the group ID for each tile. Then, for each tile, the decoder determines to which tile group T1, T2 the tile belongs based on the tile group ID. The decoder then selects which tile groups T1, T2 to extract from the received bitstream, and extracts the tiles of the selected tile groups T1, T2 from the bit stream. The tile groups T1, T2 are scanned and extracted in the explicit order of the tile group IDs [0098]. Since it is extracting a tile [0081, 0098], the programmable circuitry is to: select one of the first tile or the second tile for presentation by the device. Thus, for example, the first tile in T1 (first video stream) is selected out of the tiles in T1 (first video stream) and T2 (second video stream). After the first tile in T1 (first video stream) is selected, then it moves on in the explicit order of the tile group IDs, and thus moves on to a third video stream T3 different from T1 (first video stream) and T2 (second video stream). Thus, in response to the selection of the first tile in T1 (first video stream) out of the tiles in T1 (first video stream) and T2 (second video stream), the tiles in T2 are not selected [0081, 0098], and thus it would have been obvious to one of ordinary skill in the art that access to T2 (second video stream) is halted. Thus, Pettersson teaches in response to the selection, (a) access a third video stream different from the first video stream and the second video stream and (b) halt access to the second video stream [0081, 0098].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Moezzi and Wang to include selecting one of the first tile or the second tile for presentation by the device; and in response to the selection, (a) access a third video stream different from the first video stream and the second video stream and (b) halt access to the second video stream because Pettersson suggests that this speeds up the extraction [0081].
As per Claim 5, Moezzi and Wang do not teach selecting the third video stream form a plurality of video streams based on a sequence of tile numbers assigned to tiles included in frames of the plurality of video streams. However, Pettersson teaches that the tiles of each tile group T1, T2 are scanned in a predefined order, are grouped and located contiguously to each other [0098]. Thus, it would have been obvious to one of ordinary skill in the art that this means that the next video stream is selected based on a sequence of tile numbers assigned to the tiles in order to scan the tiles that are grouped and located contiguously to each other in the predefined order. Thus, Pettersson teaches wherein the programmable circuitry is to select the third video stream from a plurality of video streams based on a sequence of tile numbers assigned to tiles included in frames of the plurality of video streams [0098], as discussed in the rejection for Claim 4. This would be obvious for the reasons given in the rejection for Claim 4.
21. As per Claim 6, Moezzi and Wang do not teach selecting the third video stream from a plurality of video streams based on a determination that the third video stream includes a third video frame with a fifth tile included in a neighborhood of the selected one of the first tile or the second tile, the neighborhood based on a sequence of tile numbers. However, since Pettersson teaches that the tiles of each tile group T1, T2, are scanned in a predefined order, and are grouped and located contiguously to each other, and the tile groups T1, T2 are also scanned and extracted in the explicit order of their tile group IDs [0098], it would have been obvious to one of ordinary skill in the art that this means that the programmable circuitry is to select the third video stream from a plurality of video streams based on a determination that the third video stream includes a third video frame with a fifth tile included in a neighborhood of the selected one of the first tile or the second tile, the neighborhood based on a sequence of tile numbers [0081, 0098], as discussed in the rejection for Claim 4. This would be obvious for the reasons given in the rejection for Claim 4.
22. As per Claims 11-13, these claims are similar in scope to Claims 4-6 respectively, and therefore are rejected under the same rationale.
23. Claim(s) 7 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Moezzi (US005850352A) and Wang (US011200700B2) in view of Parmar (US 20180052595A1).
24. As per Claim 7, Moezzi and Wang are relied upon for the teachings as discussed above relative to Claim 1. Moezzi teaches causing presentation of the at least one of the first portion, the second portion, the third portion, or the fourth portion (portions of each of multiple video streams showing a single scene, each from a different spatial perspective, col. 19, lines 16-20; another necessary feature would be to provide a viewpoint that may be selected, the immersive video system of the present invention caters to these needs, col. 34, lines 44-52).
However, Moezzi does not expressly teach that the portions are tiles. However, the teaching of the tiles from Wang can be implemented into the device of Moezzi so that the portions are tiles, as discussed in the rejection for Claim 1.
However, Moezzi and Wang do not teach accessing the first and second video streams in response to navigation of the device to a web page via a web browser; and causing presentation of the at least one of the first tile, the second tile, the third tile, or the fourth tile in the web browser. However, Parmar teaches wherein the programmable circuitry is to: access the first and second video streams in response to navigation of the device to a web page via a web browser; and cause presentation of the at least one of the first viewpoint, the second viewpoint, the third viewpoint, or the fourth viewpoint in the web browser (viewer of a recorded video item is provided with the ability to select to watch the video item from different angles, the video item is recorded from live events using a plurality of cameras and therefore comprises a plurality of video track recorded, the video tracks include data representing views spaced 360° around the video item such that if there are 8 video tracks available, eight cameras are angularly spaced and located with respect to the item being filmed, the viewer can select any of the view and corresponding video track from the recording media being played, the selection can be performed during viewing, [0006]; referring to Fig. 5E, upon receiving the selection of the viewing angle via the first GUI control element 510 or the second GUI control element 511, the rendering unit rotates the multimedia content 501 to the selected viewing angle, renders the multimedia content at the selected angle 501-2-1 in the preview window 509 on the display, thus, the same multimedia content is being rendered at different angles in the web browser and the preview window 509 simultaneously along with separate controls, [0103]; Fig. 5E). Since the combination of Moezzi and Wang teaches causing presentation of the at least one of the first tile, the second tile, the third tile, or the fourth tile, this teaching of the web browser from Parmar can be implemented in to the combination of Moezzi and Wang so that it causes presentation of the at least one of the first tile, the second tile, the third tile, or the fourth tile in the web browser.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Moezzi and Wang to include accessing the first and second video streams in response to navigation of the device to a web page via a web browser; and causing presentation of the at least one of the first tile, the second tile, the third tile, or the fourth tile in the web browser because Parmar suggests that this way, a user is able to access the web browser from anywhere that has an internet connection, and thus is able to watch the immersive video and control which angle to watch the view from at anywhere that has an internet connection [0042].
25. As per Claim 14, Claim 14 is similar in scope to Claim 7, and therefore is rejected under the same rationale.
26. Claim(s) 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang (US011200700B2) in view of Pettersson (US 20210058633A1).
27. As per Claim 18, Wang is relied upon for the teachings as discussed above relative to Claim 15.
However, Wang does not teach wherein the plurality of video streams is a first plurality of video streams, the network interface circuitry to, in response to the selection of the at least one of the tiles, (a) access a second plurality of video streams different from the first plurality of video streams and (b) halt access to the first plurality of video streams. However, Pettersson teaches in many video services, including immersive video, partial video data may be extracted from the video. Having the video data grouped into chunks instead of scattered in different places speeds up the extraction. Dividing the picture into parts may be resolved using tiles [0081]. Pettersson teaches extracting a part of a picture from a picture with the tiles grouped with tile group IDs from the received bit stream. The decoder parses the tile structure syntax for the picture to obtain the group ID for each tile. Then, for each tile, the decoder determines to which tile group T1, T2 the tile belongs based on the tile group ID. The decoder then selects which tile groups T1, T2 to extract from the received bitstream, and extracts the tiles of the selected tile groups T1, T2 from the bit stream. The tile groups T1, T2 are scanned and extracted in the explicit order of the tile group IDs [0098]. Since it is extracting a tile [0081, 0098], it is selecting at least one of the tiles. T1 and T2 are video streams in a first plurality of video streams. Thus, for example, a tile is selected out of the tiles in T3, which is a video stream in a second plurality of video streams different from the first plurality of video streams (T1, T2). After the tile in T3 is selected, it accesses T3, so it accesses the second plurality of video streams. Thus, in response to the selection of the tile in T3, which is a video stream in the second plurality of video streams, the tiles in T1 and T2 (first plurality of video streams) are not selected [0081, 0098], and thus it would have been obvious to one of ordinary skill in the art that access to T1 and T2 (first plurality of video streams) is halted. Thus, Pettersson teaches wherein the plurality of video streams is a first plurality of video streams (T1, T2), the network interface circuitry to, in response to the selection of the at least one of the tiles, (a) access a second plurality of video streams (video streams that include T3) different from the first plurality of video streams (T1, T2) and (b) halt access to the first plurality of video streams (T1, T2) [0081, 0098]. This would be obvious for the reasons given in the rejection for Claim 4.
28. As per Claim 19, Wang does not teach stream selection circuitry to select the second plurality of video streams from a third plurality of video streams based on a sequence of tile numbers assigned to tiles included in frames of the third plurality of video streams, the third plurality of video streams including the first plurality of video streams and the second plurality of video streams. However, Pettersson teaches that the tiles of each tile group T1, T2 are scanned in a predefined order, are grouped and located contiguously to each other [0098]. Thus, it would have been obvious to one of ordinary skill in the art that this means that the next video stream is selected based on a sequence of tile numbers assigned to the tiles in order to scan the tiles that are grouped and located contiguously to each other in the predefined order. Thus, Pettersson teaches further including stream selection circuitry to select the second plurality of video streams from a third plurality of video streams based on a sequence of tile numbers assigned to tiles included in frames of the third plurality of video streams, the third plurality of video streams including the first plurality of video streams and the second plurality of video streams [0098], as discussed in the rejection for Claim 18. This would be obvious for the reasons given in the rejection for Claim 4.
29. As per Claim 20, Wang does not teach wherein the stream selection circuitry is to select the second plurality of video streams based on a determination that the second plurality of video streams includes a first one of the frames with a first one of the tiles included in a neighborhood of the selected at least one of the tiles, the neighborhood based on the sequence of tile numbers. However, since Pettersson teaches that the tiles of each tile group T1, T2, are scanned in a predefined order, and are grouped and located contiguously to each other, and the tile groups T1, T2 are also scanned and extracted in the explicit order of their tile group IDs [0098], it would have been obvious to one of ordinary skill in the art that this means that the stream selection circuitry is to select the second plurality of video streams based on a determination that the second plurality of video streams includes a first one of the frames with a first one of the tiles included in a neighborhood of the selected at least one of the tiles, the neighborhood based on the sequence of tile numbers [0081, 0098], as discussed in the rejection for Claim 18. This would be obvious for the reasons given in the rejection for Claim 4.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONI HSU whose telephone number is (571)272-7785. The examiner can normally be reached M-F 10am-6:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached at (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
JH
/JONI HSU/Primary Examiner, Art Unit 2611