DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on January 08, 2026 has been entered.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4, 7-14, 17-20, 51 and 52 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Watanabe et al. (US 2016/0255412) and Phillips et al. (US 10,440,416).
Regarding claim 1, Watanabe et al. discloses a method comprising:
determining (i) a first partitioning of at least one frame of video content into a first set of zones of a first spatial zone size (“In the example in the drawings, a rectangular region (tile1) is defined in the interior of the full frame region (tile0)” at paragraph 0049, line 1) and (ii) a second partitioning of at least one frame of the video content into a second set of zones of a second spatial zone size smaller than the first spatial zone size, wherein each zone of the second set of zones comprises a smaller respective portion of the at least one frame than each zone of the first set of zones (“Specifically, the additional region (tile2) that is a rectangular region adjacent to the left side of the basic region and with the same height as the basic region, the additional region (tile3) that is a rectangular region adjacent to the right side of the basic region and is has the same height as the basic region” at paragraph 0050, line 6);
receiving a message including a request for the at least one frame of the video content (“distributing content associated with the content via a communication network according to a request from the reproduction device 200” at paragraph 0042, line 12);
determining, based on the received message, a viewing region of interest (“Here, in a case where there is displayable range content as illustrated in the lower left of FIG. 2, the part corresponding to the basic region is first displayed as illustrated in the upper right of the drawing” at paragraph 0053, line 1);
determining a plurality of viewing regions based on proximity to the ROI, including a first viewing region corresponding to the ROI and one or more other viewing regions (“Specifically, the additional region (tile2) that is a rectangular region adjacent to the left side of the basic region and with the same height as the basic region, the additional region (tile3) that is a rectangular region adjacent to the right side of the basic region and is has the same height as the basic region, the additional region (tile4) that is a rectangular region adjacent to the upper side of the basic region and has the same width as the basic region, and the additional region (tile5) that is a rectangular region adjacent to the lower side of the basic region and is a rectangular region same width as the basic region. An additional region (tile 6) that is a rectangular region positioned to the upper left of the basic region, and with the same width as the additional region (tile2) and the same height as the additional region (tile4), an additional region (tile7) that is a rectangular region positioned to the upper right of the basic region and with the same width as the additional region (tile3) and the same height as the additional region (tile 4), an additional region (tile8) that is a rectangular region positioned to the lower left of the basic region and with the same width as the additional region (tile2) and the same height as the additional region (tile5), and an additional region (tile9) that is a rectangular region positioned to the lower right of the basic region and with the same width as the additional region (tile3) and the same height as the additional region (tile5)” at paragraph 0050, line 6);
based on the determined plurality of viewing regions, selecting for streaming a selection of non-homogenous zones, including: (i) selecting, from the first set of zones having the first spatial zone size, a first one or more zones within the first viewing region; and (ii) selecting, from the second set of zones having the second spatial zone size, a second one or more zones within the one or more other viewing regions (“Here, when a user performs an operation that causes an image in a region adjacent to the basic region to the left, as illustrated in the center right side of the drawing, a part corresponding to the region combined with the additional region (tile2) positioned adjacent to the basic region on the left is displayed” at paragraph 0053, line 4);
transmitting a zone stream for each zone in the selection of non-homogenous zones (the video data corresponding to each of the tiles is combined together to form the display).
Watanabe et al. does not explicitly disclose that the viewing region is a viewport region.
Phillips et al. teaches a method in the same field of endeavor of video data rendering comprising:
determining (i) a first partitioning of at least one frame of video content into a first set of zones of a first zone size (“In accordance with the teachings herein, ROI 1402 may be formed from splicing high quality tiles” at col. 30, line 48) and (ii) a second partitioning of at least one frame of the video content into a second set of zones (“Regions or fields disposed proximate/adjacent to ROI 1402 may have medium quality tiles (e.g., field 1404). On the other hand, fields or regions distally disposed from ROI 1402, e.g., those farther away from the viewport, may be formed from lower quality tiles, as exemplified by regions 1406 and 1408” at col. 30, line 52);
receiving a message including a request for the at least one frame of the video content (“When a user request for a particular media asset is received and processed, a tile selection process based on control inputs e.g., transmission conditions, bandwidth allocation and/or gaze vector input, etc., may be effectuated for selecting tiles from different bitrate representations (i.e., different qualities) of the media asset (block 512)” at col. 18, line 63);
determining, based on the received message, a viewport region of interest (“wherein a field 1402 which may correspond to an ROI of the frame 1400 based on the viewport or gaze vector location” at col. 30, line 46);
determining a plurality of viewport regions based on proximity to the ROI, including a first viewport region corresponding to the ROI and one or more other viewport regions (as demonstrated above, the tile groups are based upon proximity to the ROI); and based on the determined plurality of viewport regions, selecting for streaming a selection of non-homogenous zones, including: (i) selecting a first one or more zones within the first viewport region (the ROI contains tiles corresponding to the main viewport); and (ii) selecting a second one or more zones within the one or more other viewport regions (the proximal tiles corresponding to 1404 form a particular viewport and the tiles corresponding to regions 1406 and 1408 form respective viewports);
transmitting a zone stream for each of the zones in the selection of non-homogenous zones (“A media input stream 202 is illustrative of a video stream corresponding to a 360° video asset that may be suitably stitched, projection-mapped and/or encoded as set forth in FIG. 1, which may be distributed, uploaded or otherwise provided to a CDN origin server 204 associated with an operator content delivery network 206” at col. 11, line 61; the above stitched tile frame is displayed accordingly).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to apply the video processing of Watanabe et al. using viewport regions as taught by Phillips to allow the adaptive display for a HMD.
Regarding claim 11, Watanabe et al. discloses a system comprising:
control circuitry configured to:
determine (i) a first partitioning of at least one frame of video content into a first set of zones of a first spatial zone size (“In the example in the drawings, a rectangular region (tile1) is defined in the interior of the full frame region (tile0)” at paragraph 0049, line 1) and (ii) a second partitioning of at least one frame of the video content into a second set of zones of a second spatial zone size smaller than the first spatial zone size, wherein each zone of the second set of zones comprises a smaller respective portion of the at least one frame than each zone of the first set of zones (“Specifically, the additional region (tile2) that is a rectangular region adjacent to the left side of the basic region and with the same height as the basic region, the additional region (tile3) that is a rectangular region adjacent to the right side of the basic region and is has the same height as the basic region” at paragraph 0050, line 6);
receive a message including a request for the at least one frame of the video content (“distributing content associated with the content via a communication network according to a request from the reproduction device 200” at paragraph 0042, line 12);
determine, based on the received message, a viewing region of interest (“Here, in a case where there is displayable range content as illustrated in the lower left of FIG. 2, the part corresponding to the basic region is first displayed as illustrated in the upper right of the drawing” at paragraph 0053, line 1);
determine a plurality of viewing regions based on proximity to the ROI, including a first viewing region corresponding to the ROI and one or more other viewing regions (“Specifically, the additional region (tile2) that is a rectangular region adjacent to the left side of the basic region and with the same height as the basic region, the additional region (tile3) that is a rectangular region adjacent to the right side of the basic region and is has the same height as the basic region, the additional region (tile4) that is a rectangular region adjacent to the upper side of the basic region and has the same width as the basic region, and the additional region (tile5) that is a rectangular region adjacent to the lower side of the basic region and is a rectangular region same width as the basic region. An additional region (tile 6) that is a rectangular region positioned to the upper left of the basic region, and with the same width as the additional region (tile2) and the same height as the additional region (tile4), an additional region (tile7) that is a rectangular region positioned to the upper right of the basic region and with the same width as the additional region (tile3) and the same height as the additional region (tile 4), an additional region (tile8) that is a rectangular region positioned to the lower left of the basic region and with the same width as the additional region (tile2) and the same height as the additional region (tile5), and an additional region (tile9) that is a rectangular region positioned to the lower right of the basic region and with the same width as the additional region (tile3) and the same height as the additional region (tile5)” at paragraph 0050, line 6);
based on the determined plurality of viewing regions, selecting for streaming a selection of non-homogenous zones, including: (i) select, from the first set of zones having the first spatial zone size, a first one or more zones within the first viewing region; and (ii) select, from the second set of zones having the second spatial zone size, a second one or more zones within the one or more other viewing regions (“Here, when a user performs an operation that causes an image in a region adjacent to the basic region to the left, as illustrated in the center right side of the drawing, a part corresponding to the region combined with the additional region (tile2) positioned adjacent to the basic region on the left is displayed” at paragraph 0053, line 4);
transmit a zone stream for each zone in the selection of non-homogenous zones (the video data corresponding to each of the tiles is combined together to form the display).
Watanabe et al. does not explicitly disclose that the viewing region is a viewport region.
Phillips et al. discloses a system in the same field of endeavor of video data rendering comprising:
control circuitry configured to:
determine (i) a first partitioning of at least one frame of video content into a first set of zones of a first zone size (“In accordance with the teachings herein, ROI 1402 may be formed from splicing high quality tiles” at col. 30, line 48) and (ii) a second partitioning of the at least one frame of video content into a second set of zones (“Regions or fields disposed proximate/adjacent to ROI 1402 may have medium quality tiles (e.g., field 1404). On the other hand, fields or regions distally disposed from ROI 1402, e.g., those farther away from the viewport, may be formed from lower quality tiles, as exemplified by regions 1406 and 1408” at col. 30, line 52);
receive a message including a request for the at least one frame of the video content (“When a user request for a particular media asset is received and processed, a tile selection process based on control inputs e.g., transmission conditions, bandwidth allocation and/or gaze vector input, etc., may be effectuated for selecting tiles from different bitrate representations (i.e., different qualities) of the media asset (block 512)” at col. 18, line 63);
determine, based on the received message, a viewport region of interest (“wherein a field 1402 which may correspond to an ROI of the frame 1400 based on the viewport or gaze vector location” at col. 30, line 46);
determine a plurality of viewport regions based on proximity to the ROI, including a first viewport region corresponding to the ROI and one or more other viewport regions (as demonstrated above, the tile groups are based upon proximity to the ROI); and based on the determined plurality of viewport regions, select for streaming a selection of non-homogenous zones, including: (i) select a first one or more zones within the first viewport region (the ROI contains tiles corresponding to the main viewport); and (ii) select a second one or more zones within the one or more other viewport regions (the proximal tiles corresponding to 1404 form a particular viewport and the tiles corresponding to regions 1406 and 1408 form respective viewports); and
transmit a zone stream for each of the zones in the selection of non-homogenous zones (“A media input stream 202 is illustrative of a video stream corresponding to a 360° video asset that may be suitably stitched, projection-mapped and/or encoded as set forth in FIG. 1, which may be distributed, uploaded or otherwise provided to a CDN origin server 204 associated with an operator content delivery network 206” at col. 11, line 61; the above stitched tile frame is displayed accordingly).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to apply the video processing of Watanabe et al. using viewport regions as taught by Phillips to allow the adaptive display for a HMD.
Regarding claims 2 and 12, the Watanabe et al. and Phillips et al. combination discloses a method and system wherein the zones within the first set of zones are uniform relative to each other (looking at Figure 2 of Watanabe et al., tile 1 is of uniform size) and wherein the zones within the second set of zones are uniform relative to each other (“Specifically, the additional region (tile2) that is a rectangular region adjacent to the left side of the basic region and with the same height as the basic region, the additional region (tile3) that is a rectangular region adjacent to the right side of the basic region and is has the same height as the basic region” Watanabe et al. at paragraph 0050, line 6; “In accordance with the teachings herein, ROI 1402 may be formed from splicing high quality tiles (i.e., tiles selected from coded bitstreams having low QPs, e.g., QP-16 at 105.6 Mbps, and concatenated in a stitching process). Regions or fields disposed proximate/adjacent to ROI 1402 may have medium quality tiles (e.g., field 1404). On the other hand, fields or regions distally disposed from ROI 1402, e.g., those farther away from the viewport, may be formed from lower quality tiles, as exemplified by regions 1406 and 1408” Phillips et al. at col. 30, line 48; each group contains uniform bitrate in Phillips et al., which would translate to uniform resolution).
Regarding claims 3 and 13, the Watanabe et al. and Phillips et al. combination discloses a method and system wherein the zones within the first set of zones are sized according to proximity to a predicted ROI (“To facilitate gaze-based tile selection control, additional embodiments of the present invention involve monitoring where a user is viewing in a 360° immersive video program (i.e., the user's viewport) and determining appropriate tile weights based on the user's gaze. In general, a gaze vector (GV) may be returned by the user/client device defining a gaze direction in a 3D immersive space displaying 360° video, e.g., where the headset is pointed” Phillips et al. at col. 30, line 58; the tiles in region 1402 are located proximate to a region established by the gaze vector; “Here, when a user performs an operation that causes an image in a region adjacent to the basic region to the left, as illustrated in the center right side of the drawing, a part corresponding to the region combined with the additional region (tile2) positioned adjacent to the basic region on the left is displayed.” Watanabe et al. at paragraph 0053, line 4; “When display is performed as in the center right side of the drawing, when the user performs an operation that causes the image of the region to the upper side thereof to be displayed, an image corresponding to the region in which the basic region, the additional region (tile2), the additional region (tile4), and the additional region (tile6) are combined are displayed” Watanabe et al. at paragraph 0054, line 1).
Regarding claims 4 and 14, the Watanabe et al. and Phillips et al. combination discloses a method and system wherein the ROI is based on at least one of:
a center of a field of view of a user associated with a head mounted display requesting the portion of the video content (“To facilitate gaze-based tile selection control, additional embodiments of the present invention involve monitoring where a user is viewing in a 360° immersive video program (i.e., the user's viewport) and determining appropriate tile weights based on the user's gaze. In general, a gaze vector (GV) may be returned by the user/client device defining a gaze direction in a 3D immersive space displaying 360° video, e.g., where the headset is pointed” Phillips et al. at col. 30, line 58; implied that the gaze vector is centered based upon the user’s gaze); or
detected objects in the portion of the video content.
Regarding claims 7 and 17, the Watanabe et al. and Phillips et al. combination discloses a method and system
wherein the control circuitry is further configured to:
determine an additional viewport ROI (“user changes gaze direction” Phillips et al. at col. 24, line 47; therefore, a new ROI is determined based upon a new gaze direction; “When display is performed as in the center right side of the drawing, when the user performs an operation that causes the image of the region to the upper side thereof to be displayed, an image corresponding to the region in which the basic region, the additional region (tile2), the additional region (tile4), and the additional region (tile6) are combined are displayed” at paragraph 0054, line 1);
determine an additional viewport region corresponding to the additional ROI (the combined display is updated as the gaze changes); and
select from the first set of zones having the first spatial zone size, an additional one or more zones within the additional viewport region (tiles 4 and 6 are added to the combined display), and
wherein the input/output circuitry configured to transmit the zone stream for each of the zones includes the additional one or more zones (the combined display incorporates each video data corresponding to the tiles).
Regarding claims 8 and 18, the Watanabe et al. and Phillips et al. combination discloses a method and system wherein the first viewport region is a same size as the ROI (“By way of illustration, video frame 1400 is formed from 128 tiles (16 columns by 8 rows) of a 4K video input, shown in unwrapped format (i.e., not projected in a 3D spherical space), wherein a field 1402 which may correspond to an ROI of the frame 1400 based on the viewport or gaze vector location” Phillips et al. at col. 30, line 42; “In the example in the drawings, a rectangular region (tile1) is defined in the interior of the full frame region (tile0)” Watanabe et al. at paragraph 0049, line 1).
Regarding claims 9 and 19, the Watanabe et al. and Phillips et al. combination discloses a method and system wherein the first viewport region is a percentage of an area of a field of view of a user associated with a head mounted display requesting the at least one frame of the video content (as see in Figure 14 of Phillips et al., region 1402 occupies a space corresponding to a percentage of the user’s total field of view defined by frame 1400; “In the example in the drawings, a rectangular region (tile1) is defined in the interior of the full frame region (tile0)” Watanabe et al. at paragraph 0049, line 1).
Regarding claims 10 and 20, the Watanabe et al. and Phillips et al. combination discloses a method and system wherein the ROI is a point that is at a center of a field of view of a user associated with a head mounted display requesting the at least one frame of the video content (“To facilitate gaze-based tile selection control, additional embodiments of the present invention involve monitoring where a user is viewing in a 360° immersive video program (i.e., the user's viewport) and determining appropriate tile weights based on the user's gaze. In general, a gaze vector (GV) may be returned by the user/client device defining a gaze direction in a 3D immersive space displaying 360° video, e.g., where the headset is pointed” Phillips et al. at col. 30, line 58; implied that the gaze vector is centered based upon the user’s gaze; the gaze vector therefore defines a center point around which the viewport is established).
Regarding claims 51 and 52, the Watanabe et al. and Phillips et al. combination discloses a method and system wherein:
the first and second sets of zones comprise one of:
first and second pluralities of tiles of the at least one frame (“In accordance with the teachings herein, ROI 1402 may be formed from splicing high quality tiles” Phillips et al. at col. 30, line 48; “Regions or fields disposed proximate/adjacent to ROI 1402 may have medium quality tiles (e.g., field 1404). On the other hand, fields or regions distally disposed from ROI 1402, e.g., those farther away from the viewport, may be formed from lower quality tiles, as exemplified by regions 1406 and 1408” Phillips et al. at col. 30, line 52; “Region-wise mixed resolution (RWMR): Tiles are encoded at multiple resolutions. Player devices select a combination of high-resolution tiles covering the viewport and low-resolution tiles for the remaining areas” Hourunranta et al. at paragraph 0138);
first and second pluralities of slices of the at least one frame; or
first and second pluralities of subpictures of the at least one frame; and
the first and second spatial zone sizes comprise one of:
first and second spatial tile sizes (looking at Figure 2 of Watanabe et al., tile 1 is larger than tiles 2 and 3);
first and second spatial slice sizes; or
first and second spatial subpicture sizes
Claim(s) 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Watanabe et al. and Phillips et al. as applied to claims 1 and 11 above, and further in view of Xie et al. (US 2020/0204810).
The Watanabe et al. and Phillips et al. combination discloses the elements of claims 1 and 11 as described above.
The Watanabe et al. and Phillips et al. combination does not explicitly disclose that the ROI is based on an analysis of a center of a field of view of one or more other users that viewed the at least one frame of the video content that is within a field of view of a user associated with a head mounted display requesting the at least one frame of the video content.
Xie et al. teaches a method and system wherein the ROI is based on an analysis of a center of a field of view of one or more other users that viewed the at least one frame of the video content that is within a field of view of a user associated with a head mounted display requesting the at least one frame of the video content (“Referring back to FIG. 3, to calculate the cross-user-based viewport prediction, cross-user-based viewport aggregator 304 collects other users' historical fixations during 360-degree video sessions for the same video (e.g., from sessions before the current playback session). Specifically, for the users that have watched the same video, cross-user-based viewport aggregator 304 calculates the viewport probability distribution for each video segment. For a specific video segment, cross-user-based viewport aggregator 304 may perform the following process. FIG. 5 depicts a simplified flowchart 500 of a method for calculating the cross-user-based viewport probability distribution according to some embodiments. The probability distribution may be different probabilities that a user might view a tile in each video segment of a video. At 502, for a user u, cross-user-based viewport aggregator 304 determines the user's fixation on the video segment, which may be denoted as f.sub.u=(x.sub.u, y.sub.u, z.sub.u), which is a spherical coordinate. Cross-user-based viewport aggregator 304 may aggregate the fixations for multiple users. For example, in FIG. 3, an interaction interface 316 receives user input to move a viewport 208” at paragraph 0033).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the cross-user-based viewport prediction as taught by Xie et al. in the system of the Watanabe et al. and Phillips et al. combination as it “improves the prediction precision by predicting the future viewport location. Taking into consideration the viewport prediction result, the adaptive bitrate process can choose the optimal bitrate for tiles, such as different bitrates for some tiles, to maximize the overall quality of the video” (Xie et al. at paragraph 0014, second to last sentence).
Response to Arguments
Summary of Remarks (@ response page labeled 7): Phillips et al. and Hourunranta et al. do not teach or disclose first and second spatial zone sizes.
Examiner’s Response: This argument is moot in view of the newly cited Watanabe et al. reference.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATRINA R FUJITA whose telephone number is (571)270-1574. The examiner can normally be reached Monday - Friday 9:30-5:30 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at 5712723638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KATRINA R FUJITA/Primary Examiner, Art Unit 2672