DETAILED ACTION
This action is in response to remarks filed 12/29/2025:
Claims 1 – 7 and 21 – 33 are pending
Claims 8 – 20 are cancelled
Response to Arguments
Applicant’s arguments with respect to claims 1 – 7 and 21 – 33 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Response to Amendment
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3 – 5, 7, 21, 23 – 25, 27 – 28, and 30 - 32 are rejected under 35 U.S.C. 103 as being unpatentable over Riley et al. (U.S. Pub. No. 2017/0270633, hereinafter “Riley”) in view of Archibong et al. (U.S. Patent No. 10,425,671, hereinafter “Archibong”) and Zhang et al. (U.S. Pub. No. 2022/0368979, hereinafter “Zhang”).
Regarding Claim 1, Riley teaches
A method (see Riley Paragraph [0063], method), the method comprising:
receiving a first video input that corresponding to a 360-degree video conference (see Riley Paragraph [0039], Remote cameras 124A-124N are configured to capture views of respective remote users (e.g., remote participants of a video conference). For example, first remote camera 124A may be configured to capture a first remote user who owns or otherwise has access to first remote computing device 106A. Remote cameras 124A-124N are further configured to generate respective remote video streams 132A-132N based on the respective views that are captured by respective remote cameras 124A-124N, and Paragraph [0041], Remote bowtie view logic 128A-128N are configured to perform one or more of the operations described herein to provide a bowtie view (e.g., any one or more of bowtie views 123A-123N). In a first example, first video information 138A may include the 360-degree image that is captured by 360-degree camera 114);
receiving one or more second video inputs (see Riley Paragraph [0039], Remote cameras 124A-124N are configured to capture views of respective remote users (e.g., remote participants of a video conference. Nth remote camera 124N may be configured to capture an Nth remote user who owns or otherwise has access to Nth remote computing device 106N. Remote cameras 124A-124N are further configured to generate respective remote video streams 132A-132N based on the respective views that are captured by respective remote cameras 124A-124N);
Riley does not expressively teach
for video decoding in a decoder
determining one or more regions in the first video input as occlude-free regions that do not overlap with any other image or video;
rendering an output video that includes the first video input with the one or more second video inputs overlaid in a region not including the one or more occlude-free regions.
However, Archibong teaches
for video decoding in a decoder (see Archibong Column 25, lines 44 – 45, decode incoming video stream 850 into a series of incoming video frames 1120) ;
It would have been obvious to one of ordinary skill in the art before the effective filing date of
the claimed invention to combine the teaching of a method of 360 degree video conferencing (as taught in Riley), with decoding video (as taught in Archibong), the motivation being to support synchronization and multiple streams (see Archibong Column 1, lines 65 – 67, and Column 2, lines 1 – 9, ).
Riley in view of Archibong do not expressively teach
determining one or more regions in the first video input as occlude-free regions that do not overlap with any other image or video;
rendering an output video that includes the first video input with the one or more second video inputs overlaid in a region not including the one or more occlude-free regions.
However, Zhang teaches
determining one or more regions in the first video input as occlude-free regions that do not overlap with any other image or video (see Zhang Paragraph [0003], identifying, for a video with which overlaid content is to be displayed, a video category of the video from a set of predefined video categories; for each video frame in a set of sampled video frames of the video: determining, for each video feature type of a set of video features types and for each location of multiple locations in the video frame, a confidence score that indicates a likelihood that the location in the video frame includes a feature of the video feature type; determining, based on the video category, a weight for each video feature type that reflects an importance of not occluding a video feature of the video feature type when a video of the video category is displayed; and adjusting, for each video feature type of the set of video features types, the confidence scores for the multiple locations in the video frame based on the determined weight for the video feature type, to generate adjusted confidence scores; aggregating the adjusted confidence scores for each location for each video frame in the set of sampled video frames to generate aggregated and adjusted confidence scores; determining, based on the aggregated and adjusted confidence scores, a location at which to position overlaid content during video display; and providing the overlaid content for display at the determined location in the video, Paragraph [0012], Aspects of the present disclosure provide the advantage of identifying feature-containing locations in the video frames from which to exclude overlaid content, because overlaying content over these locations would block or obscure important content (e.g., content classified as important) that is included in the underlying video stream, which would result in wasted computing resources by delivering video to users when the important content is not perceivable to the users, thereby rendering delivery of the video incomplete or ineffective. In some situations, machine learning engines (such as Bayesian classifiers, optical character recognition systems, or neural networks) can identify important features within the video stream, such as faces or other human portions, text, or other significant objects such as foreground or moving objects. Areas can be identified that encompass these important features; and then the overlaid content can be displayed outside of these identified areas, e.g., at location(s) that have been determined to not have (or at least have a least likelihood of having) an important feature. As a result, the user can receive the overlaid content without obstruction of the important content of the underlying video stream, such that the computing resources required to deliver the video are not wasted. This results in a more efficient video distribution system that prevents computing system resources (e.g., network bandwidth, memory, processor cycles, and limited client device display space) from being wasted through the delivery of videos in which the important content is occluded, or otherwise not perceivable by the user);
rendering an output video that includes the first video input with the one or more second video inputs overlaid in a region not including the one or more occlude-free regions (see Zhang Paragraph [0003], identifying, for a video with which overlaid content is to be displayed, a video category of the video from a set of predefined video categories; for each video frame in a set of sampled video frames of the video: determining, for each video feature type of a set of video features types and for each location of multiple locations in the video frame, a confidence score that indicates a likelihood that the location in the video frame includes a feature of the video feature type; determining, based on the video category, a weight for each video feature type that reflects an importance of not occluding a video feature of the video feature type when a video of the video category is displayed; and adjusting, for each video feature type of the set of video features types, the confidence scores for the multiple locations in the video frame based on the determined weight for the video feature type, to generate adjusted confidence scores; aggregating the adjusted confidence scores for each location for each video frame in the set of sampled video frames to generate aggregated and adjusted confidence scores; determining, based on the aggregated and adjusted confidence scores, a location at which to position overlaid content during video display; and providing the overlaid content for display at the determined location in the video, Paragraph [0012], Aspects of the present disclosure provide the advantage of identifying feature-containing locations in the video frames from which to exclude overlaid content, because overlaying content over these locations would block or obscure important content (e.g., content classified as important) that is included in the underlying video stream, which would result in wasted computing resources by delivering video to users when the important content is not perceivable to the users, thereby rendering delivery of the video incomplete or ineffective. In some situations, machine learning engines (such as Bayesian classifiers, optical character recognition systems, or neural networks) can identify important features within the video stream, such as faces or other human portions, text, or other significant objects such as foreground or moving objects. Areas can be identified that encompass these important features; and then the overlaid content can be displayed outside of these identified areas, e.g., at location(s) that have been determined to not have (or at least have a least likelihood of having) an important feature. As a result, the user can receive the overlaid content without obstruction of the important content of the underlying video stream, such that the computing resources required to deliver the video are not wasted. This results in a more efficient video distribution system that prevents computing system resources (e.g., network bandwidth, memory, processor cycles, and limited client device display space) from being wasted through the delivery of videos in which the important content is occluded, or otherwise not perceivable by the user, and Paragraph [0076], The output(s) 320 can be provided, for example, to the content platform 106, to enable the content platform 106 to display overlaid content item(s) corresponding to the output(s), at the recommended overlaid content location(s) 320a, beginning at corresponding time offset(s) 320b during playback of the input video 302. As another example, such as during development or troubleshooting, the output(s) 320 can be provided to the visualizer 316, and the visualizer 316 can render, for example, overlaid content outlines (or in some cases actual overlaid content), for monitoring or troubleshooting purposes, e.g., for viewing by an administrator or developer).
It would have been obvious to one of ordinary skill in the art before the effective filing date of
the claimed invention to combine the teaching of a method of 360 degree video conferencing (as taught in Riley), with decoding video (as taught in Archibong), the motivation being to support synchronization and multiple streams (see Archibong Column 1, lines 65 – 67, and Column 2, lines 1 – 9, ).
It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a method of 360 degree video conferencing with a decoder (as taught in Riley in view of Archibong), with determining regions within a video as occlude-free regions and rendering a video that includes overlaid input videos in a region not including the one or more occlude-free regions (as taught in Zhang), the motivation being to provide the advantage of identifying feature-containing locations in the video frames from which to exclude overlaid content, because overlaying content over these locations would block or obscure important content (e.g., content classified as important) that is included in the underlying video stream, which would result in wasted computing resources (e.g., network bandwidth, memory, processor cycles, and limited client device display space) by delivering video to users when the important content is not perceivable to the users (see Zhang Paragraph [0012]).
Regarding Claim 3, Riley in view of Archibong and Zhang teaches
The method of claim 1, wherein the one or more second video inputs is a 360-degree video or a 2-D video (see Riley Paragraph [0039], Remote cameras 124A-124N are configured to capture views of respective remote users (e.g., remote participants of a video conference). For example, first remote camera 124A may be configured to capture a first remote user who owns or otherwise has access to first remote computing device 106A. Remote cameras 124A-124N are further configured to generate respective remote video streams 132A-132N based on the respective views that are captured by respective remote cameras 124A-124N, and Paragraph [0041], Remote bowtie view logic 128A-128N are configured to perform one or more of the operations described herein to provide a bowtie view (e.g., any one or more of bowtie views 123A-123N). In a first example, first video information 138A may include the 360-degree image that is captured by 360-degree camera 114).
Regarding Claim 4, Riley in view of Archibong and Zhang teaches
The method of claim 1, wherein the occlude-free regions are dynamic and change during a video conferencing session (see Zhang Paragraph [0027], The techniques described herein provide a video system that can be configured to identify portions of the video where overlaid content can be provided without occluding important video content. Important content can include, for example, video features such as text, human faces, human torsos, moving objects, or portions of video that undergo color variance changes between video frames, Paragraph [0038], However, there is a technical problem of determining how to place the overlaid content so that it does not occlude important content in the underlying video. This is a particularly difficult problem in the context of overlaying content on video because the locations of important content in a video can change quickly over time. As such, even if a particular location within the video is a good candidate for overlay content at one point in time (e.g., in one frame), that location may be a bad candidate for overlay content at a later point in time/in subsequent frames (e.g., due to movement of characters within the video), Paragraph [0039], Using location information of important content determined by the video processing system 110, the content platform 106 can overlay content on top of a video stream, while at the same time avoiding areas of the video screen that feature important content in the underlying video stream, e.g., areas in the original video stream that contain faces, text, or significant objects such as moving objects. The video processing system 110 can, for example, include machine learning methods and engines that can identify locations in the video that are less likely to include important content, so that overlaid content displayed at those locations is less likely to obstruct important content in the underlying video. As described in more detail below, different types (e.g., categories) of videos can include different types of important content. Accordingly, the video processing system 110 can prioritize selection of locations for overlaid content based on avoiding video features that are particularly important for a video category that has been determined for the video, and Paragraph [0049], A video sampler 308 can capture a subset of frames of the input video 302. For example, the video sampler 308 can determine a sampling rate, which can be, for example, a number of frames per second or a number of seconds per frame. For instance, the video sampler 308 can capture three frames per second, one frame per second, or one frame every three seconds. A sampling rate can be determined based on a variety of factors. In some cases, a sampling rate can be determined based on the video category. For example, some types of video content may be more dynamic (e.g., as far as changing of content between frames) than other types of content that are more static. For example, prior analysis may show that entertainment videos may be generally more dynamic than knowledge videos. Accordingly, a sampling rate for an entertainment video may be higher than for a knowledge video).
Regarding Claim 5, Riley in view of Archibong and Zhang teaches
The method of claim 1, further comprising: responding to a change in the at least one information in the input video by changing the rendering in the output video (see Zhang Paragraph [0027], The techniques described herein provide a video system that can be configured to identify portions of the video where overlaid content can be provided without occluding important video content. Important content can include, for example, video features such as text, human faces, human torsos, moving objects, or portions of video that undergo color variance changes between video frames, Paragraph [0038], However, there is a technical problem of determining how to place the overlaid content so that it does not occlude important content in the underlying video. This is a particularly difficult problem in the context of overlaying content on video because the locations of important content in a video can change quickly over time. As such, even if a particular location within the video is a good candidate for overlay content at one point in time (e.g., in one frame), that location may be a bad candidate for overlay content at a later point in time/in subsequent frames (e.g., due to movement of characters within the video), Paragraph [0039], Using location information of important content determined by the video processing system 110, the content platform 106 can overlay content on top of a video stream, while at the same time avoiding areas of the video screen that feature important content in the underlying video stream, e.g., areas in the original video stream that contain faces, text, or significant objects such as moving objects. The video processing system 110 can, for example, include machine learning methods and engines that can identify locations in the video that are less likely to include important content, so that overlaid content displayed at those locations is less likely to obstruct important content in the underlying video. As described in more detail below, different types (e.g., categories) of videos can include different types of important content. Accordingly, the video processing system 110 can prioritize selection of locations for overlaid content based on avoiding video features that are particularly important for a video category that has been determined for the video, Paragraph [0049], A video sampler 308 can capture a subset of frames of the input video 302. For example, the video sampler 308 can determine a sampling rate, which can be, for example, a number of frames per second or a number of seconds per frame. For instance, the video sampler 308 can capture three frames per second, one frame per second, or one frame every three seconds. A sampling rate can be determined based on a variety of factors. In some cases, a sampling rate can be determined based on the video category. For example, some types of video content may be more dynamic (e.g., as far as changing of content between frames) than other types of content that are more static. Accordingly, a sampling rate for an entertainment video may be higher than for a knowledge video and Figure 6, is a flow diagram of an example process 600 for determining a location within a video at which to display overlaid content, The video processing system 110 performs processing for each video frame in a set of sampled video frames of the video (at 604)).
Regarding Claim 7, Riley in view of Archibong and Zhang teaches
The method of claim 1, wherein the information of each occlude-free region is updated during the session, and a new occlude-free region is added or an existing occlude-free region is removed (see Archibong Column 27, lines 28 – 48, the shape or location of social area 1140 may be dynamically adjusted based on the determined important areas of the screen discussed above. For example, if social area 1140 is displayed at a first location at a first time in a show, it may be resized or moved at a later time in the show if social TV dongle 810 determines that the social area 1140 is overlapping an important area of the screen. For illustrative purposes only, consider a televised singing competition in which a contestant performs during a first portion of the show and then a telephone number to vote for the contestant is displayed on the screen at a later point in the show. If social area 1140 is displayed in the lower center portion of the screen during the contestant's performance during the first portion of the show (e.g., to avoid overlapping the singer's face), it may overlap the telephone number when it is displayed later in the show. To avoid this, social TV dongle 810 may detect that a new important area of the screen has appeared (i.e., the telephone number) and either adjust the size or shape of social area 1140 to avoid the telephone number, or move social area 1140 to avoid the telephone number, thus occluded and occlude-free regions are dynamic and changing since it is based on a dynamic video and occlude-free regions are added and removed depending on the areas of importance).
Regarding Claims 21, 23 – 25, and 27, they are rejected similarly as Claims 1, 3 – 5, and 7, respectively.
Regarding Claims 28, and 30 – 32, they are rejected similarly as Claims 1, and 3 – 5, respectively. A method of encoding visual media can be found in Zhang (Abstract and Paragraph [0096], encoding).
Claims 2, 22, and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Riley et al. (U.S. Pub. No. 2017/0270633, hereinafter “Riley”) in view of Archibong et al. (U.S. Patent No. 10,425,671, hereinafter “Archibong”), Zhang et al. (U.S. Pub. No. 2022/0368979, hereinafter “Zhang”) and Corwin et al. (U.S. Patent No. 10,757,347, hereinafter “Corwin”).
Regarding Claim 2, Riley in view of Archibong and Zhang teaches all of the limitations of Claim 1, but does not teach
The method of claim 1, wherein the occlude-free region is defined by a location of the occlude-free region in a coordinate system.
However, Corwin teaches
The method of claim 1, wherein the occlude-free region is defined by a location of the occlude-free region in a coordinate system (see Corwin Abstract, A client device receives video data and displays the video data via a display device. An overlay including content other than the video data is also displayed in a specific area of the display device and at least partially occludes the video data displayed within the specific area of the display device. The client device identifies coordinates of regions of interest within frames of the video data. When the client device determines that at least a threshold amount of a region of interest within the video data is displayed within the specific area of the display device, where the overlay is displayed, for at least a threshold amount of time, the client device increases a transparency of the overlay, repositions the overlay, of otherwise modifies the overlay to prevent the overlay from occluding the region of interest).
It would have been further obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of a method of 360 degree video conferencing that decodes video streams, identifies occlude-free regions in a primary video, and overlays additional video streams outside important regions (as taught in Riley in view of Archibong and Zhang), with using a coordinate system when overlaying an object over a video, to prevent obstruction of important regions (as taught in Corwin), the motivation being to provide a method of using a descriptive representation of the location of an item or region within a video frame (see Corwin Column 2, lines 41 - 45).
Regarding Claims 22 and 29, they are rejected similarly as Claim 2.
Allowable Subject Matter
Claims 6, 26, and 33 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Refer to PTO-892, Notice of References Cited for a listing of analogous art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARISSA A JONES whose telephone number is (703)756-1677. The examiner can normally be reached Telework M-F 6:30 AM - 4:00 PM CT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at 5712727503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CARISSA A JONES/Examiner, Art Unit 2691
/DUC NGUYEN/Supervisory Patent Examiner, Art Unit 2691