Office Action Analysis: 18776953 — TRANSFORMING DIGITAL FRAMES USING RELATIONSHIPS BETWEEN THE DIGITAL FRAMES

Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/16/2024 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement has been considered by the examiner.

Specification
1	The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Claim Objections
2	Claims 11 and 12 objected to because of the following informalities: Missing colon (“:”) between the words/phrase “further comprising” and “obtaining”.  Appropriate correction is required.

Claim Rejections - 35 USC § 102
3	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

4	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

5	Claim(s) 1, 3-5, 7-8, 10, 14, and 16-17 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Lin et al. (US 20220101539 A1).

6	Regarding claim 1, Lin teaches a method comprising: receiving, by a computing device, a plurality of digital frames and a plurality of masks ([0067] reciting “The optical flow computation engine 310 can receive the correlation volume computation from the correlation volume engine 308. The optical flow computation engine 310 can use the features in the correlation volume computation to perform pointwise (e.g., pixel-wise) optical flow estimations for the pixels identified in the mask (e.g., mask A, mask D, or other mask)…In some cases, mask D may be similar to or the same as mask A, B, and/or mask C. In other cases, mask D may be configured to be different than mask A, B, and/or mask C. In some examples, the features corresponding to the source frame Is and the target frame IT can have a same resolution as the source frame Is and the target frame IT.”);
determining, by the computing device, displacements of attributes between sequential digital frames of the plurality of digital frames ([0030] reciting “The optical flow estimation system can estimate motion of the object between the frames by determining an optical flow vector that corresponds to the displacement and/or distance between the pixel in the initial frame and the corresponding pixel in the subsequent frame. For instance, the optical flow vector can indicate the displacement (e.g., corresponding to the direction and distance of movement) between coordinates corresponding to the initial pixel and coordinates corresponding to the subsequent pixel.”);
obtaining, by the computing device, one or more pixel values associated with a portion of at least one digital frame of the plurality of digital frames based on one or more corresponding pixel values associated with other digital frames of the plurality of digital frames and the displacements of the attributes, the portion of the at least one digital frame defined using at least one mask of the plurality of masks ([Abstract] reciting “For example, a process can include determining a subset of pixels of at least one of a first frame and a second frame, and generating a mask indicating the subset of pixels.”; [0030] reciting “The optical flow estimation system can estimate motion of the object between the frames by determining an optical flow vector that corresponds to the displacement and/or distance between the pixel in the initial frame and the corresponding pixel in the subsequent frame.”); and
transforming, by the computing device, the portion of the at least one digital frame based on the one or more pixel values ([0099] reciting “Each node of the convolutional hidden layer 722a is connected to a region of nodes (pixels) of the input image called a receptive field.”; [0063] reciting “By associating contextual features of neighbor pixels with contextual features of a center pixel, the feature extraction engine 304 can improve the accuracy of sparse optical flow estimation. For instance, by determining and storing the contextual features of neighbor pixels in connection with a center pixel, the feature extraction engine 304 can help the optical flow estimation system 300 accurately identify a pixel that corresponds to the center pixel within a subsequent frame.”; [0089] reciting “For example, as shown, each of the input nodes of the input layer 620 is connected to each of the nodes of the first hidden layer 622a. The nodes of the hidden layers 622a, 622b, through 622n can transform the information of each input node by applying activation functions to these information.”).

7	Regarding claim 3, Lin teaches the method of claim 1, wherein transforming the portion of the at least one digital frame comprises updating one or more original pixel values associated with the portion of the at least one digital frame based on the one or more pixel values associated with the portion of the at least one digital frame ([0050] reciting “In other examples, the optical flow map engine 106 can generate a cumulative optical flow map (in which case the optical flow map is adjusted or updated at each frame) that corresponds to motion estimations between two frames having one or more intermediate frames between them.”).

8	Regarding claim 4, Lin teaches the method of claim 1 (see claim 1 rejection above), further comprising: generating, by the computing device and based on providing the at least one digital frame and the at least one mask as input to a learning model, one or more additional pixel values associated with a digital frame of the at least one digital frame ([0008] reciting “In another example, an apparatus for encoding video data is provided. The apparatus includes: means for determining a subset of pixels of at least one of a first frame and a second frame; means for generating a mask indicating the subset of pixels; means for determining, based on the mask…”);
transforming, by the computing device, one or more original pixel values associated with the digital frame based on the one or more additional pixel values associated with the digital frame ([0099] reciting “Each node of the convolutional hidden layer 722a is connected to a region of nodes (pixels) of the input image called a receptive field.”; [0063] reciting “By associating contextual features of neighbor pixels with contextual features of a center pixel, the feature extraction engine 304 can improve the accuracy of sparse optical flow estimation. For instance, by determining and storing the contextual features of neighbor pixels in connection with a center pixel, the feature extraction engine 304 can help the optical flow estimation system 300 accurately identify a pixel that corresponds to the center pixel within a subsequent frame.”; [0089] reciting “For example, as shown, each of the input nodes of the input layer 620 is connected to each of the nodes of the first hidden layer 622a. The nodes of the hidden layers 622a, 622b, through 622n can transform the information of each input node by applying activation functions to these information.”);
obtaining, by the computing device and in response to transforming the one or more original pixel values of the digital frame, updated displacements of the attributes between the sequential digital frames ([0049] reciting “The values of the mask can indicate the displacement of the pixels within the subsequent target frame relative to the source frame.”);
obtaining, by the computing device, one or more additional pixel values associated with the at least one digital frame based on one or more corresponding pixel values associated with the other digital frames of the plurality of digital frames and the updated displacements of the attributes ([Abstract] reciting “For example, a process can include determining a subset of pixels of at least one of a first frame and a second frame, and generating a mask indicating the subset of pixels.”; [0030] reciting “The optical flow estimation system can estimate motion of the object between the frames by determining an optical flow vector that corresponds to the displacement and/or distance between the pixel in the initial frame and the corresponding pixel in the subsequent frame.”); and
transforming, by the computing device, one or more original pixel values associated with the at least one digital frame based on the one or more additional pixel values ([0099] reciting “Each node of the convolutional hidden layer 722a is connected to a region of nodes (pixels) of the input image called a receptive field.”; [0063] reciting “By associating contextual features of neighbor pixels with contextual features of a center pixel, the feature extraction engine 304 can improve the accuracy of sparse optical flow estimation. For instance, by determining and storing the contextual features of neighbor pixels in connection with a center pixel, the feature extraction engine 304 can help the optical flow estimation system 300 accurately identify a pixel that corresponds to the center pixel within a subsequent frame.”; [0089] reciting “For example, as shown, each of the input nodes of the input layer 620 is connected to each of the nodes of the first hidden layer 622a. The nodes of the hidden layers 622a, 622b, through 622n can transform the information of each input node by applying activation functions to these information.”).
9	Regarding claim 5, Lin teaches the method of claim 4 (see claims 1 and 4 rejections above), further comprising selecting, by the computing device, the digital frame from the plurality of digital frames that maximizes a numerical quantity of connections to pixel values within a portion of other digital frames in the plurality of digital frames, the portion of the other digital frames defined using respective masks of the plurality of masks ([0107] reciting “The final layer of connections in the network is a fully-connected layer that connects every node from the pooling hidden layer 722b to every one of the output nodes in the output layer 724… and the pooling layer 722b includes a layer of 3×12×12 hidden feature nodes based on application of max-pooling filter to 2×2 regions across each of the three feature maps.”).

10	Regarding claim 7, Lin teaches the method of claim 4 (see claims 1 and 4 rejections above), wherein generating the one or more additional pixel values comprises determining, after transforming the portion of the at least one digital frame, one or more remaining pixels of the portion of the at least one digital frame are to be transformed ([0100] reciting “Processing the filter at each unique location of the input volume produces a number representing the filter results for that location, resulting in a total sum value being determined for each node of the convolutional hidden layer 722a.”).

11	Regarding claim 8, Lin teaches the method of claim 1 (see claim 1 rejection above), wherein obtaining the one or more pixel values associated with the portion of the at least one digital frame comprises mapping the one or more pixel values associated with the portion of the at least one digital frame to the one or more corresponding pixel values associated with the other digital frames based on respective displacements of the attributes between the at least one digital frame and the other digital frames ([0005] reciting “The method includes: determining a subset of pixels of at least one of a first frame and a second frame; generating a mask indicating the subset of pixels; determining, based on the mask, one or more features associated with the subset of pixels of at least the first frame and the second frame…”; [0043] reciting “The optical flow estimation system 100 can process frames 103 to generate an optical flow map (e.g., an optical flow map 108) by performing optical flow estimation for pixels within a frame of the frames 103. The optical flow map 108 can include one or more optical flow vectors corresponding to the movement of pixels between two frames.”; [0049] reciting “The optical flow map engine 106 of the optical flow estimation system 100 can generate values for the optical flow map 108 for the target frame based on the optical flow vectors. The values of the mask can indicate the displacement of the pixels within the subsequent target frame relative to the source frame.”).

12	Regarding claim 10, Lin teaches the method of claim 1 (see claim 1 rejection above), wherein obtaining the one or more pixel values associated with the portion of the at least one digital frame comprises: obtaining one or more first pixel values associated with the portion of the at least one digital frame based on traversing the plurality of digital frames in a first direction ([0030] reciting “For instance, the optical flow vector can indicate the displacement (e.g., corresponding to the direction and distance of movement) between coordinates corresponding to the initial pixel and coordinates corresponding to the subsequent pixel.”); 
obtaining one or more second pixel values associated with the portion of the at least one digital frame based on traversing the plurality of digital frames in a second direction, the first direction being different than the second direction ([Abstract] reciting “For example, a process can include determining a subset of pixels of at least one of a first frame and a second frame, and generating a mask indicating the subset of pixels.”); and
determining respective differences between the one or more first pixel values associated with the portion of the at least one digital frame and the one or more second pixel values associated with the portion of the at least one digital frame ([0046] reciting “In some examples, the selection techniques may include a differential field of two frames, intensity smoothness (or discontinuity), object regions and/or boundaries, semantic regions (e.g., foreground objects versus the background of the scene), semantic uncertainty, attention regions for activities, any combination thereof, and/or other properties.”).

13	Regarding claim 14, Lin teaches a system comprising: a memory component; and a computing device coupled to the memory component, the computing device to perform operations including ([0006] reciting “In another example, an apparatus for encoding video data is provided that includes a memory and one or more processors (e.g., implemented in circuitry) coupled to the memory. In some examples, more than one processor can be coupled to the memory and can be used to perform one or more of the operations described herein.”):
obtaining a plurality of masked digital frames based on applying a plurality of masks to a plurality of digital frames ([0005] reciting “The method includes: determining a subset of pixels of at least one of a first frame and a second frame; generating a mask indicating the subset of pixels; determining, based on the mask, one or more features associated with the subset of pixels of at least the first frame and the second frame; determining optical flow vectors between the subset of pixels of the first frame and corresponding pixels of the second frame; and generating an optical flow map for the second frame using the optical flow vectors.”);
generating a mapping between a plurality of pixels associated with the plurality of masked digital frames based on traversing the plurality of masked digital frames to obtain respective displacements of the plurality of pixels occurring between sequential masked digital frames of the plurality of masked digital frames ([0030] reciting “For example, an optical flow estimation system can identify a pixel of an initial frame that corresponds to a portion of a real-world object. The optical flow estimation system can determine a corresponding pixel (e.g., a pixel that depicts the same portion of the real-world object) within a subsequent frame.”; [0049] reciting “The optical flow map engine 106 of the optical flow estimation system 100 can generate values for the optical flow map 108 for the target frame based on the optical flow vectors. The values of the mask can indicate the displacement of the pixels within the subsequent target frame relative to the source frame.”);
obtaining one or more pixel values associated with at least one masked digital frame of the plurality of masked digital frames based on one or more corresponding pixel values associated with other masked digital frames of the plurality of masked digital frames and the mapping between the plurality of pixels associated with the plurality of masked digital frames ([0049] reciting “The values of the mask can indicate the displacement of the pixels within the subsequent target frame relative to the source frame. In some examples, the optical flow map engine 106 can determine values of the optical flow map 108 for the target frame using the optical flow vectors determined for the pixels in the target frame identified by the mask.”); and
transforming the at least one masked digital frame based on the one or more pixel values associated with the at least one masked digital frame ([Abstract] reciting “The process can include determining, based on the mask, one or more features associated with the subset of pixels of at least the first frame and the second frame.”; [0099] reciting “Each node of the convolutional hidden layer 722a is connected to a region of nodes (pixels) of the input image called a receptive field.”; [0063] reciting “By associating contextual features of neighbor pixels with contextual features of a center pixel, the feature extraction engine 304 can improve the accuracy of sparse optical flow estimation. For instance, by determining and storing the contextual features of neighbor pixels in connection with a center pixel, the feature extraction engine 304 can help the optical flow estimation system 300 accurately identify a pixel that corresponds to the center pixel within a subsequent frame.”; [0089] reciting “For example, as shown, each of the input nodes of the input layer 620 is connected to each of the nodes of the first hidden layer 622a. The nodes of the hidden layers 622a, 622b, through 622n can transform the information of each input node by applying activation functions to these information.”).

14	Regarding claim 16, Lin teaches the system of claim 14 (see claim 14 rejection above), wherein to transform the at least one masked digital frame the operations further include updating one or more original pixel values associated with the at least one masked digital frame based on the one or more pixel values associated with the at least one masked digital frame ([0050] reciting “In other examples, the optical flow map engine 106 can generate a cumulative optical flow map (in which case the optical flow map is adjusted or updated at each frame) that corresponds to motion estimations between two frames having one or more intermediate frames between them.”).

15	Regarding claim 17, Lin teaches the system of claim 14, wherein the operations further include: generating, based on providing the at least one masked digital frame as input to a learning model, one or more additional pixel values associated with a masked digital frame of the at least one masked digital frame ([0008] reciting “In another example, an apparatus for encoding video data is provided. The apparatus includes: means for determining a subset of pixels of at least one of a first frame and a second frame; means for generating a mask indicating the subset of pixels; means for determining, based on the mask…”);
transforming one or more original pixel values associated with the masked digital frame based on the one or more additional pixel values associated with the masked digital frame ([0099] reciting “Each node of the convolutional hidden layer 722a is connected to a region of nodes (pixels) of the input image called a receptive field.”; [0063] reciting “By associating contextual features of neighbor pixels with contextual features of a center pixel, the feature extraction engine 304 can improve the accuracy of sparse optical flow estimation. For instance, by determining and storing the contextual features of neighbor pixels in connection with a center pixel, the feature extraction engine 304 can help the optical flow estimation system 300 accurately identify a pixel that corresponds to the center pixel within a subsequent frame.”; [0089] reciting “For example, as shown, each of the input nodes of the input layer 620 is connected to each of the nodes of the first hidden layer 622a. The nodes of the hidden layers 622a, 622b, through 622n can transform the information of each input node by applying activation functions to these information.”);
generating, in response to transforming the one or more original pixel values of the masked digital frame, an updated mapping between the plurality of pixels associated with the plurality of masked digital frames based on traversing the plurality of masked digital frames to obtain updated respective displacements of the plurality of pixels occurring between the sequential masked digital frames in the plurality of masked digital frames ([0049] reciting “In some cases, the optical flow vector engine 104 can determine optical flow vectors between each pixel in a frame (referred to as a source frame) identified by the mask and corresponding pixels of a subsequent frame (referred to as a target frame). The optical flow map engine 106 of the optical flow estimation system 100 can generate values for the optical flow map 108 for the target frame based on the optical flow vectors. The values of the mask can indicate the displacement of the pixels within the subsequent target frame relative to the source frame.”; [0050] reciting “In other examples, the optical flow map engine 106 can generate a cumulative optical flow map (in which case the optical flow map is adjusted or updated at each frame) that corresponds to motion estimations between two frames having one or more intermediate frames between them.”);
obtaining one or more additional pixel values associated with the at least one masked digital frame based on one or more corresponding pixel values associated with the other masked digital frames of the plurality of masked digital frames and the updated mapping between the plurality of pixels associated with the plurality of masked digital frames ([Abstract] reciting “For example, a process can include determining a subset of pixels of at least one of a first frame and a second frame, and generating a mask indicating the subset of pixels.”; [0030] reciting “The optical flow estimation system can estimate motion of the object between the frames by determining an optical flow vector that corresponds to the displacement and/or distance between the pixel in the initial frame and the corresponding pixel in the subsequent frame.”); and
transforming one or more original pixel values associated with the at least one masked digital frame based on the one or more additional pixel values associated with the at least one masked digital frame ([0099] reciting “Each node of the convolutional hidden layer 722a is connected to a region of nodes (pixels) of the input image called a receptive field.”; [0063] reciting “By associating contextual features of neighbor pixels with contextual features of a center pixel, the feature extraction engine 304 can improve the accuracy of sparse optical flow estimation. For instance, by determining and storing the contextual features of neighbor pixels in connection with a center pixel, the feature extraction engine 304 can help the optical flow estimation system 300 accurately identify a pixel that corresponds to the center pixel within a subsequent frame.”; [0089] reciting “For example, as shown, each of the input nodes of the input layer 620 is connected to each of the nodes of the first hidden layer 622a. The nodes of the hidden layers 622a, 622b, through 622n can transform the information of each input node by applying activation functions to these information.”).

Claim Rejections - 35 USC § 103
16	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

17	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

18	Claim(s) 2, 6, 11, 15, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 20220101539 A1) in view of Dharur et al. (US 20200193609 A1).

19	Regarding claim 2, Lin teaches the method of claim 1, wherein transforming the portion of the at least one digital frame comprises (see claim 1 rejection above), but does not explicitly teach removing one or more original pixel values associated with the portion of the at least one digital frame; and replacing the one or more original pixel values associated with the portion of the at least one digital frame with the one or more pixel values associated with the portion of the at least one digital frame.

20	Dharur teaches removing one or more original pixel values associated with the portion of the at least one digital frame ([0089] reciting “In some examples, the output frame can be rendered at block 410C with background pixels having a visual effect (e.g., the background pixels are rendered as black pixels, white pixels, blurred pixels, or other visual effect)… FIG. 7C illustrates an example of a segmentation overlay resulting from the segmentation mask shown in FIG. 7B (generated by performing image segmentation on the image shown in FIG. 7A) being overlaid over the image shown in FIG. 7A. FIG. 7D illustrates an example of a rendered output frame using the segmentation mask, where the background is removed.”); and 
replacing the one or more original pixel values associated with the portion of the at least one digital frame with the one or more pixel values associated with the portion of the at least one digital frame ([0061] reciting “In another example, the visual effect can include modifying the foreground pixels (e.g., changing the lighting, blurring, or the like) of the output frame or replacing the foreground pixels with a different object, such as a computer-generated object, an augmented reality (AR) object, or other suitable object.”).

It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Lin) to incorporate the teachings of Dharur to provide a method that involves removing and replacing various pixel values for the digital frames taught by Lin. Doing so would allow the pixels to be rendered having some type of visual effect as stated by Dharur ([0089] recited).

21	Regarding claim 6, Lin teaches the method of claim 4, wherein generating the one or more additional pixel values comprises (see claims 1 and 4 rejections above), but does not explicitly teach obtaining, via one or more interactable elements of a user interface associated with the computing device, a prompt corresponding to an intent associated with the transforming of the one or more original pixel values associated with the digital frame; and determining the intent is associated with replacing the one or more original pixel values to remove at least one attribute associated with the digital frame from the digital frame; or determining the intent is associated with updating the one or more original pixel values to add a new attribute associated with the digital frame or to modify an existing attribute associated with the digital frame.  

22	Dharur teaches obtaining, via one or more interactable elements of a user interface associated with the computing device, a prompt corresponding to an intent associated with the transforming of the one or more original pixel values associated with the digital frame ([0059] reciting “A frame can include a video frame of a video sequence or a still image of a set of consecutively captured still images. In one illustrative example, a set of consecutively captured still images can be captured and displayed to the user as a preview of what is in the field-of-view of the camera, which can help the user decide when to capture an image for storage.”); and 
determining the intent is associated with replacing the one or more original pixel values to remove at least one attribute associated with the digital frame from the digital frame ([0056] reciting “In another example, a user of a computing devices may prefer to manipulate certain portions of an image of the user. For instance, the background portion and/or the foreground portion of the scene in a frame can be modified.”; [0089] reciting “In some examples, the output frame can be rendered with foreground pixels having a visual effect (e.g., some or all of the foreground pixels are replaced with an alternative object, such as an augmented reality object, an avatar, or the like, or other visual effect). FIG. 7C illustrates an example of a segmentation overlay resulting from the segmentation mask shown in FIG. 7B (generated by performing image segmentation on the image shown in FIG. 7A) being overlaid over the image shown in FIG. 7A. FIG. 7D illustrates an example of a rendered output frame using the segmentation mask, where the background is removed.”); or 
determining the intent is associated with updating the one or more original pixel values to add a new attribute associated with the digital frame or to modify an existing attribute associated with the digital frame ([0006] reciting “Using the segmentation mask, an output frame can then be generated with a modified foreground or background.”; [0061] reciting “In another example, the visual effect can include modifying the foreground pixels (e.g., changing the lighting, blurring, or the like) of the output frame or replacing the foreground pixels with a different object, such as a computer-generated object, an augmented reality (AR) object, or other suitable object.”).  

23	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Lin) to incorporate the teachings of Dharur to provide a type of prompt for the user for the methods like removing and replacing the pixels for the digital frames provided by the teachings of Lin. Doing so would allow the pixels to be rendered having some type of visual effect as stated by Dharur ([0089] recited).

24	Regarding claim 11, Lin teaches the method of claim 10 (see claims 1 and 10 rejections above), and although could teach further comprising obtaining, for pixels corresponding to differences of the respective differences that satisfy a threshold value, an average value between first pixel values of the one or more first pixel values that correspond to the pixels and second pixel values of the one or more second pixel values that correspond to the pixels, the one or more pixel values associated with the portion of the at least one digital frame including the average value ([0014] reciting “In some cases, the methods, apparatuses, and computer-readable medium described above comprise selecting the second frame for performing optical flow estimation based on determining an importance value of the second frame exceeds a threshold importance value.”; [0046] reciting “In some examples, the selection techniques may include a differential field of two frames, intensity smoothness (or discontinuity), object regions and/or boundaries, semantic regions (e.g., foreground objects versus the background of the scene), semantic uncertainty, attention regions for activities, any combination thereof, and/or other properties.”; [0070] reciting “Frames with importance values that meet or exceed (are greater than) a threshold importance value can be designated as key frames for which optical flow estimation is to be performed.”), Dharur can teach this limitation further.

25	Dharur teaches further comprising obtaining, for pixels corresponding to differences of the respective differences that satisfy a threshold value, an average value between first pixel values of the one or more first pixel values that correspond to the pixels and second pixel values of the one or more second pixel values that correspond to the pixels, the one or more pixel values associated with the portion of the at least one digital frame including the average value ([0083] reciting “In some examples, a single representative displacement value can be determined based on a weighted displacement (of the motion vectors) for every single pixel in the frame. For instance, on a pixel by pixel basis, the difference in the motion of each pixel can be computed, and then the differences can be added together (and in some cases averaged).”; [0084] reciting “In one illustrative example, to quantify and generate a value against a motion vector map (calculated between a previous frame and a current frame), and to compare with the motion threshold, an average or weighted average can be determined for the motion-difference of the pixels being tracked.”).

26	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Lin) to incorporate the teachings of Dharur to provide a clearer method of differences between two pixels and digital frames provided by the teachings of Lin, and finding the average values. Doing so would provide hyper-parameter tuning based on the particular application as stated by Dharur ([0084] recited).

27	Regarding claim 15, Lin teaches the system of claim 14, wherein to transform the at least one masked digital frame the operations further include (see claim 14 rejection above), but does not explicitly teach removing one or more original pixel values associated with the at least one masked digital frame; and replacing the one or more original pixel values associated with the at least one masked digital frame with the one or more pixel values associated with the at least one masked digital frame.

28	Dharur teaches removing one or more original pixel values associated with the at least one masked digital frame ([Abstract] reciting “Pixels of the first frame can be modified using the segmentation mask.”; [0056] reciting “In another example, a user of a computing devices may prefer to manipulate certain portions of an image of the user. For instance, the background portion and/or the foreground portion of the scene in a frame can be modified.”; [0089] reciting “In some examples, the output frame can be rendered with foreground pixels having a visual effect (e.g., some or all of the foreground pixels are replaced with an alternative object, such as an augmented reality object, an avatar, or the like, or other visual effect). FIG. 7C illustrates an example of a segmentation overlay resulting from the segmentation mask shown in FIG. 7B (generated by performing image segmentation on the image shown in FIG. 7A) being overlaid over the image shown in FIG. 7A. FIG. 7D illustrates an example of a rendered output frame using the segmentation mask, where the background is removed.”); and 
replacing the one or more original pixel values associated with the at least one masked digital frame with the one or more pixel values associated with the at least one masked digital frame ([0061] reciting “In another example, the visual effect can include modifying the foreground pixels (e.g., changing the lighting, blurring, or the like) of the output frame or replacing the foreground pixels with a different object, such as a computer-generated object, an augmented reality (AR) object, or other suitable object.”).

29	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Lin) to incorporate the teachings of Dharur to provide a method that involves removing and replacing various pixel values for the digital masked frames taught by Lin. Doing so would allow the pixels to be rendered having some type of visual effect as stated by Dharur ([0089] recited).

30	Regarding claim 18, Lin teaches a method comprising: 
generating, by the computing device and based on providing the plurality of digital frames and the prompt as input to a learning model, one or more pixel values associated with a digital frame of the plurality of digital frames, the one or more pixel values corresponding to the intent ([0008] reciting “In another example, an apparatus for encoding video data is provided. The apparatus includes: means for determining a subset of pixels of at least one of a first frame and a second frame; means for generating a mask indicating the subset of pixels; means for determining, based on the mask…”);
transforming, by the computing device, the one or more respective original pixel values associated with the digital frame based on the one or more pixel values associated with the digital frame ([0099] reciting “Each node of the convolutional hidden layer 722a is connected to a region of nodes (pixels) of the input image called a receptive field.”; [0063] reciting “By associating contextual features of neighbor pixels with contextual features of a center pixel, the feature extraction engine 304 can improve the accuracy of sparse optical flow estimation. For instance, by determining and storing the contextual features of neighbor pixels in connection with a center pixel, the feature extraction engine 304 can help the optical flow estimation system 300 accurately identify a pixel that corresponds to the center pixel within a subsequent frame.”; [0089] reciting “For example, as shown, each of the input nodes of the input layer 620 is connected to each of the nodes of the first hidden layer 622a. The nodes of the hidden layers 622a, 622b, through 622n can transform the information of each input node by applying activation functions to these information.”);
obtaining, by the computing device and in response to transforming the one or more respective original pixel values of the digital frame, relationships between respective digital frames in the plurality of digital frames based on respective displacements of attributes associated with the respective digital frames, the respective displacements of the attributes occurring between sequential digital frames in the plurality of digital frames ([0048] reciting “The optical flow vector engine 104 of the optical flow estimation system 100 can determine optical flow vectors corresponding to pixels identified by the mask. In some cases, an optical flow vector can indicate a direction and magnitude of the movement of the pixel. For example, an optical flow vector can describe a displacement between a coordinate corresponding to the location of the pixel within an initial frame and a coordinate corresponding to the location of the pixel within a subsequent frame…In an illustrative example, the optical flow vector engine 104 can determine an optical flow vector using a differential motion estimation technique (e.g., a Taylor series approximation).”);
obtaining, by the computing device, one or more pixel values associated with at least one digital frame of the plurality of digital frames based on one or more corresponding pixel values associated with other digital frames of the plurality of digital frames and the relationships between the respective digital frames in the plurality of digital frames ([Abstract] reciting “For example, a process can include determining a subset of pixels of at least one of a first frame and a second frame, and generating a mask indicating the subset of pixels. The process can include determining, based on the mask, one or more features associated with the subset of pixels of at least the first frame and the second frame. The process can include determining optical flow vectors between the subset of pixels of the first frame and corresponding pixels of a second frame.”; [0030] reciting “The optical flow estimation system can estimate motion of the object between the frames by determining an optical flow vector that corresponds to the displacement and/or distance between the pixel in the initial frame and the corresponding pixel in the subsequent frame.”); and
transforming, by the computing device, one or more original pixel values associated with the at least one digital frame based on the one or more pixel values associated with the at least one digital frame ([0099] reciting “Each node of the convolutional hidden layer 722a is connected to a region of nodes (pixels) of the input image called a receptive field.”; [0063] reciting “By associating contextual features of neighbor pixels with contextual features of a center pixel, the feature extraction engine 304 can improve the accuracy of sparse optical flow estimation. For instance, by determining and storing the contextual features of neighbor pixels in connection with a center pixel, the feature extraction engine 304 can help the optical flow estimation system 300 accurately identify a pixel that corresponds to the center pixel within a subsequent frame.”; [0089] reciting “For example, as shown, each of the input nodes of the input layer 620 is connected to each of the nodes of the first hidden layer 622a. The nodes of the hidden layers 622a, 622b, through 622n can transform the information of each input node by applying activation functions to these information.”).

31	Lin does not explicitly teach obtaining, via one or more interactable elements of a user interface associated with a computing device, a prompt corresponding to an intent associated with transforming of one or more respective original pixel values associated with a plurality of digital frames;

32	Dharur teaches obtaining, via one or more interactable elements of a user interface associated with a computing device, a prompt corresponding to an intent associated with transforming of one or more respective original pixel values associated with a plurality of digital frames ([0059] reciting “A frame can include a video frame of a video sequence or a still image of a set of consecutively captured still images. In one illustrative example, a set of consecutively captured still images can be captured and displayed to the user as a preview of what is in the field-of-view of the camera, which can help the user decide when to capture an image for storage.”);

33	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Lin) to incorporate the teachings of Dharur to provide a type of prompt for the user for the methods like transforming and obtaining the pixels for the digital frames provided by the teachings of Lin. Doing so would allow the pixels to be rendered having some type of visual effect as stated by Dharur ([0089] recited).

34	Regarding claim 19, Lin in view of Dharur teaches the method of claim 18 (see claim 18 rejection above), wherein transforming the one or more respective original pixel values associated with the digital frame comprises: determining the intent is associated with replacing the one or more respective original pixel values to remove at least one attribute associated with the digital frame from the digital frame; removing the one or more respective original pixel values associated with the digital frame; and replacing the one or more respective original pixel values associated with the digital frame with the one or more pixel values associated with the digital frame (Dharur; [0056] reciting “In another example, a user of a computing devices may prefer to manipulate certain portions of an image of the user. For instance, the background portion and/or the foreground portion of the scene in a frame can be modified.”; [0089] reciting “In some examples, the output frame can be rendered with foreground pixels having a visual effect (e.g., some or all of the foreground pixels are replaced with an alternative object, such as an augmented reality object, an avatar, or the like, or other visual effect). FIG. 7C illustrates an example of a segmentation overlay resulting from the segmentation mask shown in FIG. 7B (generated by performing image segmentation on the image shown in FIG. 7A) being overlaid over the image shown in FIG. 7A. FIG. 7D illustrates an example of a rendered output frame using the segmentation mask, where the background is removed.”).

35	Regarding claim 20, Lin in view of Dharur teaches the method of claim 18 (see claim 18 rejection above), wherein transforming the one or more respective original pixel values associated with the digital frame comprises: determining the intent is associated with updating the one or more respective original pixel values to add a new attribute associated with the digital frame or to modify an existing attribute associated with the digital frame ([0006] reciting “Using the segmentation mask, an output frame can then be generated with a modified foreground or background.”; [0061] reciting “In another example, the visual effect can include modifying the foreground pixels (e.g., changing the lighting, blurring, or the like) of the output frame or replacing the foreground pixels with a different object, such as a computer-generated object, an augmented reality (AR) object, or other suitable object.”); and
updating the one or more respective original pixel values associated with the digital frame based on the one or more pixel values associated with the digital frame ([0061] reciting “In another example, the visual effect can include modifying the foreground pixels (e.g., changing the lighting, blurring, or the like) of the output frame or replacing the foreground pixels with a different object, such as a computer-generated object, an augmented reality (AR) object, or other suitable object.”; [0152] reciting “At each convolutional iteration, the values of the filter are multiplied with a corresponding number of the original pixel values of the image (e.g., the 5×5 filter array is multipled by a 5×5 array of input pixel values at the top-left corner of the input image array). The multiplications from each convolutional iteration can be summed together to obtain a total sum for that iteration or node.”).

36	Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 20220101539 A1) in view of Breton et al. (US 20100060638 A1).

37	Regarding claim 9, Lin teaches the method of claim 1 (see claim 1 rejection above), further comprising: 
 
and obtaining,  and based on the respective displacements of the attributes, a mapping between pixel values associated with the first digital frame and corresponding pixel values associated with the subsequent digital frames, the one or more pixel values obtained based on the mapping ([0005] reciting “The method includes: determining a subset of pixels of at least one of a first frame and a second frame; generating a mask indicating the subset of pixels; determining, based on the mask, one or more features associated with the subset of pixels of at least the first frame and the second frame…”; [0043] reciting “The optical flow estimation system 100 can process frames 103 to generate an optical flow map (e.g., an optical flow map 108) by performing optical flow estimation for pixels within a frame of the frames 103. The optical flow map 108 can include one or more optical flow vectors corresponding to the movement of pixels between two frames.”; [0049] reciting “The optical flow map engine 106 of the optical flow estimation system 100 can generate values for the optical flow map 108 for the target frame based on the optical flow vectors. The values of the mask can indicate the displacement of the pixels within the subsequent target frame relative to the source frame.”).  

38	Lin does not explicitly teach applying a grid overlay to a first digital frame in the plurality of digital frames; transforming the grid overlay based on respective displacements of the attributes between the first digital frame and subsequent digital frames of the plurality of digital frames; and obtaining, using the grid overlay as a reference and based on the respective displacements of the attributes…

39	Breton teaches applying a grid overlay to a first digital frame in the plurality of digital frames; transforming the grid overlay based on respective displacements of the attributes between the first digital frame and subsequent digital frames of the plurality of digital frames; and obtaining, using the grid overlay as a reference and based on the respective displacements of the attributes ([Abstract] reciting “An overlay grid with lighting values may be superimposed on an area defined by a light meter on the 3-D graphics model. The lighting values on the overlay grid are associated with the light meter and may vary frame-over-frame. In another embodiment, a JPEG image with a superimposed overlay grid with per-pixel lighting values covering a 3-D graphics model is generated for each frame that includes the 3-D graphics model.”; [0024] reciting “The GPU 130 is configured to perform various tasks related to producing pixel data from the graphics data supplied by the CPU 122. Further, the GPU 130 is configured to store and update the produced pixel data in the frame buffer 126 via the communication path 128 and/or transmit the produced pixel data to the display screen 134 via the communication path 132 for display.”)…

40	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Lin) to incorporate the teachings of Breton to provide a method that provides a grid overlay for the digital frames and pixels provided by the teachings of Lin, while Lin also providing the methods of mapping. Doing so would provide the ability to compute lighting values for every pixel of the 3-D graphics model on a frame-by-frame basis based on the graphics data as stated by Breton ([0027] recited).

41	Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 20220101539 A1) in view of Ravirala et al. (US 11373281 B1).

42	Regarding claim 12, Lin teaches the method of claim 10 (see claims 1 and 10 rejections above), but does not explicitly teach further comprising obtaining, for pixels corresponding to differences of the respective differences that fail to satisfy a threshold value, respective pixel values associated with the pixels as output from a learning model based on providing the at least one digital frame as input to the learning model, the one or more pixel values associated with the portion of the at least one digital frame including the respective pixel values associated with the pixels.

43	Ravirala teaches further comprising obtaining, for pixels corresponding to differences of the respective differences that fail to satisfy a threshold value, respective pixel values associated with the pixels as output from a learning model based on providing the at least one digital frame as input to the learning model, the one or more pixel values associated with the portion of the at least one digital frame including the respective pixel values associated with the pixels ([Page 15; Column 6, Lines 35-38] reciting “For example, the HDR manager 135 may process image data (e.g., image data) from and/or write image data to a local memory of the device 105 or to the database 115.”; [Page 16; Column 7, Lines 61-67; Column 8, Lines 1-6] reciting “In some examples, the device 105 may determine that the set of motion pixels associated with the region of the pixel map fails to satisfy the threshold, and may maintain the first frame of the first subset of frames as the anchor frame based at least in part on the determining that the set of motion pixels associated with the region of the pixel map fails to satisfy the threshold. Alternatively, the device 105 may determine that the set of motion pixels associated with the region of the pixel map satisfies the threshold, and select a second frame of the second subset of frames as the anchor frame based at least in part on the determining that the set of motion pixels associated with the region of the pixel map satisfies the threshold.”).

44	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Lin) to incorporate the teachings of Ravirala to provide a methos that can determine if a threshold fails between two digital frames from a learning model and the frames provided by the teachings of Lin ([0009] recited “In some aspects, the methods, apparatuses, and computer-readable medium described above comprise determining the subset of pixels of at least the first frame and the second frame using a machine learning algorithm.”). Doing so would satisfy the threshold from the association of the pixel map as stated by Ravirala ([Page 16; Column 7, Lines 61-67]).

45	Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lin et al. (US 20220101539 A1) in view of Hsiao et al. (US 20230274399 A1).

46	Regarding claim 13, Lin teaches the method of claim 1 (see claim 1 rejection above), but does not explicitly teach wherein the plurality of masks includes one or more of a first plurality of masks associated with a target attribute to be transformed or a second plurality of masks associated with an attribute that at least partially overlaps with the target attribute in the at least one digital frame, the first plurality of masks defining the portion of the at least one digital frame.

47	Hsiao teaches wherein the plurality of masks includes one or more of a first plurality of masks associated with a target attribute to be transformed or a second plurality of masks associated with an attribute that at least partially overlaps with the target attribute in the at least one digital frame, the first plurality of masks defining the portion of the at least one digital frame ([0004] reciting “The selecting includes determining that an overlap between the combined mask in the reference video frame and the combined mask in the target video frame is less than a first threshold. The selecting further includes determining that an optical flow metric of the combined mask in the reference video frame compared to the combined mask in the target video frame is less than a second threshold. The method further aligns the reference video frame with the target video frame.”; [0007] reciting “In an embodiment, the determining the overlap includes identifying, in the reference video frame, a first set of pixels that correspond to pixels in the combined mask. The locating includes locating, in the target video frame, a second set of pixels that corresponds to the first set of pixels. The determining includes computing a stability score using the first set of pixels and the second set of pixels.”).

48	It would have been obvious to one with ordinary skill before the effective filing date of the claimed invention, to have modified the method (taught by Lin) to incorporate the teachings of Hsiao to provide a way to have overlaps between two frames surrounding the masks provided by the teachings of Lin. Doing so would allow the ability to compute a stability score using the first set of pixels and the second set of pixels as stated by Hsiao ([0007] recited).

Conclusion
49	Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNNY TRAN LE whose telephone number is (571)272-5680. The examiner can normally be reached Mon-Thu: 7:30am-5pm; First Fridays Off; Second Fridays: 7:30am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHNNY T LE/Examiner, Art Unit 2614                                                                                                                                                                                                        
/KENT W CHANG/Supervisory Patent Examiner, Art Unit 2614
Read full office action
TRANSFORMING DIGITAL FRAMES USING RELATIONSHIPS BETWEEN THE DIGITAL FRAMES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

TRANSFORMING DIGITAL FRAMES USING RELATIONSHIPS BETWEEN THE DIGITAL FRAMES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email