Office Action Analysis: 18759111 — REAL-TIME SELFIE PERSPECTIVE UNDISTORTION ON MOBILES BY IM2IM TRANSLATION

Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 4, 6, 7, 9, 12, 14, 15, 17, and 20 are rejected under 35 U.S.C. 103 as being anticipated by Zhao (Learning Perspective Undistortion of Portraits) in view of Kwon (US 2009 0059041 A1).
Regarding independent claims 1, 9, 17,
Zhao teaches:
 Zhao teaches a method of image processing using a network (Zhao Introduction “cascaded network”), comprising the steps of: processing an input image including a face (Zhao Abstract “Near-range portrait photographs often contain perspective distortion artifacts that bias human perception and challenge both facial recognition and reconstruction techniques. We present the first deep learning based approach to remove such artifacts from unconstrained portraits” Note: Here Zhao teaches that images with faces, or portraits, are processed to remove artifacts.). Zhao also teaches performing translation of a warped image to generate an improved image of the face with reduced face distortion by setting a longer camera-to-face distance. (Zhao Abstract “we predict a distortion correction flow map that encodes a per-pixel displacement that removes distortion artifacts when applied to the input image.” 1. Introduction “Perspective distortion artifacts are often observed in portrait photographs, in part due to the popularity of the “selfie” image captured at a near-range distance. The inset images, where a person is photographed from distances of 160cm and 25cm… we propose a cascaded network that maps a near-range portrait with perspective distortion to its distortion-free counterpart at a canonical distance of 1.6m (although any distance between 1.4m ∼ 2m could be used as the target distance for good portraits). Our cascaded network includes a distortion correction flow network and a completion network. Our distortion correction flow method encodes a per-pixel displacement, maintaining the original image’s resolution and its high frequency details in the output.” 3. Overview “The estimated distance and the portrait are fed into our cascaded network including FlowNet, which predicts a distortion correction flow map” Note: Zhao specifically states that a “distortion-free counterpart” is created by removing “perspective distortion” in images “captured at a near-range distance”. Specifically at a range that could be as close as 25cm, and correcting them to be at a “camera-to-subject” distance of 1.6m normally and potentially up to 2m. In other words, the distortion caused from near-range distances is corrected by setting the camera-to-subject distance to be further away.  It is stated explicitly that the images are portraits so we know the subjects are faces. The distortion free counterparts are stated to be handled by a “distortion correction flow network”, which is later defined as “FlowNet”. FlowNet is stated to create a “distortion correction flow map” which maps each individual pixel to a new location by assigning a per-pixel displacement. This shows Zhao teaches the removal of distortion from an image by setting a longer camera-to-face distance. It is also of note that it does this with a distortion correction flow map that attempts to undo distortion by mapping individual pixels to new locations where they would originally be before the distortion, a concept very similar to that of a backwards warping map.)
Zhao does not teach the explicit creation of a “backward warping map”, or performing backwards warping on an input image to generate a backwards warped image, only teaching a distortion correction map.
The creation of a backward warping map which specifically removes distortion present in an image through its backward warping map, and applies backwards warping to an input image to generate a backwards warped image is taught in Kwon:
Kwon teaches performing backwards warping on the input image using the backward warping map to generate a backward warped image; (Kwon ¶33 “The distortion correcting unit 240 corrects the image through backward mapping by using the extracted distortion coefficient .Specifically, in order to correct the radial distortion created in the image, a distortion coefficient is extracted by using a warping equation or a lens distortion model, and the distortion is corrected by image warping by using the extracted distortion coefficient. Image warping is divided into forward mapping and backward mapping … the backward mapping does not generate the holes since coordinates of a distorted image are calculated by using coordinates of a corrected image.” Note: Kwon specifically teaches that backwards warping is performed on an input image to remove distortion and produce an outputted image which is free of distortion. The output image, as it is the output of backwards warping, is a backwards warped image. This backwards warping is specifically stated to be done through “backward mapping” which teaches that backwards warping on our input image uses a backward warping map, and that the map is generated or created as we could not perform backward mapping without having map.) 
It would have been obvious to a person having ordinary skill in the art before the effective filing
date of the claimed invention to modify Zhao’s invention with Kwon in which a backwards warping map for the input image of a face is generated, a backwards warped image is produced from backwards warping on an input image, and the camera-to-face distortion correction is applied to an image that is the output backwards warping.
	There are many reasons that would motivate one to combine Zhao with Kwon’s teachings, one of which is a desire for more accuracy, or less distortion, in an image of a face. Zhao explicitly teaches a method of removing distortion caused by a camera, in this case camera-to-face distance, in which it employs a map to do so, the idea to further employ another type of map such as a backwards warping map to remove distortion as detailed in Kwon would have been obvious.
Regarding claims 4, 12, 	20, dependent on 1, 9, 17, 
Zhao teaches an image warping network receives the input image and generates a map (Zhao 3. Overview “The estimated distance and the portrait are fed into our cascaded network including FlowNet, which predicts a distortion correction flow map,” Note: Zhao does not use the language of a “backward warping map” but here it teaches that another type of map, a flow map that corrects distortion, is created based on an input image. It also teaches that its flow map, which seeks to correct distortion and necessarily involves warping/distorting to unwarp/undistort, is created by a network. This teaches “an image warping network” that generates an undistortion map.). Zhao also teaches that for each pixel in the  image a grid-sampled value is retrieved from the input image based on a flow predicted on that pixel location. (Zhao 3. Overview “Perspective undistortion is not a typical image-to-image translation problem, because the input and output pixels are not spatially corresponded. Thus, we factor this challenging problem into two sub tasks: first finding a per-pixel undistortion flow map, and then image completion via inpainting. In particular, the vectorized flow representation undistorts an input image at its original resolution, preserving its high frequency details, which would be challenging if using only generative image synthesis techniques” Note: Zhao teaches that individual pixels are sampled and assigned their own distortions to make the undistorted/unwarped image, teaching that each pixel in the output of its mapping comes from the input image based on a flow predicted for that specific pixel location. As we sample each pixel in the image individually from the original “grid” or pixel arrangement the image comes in, as opposed to a method like area sampling where we consider multiple pixels at once, the input image’s original individual pixels that are sampled are the claim’s “grid-sampled” values.)
While Zhao teaches an image warping network which creates a distortion correction map wherein each pixel in the undistorted image is a grid-sampled value, in this case individual pixels, to predict a flow on that specific pixel location, it does not teach that the map it creates is a backwards warping map.
The creation of a backwards warping map which is taught in Kwon. Specifically Kwon teaches that it receives the input image and generates the backward warping map (Kwon ¶33 “The distortion correcting unit 240 corrects the image through backward mapping by using the extracted distortion coefficient .Specifically, in order to correct the radial distortion created in the image, a distortion coefficient is extracted by using a warping equation or a lens distortion model, and the distortion is corrected by image warping by using the extracted distortion coefficient. Image warping is divided into forward mapping and backward mapping … the backward mapping does not generate the holes since coordinates of a distorted image are calculated by using coordinates of a corrected image.” Note: Kwon specifically teaches that backwards warping is performed on an input image to remove distortion and produce an outputted image which is free of distortion. The output image, as it is the output of backwards warping, is a backwards warped image. This backwards warping is specifically stated to be done through “backward mapping” which teaches that backwards warping on our input image uses a backward warping map, and that the map is generated or created as we could not perform backward mapping without having map.)
It would have been obvious to a person having ordinary skill in the art before the effective filing
date of the claimed invention to modify Zhao with Kwon where an image warping network does not just receive and image and produce a distortion/warping map but specifically produces a backwards warping map, then use the backwards warping map as the map Zhao uses to produce an output image where each pixel in the image is a grid-sampled value retrieved from the input image based on a flow predicted on that pixel location.
	There are several reasons that would motivate one to do so one of which is maintaining the high frequency details retained in Zhao’s flow distortion correction map while also being able to maintain an even higher accuracy via less distortion which is removed by Kwon’s backwards warping map.
Regarding claims 6, 14, dependent on 4, 12
Zhao teaches the warping enables training of the warping network without direct flow supervision. (Zhao 1. Introduction “Our cascaded network includes a distortion correction flow network and a completion network … Training our proposed networks requires a large corpus of paired portraits with and without perspective distortion.” 4.2 FlowNet “The FlowNet operates on the normalized input image (512 × 512) A and estimates a correction forward flow F that rectifies the facial distortions. However, due to the immense range of possible perspective distortions, the correction displacement for portraits taken at different distances will exhibit different distributions. Directly predicting such high-dimensional per-pixel displacement is highly underconstrained and often leads to inferior results (Figure 11). To ensure more efficient learning … FlowNet takes A and distance label L as input, and it will predict a forward flow map FAB, which can be used to obtain undistorted output B when applied to A” Note: As the specifications provide no direct definition of “direct flow supervision” that is not performed we default to the common use of the term direct flow supervision. The direct supervised learning process of training to predict flows involves utilizing data which specifies correct individual flows for each pixel in all images in a training set. This is “direct” flow supervision as once a learning model attempts to create its own flows, it is directly supervised and evaluated on each per pixel flow it creates using data which correctly details what each pixel’s flow should be. The approach the application specifies to not use direct flow supervision is described in ¶62 “At block 1304, image warping network 502 outputs a backward warping flow map 510 and then backward warping is performed on the input image 102 to generate a backward warped image 504. For each pixel in the backward warped image 504, a grid-sampled value is retrieved from the input image 102 based on the flow predicted on that pixel location. The backward warping is a surjective mapping, therefore ensuring value assignment to every pixel location in the warped results, … The differentiable nature of backward warping enables training of the backward image warping network 502 without direct flow supervision … image translation network 506 performs translation of the backward warped image 504 to generate an improved and undistorted image 106 of the face. Image translation network 506 processes the backward warped image 504 and creates the reconstructed and undistorted output image 106. Image translation network 506, formulated as a U-Net with skipped connections, takes as input the backward warped image 504 and synthesizes its output to match with the ground truth undistorted image. Formally speaking, the image translation network 506 learns a mapping from the warped image domain to the natural image domain under a conditional GAN objective.” The described learning method here trains a network to create a map which will create an output where each individual pixel originates from a “grid-sampled value”, for example a pixel, in the input distorted image. In other words, each pixel in our distorted image is assigned a flow from the predicted map which will displace the pixels to produce an undistorted image. This method is the mapping approach Zhao describes exactly to undistort its images using an undistortion flow map. The application’s network learns only using an undistorted image and a distorted image to learn how to make a complete map which displaces each pixel, and as described this method is said to not use direct flow supervision. We can then confidently say Zhao does not rely on direct flow supervision either, as Zhao describes an extremely similar method in which in the first citation it is said the training of Zhao’s network uses paired portraits with and without distortion to train its network, and later in the next citation that the network will generate a map that assigns flows to each pixel individually.)
	Zhao does not however teach the explicit language of a “backwards warping”, the use of a backwards warping to remove distortion is taught in Kwon which teaches:
	The method of claim 4, wherein the backward warping (Kwon ¶33 “The distortion correcting unit 240 corrects the image through backward mapping by using the extracted distortion coefficient .Specifically, in order to correct the radial distortion created in the image, a distortion coefficient is extracted by using a warping equation or a lens distortion model, and the distortion is corrected by image warping by using the extracted distortion coefficient. Image warping is divided into forward mapping and backward mapping … the backward mapping does not generate the holes since coordinates of a distorted image are calculated by using coordinates of a corrected image.) enables training of the warping network without direct flow supervision.
It would have been obvious to a person having ordinary skill in the art before the effective filing
date of the claimed invention to modify Zhao with Kwon’s inventions where the training of a network without direct flow supervision is not just facilitated by undistorting an image with a flow map, but is specifically facilitated by backwards warping, which is specified to use a backwards warping map to undo distortion.
	There are several reasons that would motivate one to do combine these teachings, one of which is further removing distortion to gain higher output image accuracy. Zhao already seeks to undistort an image with a map so to employ a backwards warping map, a map which undoes a warp/distortion, would have been obvious.
Regarding claims 7, 15, dependent on 4, 12,
Zhao teaches the warped image is refined by an image translation network to generate a final output image that has less distortion than the input image. (Zhao 1“Our distortion correction flow method encodes a per-pixel displacement, maintaining the original image’s resolution and its high frequency details in the output. However, as near-range portraits often suffer from significant perspective occlusions, flowing individual pixels often does not yield a complete final image. Thus, the completion network inpaints any missing features. A final texture blending step combines the face from the completion network and the warped output from the distortion correction network.” Note: Here Zhao teaches that information lost as a result of distortion may be reconstructed with a step that occurs after the backward warp image is created. The completion network refines the image by filling in missing or incomplete details from distortion to create a final output image that has less distortion than the input, teaching all aspects of this claim other than the backwards warped language.)
Zhao does not teach however that the image in question is a backward warped image, meaning it is the result of a backwards warping that has been applied to it.
The presence of backwards warped images which are outputted by a backwards warping process is detailed in Kwon which teaches the backward warped image that has less distortion than the input image (Kwon ¶33 “The distortion correcting unit 240 corrects the image through backward mapping by using the extracted distortion coefficient .Specifically, in order to correct the radial distortion created in the image, a distortion coefficient is extracted by using a warping equation or a lens distortion model, and the distortion is corrected by image warping by using the extracted distortion coefficient. Image warping is divided into forward mapping and backward mapping … the backward mapping does not generate the holes since coordinates of a distorted image are calculated by using coordinates of a corrected image.” Note: As Kwon teaches images are corrected through backward warping, the corrected images are backward warped images.)
It would have been obvious to a person having ordinary skill in the art before the effective filing
date of the claimed invention to modify Zhao with Kwon’s inventions where an image that is refined to produce a final image with less distortion is outputted not just from an undistorted image that has had an undistortion map applied, but is a backwards warped image. 
	There are several reasons that would motivate one to do so, one of which is further removing distortion to gain higher output image accuracy. Zhao already seeks to undistort images and then increase their accuracy using an image translation network, so the idea to start with an image that has been definitively unwarped as it is a backwards warped image to gain higher accuracy in our final image would have been obvious.
Claims 2, 10, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (Learning Perspective Undistortion of Portraits) in view of Feng (Learning an Animatable Detailed 3D Face Model from In-The-Wild Images) and further in view of Kwon (US 2009 0059041 A1).
Regarding claims 2, 10, 18,
Zhao teaches that it generates output camera parameters z or din of the face of the input image, wherein din is a camera-to-face distance of the input image. (Zhao 3. Overview “We pre-process the input portraits with background segmentation, scaling, and spatial alignment (see appendix), and then feed them to a camera distance prediction network to estimates camera-to-subject distance” Note: Here we see Zhao explicitly teaches finding output camera parameters like camera-to-face distance. Referred to in Zhao as camera-to-subject distance, we know the subjects are faces as the input images are portraits.)
While Zhao teaches finding camera parameters such as camera-to-face distance it does not teach the use of a detailed expression capture and animation (DECA) model and the generation of a 3D representation of a face from the input image,
Doing so is taught in Feng which teaches a perspective-aware detailed expression capture and animation (DECA) (Feng Abstract “We present the first approach that regresses 3D face shape and animatable details that are specific to an individual but change with expression. Our model, DECA (Detailed Expression Capture and Animation), is trained to robustly produce a UV displacement map from a low-dimensional latent representation that consists of person-specific detail parameters and generic expression parameters,” Feng 7 “DECA uses a weak perspective camera model” Note: Here in the research paper which debuts DECA it is already described as being aware of perspective to an extent.). Feng also teaches that the DECA model generates output camera parameters and 3D representations of the face of the input image (Feng 1.6 “Given a single face image, DECA reconstructs the 3D face shape with mid-frequency geometric details” 3 Prelminiaries “Camera model: Photographs in existing in-the-wild face datasets are often taken from a distance. We, therefore, use an orthographic camera model c to project the 3D mesh into image space. Face vertices are projected into the image as v = 𝑠Π(𝑀𝑖 ) + t, … The parameters 𝑠, and t are summarized as 𝒄.” Note: Feng teaches the DECA model and its ability to generate 3D representations as well as the ability to generate camera parameters as specific parameters that describe the camera labeled as c has are s and t.)
It would have been obvious to a person having ordinary skill in the art before the effective filing
date of the claimed invention to modify Zhao with Kwon where camera parameters are not just generated but are generated with a DECA model that also generates a 3D face representation, and a camera-to-face distance is also calculated.
	There are several reasons that would motivate one to do so, one of which is increased accuracy of the 3D face models DECA produces. As the camera-to-face distance is an aspect which helps Zhao remove distortion it could also help an extended DECA model do the same. In fact, Zhao is specifically cited as a solution to this problem in Feng 7 “DECA uses a weak perspective camera model. To use DECA to recover head geometry from “selfies”, we would need to extend the method to include the focal length … inferring 3D geometry and focal length from a single image under perspective projection for in-the-wild images is unsolved and likely requires explicit supervision during training (cf. [Zhao et al. 2019]).”
Claims 3, 11, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (Learning Perspective Undistortion of Portraits) in view of Kwon (US 2009 0059041 A1), further in view of Feng (Learning an Animatable Detailed 3D Face Model from In-The-Wild Images) further in view of Markhasin (US 20240193891 A1), and further in view of Kato (Neural 3D Mesh Renderer).
Regarding claims 3, 11, 19, dependent on 2, 10, 18, 
Zhao, and Kwon teach the method of claim 2,
Neither teach the use of the DECA model, which is taught in Feng. Specifically, Feng teaches that the perspective-aware DECA (Feng 7 “DECA uses a weak perspective camera model”) includes an image encoder (Feng 4.1 “We first learn a coarse reconstruction (i.e. in FLAME’s model space) in an analysis-by-synthesis way: given a 2D image 𝐼 as input, we encode the image into a latent code” Note: Here DECA is explicitly stated to encode images, teaching the inclusion of an image encoder.)
Feng does not teach that a DECA model uses differentiable rendering however, doing so is taught in Markhasin which teaches that the perspective-aware DECA includes a differentiable renderer (Markhasin ¶142 “An RGB image (upper row, left) is encoded by a pretrained encoder DECA (which is described in more detail for example in [10]) to extract shape {right arrow over (β)}, pose {right arrow over (θ)} and expression {right arrow over (ψ)} coefficents in FLAME's latent space, which are decoded by FLAME mode (which is described in more detail in [30]) to deformed mesh vertices … Rather than learn texturing from ground truth UV textures, we instead leverage large-scale RGB image datasets of faces, which we use in an adversarial formulation through differentiable rendering.” Note: Here we see that Markhasin teaches a DECA model which includes a differentiable renderer to handle the rendering of the 3D generations it produces.)
Markhasin does not however teach that is differentiable rendering utilizes prospective projection or the calculation of 3D gradients to propagate.
The use of differentiable rendering to do so is taught in Kato which teaches a differentiable renderer (Kato 1 “The machines, too, can act more intelligently by explicitly modeling the 3D world behind 2D images. The process of generating an image from the 3D world is called rendering. Because this lies on the border between the 3D world and 2D images, it is crucially important in computer vision. In recent years, convolutional neural networks (CNNs) have achieved considerable success in 2D image understanding [7, 13]. Therefore, incorporating rendering into neural networks has a high potential for 3D understanding.” Note: Kato teaches a rendering method which involves a network that has an understanding of 3D properties from observing 2D images, this practice of producing 3D qualities from 2D images is differentiable rendering), wherein the differentiable renderer utilizes a perspective projection, and calculates gradients of 3D objects and allows the gradients of 3D objects to be propagated through images. (Kato 2.3 “Using a differentiable feature extractor and loss function, an image that minimizes the loss can be generated via backpropagation and gradient descent.” Kato 1 “Therefore, to enable back-propagation with rendering, we propose an approximate gradient for rendering peculiar to neural networks, which facilitates end-to-end training of a system including rendering. Our proposed renderer can flow gradients into texture, lighting, and cameras as well as object shapes” Note: Kato teaches observed 3D gradients can be propagated to the images such as lighting, texture, and cameras. By cameras Kato refers to the ability of a camera to perceive a 3D object from a certain position to create a 2D image, or in other words the ability to propagate or project perspective alongside gradients of our 3D objects.)
It would have been obvious to a person having ordinary skill in the art before the effective filing
date of the claimed invention to modify Feng with Markhasin’s teachings where the DECA model includes differentiable rendering as a rendering method. It would have similarly been obvious to then modify Markhasin’s teachings where a DECA model which contains differentiable rendering has differentiable rendering that specifically utilizes perspective projection and the calculation of gradients of 3D objects to allow for the propagation of them through images.
	There are several reasons that would motivate one to do so, one of which is the ability to enhance the 3D generations DECA produces with insights learned from differentiable rendering and its ability for projection and propagation. Differentiable rendering allows for 3D characteristics like lighting and texture to be learned from a 2D image, by propagating these 3D gradient qualities DECA could generate a face with more detail. This is especially relevant for DECA as its 3D generations come from 2D images, similar to how Kato’s 3D characteristics are derived from 2D images as well.
Claims 8, 16, are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (Learning Perspective Undistortion of Portraits) in view of in view of Kwon (US 2009 0059041 A1) further in view of Beeler (High-quality passive facial performance capture using anchor frames).
Regarding claims 8, 16, dependent on 1, 14,
Zhao teaches further comprising performing video processing by undistorting frames (Zhao Abstract “we predict a distortion correction flow map that encodes a per-pixel displacement that removes distortion artifacts when applied to the input image.” Note: Zhao teaches the ability to undistort frames as frames are images.) 
Zhao however does not teach the use of anchor frames or the propagation of them for the sake of reducing computation and temporal consistency in video processing, in which the processing is performed offline. 
Doing so is taught in Beeler which teaches performing offline video processing with anchor frames and then propagating them to additional frames to reduce computation and provide temporal consistency. (Beeler 3 “Stage 2: Anchoring. One frame is manually identified as the reference frame (marked “R” in Fig. 2). Frames with similar image appearance (similar face expression and head orientation) are detected automatically and labelled as anchor frames (marked “A” in Fig. 2 and Fig. 3). Anchor frames will provide a way to partition the complete sequence into clips of frames for the processing in Stage 3.” Beeler 2 “Unlike the marker-based and active light approaches, our method can capture detailed expressive performances in full temporal correspondence, with an entirely passive approach.” Beeler Abstract “We present a new technique for passive and markerless facial performance capture based on anchor frames …Our anchored reconstruction approach also limits tracker drift and robustly handles occlusions and motion blur … offer low computation times” Note: Here Beeler discusses a method in which a 3D model can be generated from a video’s frames, in order to save on computation and ensure temporal consistency it identifies anchor frames. Anchor frames are frames that have a face most similar to frames surrounding it, information learned about the face in the anchor frame can be shared to similar frames which Beeler specifically notes improves temporal consistency and reduces computation. As Beeler’s method of processing does not require a lice internet connection, it implicitly teaches offline video processing.)
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify Zhao with Beeler’s teachings in which images/frames are undistorted and frames are also identified as anchor frames allowing undistortion information to be shared to similar frames, and said processing is performed offline.
There are several reasons that would motivate one to do so, one is overcoming the computation and time cost of undistorting multiple similar frames in a video. Rather than distorting each frame individually by identifying anchor frames we can reuse what we learn about warping with similar frames, saving time and reducing computation. The offline component Beeler implicitly teaches is also relevant for this, as speed could be increased when not being limited by the speed of a connection.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao (Learning Perspective Undistortion of Portraits) in view of Kwon (US 2009 0059041 A1) and further in view of Feng (Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network).
	Zhao teaches that the warping network accepts information to guide the warping (Zhao 1 Introduction “As the possible range of per-pixel flow values vary by camera distance, we first train a camera distance prediction network, and feed this prediction along with the input portrait to the distortion correction network.” Note: Here we see Zhao’s distortion correction network, which performs the warping, accepts distance information to help guide its undistortion, a similar process to backwards warping)
	Zhao does not however use the explicit language of “backwards warping”, doing so is taught in Kwon which teaches that it accepts information to guide the backward warping (Kwon ¶33 “The distortion correcting unit 240 corrects the image through backward mapping by using the extracted distortion coefficient .Specifically, in order to correct the radial distortion created in the image, a distortion coefficient is extracted by using a warping equation or a lens distortion model, and the distortion is corrected by image warping by using the extracted distortion coefficient. Image warping is divided into forward mapping and backward mapping … the backward mapping does not generate the holes since coordinates of a distorted image are calculated by using coordinates of a corrected image.” Note: Kwon specifies that when it accepts distorted images it implicitly comes with the information or assumed context that the image has not been distorted randomly but specifically by lens distortion. This allows it to be guided by knowledge of a lens distortion model to perform the backwards warping as stated above.)
	Zhao and Kwon do not teach the ability to use information of a warped face parsing map, a 2D projection of a 3D face, or a previous frame result.
	The use of a 2D projection of a 3D face for making a map is taught in Feng the information is selected from the group of: a warped face parsing map, a 2d projection of a 3d face, or a previous frame result. (Feng Introduction “In this paper, we propose an end-to-end method called Position map Regression Network (PRN) to jointly predict dense alignment and reconstruct 3D face shape. … these are achieved by the elaborate design of the 2D representation of 3D facial structure and the corresponding loss function. Specifically, we design a UV position map, which is a 2D image recording the 3D coordinates of a complete facial point cloud” Note: Here we see Feng teaches that a 2D image recording, or projection, of a 3D face can be constructed. As the network develops a position map to map, or warp, coordinates this also teaches that the information is used to guide warping.)
	It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify Zhao with Kwon where a warping network that is guided in its warping by information performs a backwards warping method specifically, which is similarly guided by information in its warping. The idea to then modify these teachings with Feng in which the information which guides warping is information of a 2D projection or representation of a 3D face.
	There are several benefits of doing so, one of which is considering useful information present in a 3D face structure such as depth or 3D lighting when undistorting and applying the backward warping map without having to store and handle a 3D structure, instead only handling the smaller 2D representation.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALAN GREGORY HAKALA whose telephone number is (571)272-7863. The examiner can normally be reached 8:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at (571) 270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/ALAN GREGORY HAKALA/Examiner, Art Unit 2617                                                                                                                                                                                                        /KING Y POON/Supervisory Patent Examiner, Art Unit 2617
Read full office action
REAL-TIME SELFIE PERSPECTIVE UNDISTORTION ON MOBILES BY IM2IM TRANSLATION

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

REAL-TIME SELFIE PERSPECTIVE UNDISTORTION ON MOBILES BY IM2IM TRANSLATION

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email