DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on January 2, 2026 has been entered.
Response to Amendment
The amendment filed January 2, 2026 has been entered. Claims 1-8, 12, 18-21, and 23-24 remain pending in the application. Applicant’s amendments to the Claims have overcome each and every objection previously set forth in the Final Office Action mailed October 29, 2025.
Response to Arguments
Applicant’s arguments, see Pages 6-7 of Response to Final Office Action, filed January 2, 2026, with respect to the rejection(s) of claim(s) 1-8, 12, 18-21, and 23-24 under 35 USC 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of Adachi et al. (JP 2000172829 A).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-8, 12, 18-21, and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Black et al. (US 10529137 B1) in view of Martin et al. (US 20240005590 A1), Mildenhall et al. (NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis), and Adachi et al. (JP 2000172829 A), hereinafter Black, Martin, Mildenhall, and Adachi respectively.
Regarding claim 1, Black teaches an image deformation apparatus, the apparatus comprising one or more processors and a memory storing in non-transient form data defining program code executable by the one or more processors (Col. 5 lines 53-60, Col. 15 lines 56-64 – “The image augmentation system 205 includes at least one memory 220 and one or more processing units (or processor(s)) 242… When the process 300 is initiated, a set of executable program instructions stored on one or more non-transitory computer-readable media (e.g., hard drive, flash memory, removable media, etc.) may be loaded into memory (e.g., random access memory or “RAM”) of a server or other computing device of the computing environment 200. The executable instructions may then be executed by a hardware-based computer processor (e.g., a central processing unit or “CPU”) of the computing device”; Note: the image augmentation system is equivalent to the image deformation apparatus), the apparatus being configured to:
receive an input image (Col. 16 lines 1-4 – “the image augmentation system 205 receives an input image depicting a human in a certain pose. The input image may also depict various foreground and background objects”);
extract arrangement parameters of a feature from the input image, each arrangement parameter defining a location of a point of the feature (Col. 16 lines 5-10 and 33-37 – “the image augmentation system 205 inputs the image data into a first machine learning model, for instance pose identification model 225A, that is trained to identify pixels corresponding to the human body and a pose depicted by those pixels… the pose can be defined using a 2D array of joint vertices identified in the image data, and the pose-to-model-pose comparison can involve comparison of these joint locations and/or of segments connecting these joints”; Note: the pose identification model extracts the location of the joints of the human, which is equivalent to the arrangement parameters. The body of the human is equivalent to the feature);
extract appearance parameters of the feature from the input image, each appearance parameter defining appearance information of a point of the feature (Col. 16 lines 46-49 and 54-57 – “the image augmentation system 205 inputs the image data into a second machine learning model, for example, shape identification model 225B, trained to identify the shape of the human body…The shape detection model 225B can also generate a texture map representing the skin, hair, and clothing of the human and/or a displacement map representing the actual contours of the human body surfaces”; Note: the shape detection model extracts the texture map of the human, which is equivalent to the appearance parameter);
generate deformed arrangement parameters by modifying the location of at least one point of the feature (Col. 17 lines 7-16 – “the image augmentation rules can specify a model pose representing a model of correct body form/posture for the identified pose. The rules can also specify how the identified pose should be compared to the model pose, and further how the image representation of the human body is to be morphed based on comparison of the identified pose with the model pose. The image augmentation can include morphing the depicted human body into the correct pose by moving the determined 3D representation into alignment with the correct pose”; Note: the location of the human is modified based on the model pose, which results in deformed arrangement parameters);
and render an output image comprising a deformed feature corresponding to the feature in dependence on the deformed arrangement parameters and the appearance parameters (Col. 17 lines 19-28 and 38-39 – “the rendering engine 226 of the image augmentation system 205 applies image augmentation based on the outputs of machine learning models and on image augmentation rules…an underlying skeletal structure of the body model in a rest pose can be aligned into the new pose. Blend skinning can be applied to move the ‘skin vertices’—the vertices of the model of the exterior surfaces of the body—according to the movement of the underlying skeletal structure to which it was attached in the rest pose. The rendering engine 226 can then apply the texture map and/or displacement map generated at block 320 to the morphed body model in order to recreate the particular appearance of the depicted human in the new pose…the image augmentation system 205 outputs the augmented image for display to the user”; Note: the rendering engine produces an output image that corresponds to a new pose, which is the deformed feature, and corresponds to the same appearance as the input),
wherein the one or more processors are configured to (Col. 15 lines 55-64 – “When the process 300 is initiated, a set of executable program instructions stored on one or more non-transitory computer-readable media (e.g., hard drive, flash memory, removable media, etc.) may be loaded into memory (e.g., random access memory or “RAM”) of a server or other computing device of the computing environment 200. The executable instructions may then be executed by a hardware-based computer processor (e.g., a central processing unit or “CPU”) of the computing device”; Note: the processor is configured to execute the process) repeat the steps for at least one subsequent input image to render a corresponding subsequent output image (Col. 15 lines 37-46 and 52-56, Col. 17 lines 38-43 – “The rendering engine 226 takes the pose identification 227A, body shape 227B, and scene elements 227C as inputs together with rules 237 from the image augmentation rules data repository 236. In one example, the rules 237 can specify a model pose representing a model of correct body form/posture for the identified pose, and can specify how the image representation of the human body is to be morphed based on comparison of the pose of the pose identification 227A with the model pose. Based on these inputs, the rendering engine 226 outputs the augmented image 228…The process 300 may begin in response to an event, such as on a predetermined or dynamically determined schedule, on demand when initiated by a user, or in response to some other event…the image augmentation system 205 outputs the augmented image for display to the user. This can serve to provide visual form correction feedback to the user, which can beneficially assist the user in athletic training, physical therapy, or other endeavors that involve precision with body posture”; Note: the process can be repeated multiple times depending on the schedule or as long as the user continues to initiate it. Additionally, a subsequent input image would result in a subsequent output image); and render a 3D output image (Col. 17 lines 13-18 and 35-39 – “The image augmentation can include morphing the depicted human body into the correct pose by moving the determined 3D representation into alignment with the correct pose…the rendering engine 226 can morph the depiction of the human body into the correct pose while maintaining the other semantic content of the image…the image augmentation system 205 outputs the augmented image for display to the user”; Note: a 3D representation is rendered and outputted).
Black does not teach implementing an image deformation model; nor that the output image and the subsequent output image are 2D images and comprise the same deformed feature from different viewpoints. However, Martin teaches implementing an image deformation model (Paragraph 0004 – “generating a deformation model based on the image data, the deformation model describing movements made by the non-rigidly deforming object while the image data was generated”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Martin to implement an image deformation model because using an AI model to perform the process would make it more efficient. Additionally, Martin teaches that the output image and the subsequent output image are 2D images and comprise the same deformed feature from different viewpoints (Fig. 4A and 4B, Paragraph 0016 and 0069 – “To render this five-dimensional function, or NeRF, one can: 1) march camera rays through the scene to generate a sampled set of 3D points, 2) use those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities, and 3) use classical volume rendering techniques to accumulate those colors and densities into a 2D image…The effect of the deformation field on a subject is illustrated in FIGS. 4A, 4B, and 4C. FIG. 4A is a diagram that illustrates an example pose 400 of a human subject in an observation frame. FIG. 4B is a diagram that illustrates an example pose 450 of a human subject in a canonical frame. In both FIGS. 4A and 4B, the subjects are shown with insets showing orthographic views in the forward and left directions. In FIG. 4A (the observation frame), note the right-to-left and front-to-back displacements between the observation and canonical model, which are modeled by the deformation field for this observation”; Note: Fig. 4A and 4B shows 2D output images of the same deformed feature, a person, from different viewpoints. The subsequent output image was previously taught by Black). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Martin to have the output images be 2D and comprise the same deformed feature from different viewpoints for the benefit of enhancing the user experience by allowing the user to see the feature from multiple angles. For instance, in the invention of Black, the output is used to show correct yoga poses. Being able to see the poses from different angles would help the user understand and copy the form better.
PNG
media_image1.png
354
250
media_image1.png
Greyscale
Screenshot of Fig. 4A and 4B (taken from Martin)
Black modified by Martin still does not teach rendering a 3D output image from the at least two output images. However, Mildenhall teaches rendering a 3D output image from the at least two output images (Fig. 1, Paragraph 1 on Page 2 – “We present a method that optimizes a continuous 5D neural radiance field representation (volume density and view-dependent color at any continuous location) of a scene from a set of input images. We use techniques from volume rendering to accumulate samples of this scene representation along rays to render the scene from any viewpoint. Here, we visualize the set of 100 input views of the synthetic Drums scene randomly captured on a surrounding hemisphere, and we show two novel views rendered from our optimized NeRF representation”; Note: a rendered image from a 3D scene is generated from multiple 2D images of the drums; see screenshot of Fig. 1 below).
PNG
media_image2.png
358
1434
media_image2.png
Greyscale
Screenshot of Fig. 1 (taken from Mildenhall)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Mildenhall to render a 3D image from multiple other images for the benefit of view synthesis, which is not only common in the art, but allows for realistic and detailed 3D renderings.
Finally, Black modified by Martin and Mildenhall still does not teach wherein the appearance parameters comprise a density of pixels for each of a plurality of points of the feature, wherein a first point of the feature has a different pixel density than a second point of the feature. However, Adachi teaches wherein the appearance parameters comprise a density of pixels for each of a plurality of points of the feature, wherein a first point of the feature has a different pixel density than a second point of the feature (Paragraph 0008, 0023-0024 – “the feature is the minimum and maximum density values of the surrounding pixels including the specified position and the average or maximum error of the density difference between the pixels adjacent to the specified position, and these feature are used to determine the boundary points of the region of interest…Each pixel has a density value G. For example, if the input data is 12 bits, the density value G has a gray scale value of 2@12 gradations. The input data may be either 10 bits or 8 bits. Therefore, if the densities of the 81 dots of pixels in the initial region 123 are G(j+4,k+4), ..., G(j+4,k-4), ..., G(j-4,k+4), ..., G(j-4,k-4), by calculating the maximum value Gmx and minimum value Gmn of these density values, these values indicate the trend of the entire initial region 123 and therefore become initial reference values for determining global changes within the initial region 123”; Note: the region of interest is equivalent to the feature, and the density of the pixels are the appearance parameters. The point containing the maximum density value is a first point, which has a different pixel density than a second point containing the minimum density value). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Adachi to have the appearance parameters comprise pixel densities for points of a feature for the benefit of being able to maintain the feature’s brightness/darkness even when the position of the feature changes. This will help create a natural and realistic appearance for the feature since pixel density can greatly affect how a feature looks. Additionally, while the method of Adachi pertains to biological imaging, it can be applied to any kind of digital image, as all digital images contain pixels and pixel densities. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Adachi to have the points of the features have different pixel densities because logically, it is not likely for a feature in an image to have the same pixel density throughout. Lighting usually makes it so that parts of an image may be darker or lighter than other parts, which means that there will commonly be at least one point with a different pixel density from another point.
Regarding claim 2, Black in view of Martin, Mildenhall, and Adachi teaches the image deformation apparatus of claim 1. Black further teaches wherein the one or more processors are configured to (Col. 15 lines 45-47 and 55-64 – “the rendering engine 226 outputs the augmented image 228…When the process 300 is initiated, a set of executable program instructions stored on one or more non-transitory computer-readable media (e.g., hard drive, flash memory, removable media, etc.) may be loaded into memory (e.g., random access memory or “RAM”) of a server or other computing device of the computing environment 200. The executable instructions may then be executed by a hardware-based computer processor (e.g., a central processing unit or “CPU”) of the computing device”; Note: the processor is configured to execute the process) render the output image (Col. 17 lines 13-18 and 35-39 – “The image augmentation can include morphing the depicted human body into the correct pose by moving the determined 3D representation into alignment with the correct pose…the rendering engine 226 can morph the depiction of the human body into the correct pose while maintaining the other semantic content of the image…the image augmentation system 205 outputs the augmented image for display to the user”; Note: a 3D representation is rendered and outputted), the location of the deformed feature being defined by the deformed arrangement parameters and the appearance of the pixels being defined by the appearance parameters (Col. 17 lines 19-28 – “the rendering engine 226 of the image augmentation system 205 applies image augmentation based on the outputs of machine learning models and on image augmentation rules…an underlying skeletal structure of the body model in a rest pose can be aligned into the new pose. Blend skinning can be applied to move the ‘skin vertices’—the vertices of the model of the exterior surfaces of the body—according to the movement of the underlying skeletal structure to which it was attached in the rest pose. The rendering engine 226 can then apply the texture map and/or displacement map generated at block 320 to the morphed body model in order to recreate the particular appearance of the depicted human in the new pose”; Note: the rendering engine produces an output image that corresponds to a new pose, which is the deformed feature, and corresponds to the same appearance as the input. Having a new pose means that the location of part of the body is modified. The appearance of the pixels is defined by the texture map, which can be considered the appearance parameters). Black does not teach rendering the output image by casting rays from pixels of the output image. However, Mildenhall teaches rendering the output image by casting rays from pixels of the output image (Paragraph 2 on Page 6 – “The function T(t) denotes the accumulated transmittance along the ray from tn to t, i.e., the probability that the ray travels from tn to t without hitting any other particle. Rendering a view from our continuous neural radiance field requires estimating this integral C(r) for a camera ray traced through each pixel of the desired virtual camera”; Note: rays are casted from pixels to render the image). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Mildenhall to render the output image by casting rays from the pixels of the image because ray-tracing is a common rendering technique in the art, and it would assist in providing realistic images, especially in terms of lighting. Using ray-tracing may provide a better visual experience for the user.
Regarding claim 3, Black in view of Martin, Mildenhall, and Adachi teaches the image deformation apparatus of claim 1. Black further teaches wherein the one or more processors are configured to (Col. 15 lines 55-64 – “When the process 300 is initiated, a set of executable program instructions stored on one or more non-transitory computer-readable media (e.g., hard drive, flash memory, removable media, etc.) may be loaded into memory (e.g., random access memory or “RAM”) of a server or other computing device of the computing environment 200. The executable instructions may then be executed by a hardware-based computer processor (e.g., a central processing unit or “CPU”) of the computing device”; Note: the processor is configured to execute the process) generate further deformed arrangement parameters by further modifying the location of at least one point of the feature (Col. 18 lines 16-20 – “FIG. 4E depicts representations of the human depicted in the data illustrated in FIGS. 4A-4D morphed into multiple different poses. This illustrates the textured model 440 reposed and with the texturing of the texture map 415 applied”; Note: further deformation occurs, and as shown in the screenshot of Fig. 4E below, the location of the different parts of the body were modified); and render a further output image comprising a further deformed feature corresponding to the feature in dependence on the further deformed arrangement parameters and the appearance parameters (Col. 17 lines 4-16 and 38-39, Col. 18 lines 16-20 – “the rendering engine 226 of the image augmentation system 205 applies image augmentation based on the outputs of machine learning models and on image augmentation rules. For example, the image augmentation rules can specify a model pose representing a model of correct body form/posture for the identified pose. The rules can also specify how the identified pose should be compared to the model pose, and further how the image representation of the human body is to be morphed based on comparison of the identified pose with the model pose. The image augmentation can include morphing the depicted human body into the correct pose by moving the determined 3D representation into alignment with the correct pose… the image augmentation system 205 outputs the augmented image for display to the user…FIG. 4E depicts representations of the human depicted in the data illustrated in FIGS. 4A-4D morphed into multiple different poses. This illustrates the textured model 440 reposed and with the texturing of the texture map 415 applied”; Note: a further output image is generated, as shown in Fig. 4E, where the appearance is maintained by the texture map, and the arrangement parameters are deformed to create different poses).
PNG
media_image3.png
714
565
media_image3.png
Greyscale
Screenshot of Fig. 4E (taken from Black)
Regarding claim 4, Black in view of Martin, Mildenhall, and Adachi teaches the image deformation apparatus of claim 1. Black further teaches wherein the one or more processors are configured to (Col. 15 lines 45-47 and 55-64 – “the rendering engine 226 outputs the augmented image 228…When the process 300 is initiated, a set of executable program instructions stored on one or more non-transitory computer-readable media (e.g., hard drive, flash memory, removable media, etc.) may be loaded into memory (e.g., random access memory or “RAM”) of a server or other computing device of the computing environment 200. The executable instructions may then be executed by a hardware-based computer processor (e.g., a central processing unit or “CPU”) of the computing device”; Note: the processor is configured to execute the process) render the further output image (Col. 17 lines 13-18 and 35-39, Col. 18 lines 16-20 – “The image augmentation can include morphing the depicted human body into the correct pose by moving the determined 3D representation into alignment with the correct pose…the rendering engine 226 can morph the depiction of the human body into the correct pose while maintaining the other semantic content of the image…the image augmentation system 205 outputs the augmented image for display to the user…FIG. 4E depicts representations of the human depicted in the data illustrated in FIGS. 4A-4D morphed into multiple different poses. This illustrates the textured model 440 reposed and with the texturing of the texture map 415 applied”; Note: a 3D representation is rendered and outputted. Additionally, see screenshot of Fig. 4E above, which shows further output images), the location of the further deformed feature being defined by the further deformed arrangement parameters (Col. 18 lines 16-20 – “FIG. 4E depicts representations of the human depicted in the data illustrated in FIGS. 4A-4D morphed into multiple different poses. This illustrates the textured model 440 reposed and with the texturing of the texture map 415 applied”; Note: further deformation occurs, and as shown in Fig. 4E above, the location of the different parts of the body were modified) and the appearance of the pixels being defined by the appearance parameters (Col. 17 lines 19-28 – “the rendering engine 226 of the image augmentation system 205 applies image augmentation based on the outputs of machine learning models and on image augmentation rules…an underlying skeletal structure of the body model in a rest pose can be aligned into the new pose. Blend skinning can be applied to move the ‘skin vertices’—the vertices of the model of the exterior surfaces of the body—according to the movement of the underlying skeletal structure to which it was attached in the rest pose. The rendering engine 226 can then apply the texture map and/or displacement map generated at block 320 to the morphed body model in order to recreate the particular appearance of the depicted human in the new pose”; Note: the rendering engine produces an output image that corresponds to a new pose, which is the deformed feature, and corresponds to the same appearance as the input. Having a new pose means that the location of part of the body is modified. The appearance of the pixels is defined by the texture map, which can be considered the appearance parameters). Black does not teach rendering the further output image by casting rays from pixels of the further output image. However, Mildenhall teaches rendering the output image by casting rays from pixels of the output image (Paragraph 2 on Page 6 – “The function T(t) denotes the accumulated transmittance along the ray from tn to t, i.e., the probability that the ray travels from tn to t without hitting any other particle. Rendering a view from our continuous neural radiance field requires estimating this integral C(r) for a camera ray traced through each pixel of the desired virtual camera”; Note: rays are casted from pixels to render the image). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Mildenhall to render the output image by casting rays from the pixels of the image because ray-tracing is a common rendering technique in the art, and it would assist in providing realistic images, especially in terms of lighting. Using ray-tracing may provide a better visual experience for the user.
Regarding claim 5, Black in view of Martin, Mildenhall, and Adachi teaches the image deformation apparatus of claim 1. Black further teaches wherein the feature comprises a human or animal body (Col. 16 lines 1-2 and 5-10 – “the image augmentation system 205 receives an input image depicting a human in a certain pose… the image augmentation system 205 inputs the image data into a first machine learning model, for instance pose identification model 225A, that is trained to identify pixels corresponding to the human body and a pose depicted by those pixels”; Note: the feature is a human body).
Regarding claim 6, Black in view of Martin, Mildenhall, and Adachi teaches the image deformation apparatus of claim 1. Black further teaches wherein the arrangement parameters are indicative of a pose of the feature (Col. 16 lines 5-13 – “the image augmentation system 205 inputs the image data into a first machine learning model, for instance pose identification model 225A, that is trained to identify pixels corresponding to the human body and a pose depicted by those pixels. For example, the pose identification model 225A can be trained to identify a set of yoga poses, body poses during a batting swing or golf swing, physical therapy poses, or other sets of poses”; Note: the arrangement parameters indicate a pose of the human body).
Regarding claim 7, Black in view of Martin, Mildenhall, and Adachi teaches the image deformation apparatus of claim 1. Black does not directly teach wherein the arrangement parameters are indicative of a shape of the feature. However, Black separately teaches arrangement parameters (Col. 16 lines 5-10 and 33-37 – “the image augmentation system 205 inputs the image data into a first machine learning model, for instance pose identification model 225A, that is trained to identify pixels corresponding to the human body and a pose depicted by those pixels… the pose can be defined using a 2D array of joint vertices identified in the image data, and the pose-to-model-pose comparison can involve comparison of these joint locations and/or of segments connecting these joints”; Note: the pose identification model extracts the location of the joints of the human, which is equivalent to the arrangement parameters) and a shape of the feature (Col. 19 lines 21-26 – “The body model is defined as a function M(β, θ, γ), parameterized by shape β, pose θ, and translation γ. The output of the function is a triangulated surface, M, with 6890 vertices in this example. Shape parameters are coefficients of a low-dimensional shape space, learned from a training set of thousands of registered scans”; Note: there are parameters indicative of the shape of the body). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Black to have the shape be an arrangement parameter because it would allow for the shape of the subject to be modified for instances like changing facial expressions or changing height. This would be beneficial for use in different fields like character customization, where a user may prefer to have their character appear like themselves.
Regarding claim 8, Black in view of Martin, Mildenhall, and Adachi teaches the image deformation apparatus of claim 1. Black does not directly teach wherein the appearance parameters comprise a colour of pixels of the feature. However, Black separately teaches appearance parameters (Col. 16 lines 46-49 and 54-57 – “the image augmentation system 205 inputs the image data into a second machine learning model, for example, shape identification model 225B, trained to identify the shape of the human body…The shape detection model 225B can also generate a texture map representing the skin, hair, and clothing of the human and/or a displacement map representing the actual contours of the human body surfaces”; Note: the shape detection model extracts the texture map of the human, which is equivalent to the appearance parameter) and color of pixels (Col. 6 lines 35-47 – “The pose detection model can utilize methods including fine-grained body-aware image analysis such as per-pixel image segmentation, depth estimation, lighting estimation and color/texture extraction to understand lighting conditions, details about the body and any clothing, occlusion (body parts and objects that are hidden by other objects), and to detect planes (surfaces, ceilings, walls, floors, etc.)”; Note: color is extracted). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Black to have the pixel color be an appearance parameter because the color of the pixels is an important part of the appearance of an image. Therefore, maintaining the pixel color would help ensure that the appearance of the subject of an image remains similar or the same to the original.
Regarding claim 12, Black teaches a method for deforming an image (Col. 15 lines 48-51, Col. 17 lines 35-37 – “FIG. 3 is a flow diagram of an illustrative machine learning process 300 for generating augmented images using the computing environment 200 of FIG. 2A or another suitable computing system…the rendering engine 226 can morph the depiction of the human body into the correct pose while maintaining the other semantic content of the image”; Note: Fig. 3 shows a method, which involves morphing an image of a person, which is equivalent to deforming the image), the method comprising:
receiving an input image (Col. 16 lines 1-4 – “the image augmentation system 205 receives an input image depicting a human in a certain pose. The input image may also depict various foreground and background objects”);
extracting arrangement parameters of a feature from the input image, each arrangement parameter defining a location of a point of the feature (Col. 16 lines 5-10 and 33-37 – “the image augmentation system 205 inputs the image data into a first machine learning model, for instance pose identification model 225A, that is trained to identify pixels corresponding to the human body and a pose depicted by those pixels… the pose can be defined using a 2D array of joint vertices identified in the image data, and the pose-to-model-pose comparison can involve comparison of these joint locations and/or of segments connecting these joints”; Note: the pose identification model extracts the location of the joints of the human, which is equivalent to the arrangement parameters. The body of the human is equivalent to the feature);
extracting appearance parameters of the feature from the input image, each appearance parameter defining appearance information of a point of the feature (Col. 16 lines 46-49 and 54-57 – “the image augmentation system 205 inputs the image data into a second machine learning model, for example, shape identification model 225B, trained to identify the shape of the human body…The shape detection model 225B can also generate a texture map representing the skin, hair, and clothing of the human and/or a displacement map representing the actual contours of the human body surfaces”; Note: the shape detection model extracts the texture map of the human, which is equivalent to the appearance parameter);
generating deformed arrangement parameters by modifying the location of at least one point of the feature (Col. 17 lines 7-16 – “the image augmentation rules can specify a model pose representing a model of correct body form/posture for the identified pose. The rules can also specify how the identified pose should be compared to the model pose, and further how the image representation of the human body is to be morphed based on comparison of the identified pose with the model pose. The image augmentation can include morphing the depicted human body into the correct pose by moving the determined 3D representation into alignment with the correct pose”; Note: the location of the human is modified based on the model pose, which results in deformed arrangement parameters);
and rendering an output image comprising a deformed feature corresponding to the feature in dependence on the deformed arrangement parameters and the appearance parameters (Col. 17 lines 19-28 and 38-39 – “the rendering engine 226 of the image augmentation system 205 applies image augmentation based on the outputs of machine learning models and on image augmentation rules…an underlying skeletal structure of the body model in a rest pose can be aligned into the new pose. Blend skinning can be applied to move the ‘skin vertices’—the vertices of the model of the exterior surfaces of the body—according to the movement of the underlying skeletal structure to which it was attached in the rest pose. The rendering engine 226 can then apply the texture map and/or displacement map generated at block 320 to the morphed body model in order to recreate the particular appearance of the depicted human in the new pose…the image augmentation system 205 outputs the augmented image for display to the user”; Note: the rendering engine produces an output image that corresponds to a new pose, which is the deformed feature, and corresponds to the same appearance as the input),
repeat the steps for at least one subsequent input image to render a corresponding subsequent output image (Col. 15 lines 37-46 and 52-56, Col. 17 lines 38-43 – “The rendering engine 226 takes the pose identification 227A, body shape 227B, and scene elements 227C as inputs together with rules 237 from the image augmentation rules data repository 236. In one example, the rules 237 can specify a model pose representing a model of correct body form/posture for the identified pose, and can specify how the image representation of the human body is to be morphed based on comparison of the pose of the pose identification 227A with the model pose. Based on these inputs, the rendering engine 226 outputs the augmented image 228…The process 300 may begin in response to an event, such as on a predetermined or dynamically determined schedule, on demand when initiated by a user, or in response to some other event…the image augmentation system 205 outputs the augmented image for display to the user. This can serve to provide visual form correction feedback to the user, which can beneficially assist the user in athletic training, physical therapy, or other endeavors that involve precision with body posture”; Note: the process can be repeated multiple times depending on the schedule or as long as the user continues to initiate it. Additionally, a subsequent input image would result in a subsequent output image); and render a 3D output image (Col. 17 lines 13-18 and 35-39 – “The image augmentation can include morphing the depicted human body into the correct pose by moving the determined 3D representation into alignment with the correct pose…the rendering engine 226 can morph the depiction of the human body into the correct pose while maintaining the other semantic content of the image…the image augmentation system 205 outputs the augmented image for display to the user”; Note: a 3D representation is rendered and outputted).
Black does not teach implementing an image deformation model; nor that the output image and the subsequent output image are 2D images and comprise the same deformed feature from different viewpoints. However, Martin teaches implementing an image deformation model (Paragraph 0004 – “generating a deformation model based on the image data, the deformation model describing movements made by the non-rigidly deforming object while the image data was generated”). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Martin to implement an image deformation model because using an AI model to perform the process would make it more efficient. Additionally, Martin teaches that the output image and the subsequent output image are 2D images and comprise the same deformed feature from different viewpoints (Fig. 4A and 4B, Paragraph 0016 and 0069 – “To render this five-dimensional function, or NeRF, one can: 1) march camera rays through the scene to generate a sampled set of 3D points, 2) use those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities, and 3) use classical volume rendering techniques to accumulate those colors and densities into a 2D image…The effect of the deformation field on a subject is illustrated in FIGS. 4A, 4B, and 4C. FIG. 4A is a diagram that illustrates an example pose 400 of a human subject in an observation frame. FIG. 4B is a diagram that illustrates an example pose 450 of a human subject in a canonical frame. In both FIGS. 4A and 4B, the subjects are shown with insets showing orthographic views in the forward and left directions. In FIG. 4A (the observation frame), note the right-to-left and front-to-back displacements between the observation and canonical model, which are modeled by the deformation field for this observation”; Note: Fig. 4A and 4B shows 2D output images of the same deformed feature, a person, from different viewpoints. The subsequent output image was previously taught by Black). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Martin to have the output images be 2D and comprise the same deformed feature from different viewpoints for the benefit of enhancing the user experience by allowing the user to see the feature from multiple angles. For instance, in the invention of Black, the output is used to show correct yoga poses. Being able to see the poses from different angles would help the user understand and copy the form better.
Black modified by Martin still does not teach rendering a 3D output image from the at least two output images. However, Mildenhall teaches rendering a 3D output image from the at least two output images (Fig. 1, Paragraph 1 on Page 2 – “We present a method that optimizes a continuous 5D neural radiance field representation (volume density and view-dependent color at any continuous location) of a scene from a set of input images. We use techniques from volume rendering to accumulate samples of this scene representation along rays to render the scene from any viewpoint. Here, we visualize the set of 100 input views of the synthetic Drums scene randomly captured on a surrounding hemisphere, and we show two novel views rendered from our optimized NeRF representation”; Note: a rendered image from a 3D scene is generated from multiple 2D images of the drums; see screenshot of Fig. 1 above). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Mildenhall to render a 3D image from multiple other images for the benefit of view synthesis, which is not only common in the art, but allows for realistic and detailed 3D renderings.
Finally, Black modified by Martin and Mildenhall still does not teach wherein the appearance parameters comprise a density of pixels for each of a plurality of points of the feature, wherein a first point of the feature has a different pixel density than a second point of the feature. However, Adachi teaches wherein the appearance parameters comprise a density of pixels for each of a plurality of points of the feature, wherein a first point of the feature has a different pixel density than a second point of the feature (Paragraph 0008, 0023-0024 – “the feature is the minimum and maximum density values of the surrounding pixels including the specified position and the average or maximum error of the density difference between the pixels adjacent to the specified position, and these feature are used to determine the boundary points of the region of interest…Each pixel has a density value G. For example, if the input data is 12 bits, the density value G has a gray scale value of 2@12 gradations. The input data may be either 10 bits or 8 bits. Therefore, if the densities of the 81 dots of pixels in the initial region 123 are G(j+4,k+4), ..., G(j+4,k-4), ..., G(j-4,k+4), ..., G(j-4,k-4), by calculating the maximum value Gmx and minimum value Gmn of these density values, these values indicate the trend of the entire initial region 123 and therefore become initial reference values for determining global changes within the initial region 123”; Note: the region of interest is equivalent to the feature, and the density of the pixels are the appearance parameters. The point containing the maximum density value is a first point, which has a different pixel density than a second point containing the minimum density value). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Adachi to have the appearance parameters comprise pixel densities for points of a feature for the benefit of being able to maintain the feature’s brightness/darkness even when the position of the feature changes. This will help create a natural and realistic appearance for the feature since pixel density can greatly affect how a feature looks. Additionally, while the method of Adachi pertains to biological imaging, it can be applied to any kind of digital image, as all digital images contain pixels and pixel densities. It also would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Adachi to have the points of the features have different pixel densities because logically, it is not likely for a feature in an image to have the same pixel density throughout. Lighting usually makes it so that parts of an image may be darker or lighter than other parts, which means that there will commonly be at least one point with a different pixel density from another point.
Regarding claim 18, Black in view of Martin, Mildenhall, and Adachi teaches the method of claim 12. Black further teaches wherein the feature comprises a human or animal body (Col. 16 lines 1-2 and 5-10 – “the image augmentation system 205 receives an input image depicting a human in a certain pose… the image augmentation system 205 inputs the image data into a first machine learning model, for instance pose identification model 225A, that is trained to identify pixels corresponding to the human body and a pose depicted by those pixels”; Note: the feature is a human body).
Regarding claim 19, Black in view of Martin, Mildenhall, and Adachi teaches the method of claim 12. Black further teaches wherein the arrangement parameters are indicative of a pose of the feature (Col. 16 lines 5-13 – “the image augmentation system 205 inputs the image data into a first machine learning model, for instance pose identification model 225A, that is trained to identify pixels corresponding to the human body and a pose depicted by those pixels. For example, the pose identification model 225A can be trained to identify a set of yoga poses, body poses during a batting swing or golf swing, physical therapy poses, or other sets of poses”; Note: the arrangement parameters indicate a pose of the human body).
Regarding claim 20, Black in view of Martin, Mildenhall, and Adachi teaches the method of claim 12. Black does not directly teach wherein the arrangement parameters are indicative of a shape of the feature. However, Black separately teaches arrangement parameters (Col. 16 lines 5-10 and 33-37 – “the image augmentation system 205 inputs the image data into a first machine learning model, for instance pose identification model 225A, that is trained to identify pixels corresponding to the human body and a pose depicted by those pixels… the pose can be defined using a 2D array of joint vertices identified in the image data, and the pose-to-model-pose comparison can involve comparison of these joint locations and/or of segments connecting these joints”; Note: the pose identification model extracts the location of the joints of the human, which is equivalent to the arrangement parameters) and a shape of the feature (Col. 19 lines 21-26 – “The body model is defined as a function M(β, θ, γ), parameterized by shape β, pose θ, and translation γ. The output of the function is a triangulated surface, M, with 6890 vertices in this example. Shape parameters are coefficients of a low-dimensional shape space, learned from a training set of thousands of registered scans”; Note: there are parameters indicative of the shape of the body). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Black to have the shape be an arrangement parameter because it would allow for the shape of the subject to be modified for instances like changing facial expressions or changing height. This would be beneficial for use in different fields like character customization, where a user may prefer to have their character appear like themselves.
Regarding claim 21, Black in view of Martin, Mildenhall, and Adachi teaches the method of claim 12. Black does not directly teach wherein the appearance parameters comprise a colour of pixels of the feature. However, Black separately teaches appearance parameters (Col. 16 lines 46-49 and 54-57 – “the image augmentation system 205 inputs the image data into a second machine learning model, for example, shape identification model 225B, trained to identify the shape of the human body…The shape detection model 225B can also generate a texture map representing the skin, hair, and clothing of the human and/or a displacement map representing the actual contours of the human body surfaces”; Note: the shape detection model extracts the texture map of the human, which is equivalent to the appearance parameter) and color of pixels (Col. 6 lines 35-47 – “The pose detection model can utilize methods including fine-grained body-aware image analysis such as per-pixel image segmentation, depth estimation, lighting estimation and color/texture extraction to understand lighting conditions, details about the body and any clothing, occlusion (body parts and objects that are hidden by other objects), and to detect planes (surfaces, ceilings, walls, floors, etc.)”; Note: color is extracted). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Black to have the pixel color be an appearance parameter because the color of the pixels is an important part of the appearance of an image. Therefore, maintaining the pixel color would help ensure that the appearance of the subject of an image remains similar or the same to the original.
Regarding claim 23, Black in view of Martin, Mildenhall, and Adachi teaches the method of claim 12. Black further teaches rendering the output image (Col. 17 lines 13-18 and 35-39 – “The image augmentation can include morphing the depicted human body into the correct pose by moving the determined 3D representation into alignment with the correct pose…the rendering engine 226 can morph the depiction of the human body into the correct pose while maintaining the other semantic content of the image…the image augmentation system 205 outputs the augmented image for display to the user”; Note: a 3D representation is rendered and outputted), the location of the deformed feature being defined by the deformed arrangement parameters and the appearance of the pixels being defined by the appearance parameters (Col. 17 lines 19-28 – “the rendering engine 226 of the image augmentation system 205 applies image augmentation based on the outputs of machine learning models and on image augmentation rules…an underlying skeletal structure of the body model in a rest pose can be aligned into the new pose. Blend skinning can be applied to move the ‘skin vertices’—the vertices of the model of the exterior surfaces of the body—according to the movement of the underlying skeletal structure to which it was attached in the rest pose. The rendering engine 226 can then apply the texture map and/or displacement map generated at block 320 to the morphed body model in order to recreate the particular appearance of the depicted human in the new pose”; Note: the rendering engine produces an output image that corresponds to a new pose, which is the deformed feature, and corresponds to the same appearance as the input. Having a new pose means that the location of part of the body is modified. The appearance of the pixels is defined by the texture map, which can be considered the appearance parameters). Black does not teach rendering the output image by casting rays from pixels of the output image. However, Mildenhall teaches rendering the output image by casting rays from pixels of the output image (Paragraph 2 on Page 6 – “The function T(t) denotes the accumulated transmittance along the ray from tn to t, i.e., the probability that the ray travels from tn to t without hitting any other particle. Rendering a view from our continuous neural radiance field requires estimating this integral C(r) for a camera ray traced through each pixel of the desired virtual camera”; Note: rays are casted from pixels to render the image). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Black to incorporate the teachings of Mildenhall to render the output image by casting rays from the pixels of the image because ray-tracing is a common rendering technique in the art, and it would assist in providing realistic images, especially in terms of lighting. Using ray-tracing may provide a better visual experience for the user.
Regarding claim 24, Black in view of Martin, Mildenhall, and Adachi teaches the method of claim 12. Black further teaches generating further deformed arrangement parameters by further modifying the location of at least one point of the feature (Col. 18 lines 16-20 – “FIG. 4E depicts representations of the human depicted in the data illustrated in FIGS. 4A-4D morphed into multiple different poses. This illustrates the textured model 440 reposed and with the texturing of the texture map 415 applied”; Note: further deformation occurs, and as shown in the screenshot of Fig. 4E below, the location of the different parts of the body were modified); and rendering a further output image comprising a further deformed feature corresponding to the feature in dependence on the further deformed arrangement parameters and the appearance parameters (Col. 17 lines 4-16 and 38-39, Col. 18 lines 16-20 – “the rendering engine 226 of the image augmentation system 205 applies image augmentation based on the outputs of machine learning models and on image augmentation rules. For example, the image augmentation rules can specify a model pose representing a model of correct body form/posture for the identified pose. The rules can also specify how the identified pose should be compared to the model pose, and further how the image representation of the human body is to be morphed based on comparison of the identified pose with the model pose. The image augmentation can include morphing the depicted human body into the correct pose by moving the determined 3D representation into alignment with the correct pose… the image augmentation system 205 outputs the augmented image for display to the user…FIG. 4E depicts representations of the human depicted in the data illustrated in FIGS. 4A-4D morphed into multiple different poses. This illustrates the textured model 440 reposed and with the texturing of the texture map 415 applied”; Note: a further output image is generated, as shown in Fig. 4E, where the appearance is maintained by the texture map, and the arrangement parameters are deformed to create different poses).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Yamazaki (JP 2014238751 A) teaches a method of generating an image, which involves capturing an image of an object, changing the position of the image data while maintaining the object appearance, and outputting an image with the changed position.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE HAU MA whose telephone number is (571)272-2187. The examiner can normally be reached M-Th 7-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Poon can be reached at (571) 270-0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHELLE HAU MA/Examiner, Art Unit 2617 /KING Y POON/Supervisory Patent Examiner, Art Unit 2617