Office Action Analysis: 18472923 — COMPUTING DEVICE AND METHOD FOR REALISTIC VISUALIZATION OF DIGITAL HUMAN

Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of the Claims
	Claims 1-15 are currently pending in the present application, with claims 1, 8, and 15 being independent.
Response to Amendments / Arguments
Applicant's arguments filed 10/20/2025 have been fully considered but they are not persuasive.
Applicant argues: Lee et al. (KR 2022/0059434) (“Lee”) fails to disclose “wherein the obtaining the frame data of the second rendered video includes visualizing only a specific region of the digital human when rendering the digital human, by detecting a facial region, detecting facial feature points in the facial region using the frame data of the first rendered video, and calculating transformation information for extracting the facial region based on the detected facial feature points, wherein the transformation information includes information about at least one of movement, enlargement, reduction or rotation of the facial feature points”.
Examiner replies: Fig. 2, Paragraph [0027], Paragraph [0031], and Paragraph [0033]-[0035] of Lee recite “the foreground area (particularly, the foreground character) separated from the key frame by the first separation unit (12) is provided to the first information extraction unit (13) and the learning engine (30)…second separation unit (21) receives a video as input and operates to create realistic facial expressions and motions of the character…second information extraction unit (22) is configured to extract motion information from a vide…third information extraction unit (23) is configured to extract facial expression information from a video…motion information provided to the learning engine (30) includes information on a partial region (e.g., image data) in which movement is detected among the object regions”. Thus, Lee teaches wherein the obtaining the frame data of the second rendered video includes visualizing only a specific region of the digital human when rendering the digital human.
Paragraph [0034] of Lee recites “a face recognition algorithm is used in a video to recognize an object's face”. Thus, Lee teaches by detecting a facial region.
Paragraph [0034]- [0035] of Lee recite “feature points such as lips and eyes …in the case of the shape of smiling lips, the lip feature point information when the face is smiling…partial area corresponding to the lips in the smiling face area…facial expression information provided to the learning engine (30) includes feature points where movement occurred on the face, related movement information, and partial region information (e.g., image data) where feature point movement was detected” and Paragraph [0053] of Lee recites “motion information and facial expression information are acquired from input data such as input video”. Thus, Lee teaches detecting facial feature points in the facial region using the frame data of the first rendered video.
Paragraph [0034]-[0036] of Lee recites “a face recognition algorithm is used in a video to recognize…the movement of feature points such as lips and eyes is extracted from the recognized face… lip feature point information when the face is smiling is extracted along with the partial area corresponding to the lips in the smiling face area…feature points where movement occurred on the face, related movement information, and partial region information (e.g., image data) where feature point movement was detected)”. Thus, Lee teaches calculating transformation information for extracting the facial region based on the detected facial feature points, wherein the transformation information includes information about at least one of movement, enlargement, reduction or rotation of the facial feature points.
Regarding the remaining arguments: Applicant argues with respect to the amended claim language, which is fully addressed in the prior art rejections set forth below.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-2, and 8-9 is/are rejected under 35 U.S.C. 102(a)(1) and 102(a)(2) as being anticipated by Lee et al. (KR 20220059434), hereinafter referred to as “Lee”.
Regarding claim 1, Lee discloses a method for visualization of a digital human (Par. 0003; method and device capable of maximizing the realism of a 3D character through dual rendering), the method comprising:
setting a specific action of the digital human (Par. 0025; videos can be acquired of a person performing set movements, such as national gymnastics, making set facial expressions, such as happiness or sadness, and saying set words, such as the nation anthem),
determining a scene including the specific action of the digital human (Par. 0025-0026; through this process, we acquire images of various environments in which general digital humans can be created…the images acquired in this manner are provided to the learning data generation device (1) as input data) and rendering the determined scene to generate a first rendered video (Par. 0043; 3D model is first rendered using motion information such as HTR (Hierarchical Translation-Rotation),
capturing images (Par. 0026; input data is videos captured in various lighting environments, various backgrounds, and various poses) constituting the first rendered video for each frame to obtain frame data (Par. 004; The first rendering unit (110) performs rendering on a 3D character to generate a 2D frame (RF1). The 2D frame (RF1), which is the result of the first rendering, includes multiple frames, i.e., rendered frame 1, …, rendered frame n, as shown in Fig. 4.),
inputting each piece of the frame data of the first rendered video (Par. 0045; The second rendering unit (120) receives the 2D frame (RF1), which is the result of the first rendering, and performs additional rendering) to two or more visualization modules (Par. 0037-0038; learning data for dual rendering…is generated through the first data acquisition unit (10) and the second data acquisition unit (20) …learning engine (30). See Par. 0026-0029 on first data acquisition unit (10) and Par. 0030-0036 on second data acquisition unit (20)) to obtain frame data of a second rendered video (Par. 0045; The secondary rendering unit (120) performs additional rendering using the learning engine (30). Par. 0044; the learning engine (30) is configured to transform a high-quality rendering image into a photorealistic image, so that better results can be obtained when the result of the first rendering (RF1) has a similar quality to the learning data.),
and combining the frame data of the second rendered video to generate a  visualized scene (Par. 0058; secondary rendering is performed using the results of the previous frame to ensure continuity in the time series. Accordingly, a frame (secondary rendered frame) with guaranteed continuity in color, appearance, etc. is obtained (S23). See more on secondary rendering in Par. 0045-0048).
wherein obtaining the frame data of the second rendered video includes visualizing only a specific region of the digital human when rendering the digital human (Fig. 2 and Par. 0027; The foreground area (particularly, the foreground character) separated from the key frame by the first separation unit (12) is provided to the first information extraction unit (13) and the learning engine (30)…Par. 0031; second separation unit (21) receives a video as input and operates to create realistic facial expressions and motions of the character…Par. 0033; second information extraction unit (22) is configured to extract motion information from a vide…Par. 0034; third information extraction unit (23) is configured to extract facial expression information from a video…Par. 0035; motion information provided to the learning engine (30) includes information on a partial region (e.g., image data) in which movement is detected among the object regions), by
detecting a facial region (Par. 0034; a face recognition algorithm is used in a video to recognize an object's face),
detecting facial feature points in the facial region (Par. 0034-0035; feature points such as lips and eyes …in the case of the shape of smiling lips, the lip feature point information when the face is smiling…partial area corresponding to the lips in the smiling face area…facial expression information provided to the learning engine (30) includes feature points where movement occurred on the face, related movement information, and partial region information (e.g., image data) where feature point movement was detected) using the frame data of the first rendered video (Par. 0053; motion information and facial expression information are acquired from input data such as input video), and
calculating transformation information for extracting the facial region based on the detected facial feature points (Par. 0034-0036; a face recognition algorithm is used in a video to recognize…the movement of feature points such as lips and eyes is extracted from the recognized face… lip feature point information when the face is smiling is extracted along with the partial area corresponding to the lips in the smiling face area…feature points where movement occurred on the face, related movement information, and partial region information (e.g., image data) where feature point movement was detected),
wherein the transformation information includes information about at least one of movement, enlargement, reduction or rotation of the facial feature points (Par. 0035; the facial expression information provided to the learning engine (30) includes feature points where movement occurred on the face, related movement information, and partial region information (e.g., image data) where feature point movement was detected).

Regarding claim 2, Lee discloses the method of claim 1, and further discloses wherein the determining of the scene includes determining a scene including at least one of a posture of the digital human, a camera, lighting, a background, a viewing angle, a distance, and coordinate information (Par. 0026; The key frame extraction unit (11) extracts key frames from input data according to preset frame extraction parameters (specific angle of view to be extracted, specific lighting, specific pose, etc.)). 

Regarding claim 8, claim 8 is the system claim (Lee Par. 0064-0067; computing device (700), processor (710)) of method claim 1 and is accordingly rejected using substantially similar rationale as to that which is set for with respect to claim 1. 
    
Regarding claim 9, claim 9 has similar limitations as of claim 2, except it is the system claim (Lee Par. 0064-0067; computing device (700), processor (710)) to method claim 2, therefore it is rejected under the same rationale as claim 2.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 3-5, 10-12, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (KR 20220059434), hereinafter referred to as “Lee”, in view of Lim et al. (KR 20180066702), hereinafter referred to as “Lim”.
Regarding claim 3, Lee discloses the method of claim 1, but does not disclose wherein the two or more visualization modules include at least one pair of visualization modules connected in parallel.
In the same art of 3D rendering, Lim discloses wherein the two or more visualization modules include at least one pair of visualization modules (Parallel rendering module (100)) connected in parallel (Par. 0052-0054 on parallel rendering devices).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the parallel configuration of realistic visualization modules as taught by Lim into the dual-rendering system of Lee. The motivation lies in the advantage of enabling synchronized rendering across multiple GPUs. This allows processing of a large number of 3D objects in real time, increasing rendering processes. The combination yields predictable results in achieving photorealistic digital human visualization efficiently.

Regarding claim 4, Lee in view of Lim discloses the method of claim 3, but Lee does not disclose wherein each of the pair of visualization modules connected in parallel includes one or more visualization modules connected thereto in series.
In the same art of 3D rendering, Lim discloses wherein each of the pair of visualization modules connected in parallel includes one or more visualization modules connected thereto in series (Par. 0032; Each parallel rendering device (100a, 100b, 100c, ...) is composed of a content loading unit (110), a segmented area loading unit (120), a motion processing unit (130), a rendering unit (140), a segmented content transmission unit (150), and a database unit (160), as shown in Fig. 2).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the rendering structure of Lee by integrating the arrangement of parallel rendering devices that comminate in a series-like manner as taught by Lim. The motivation lies in the advantage of synchronized, distributed rendering flow, ensuring rendering consistency and improving system scalability.

Regarding claim 5, Lee in view of Lim discloses the method of claim 4, but Lee does not disclose wherein the one or more visualization modules connected in series to each of the pair of visualization modules connected in parallel are provided in different types.
In the same art of 3D rendering, Lim discloses wherein the one or more visualization modules connected in series to each of the pair of visualization modules connected in parallel are provided in different types (Par. 0032; The identification number of the above configuration with an alphabet added means the configuration of a specific parallel rendering device (100b) (rendering unit (140b), database unit (160b), etc.), and the identification number without an alphabet added (identification number composed of only numbers) means each parallel rendering device (100a, 100b, 100c,... ) is a general term for the components (rendering section (140a, 40b, 140c, ...) ) as the rendering unit (140), etc.)).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate different types of visualization modules as taught by Lim with the dual-rendering system of Lee. The motivation lies in the advantage of facilitating modular processing, allowing specific and additional rendering tasks to be handled by designated devices and outputs, increasing system flexibility and the overall realistic visualization. 

Regarding claim 10, claim 10 has similar limitations as of claim 3, except it is the system claim (Lee Par. 0064-0067; computing device (700), processor (710)) to method claim 3, therefore it is rejected under the same rationale as claim 3.
 
Regarding claim 11, claim 11 has similar limitations as of claim 4, except it is the system claim (Lee Par. 0064-0067; computing device (700), processor (710)) to method claim 4, therefore it is rejected under the same rationale as claim 4.
 
Regarding claim 12, claim 12 has similar limitations as of claim 5, except it is the system claim (Lee Par. 0064-0067; computing device (700), processor (710)) to method claim 5, therefore it is rejected under the same rationale as claim 5.

Regarding claim 15, Lee discloses a method for visualization of a specific region of a digital human (Par. 0003; method and device capable of maximizing the realism of a 3D character through dual rendering), the method comprising:
determining a scene including a specific action of the digital human (Par. 0025-0026; through this process, we acquire images of various environments in which general digital humans can be created…the images acquired in this manner are provided to the learning data generation device (1) as input data) and rendering the determined scene to generate a first rendered video (Par. 0043; 3D model is first rendered using motion information such as HTR (Hierarchical Translation-Rotation);
extracting a facial region of the digital human (Par. 0034; the third information extraction unit (23) is configured to extract facial expression information from a video…),
performing a visualization operation on the facial region using two or more visualization modules (Par. 0035-0038; The partial area information included in this facial expression information and motion information can be used as guide information during learning and can be created like an actual video.) to generate frame data of a visualized facial region video (Par. 0055; semantic attribute information, foreground area, motion information, and facial expression information are used as learning data, and a learning engine for rendering is trained using these learning data (S16, S17))
and synthesizing each piece of the frame data of the visualized facial region video and each piece of frame data of the first rendered video (Par. 0058-0059).
wherein the performing the visualization operation on the facial region includes visualizing only a specific region of the digital human when rendering the digital human  (Fig. 2 and Par. 0027; The foreground area (particularly, the foreground character) separated from the key frame by the first separation unit (12) is provided to the first information extraction unit (13) and the learning engine (30)…Par. 0031; second separation unit (21) receives a video as input and operates to create realistic facial expressions and motions of the character…Par. 0033; second information extraction unit (22) is configured to extract motion information from a vide…Par. 0034; third information extraction unit (23) is configured to extract facial expression information from a video…Par. 0035; motion information provided to the learning engine (30) includes information on a partial region (e.g., image data) in which movement is detected among the object regions) by,
detecting a facial region (Par. 0034; a face recognition algorithm is used in a video to recognize an object's face),
detecting facial feature points in the facial region  (Par. 0034-0035; feature points such as lips and eyes …in the case of the shape of smiling lips, the lip feature point information when the face is smiling…partial area corresponding to the lips in the smiling face area…facial expression information provided to the learning engine (30) includes feature points where movement occurred on the face, related movement information, and partial region information (e.g., image data) where feature point movement was detected) using the frame data of the first rendered video (Par. 0053; motion information and facial expression information are acquired from input data such as input video), and
calculating transformation information for extracting the facial region based on the detected facial feature points (Par. 0034-0036; a face recognition algorithm is used in a video to recognize…the movement of feature points such as lips and eyes is extracted from the recognized face… lip feature point information when the face is smiling is extracted along with the partial area corresponding to the lips in the smiling face area…feature points where movement occurred on the face, related movement information, and partial region information (e.g., image data) where feature point movement was detected),
wherein the transformation information includes information about at least one of movement, enlargement, reduction or rotation of the facial feature points (Par. 0035; the facial expression information provided to the learning engine (30) includes feature points where movement occurred on the face, related movement information, and partial region information (e.g., image data) where feature point movement was detected).
Lee does not disclose two or more realistic visualization modules connected in parallel.
In the same art of 3D rendering, Lim discloses two or more visualization modules connected in parallel (Par. 0025; motion recognition module (300) recognizes the user's motion to generate motion data and transmits the data to the parallel rendering module (100), and can recognize the user's gaze…specific parallel rendering device (100a) is used as a server, and the remaining parallel rendering devices (100b, 100c, …) are designated as clients and are connected to each other through a network. This is for synchronization of rendering).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the parallel configuration of realistic visualization modules as taught by Lim into the dual-rendering system of Lee. The motivation lies in the advantage of enabling synchronized rendering across multiple GPUs. This allows processing of a large number of 3D objects in real time, increasing rendering processes. The combination yields predictable results in achieving photorealistic digital human visualization efficiently.

	Claim(s) 6-7, and 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al. (KR 20220059434), hereinafter referred to as “Lee”, in view of Yun et al. (KR 20100066289), hereinafter referred to as “Yun”, and in further view of Lister et al. "A Key‐Pose Caching System for Rendering an Animated Crowd in Real‐Time." In Computer Graphics Forum, vol. 29, no. 8, hereinafter referred to as “Lister”.
Regarding claim 6, Lee discloses the method of claim 1, but does not disclose generating identification information for the visualized scene.
Yun discloses generating identification information for the visualized scene (Par. 0026-0031 and Fig. 1; the region or object identification information of the stereoscopic image).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the teaching of Yun of generating identification information for scenes into the rendering system of Lee. The motivation lies in the advantage of enabling scene tracking and indexing. This improves the efficiency of rendering systems to allow storing and accessing of 
Lee in view of Yun does not discloses when the identification information for the visualized scene matches identification information for a newly input third rendered video, caching the second rendered video to generate a visualized video.
Lister discloses when the identification information for the visualized scene (Section 4.2, Pg. 2309; The key-poses were selected with normal distributions to approximate the non-uniform distribution of animations in an urban scene…were then used to populate various cache sizes…) matches identification information for a newly input third rendered video (Section 5, Pg. 2310; Each animation introduced thereafter coincides with a new "step" on the graph. Here, a combination of cached and skinned meshes are used to render the crowd because the set of required key-poses is larger than can be stored in the cache at a given time…additional cache updates are performed to maintain the "best" set of poses…), caching the second rendered video to generate a visualized video (Section 5, Pg. 2310; if both its source and destination poses are present at given moment in time…100% of the crowd can be rendered from the cache).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the key-pose caching technique of Lister into the rendering system of Lee in view of Yun’s scene identification technique. Lister’s system teaches caching and reusing rendered animation segments based on key-poses to avoid redundant rendering. The motivation to combine lies in the advantage of reduced computational load (Lister Section 6, Pg. 2310; the technique retains the fidelity of skeletal animation whilst approaching the performance of rendering static geometry. Moreover, the system adapts to the current state of a crowd simulation and the allocated cache size can be chosen to balance memory consumption against computational cost).
 
Regarding claim 7, Lee in view of Yun and in further view of Lister discloses the method of claim 6, and further discloses wherein the identification information includes at least one of scene identification information (scene_ID) (Yun Par. 0026-0031; identification information of each region included in the stereoscopic image. Fig. 1; scene 104 may be a stereoscopic image composed of three-dimensional objects), action identification information (action_ID) (Yun Par. 0026-0031; identification information of each sound source of stereoscopic audio…symphony concert scene played by an orchestra using stereoscopic audio…the violin player. Fig. 1; orchestra plays music through a 5.1 channel audio environment 102), and query identification information (query_id) including the scene identification information and the action identification information (Yun Par. 0026-0027; The linkage information includes identification information of each sound source of stereoscopic audio and identification information of each region included in the stereoscopic image….linkage information includes identification information of each sound source of stereoscopic audio and identification information of each region or object included in the stereoscopic image).
It would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the identification information for scene, action, and query as taught by Yun with the rendering system of Lee. The motivation lies in the advantage of precise scene-action tracking, metadata linkage, and efficient scene organization. The combination yield predictable results in improving retrieval accuracy and overall improve render-time efficiency.
 
Regarding claim 13, claim 13 has similar limitations as of claim 6, except it is the system claim (Lee Par. 0064-0067; computing device (700), processor (710)) to method claim 6, therefore it is rejected under the same rationale as claim 6.
 
Regarding claim 14, claim 14 has similar limitations as of claim 7, except it is the system claim (Lee Par. 0064-0067; computing device (700), processor (710)) to method claim 7, therefore it is rejected under the same rationale as claim 7.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JENNY NGAN TRAN whose telephone number is (571)272-6888. The examiner can normally be reached Mon-Thurs 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alicia Harrington can be reached at (571) 272-2330. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JENNY N TRAN/Examiner, Art Unit 2615                                                                                                                                                                                                        
/ALICIA M HARRINGTON/Supervisory Patent Examiner, Art Unit 2615
Read full office action
COMPUTING DEVICE AND METHOD FOR REALISTIC VISUALIZATION OF DIGITAL HUMAN

This examiner grants 20% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

COMPUTING DEVICE AND METHOD FOR REALISTIC VISUALIZATION OF DIGITAL HUMAN

This examiner grants 20% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email