Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 11/25/2025 has been entered.
Allowable Subject Matter
Claims 2 and 3 are allowed.
Claims 4, 6, 8, 9, 12-14, and 16, 18, 19 are objected to as being dependent upon a rejected base claim(s), but would be allowable if rewritten in independent form including all of the limitations of the base claim(s) and any intervening claim(s).
Response to Applicant’s Arguments
Applicant’s arguments submitted with the Request for Continued Examination have been considered but are not persuasive. Applicant contends that the cited prior art fails to teach or suggest the amended limitations of independent Claim 1; however, Hong et al. (NPL) teaches obtaining multi-view images of an avatar model within a language-driven, CLIP-guided avatar generation pipeline, and Michel et al. (NPL) teaches obtaining per-vertex geometric displacement and color information from a language description and updating a three-dimensional mesh using information derived from multiple rendered views.
The combination of Hong, Michel, and Chen therefore teaches or suggests the amended limitations as claimed, and Applicant has not shown that the proposed combination would have required more than ordinary skill in the art or would have produced unpredictable results. Accordingly, the rejections of Claim 1 and other claims under 35 U.S.C. § 103 are maintained, and the arguments directed to the dependent claims are likewise unpersuasive for at least the same reasons.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
(Please refer to the paragraph(s) mentioned or nearby paragraphs of the references for the paraphrased text.)
Claims 1 and 11 are rejected under 35 U.S.C. § 103 as being unpatentable over Hong et al. (NPL) and Michel et al. (NPL).
As per Claim 1, Hong teaches the following portion of Claim 1, which recites:
“A method for editing avatar model based on language-driven, the method comprising:
receiving a first input including language description;”
Hong et al (NPL) discloses: “The inputs are natural languages, text = {tshape, tapp, tmotion}.” — Hong et al (NPL), Section 3 “OUR APPROACH”, p. 3.
“Natural languages” corresponds directly to the claimed language description input.
Hong teaches the following portion of Claim 1, which recites:
“obtaining a first latent vector based on the first input;”
Hong et al (NPL) discloses: “the encoders are trained in the way that the latent codes of paired images and texts are pulled together … aligning with the latent code of the text description.” — Hong et al (NPL), Section 3.1 “CLIP”, p. 4.
The “latent code of the text description” is the claimed first latent vector derived from the language input.
Hong teaches the following portion of Claim 1, which recites:
“updating an initial avatar model to a first three-dimensional avatar model based on the first latent vector;”
Hong et al (NPL) discloses: “Guided by the appearance description tapp, N is further optimized by CLIP … After that, the target static 3D avatar mesh M = {V, F, C} is extracted.” — Hong et al (NPL), Section 3.2 “Pipeline Overview”, p. 4.
Hong updates an initial avatar representation into a first 3D avatar mesh using CLIP guidance derived from the text latent.
Hong teaches the following portion of Claim 1, which recites:
“obtaining at least one two-dimensional image for a plurality of viewpoints from the first three-dimensional avatar model;”
Hong et al (NPL) discloses: “Mt is then rendered to multi-view images …” — Hong et al (NPL), Section 3.2 “Pipeline Overview”, p. 4.
“Multi-view images” are 2D images obtained from a plurality of viewpoints.
Hong alone does not explicitly teach all the limitation(s) of the claim. However, when combined with Michel et al. (NPL), they collectively teach all the limitation(s).
Michel et al. (NPL) teaches the following portion of Claim 1, which recites:
“obtaining information regarding changes in at least one vertex position and at least one color of the first three-dimensional avatar model based on the language description included in the first input;”
“map points on the mesh surface p ∈ V to an RGB color and displacement along the normal direction.” — Michel et al (NPL), Section 3 “Method”, p. 4.
“every point p is displaced … and colored by cp.” — Michel et al (NPL), Section 3.1 “Neural Style Field Network”, p. 5.
the mesh “is modified to conform to a target text prompt.” — Michel et al (NPL), Section 3 “Method”, p. 4.
Displacement corresponds to vertex-position change and RGB color corresponds to color change, both driven by the language description.
Michel et al. (NPL) teaches the following portion of Claim 1, which recites:
“updating the first three-dimensional avatar model to a second three-dimensional avatar model based on the plurality of viewpoints and the obtained information regarding the changes in the at least one vertex position and at least one color; and”
“We render MS from multiple views … render two 2D projections …” — Michel et al (NPL), Section 3.2 “Text-based correspondence”, p. 5.
“The weights … are optimized by rendering multiple 2D images …” — Michel et al (NPL), Fig. 4 caption, p. 4.
updated mesh defined by “displaced … and colored” vertices. — Michel et al (NPL), Section 3.1, p. 5.
Michel updates the 3D mesh using multi-view images and the obtained vertex-position and color change information, yielding a second 3D model.
Michel et al. (NPL) teaches the following portion of Claim 1, which recites:
“displaying the second three-dimensional avatar model.”
Michel et al (NPL) discloses: “use a differentiable renderer to visualize the style …” — Michel et al (NPL), Section 3.1 “Neural Style Field Network”, p. 4.
Visualization via rendering corresponds to displaying the second 3D avatar model.
Before the effective filing date of the claimed invention, a person of ordinary skill in the art would have been motivated to combine Hong et al (NPL) with Michel et al (NPL) because Hong provides a language-driven avatar generation pipeline using CLIP and multi-view renderings, while Michel provides a compatible text-guided mesh-editing technique that derives per-vertex displacement and color using a multi-view render-and-optimize loop. Integrating Michel’s per-vertex geometry and color refinement into Hong’s avatar pipeline would have predictably enhanced the level of detail and control in language-driven avatar editing, yielding expected improvements without changing the fundamental operation of the system.
PNG
media_image1.png
13
460
media_image1.png
Greyscale
Device Claim 11 does not include any additional limitations that would significantly distinguish them from method claim 1. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above.
PNG
media_image1.png
13
460
media_image1.png
Greyscale
Claims 5, 7, 10, 15, 17, and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over Hong et al. (NPL) in view of Michel et al. (NPL), and further in view of Chen et al. (US20220157036A1).
As per Claim 5, Hong and Michel alone do not explicitly teach all of the limitation(s) of the claim. However, when combined with Chen, they collectively teach all the limitation(s).
Chen teaches the limitation(s) of Claim 5 that recites:
"The method of claim 1, wherein the language description is obtained based on at least one of audio, video, text, photo, compiled instructions, customized files, sensor data, user selected option or multi-modal input."
Chen teaches:
"acquiring a language description generated by a user for a target virtual character". — Chen et al., Abstract. Although Chen explicitly mentions a language description without initially specifying modalities, it implicitly and reasonably encompasses textual or audio input. Moreover, Chen broadly states its application in fields involving "artificial intelligence, Internet of Things, voice technology, cloud computing" — Chen et al., Abstract, which inherently suggests multiple input modalities such as audio (voice), text, and customized files.
Further, Chen describes:
"The language description may include the language description in the form of voice or text, which is not limited in the embodiments of the present disclosure. Wherein, for the language description in the form of voice, ... semantic requirement of user for the target virtual character may be captured through automatic voice recognition ASR technology". — Chen et al., [0110]
Chen’s teaching matches Claim 5’s limitation of obtaining a language description from "at least one of audio, ... text," thereby covering audio and text modalities.
Before the effective filing date of the claimed invention, a POSITA would have been motivated to combine Chen with Hong and Michel because Chen explicitly teaches the use of multiple input modalities (including audio and text) for obtaining language descriptions, a natural and predictable extension to Hong’s and Michel’s CLIP-based avatar editing frameworks. Given that Hong and Michel explicitly operate within the same CLIP latent embedding space and utilize gradient-based optimization and render-and-compare workflows, incorporating Chen’s explicit disclosure of multimodal inputs would provide straightforward improvements in flexibility, accessibility, and ease-of-use, enhancing user interaction without yielding unexpected results.
PNG
media_image1.png
13
460
media_image1.png
Greyscale
As per Claim 7, Hong and Michel alone do not explicitly teach all of the limitation(s) of the claim. However, when combined with Chen, they collectively teach all the limitation(s) of Claim 7 which recites:
“The method of Claim 1, further comprising: in case that the second input corresponds with a plurality of queries of the first input, displaying stored at least one of the first three-dimensional avatar model or the second three-dimensional avatar model corresponding with the first input.”
Limitation: “in case that the second input corresponds with a plurality of queries of the first input,”
Hong et al (NPL) teaches forming a plurality of prompts (queries) corresponding to an input description: “if 𝑡app = ‘Steve Jobs’, we would augment 𝑡app to two additional prompts 𝑡face = ‘the face of Steve Jobs’ and 𝑡back = ‘the back of Steve Jobs’.” — Hong et al (NPL), Section 3.3.2 (prompt augmentation), p. 7.
The “two additional prompts” are a plurality of queries derived from the first input (𝑡app), matching the claimed condition.
Limitation: “displaying stored at least one of the first three-dimensional avatar model or the second three-dimensional avatar model corresponding with the first input.”
Chen et al teaches storing character-related content and then outputting a user-described character using that stored content: “the skeleton and skinning information and the generated slider are stored in the same file.” — Chen et al, ¶[0170].
And: “the virtual character slider and the reference virtual character are stored in the same file, and … may be … driven … to quickly output a target virtual character described by the user.” — Chen et al, ¶[0172].
Chen’s “stored in the same file” plus “quickly output” teaches displaying/outputting a character model corresponding to the user’s input, consistent with “displaying stored” avatar models tied to the input.
Before the effective filing date of the claimed invention, a person of ordinary skill in the art would have been motivated to combine Hong et al (NPL) with Chen et al because Hong teaches improving language-driven avatar generation using a plurality of prompt queries derived from an input description, while Chen teaches storing character-related content and then quickly outputting a user-described target character using that stored content. Integrating Chen’s storage-and-output mechanism into Hong’s multi-prompt workflow would predictably enhance responsiveness and reuse by enabling retrieval and display of previously generated avatar results corresponding to earlier language inputs, with expected improvements and predictable results.
PNG
media_image1.png
13
460
media_image1.png
Greyscale
As per Claim 10, Hong and Michel alone do not explicitly teach all of the limitation(s) of the claim. However, when combined with Chen, they collectively teach all the limitation(s).
Chen teaches the limitation(s) of Claim 10 that recites:
"The method of claim 1, further comprising: displaying at least one of the first three-dimensional avatar model or the second three-dimensional avatar model in an animation mode."
Chen discloses:
Displaying a virtual character explicitly in real-time, vividly interacting and communicating with a user on a smart device. Specifically, Chen states:
"After the virtual character is sent to the IoT device, the IoT device may be triggered to display the virtual character in real time, so that the virtual character vividly communicates with the user in a functional or meaningless dialogue as a front-end carrier of an intelligent voice dialogue system." — Chen et al., [0038]
Similar statements about the virtual character "vividly perform[ing]" dialogue or interaction also appear in Chen around paragraphs [0050] and [0065].
The disclosure of "vividly communicates" inherently and explicitly implies displaying the virtual character in an animated manner to achieve lifelike interaction, meeting the explicit limitation of "displaying avatar model in animation mode."
A POSITA would have found it obvious, before the effective filing date of the claimed invention, to integrate Chen’s explicit teaching of displaying virtual characters vividly in animated interactions with the avatar generation and customization techniques taught by Hong and Michel. Such a combination would have predictably enhanced realism, user engagement, and the interactivity of avatars, clearly addressing market demands for animated and interactive virtual characters. This combination represents a predictable and routine step in enhancing user experience, yielding no unexpected results.
PNG
media_image1.png
13
460
media_image1.png
Greyscale
Device Claim 15 does not include any additional limitations that would significantly distinguish them from method claim 5. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above.
PNG
media_image1.png
13
460
media_image1.png
Greyscale
Device Claim 17 does not include any additional limitations that would significantly distinguish them from method claim 7. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above.
PNG
media_image1.png
13
460
media_image1.png
Greyscale
Device Claim 20 does not include any additional limitations that would significantly distinguish them from method claim 10. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above.
PNG
media_image1.png
13
460
media_image1.png
Greyscale
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ADEEL BASHIR whose telephone number is (571) 272-0440. The examiner can normally be reached Monday-Thursday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached on (571) 276-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ADEEL BASHIR/
Examiner, Art Unit 2616
/DANIEL F HAJNIK/Supervisory Patent Examiner, Art Unit 2616