Office Action Analysis: 18748920 — FUSED MULTIMODAL FRAMEWORK FOR NON-PLAYER CHARACTER GENERATION AND CONFIGURATION

Examiner Intelligence

SINGH, ISHAYU NMN View full profile →
Grants only 0% of cases
Career Allow Rate
0 granted / 0 resolved
-70.0% vs TC avg
Minimal +0% lift
Without
With
+0.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
14 currently pending
Career history
14
Total Applications
across all art units
Statute-Specific Performance

§101
20.9%
-19.1% vs TC avg
§103
39.5%
-0.5% vs TC avg
§102
23.3%
-16.7% vs TC avg
§112
16.3%
-23.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 0 resolved cases
Office Action

§103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 4, 8, 14, and 18 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “substantially disentangled latent” in claim 4 is a relative term which renders the claim indefinite. The term “substantially disentangled latent” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. It is unclear what level of disentanglement rises to the level of substantial, and the claim does not have any clear metes and bounds.
Concerning claim 8, 14, and 18, see the rejection of claim 2. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-3, 5-9, and 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over US Publication 2024/0386623 A1 to Yu et al. (hereinafter Yu) in view of US Patent 12406419 B1 to Villanueva Aylagas et al. (hereinafter Villanueva Aylagas).
Concerning claim 1,
Yu discloses receiving a combined representation of multimodal input data based on a plurality of input modalities (0030-0031; 0036; Figure 3);
	processing the combined representation via reverse diffusion to generate an intermediate representation (0027); and
iteratively processing the intermediate representation via a U-Net structure (0031).
Yu does not disclose generating, using machine learning, face vertex displacement data and joint trajectory data for a character model.
Villanueva Aylagas teaches generating, using machine learning, face vertex displacement data and joint trajectory data for a character model (Col. 12, ln 64-Col. 13, ln 27).
It would have been obvious for one with ordinary skill in the art before the effective filing date of the claimed invention to incorporate the machine learning animation control from Villanueva Aylagas with the machine learning image generation mechanics from Yu as both concern machine learning generation. The generation and control of character model animations shown in Villanueva Aylagas would make the generation systems of Yu more robust. 
Concerning claim 2,
Yu discloses receiving the combined representation of multimodal input data comprises receiving a fused multimodal representation of the multimodal input data and of a corresponding plurality of substantially disentangled latent representations of input modalities of the multimodal input data (0029-0030; 0036; Figure 3; Wherein the encoding of the task instruction (206) to the visual condition (204) is considered a fused multimodal representation and the text prompt (202) and time (304) are considered disentangled. See the 112b rejection above).
Concerning claim 3, 
Yu discloses iteratively processing the intermediate representation via a U-Net structure comprises refining the intermediate representation via a control network coupled to the U-Net structure, the control network comprising a decoder network having a plurality of zero-convolution layers (0040-0043; Figure 3; Figure 5).
Concerning claim 5,
Yu discloses processing the combined representation comprises applying a time encoder to the combined representation to incorporate temporal information into the intermediate representation (0040, Figure 3).
Concerning claim 6,
Yu does not disclose generating an animated representation of the character model based at least in part on the face vertex displacement data and the joint trajectory data; and 
providing the animated representation to an environment-specific adapter to animate the character model within a virtual digital environment corresponding to the environment-specific adapter.
Villanueva Aylagas teaches generating an animated representation of the character model based at least in part on the face vertex displacement data and the joint trajectory data (Col. 12, ln 64-Col. 13, ln 27); and 
providing the animated representation to an environment-specific adapter to animate the character model within a virtual digital environment corresponding to the environment-specific adapter (Col. 3, ln 66-Col. 4, ln 4; Col. 5, ln 7-26; Col. 7, ln 1-18).
Concerning claim 7, see the rejection of claim 1.
Concerning claim 8, see the rejection of claim 2.
Concerning claim 9, see the rejection of claim 3.
Concerning claim 11, see the rejection of claim 5.
Concerning claim 12, see the rejection of claim 6. 
Claim(s) 4, 10, 13, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over US Publication 2024/0386623 A1 to Yu et al. in view of US Patent 12406419 B1 to Villanueva Aylagas et al. and further in view of “Future of NPCs? OpenAI + UnrealEngine5 + Text to speech” by Sushidad (hereinafter Sushidad).
Concerning claim 4,
Yu discloses multimodal input data.
Yu does not disclose receiving generated speech data from a large language model (LLM), the generated speech data being output by the LLM based on input data.
Sushidad teaches receiving generated speech data from a large language model (LLM), the generated speech data being output by the LLM based on input data (0:00-1:00; Description).
It would have been obvious for one with ordinary skill in the art before the effective filing date of the claimed invention to incorporate the LLM integration into an NPC avatar as taught by Sushidad with the machine learning image generation mechanics from Yu as both concern machine learning generation. The LLM integration into an NPC avatar would make the generation systems of Yu more multifaceted. 
Concerning claim 13, 
Yu discloses receiving multimodal input data comprising a plurality of input modalities (0030-0031; 0036; Figure 3)
providing the multimodal input data as input to one or more neural networks (0030-0031; 0036; Figure 3); and
Yu does not disclose interaction with a non-player character (NPC) in a virtual digital environment, the NPC having a set of body features and a set of facial features
based on output of the one or more neural networks in response to the input data, generating one or more animation sequences for both the set of body features and the set of facial features.
Villanueva Aylagas teaches interaction with an avatar in a virtual digital environment, the avatar having a set of body features and a set of facial features (Col. 12, ln 64-Col. 13, ln 27);
based on output of the one or more neural networks in response to the input data, generating one or more animation sequences for both the set of body features and the set of facial features (Col. 12, ln 64-Col. 13, ln 27).
It would have been obvious for one with ordinary skill in the art before the effective filing date of the claimed invention to incorporate the machine learning animation control from Villanueva Aylagas with the machine learning image generation mechanics from Yu as both concern machine learning generation. The generation and control of character model animations shown in Villanueva Aylagas would make the generation systems of Yu more robust. 
Sushidad teaches a non-player character (NPC) (0:00-1:00; Description). 
It would be an obvious to try to make the avatar disclosed in Villanueva Aylagas an NPC as taught by Sushidad, as there are only a few known configurations of what an avatar in a game can be (player character, non-player character, etc.). In other words, there are a finite number of identified, predictable solutions, a person of ordinary skill has good reason to pursue the known options within his or her technical grasp. The fact that a combination was obvious to try shows it was obvious under 35 U.S.C. 103.” KSR Int’l Co. v. Teleflex Inc., 127 S.Ct. 1727, 1742, 82 USPQ2d 1385, 1396 (2007).
Concerning claim 10, see the rejection of claim 4. 
Concerning claim 17, see the rejection of claim 13.
Claim(s) 14-16 and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over US Publication 2024/0386623 A1 to Yu et al. in view of US Patent 12406419 B1 to Villanueva Aylagas et al. further in view of “Future of NPCs? OpenAI + UnrealEngine5 + Text to speech” by Sushidad and further in view of US Publication 2024/0013464 A1 to Ravichandran et al. (hereinafter Ravichandran).
Concerning claim 14,
Yu discloses generating, via the one or more neural networks, a combined representation of the multimodal input data and the substantially disentangled latent representations (0029-0030; 0036; Figure 3; See the 112b rejection above). 
Yu does not disclose disentangling, via the one or more neural networks, a set of encoded latent representations of the plurality of input modalities to generate a substantially disentangled latent representation corresponding to each input modality of the plurality of input modalities;
 	generating, via the one or more neural networks, speech data for the NPC based on providing the combined representation to a large-language model (LLM).
Ravichandran teaches disentangling, via the one or more neural networks, a set of encoded latent representations of the plurality of input modalities to generate a substantially disentangled latent representation corresponding to each input modality of the plurality of input modalities (0075; Figure 6; See the 112b rejection above).
It would have been obvious for one with ordinary skill in the art before the effective filing date of the claimed invention to incorporate the multimodal disentanglement in the context of machine learning and avatars from Ravichandran with the machine learning image generation mechanics from Yu as both concern machine learning generation. The machine learning disentanglement process of Ravichandran would make the generation systems of Yu more robust. 
 	Sushidad teaches generating, via the one or more neural networks, speech data for the NPC based on providing the combined representation to a large-language model (LLM) (0:00-1:00; Description).
Concerning claim 15,
Yu discloses one or more neural networks and using reverse diffusion and combined representation (0027; 0030-0031; 0036; Figure 3).
Yu does not disclose generating face vertex displacement data and joint trajectory data for the NPC based at least in part on the generated speech data and on the combined representation.
Villanueva Aylagas teaches generating face vertex displacement data and joint trajectory data for the NPC (Col. 12, ln 64-Col. 13, ln 27).
Concerning claim 16,
generating an animated representation of the avatar based at least in part on the face vertex displacement data, the joint trajectory data, and the generated speech data (Col. 3, ln 66-Col. 4, ln 4; Col. 5, ln 7-26; Col. 7, ln 1-18); and
providing the animated representation to one or more environment-specific adapters to animate the avatar within the virtual digital environment (Col. 3, ln 66-Col. 4, ln 4; Col. 5, ln 7-26; Col. 7, ln 1-18).
Concerning claim 18, see the rejection of claim 14.
Concerning claim 19, see the rejection of claim 15.
Concerning claim 20, see the rejection of claim 16.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ISHAYU SINGH whose telephone number is (571)272-3179. The examiner can normally be reached Flex.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Dmitry Suhol can be reached at (571) 272-4430. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/I.S./Examiner, Art Unit 3715                                                                                                                                                                                                        

/DMITRY SUHOL/Supervisory Patent Examiner, Art Unit 3715
Read full office action
Prosecution Timeline

Jun 20, 2024
Application Filed
Mar 12, 2026
Non-Final Rejection — §103, §112 (current)
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

1-2
Expected OA Rounds
Grant Probability
3y 2m
Median Time to Grant
Low
PTA Risk
Based on 0 resolved cases by this examiner. Grant probability derived from career allow rate.
FUSED MULTIMODAL FRAMEWORK FOR NON-PLAYER CHARACTER GENERATION AND CONFIGURATION