Last updated: May 29, 2026
Application No. 18/749,032
MULTIMODAL CONTEXTUALIZER FOR NON-PLAYER CHARACTER GENERATION AND CONFIGURATION

Non-Final OA §103
Filed
Jun 20, 2024
Priority
Jun 20, 2023 — provisional 63/521,981
Examiner
SHENG, XIN
Art Unit
2619
Tech Center
2600 — Communications
Assignee
Advanced Micro Devices, Inc.
OA Round
1 (Non-Final)
Interview Optional

— +17.2% interview lift. Examiner has a relatively high allowance rate (72%); +17.2% interview lift. A written response may suffice.
Based on 404 resolved cases, 2023–2026
Examiner Intelligence

SHENG, XIN View full profile →
Grants 72% — above average
Career Allowance Rate
293 granted / 404 resolved
+10.5% vs TC avg
Strong +17% interview lift
Without
With
+17.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
14 currently pending
Career history
421
Total Applications
across all art units
Statute-Specific Performance

§101
1.6%
-38.4% vs TC avg
§103
94.5%
+54.5% vs TC avg
§102
1.0%
-39.0% vs TC avg
§112
0.3%
-39.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 404 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 9-12, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Fares et al ("Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding." Frontiers in Artificial Intelligence 6 (2023): 1142997, 06/12/2023) in view of Gonzalez-Garcia et al ("Image-to-image translation for cross-domain disentanglement." Advances in neural information processing systems 31 (2018)).

Regarding Claim 1. Fares teaches A method, comprising:
receiving multimodal input data comprising a plurality of input modalities (Fares, abstract, the paper describes an efficient yet effective machine learning approach to synthesize gestures driven by prosodic features and text in the style of different speakers including those unseen during training. Our model performs zero-shot multimodal style transfer driven by multimodal data from the PATS database containing videos of various speakers. We view style as being pervasive; while speaking, it colors the communicative behaviors expressivity while speech content is carried by multimodal signals and text. This disentanglement scheme of content and style allows us to directly infer the style embedding even of a speaker whose data are not part of the training phase, without requiring any further training or fine-tuning.);
encoding the multimodal input data into a respective latent representation for each input modality of the plurality of input modalities (Fares, page 2, col 2, par 2, We propose a novel approach to model behavior style in ECAs (Embodied conversational agents) and to tackle the different behavior style modeling challenges. We view behavior style as being pervasive while speaking; it colors the communicative behaviors expressivity while speech content is carried by multimodal signals and text. To design our approach, we make the following assumptions for the separation of style and content information: style is possibly encoded across all modalities (text, speech, and pose) and varies little or not over time; content is encoded only by text and speech modalities and varies over time.
Page 2, col 2, par 3, Our model consists of two main components: first (1) a speaker style encoder network which goal is to model a specific target speaker style extracted from three input modalities—Mel spectrogram, upper-body gestures, and text semantics and second (2) a sequence-to-sequence synthesis network that generates a sequence of upper-body gestures based on the content of two input modalities—Mel spectrogram and text semantics—of a source speaker and conditioned on the target speaker style embedding.);
disentangling the encoded latent representations to generate a substantially disentangled latent representation corresponding to each input modality of the plurality of input modalities (Fares, page 6, col 1, par 2, Our approach of disentangling style from content relies on the fader network disentangling approach (Lample et al., 2017), where a fader loss is introduced to effectively separate content and style encodings, as depicted in Figure 2. The fundamental feature of our disentangling scheme is to constrain the latent space of hcontent to be independent of the style embeddings hstyle. Concretely, it means that the distribution over hcontent of the latent representations should not contain the style information. A fader network is composed of an encoder which encodes the input information X into the latent code hcontent, a decoder which decodes the original data from the latent, and an additional variable hstyle used to condition the decoder with the desired information (a face attribute in the original paper).); and

Fares fails to explicitly teach, however, Gonzalez-Garcia teaches based on the disentangling, generating a direct cross-modal translation for each pair of input modalities in the multimodal input data (Gonzalez-Garcia, abstract, the paper describes the concept of cross-domain disentanglement in deep image translation. We aim to separate the internal representation learned by deep methods into three parts. The shared part contains information for both domains. The exclusive parts, on the other hand, contain only factors of variation that are particular to each domain. We achieve this through bidirectional image translation based on Generative Adversarial Networks and cross-domain autoencoders, a novel network component. Our model offers multiple advantages. We can output diverse samples covering multiple modes of the distributions of both domains, perform domain-specific image transfer and interpolation, and cross-domain retrieval without the need of labeled data, only paired images. We compare our model to the state-of-the-art in multi-modal image translation and achieve better results for translation on challenging datasets as well as for cross-domain retrieval on realistic datasets.
Page 2, par 1, In this paper, we combine the disentanglement objective with image-to-image translation, and introduce the concept of cross-domain disentanglement. The aim is to disentangle the domain specific factors from the factors that are shared across the domains. To do so, we partition the representation into three parts; the shared part containing information that is common to both domains, and two exclusive parts, which only represent those factors of variation that are particular to each domain (see example in figure 1).
Page 5, par 2, 2.3 Bi-directional image translation: Given the multi-modal nature of our system in both domains, our architecture is unified to perform image translation in the two directions simultaneously. This is paramount to learn how to disentangle what part of the representation can be shared across domains and what parts are exclusive to each. We train our model jointly in an end-to-end manner, minimizing the following total loss.
Fares, page 5, col 2, par 1, The stylized 2D poses are generated given the sequence of content representation hcontent of the source speaker’s Mel spectrogram and text embeddings obtained at the S-level and conditioned by the style vector embedding hstyle generated from a target speaker’s multimodal data. For decoding the stylized 2Dposes, the sequence of hcontent and the vector hstyle are concatenated (by repeating the hstyle vector for each segment of the sequence) and passed through a Dense layer of size dmodel…. The output predictions are offset by one position. This masking makes sure that the predictions for position index j depends only on the known outputs at positions that are less than j. For the last step, we perform a permutation of the first and the second dimensions of the vector generated by the transformer decoder. The resulting vector is a sequence of 2D-poses which corresponds to 
    PNG
    media_image1.png
    37
    411
    media_image1.png
    Greyscale
).
Fares and Gonzalez-Garcia are analogous art because they both teach method of disentangling multimodal input data. Gonzalez-Garcia further teaches creating cross-modal translation between data of different modals. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the method of disentangling multimodal input data (taught in Fares) to further create cross-modal translation between data of different modals (taught in Gonzalez-Garcia), so as to create a more precise control for tasks that perform actions based on the disentangled representations (Gonzalez-Garcia, page 1, par 2).

Regarding Claim 2. The combination of Fares and Gonzalez-Garcia further teaches The method of claim 1, further comprising using the cross-modal translations to generate a substantially disentangled representation of each input modality of the multimodal input data for use in subsequent processing (Gonzalez-Garcia, page 2, par 3, The goal of our method is to learn deep structured representations that are clearly separated in three parts. Let X,Y be two image domains (e.g. fig. 1) and let R be an image representation in either domain. We split R into sub-representations depending on whether the information contained in that part belongs exclusively to domain X (EX), domain Y (EY), or it is shared between both domains (SX/SY). Figure 1 depicts an example of this representation for images of digits with colors in different areas (digit or background). In this case, the shared part of the representation is the actual digit without color information, i.e. “the image contains a 5”. The exclusive parts are the color information in the different parts of the image, e.g. “the digit is yellow” or “the background is purple”.).
The reasoning for combination of Fares and Gonzalez-Garcia is the same as described in Claim 1.

Regarding Claim 3. The combination of Fares and Gonzalez-Garcia further teaches The method of claim 1, wherein generating a direct cross-modal translation comprises generating a modality translation codebook mapping a direct modality translation between a pair of input modalities (Gonzalez-Garcia, page 2, par 4, Figure 2 presents an overview of our model, which can be separated into image translation modules (left) and cross-domain autoencoders (right). The translation modules G and F translate images from domain X to domain Y, and from Y to X, respectively. They follow an encoder-decoder architecture. Encoders Ge and Fe process the input image through a series of convolutional layers and output a latent representation R. Traditionally in these architectures (e.g. [23, 49, 50]), the decoder takes the full representation R and generates an image in the corresponding output domain. In our model, however, the latent representation is split into shared and exclusive parts, i.e. R = (S,E), and only the shared part of the representation is used for translation. Decoders Gd and Fd combine S with random noise z that accounts for the missing exclusive part, which is unknown for the other domain at test time. This enables the generation of multiple plausible translations given an input image. The other component of the model, the cross-domain autoencoders, is a new type of module that helps aligning the latent distributions and enforce representation disentanglement. The following sections describe all the components of the model and detail how we achieve the necessary constraints on the learned representation. For simplicity, we focus on input domain X, the model for Y is analogous.).
The reasoning for combination of Fares and Gonzalez-Garcia is the same as described in Claim 1.

Regarding Claim 4. The combination of Fares and Gonzalez-Garcia further teaches The method of claim 3, wherein generating a modality translation codebook comprises:
collecting data pairs from the latent representations of the input modalities (Gonzalez-Garcia, page 2, par 3, The goal of our method is to learn deep structured representations that are clearly separated in three parts. Let X,Y be two image domains (e.g. fig. 1) and let R be an image representation in either domain. We split R into sub-representations depending on whether the information contained in that part belongs exclusively to domain X (EX), domain Y (EY), or it is shared between both domains (SX/SY). Figure 1 depicts an example of this representation for images of digits with colors in different areas (digit or background). In this case, the shared part of the representation is the actual digit without color information, i.e. “the image contains a 5”. The exclusive parts are the color information in the different parts of the image, e.g. “the digit is yellow” or “the background is purple”.
Page 2, par 4, Figure 2 presents an overview of our model, which can be separated into image translation modules (left) and cross-domain autoencoders (right). The translation modules G and F translate images from domain X to domain Y, and from Y to X, respectively. They follow an encoder-decoder architecture. Encoders Ge and Fe process the input image through a series of convolutional layers and output a latent representation R.);
training a neural network to map the latent representation of one input modality to the latent representation of another input modality (Gonzalez-Garcia, page 2, par 3, The goal of our method is to learn deep structured representations that are clearly separated in three parts. Let X,Y be two image domains (e.g. fig. 1) and let R be an image representation in either domain. We split R into sub-representations depending on whether the information contained in that part belongs exclusively to domain X (EX), domain Y (EY), or it is shared between both domains (SX/SY). Figure 1 depicts an example of this representation for images of digits with colors in different areas (digit or background). In this case, the shared part of the representation is the actual digit without color information, i.e. “the image contains a 5”. The exclusive parts are the color information in the different parts of the image, e.g. “the digit is yellow” or “the background is purple”.);
quantizing the mapped latent representations into discrete vectors (Gonzalez-Garcia, page 4, par 1, …Therefore, instead of using skip connections, we reduce the architectural bottleneck by increasing the size of the latent representation. In fact, we only increase the spatial dimensions of the shared part of the representation, from 1X1X512 to 8X8X512. We found out that in the considered domains, the exclusive part can be successfully modeled by a 1X1X8 vector, which is later tiled and concatenated with the shared part before decoding. We implement the different size of the latent representation by parallel last layers in the encoder, convolutional for the shared part and fully connected for the exclusive part.); and
storing the discrete vectors in the modality translation codebook to represent the direct translation between the pair of input modalities (Gonzalez-Garcia, page 4, par 2, Reconstructing the latent space. The input of the translation decoders is the shared representation S and random input noise that takes the role of the exclusive part of the representation. Concretely, we use an 8-dimensional noise vector z sampled from N(0, I). The exclusive representation must be approximately distributed like the input noise, as both take the same place in the input of the decoder (see sec. 2.2). To achieve this, we add a discriminator Dz that tries to distinguish between the output exclusive representation EX and input noise z, and train it with the original GAN loss [17]. This pushes the distribution of EX towards N(0, I) and makes the input of the decoder consistent.
Page 4, par 5, The image translation modules impose three main constraints: (1) the shared part of the representation must be identical for both domains, (2) the exclusive part only has information about its own domain, and (3) the generated output must belong to the other domain. However, there is no force that aligns the generated output with the corresponding input image to show the same concept (e.g. same number) but in different domains. In fact, the generated images need not correspond to the input if the encoders learn to map different concepts to the same shared latent representation. In order to achieve consistency across domains, we introduce the idea of cross-domain autoencoders (fig. 2, right).).
The reasoning for combination of Fares and Gonzalez-Garcia is the same as described in Claim 1.

Claim 9 is similar in scope as Claim 1, and thus is rejected under same rationale.
Claim 10 is similar in scope as Claim 2, and thus is rejected under same rationale.
Claim 11 is similar in scope as Claim 3, and thus is rejected under same rationale.
Claim 12 is similar in scope as Claim 4, and thus is rejected under same rationale.
Claim 17 is similar in scope as Claim 1, and thus is rejected under same rationale.
Claim 18 is similar in scope as Claim 2, and thus is rejected under same rationale.

Claims 5, 7-8, 13, 15-16, 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fares et al in view of Gonzalez-Garcia et al further in view of Richard et al (US20220309724).

Regarding Claim 5. The combination of Fares and Gonzalez-Garcia fails to explicitly teach, however, Richard teaches The method of claim 1, wherein encoding the multimodal input data into a respective latent representation for each input modality comprises encoding the multimodal input data into a continuous latent space, and wherein disentangling the encoded latent representations includes disentangling the encoded latent representations into a modality-specific feature space for each input modality of the plurality of input modalities (Richard, abstract, the invention describes a method for training a three-dimensional model face animation model from speech. The method includes determining a first correlation value for a facial feature based on an audio waveform from a first subject, generating a first mesh for a lower portion of a human face, based on the facial feature and the first correlation value, updating the first correlation value when a difference between the first mesh and a ground truth image of the first subject is greater than a pre-selected threshold, and providing a three-dimensional model of the human face animated by speech to an immersive reality application accessed by a client device based on the difference between the first mesh and the ground truth image of the first subject.
[0025] To address these technical problems arising in the field of computer networks, computer simulations and immersive reality applications, embodiments as disclosed herein include technical aspects such as an audio-driven facial animation approach that enables highly realistic motion synthesis for the entire face and also generalizes to unseen identities. Accordingly, a machine learning application includes a categorical latent space of facial animation that disentangles audio-correlated and audio-uncorrelated information. For example, eye closure may not be bound to a specific lip shape. The latent space is trained based on a novel cross-modality loss that encourages the model to have an accurate upper face reconstruction independent of the audio input and accurate mouth area that only depends on the provided audio input. This disentangles the motion of the lower and upper face region and prevents over-smoothed results.
[0060] FIG. 5 illustrates a chart 500 of a latent categorical space 540 (e.g., categorical spaces 340 and 440) with classifiers clustered according to expression inputs, according to some embodiments. Chart 500 includes lower face meshes 521A, synthesized meshes 521B, and upper face meshes 521C (hereinafter, collectively referred to as "face meshes 521") in latent categorical space 540. Synthesized meshes 521B successfully merge upper face motion and lip synchronization from different input modalities. In some embodiments, categorical latent space 540 may be preferable over a continuous latent space, to reduce computational complexity. In some embodiments, a continuous latent space may provide higher rendition fidelity.
[0061] Cross-modal disentanglement leads to a structured categorical latent space 540 wherein each input modality has different effects on face meshes 521.).
Fares, Gonzalez-Garcia and Richard are analogous art because they all teach method of disentangling multimodal input data. Richard further teaches disentangling continuous latent space for multimodal data into categorical latent space. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the method of disentangling multimodal input data (taught in Fares and Gonzalez-Garcia) to further disentangle latent space for multimodal data into categorial latent space (taught in Richard), so as to incorporate effects from different modality data and prevent over-smoothed results in facial animation (Richard, [0025, 0061]).

Regarding Claim 7. The combination of Fares, Gonzalez-Garcia and Richard further teaches The method of claim 1, further comprising using reconstruction loss between the respective latent representation for an input modality and the substantially disentangled latent representation corresponding to that input modality to optimize the substantially disentangled latent representation (Gonzalez-Garcia, page 4, par 2, Reconstructing the latent space. The input of the translation decoders is the shared representation S and random input noise that takes the role of the exclusive part of the representation. Concretely, we use an 8-dimensional noise vector z sampled from N(0, I). The exclusive representation must be approximately distributed like the input noise, as both take the same place in the input of the decoder (see sec. 2.2). To achieve this, we add a discriminator Dz that tries to distinguish between the output exclusive representation EX and input noise z, and train it with the original GAN loss [17]. This pushes the distribution of EX towards N(0, I) and makes the input of the decoder consistent.
Richard, [0059] Eq. 9 includes a temporal causality in the decomposition, i.e., a category ct,h at time t only depends on current and past audio information a≤t rather than on future context a1:T. In some embodiments, autoregressive block 445 is a temporal CNN including four convolutional layers with increasing dilation along the temporal axis. In some embodiments, convolutions are masked such that for the prediction of ct,h the model only has access to information from all categorical heads in the past, c<t,1:H and the preceding categorical heads at the current time step, ct,<h (cf. blocks before selected block 405 in timeline). To train autoregressive block 445, audio encoder 442 maps the expression and audio sequences (x1:T, a1:T) in the training set to their categorical embeddings (cf. Eq. 1). Autoregressive block 445 is optimized using teacher forcing and a cross-entropy loss over the latent categorical labels. At inference time, a categorical expression code is sequentially sampled for each position ct,h using autoregressive temporal model 400.).
The reasoning for combination of Fares, Gonzalez-Garcia and Richard is the same as described in Claim 1&5.

Regarding Claim 8. The combination of Fares, Gonzalez-Garcia and Richard further teaches The method of claim 1, wherein encoding the multimodal input data into a respective latent representation for each input modality comprises using a respective pretrained encoder for each input modality to encode the latent representation (Richard, [0032] In that regard, 3D speech animation engine 232 may be configured to create, store, update, and maintain a multimodal encoder 240, as disclosed herein. Multimodal encoder 240 may include an audio encoder 242, a facial expression encoder 244, a convolution tool 246, and a synthetic encoder 248. 3D speech animation engine 232 may also include a synthetic decoder 248.).
The reasoning for combination of Fares, Gonzalez-Garcia and Richard is the same as described in Claim 1&5.

Claim 13 is similar in scope as Claim 5, and thus is rejected under same rationale.
Claim 15 is similar in scope as Claim 7, and thus is rejected under same rationale.
Claim 16 is similar in scope as Claim 8, and thus is rejected under same rationale.
Claim 19 is similar in scope as Claim 7, and thus is rejected under same rationale.
Claim 20 is similar in scope as Claim 8, and thus is rejected under same rationale.

Claims 6, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Fares et al in view of Gonzalez-Garcia et al further in view of Vanrompay et al (US20240280522).

Regarding Claim 6. The combination of Fares and Gonzalez-Garcia fails to explicitly teach, however, Vanrompay teaches The method of claim 1, wherein disentangling the encoded latent representations to generate a substantially disentangled latent representation corresponding to each input modality comprises:
iteratively pairing the respective latent representations for each input
modality of the plurality of input modalities (Vanrompay, abstract, the invention describes scientific instrument including detectors supporting one or more spectroscopic modalities and an imaging modality and further including an electronic controller configured to process streams of measurements received from the detectors. The electronic controller operates to generate a base image of the sample based on the measurements corresponding to the imaging modality and further operates to generate an anomaly map of the sample based on the base image and further based on differences between measured and autoencoder-reconstructed spectra corresponding to different pixels of the base image. In at least some instances, the anomaly map can beneficially be used in a quality-control procedure to identify, within seconds, specific problem spots in the sample for more-detailed inspection and/or analyses.
[0064] In some examples, during the training mode of operation, the autoencoder 500 is trained on a training set {xi} to approximately minimize the loss function L(θ, ϕ, x, z) with respect to the parameters θ of the encoder 510 and the parameters ø of the decoder 520 using suitable gradient descent methods. For example, for stochastic gradient descent with a step size p, the encoder parameters θ are recursively and iteratively updated based on the gradient of the loss function as follows: 

    PNG
    media_image2.png
    58
    136
    media_image2.png
    Greyscale

The decoder parameters ɸ are updated in a similar manner. The iterations stop when the convergence criteria are met. The parameters θ and ɸ are fixed thereafter for being used in the testing (working) mode of operation of the autoencoder 500.); and 
learning one or more cross-modal relationships between the paired latent
representations (Vanrompay, [0069] In some instances, the autoencoder 500 
representing two selected modalities of the scientific instrument 100 is trained using a training dataset that is structurally similar to the data structure 300 (FIG. 3). In other words, the training dataset used in this approach has a respective pair of spectra for each pixel of the corresponding image, both of which spectra are fed into the encoder 510 in the form of the input vector 502 during the training mode of operation of the autoencoder 500. Compared to the above-described stacking approach, the joint bimodal training enables correlations between the two modalities to be better reflected in the latent space and then exploited during the working mode of operation of the autoencoder 500.).
Fares, Gonzalez-Garcia and Vanrompay are analogous art because they all teach method of disentangling multimodal input data. Vanrompay further teaches iteratively updating the encoder parameter during the pairing between modal data and its latent representation. Therefore, it would have been obvious to a person with ordinary skill in the art before the effective filing date of the claimed invention, to modify the method of disentangling multimodal input data (taught in Fares and Gonzalez-Garcia) to further use the iterative updating/refining encoder method (taught in Vanrompay), so as to fine-tune the accuracy of the pairing of the input model data and its latent representation.

Claim 14 is similar in scope as Claim 6, and thus is rejected under same rationale.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIN SHENG whose telephone number is (571)272-5734. The examiner can normally be reached M-F 9:30AM-3:30PM 6:00PM-8:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Chan can be reached at 5712723022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Xin Sheng/           Primary Examiner, Art Unit 2619
Read full office action
Prosecution Timeline

Jun 20, 2024
Application Filed
Mar 18, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/173,623
Patent 12626326
IMAGE STITCHING WITH AN ADAPTIVE THREE-DIMENSIONAL BOWL MODEL OF THE SURROUNDING ENVIRONMENT FOR SURROUND VIEW VISUALIZATION
3y 2m to grant Granted May 12, 2026
18/367,119
Patent 12620165
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR POPULATING ENVIRONMENT MODELS
2y 7m to grant Granted May 05, 2026
18/367,115
Patent 12614341
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR POPULATING ENVIRONMENT MODELS
2y 7m to grant Granted Apr 28, 2026
18/490,458
Patent 12614337
SYSTEM AND METHODS FOR CUSTOMIZING 3D MODELS
2y 6m to grant Granted Apr 28, 2026
18/796,576
Patent 12614366
AUTOMATIC POINT CLOUD BUILDING ENVELOPE SEGMENTATION (AUTO-CuBES) USING MACHINE LEARNING
1y 8m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
72%
Grant Probability
90%
With Interview (+17.2%)
2y 4m (~5m remaining)
Median Time to Grant
Low
PTA Risk
Based on 404 resolved cases by this examiner. Grant probability derived from career allowance rate.