Last updated: April 18, 2026
Application No. 18/766,462
SYSTEM AND METHOD FOR MACHINE LEARNING ARCHITECTURE FOR PARTIALLY-OBSERVED MULTIMODAL DATA

Non-Final OA §DP
Filed
Jul 08, 2024
Examiner
OPSASNICK, MICHAEL N
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Royal Bank Of Canada
OA Round
1 (Non-Final)
Interview Optional

— +10.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 900 resolved cases, 2023–2026
Examiner Intelligence

OPSASNICK, MICHAEL N View full profile →
Grants 82% — above average
Career Allow Rate
737 granted / 900 resolved
+19.9% vs TC avg
Moderate +10% lift
Without
With
+10.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
46 currently pending
Career history
946
Total Applications
across all art units
Statute-Specific Performance

§101
17.7%
-22.3% vs TC avg
§103
33.0%
-7.0% vs TC avg
§102
29.9%
-10.1% vs TC avg
§112
6.3%
-33.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 900 resolved cases
Office Action

§DP
Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification

Content of Specification

(b) CROSS-REFERENCES TO RELATED APPLICATIONS: See 37 CFR 1.78 and MPEP § 211 et seq.

The disclosure is objected to because of the following informalities: The parent application of the instant invention, has now issued as an US Patent.  Please update the specification to reflect this change.
Appropriate correction is required.

Claim Objections

Claims 2,7,12,17 are objected to because of the following informalities:

In claims 2,7,12,17, the word “input”/“inputted” is mis-spelled as “impute”/“imputed”.  Appropriate correction is required.

Double Patenting

The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.

Claims 1,8,9,11,18-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1,3,4,11,13,14, 20 of U.S. Patent No. 12,033,083.  Although the claims at issue are not identical, they are not patentably distinct from each other because although the patentable weight reinforcement of the altered preambles, within the body of the claim, these altered preambles do not change the functionality of the claims, and hence claims 1,11,20 of the ‘083 patent meet the scope of the independent claims in the instant invention.

18/766462
12,033,083
1.A machine learning model architecture system trained for conducting machine learning using partially-observed data by using a variational selective auto-encoder (VSAE) machine learning model framework, the system comprising: the processor configured to provide: 

a data receiver adapted to receive one or more data sets representative of the partially-observed data, each having a subset of observed data and a subset of unobserved data, the data receiver configured to extract a mask data structure from each data set of the one or more data sets representative of which modalities are observed and which modalities are unobserved; and a machine learning data architecture engine adapted to: maintain a attributive proposal network for processing the one or more data sets, the attributive proposal network including a set of individual encoders, each individual encoder adapted for a corresponding observed modality; maintain a collective proposal network for processing the corresponding mask data structure, the collective proposal network including a collective encoder corresponding to all of the unobserved modalities, the mask data structure utilized for conditional selection of a proposal distribution for an unobserved modality; and maintain a first generative network including a first set of one or more decoders, each decoder of the first set of the one or more decoders configured to generate output estimated data proposed by the attributive proposal network and the collective proposal network wherein, for the unobserved modalities, expectation over collective observation from the collective proposal network is applied as a corresponding proposal distribution as an approximation of a true posterior distribution based on the mask data structure such that a joint distribution of all attributes and mask data structure can be learned from the partially-observed data.

8. The system of claim 1, wherein the partially-observed data is heterogeneous data or multimodal data sets.

9. The system of claim 1, wherein the output estimated data includes estimated values corresponding to at least one unobserved modality and the output estimated data can be combined with the partially-observed data.

11. A method for training a machine learning model architecture for conducting machine learning using partially-observed data by 
using a variational selective auto-encoder (VSAE) machine learning model framework, the method comprising: receiving one or more data sets representative of the partially-observed data, each having a subset of observed data and a subset of unobserved data, 

the data receiver configured to extract a mask data structure from each data set of the one or more data sets representative of which modalities are observed and which modalities are unobserved; and maintaining a attributive proposal network for processing the one or more data sets, the attributive proposal network including a set of individual encoders, each individual encoder adapted for a corresponding observed modality; maintaining a collective proposal network for processing the corresponding mask data structure, the collective proposal network including a collective encoder corresponding to all of the unobserved modalities, the mask data structure utilized for conditional selection of a proposal distribution for an unobserved modality; and maintaining a first generative network including a first set of one or more decoders, each decoder of the first set of the one or more decoders configured to generate output estimated data proposed by the attributive proposal network and the collective proposal network wherein, for the unobserved modalities, expectation over collective observation from the collective proposal network is applied as a corresponding proposal distribution as an approximation of a true posterior distribution based on the mask data structure such that a joint distribution of all attributes and mask data structure can be learned from the partially-observed data.

18. The method of claim 11, wherein the partially-observed data is heterogeneous data or multimodal data sets.

19. The method of claim 11, wherein the output estimated data includes estimated values corresponding to at least one unobserved modality and the output estimated data can be combined with the partially-observed data.

20. A non-transitory computer readable medium storing machine interpretable data structures representing a machine learning model architecture, the machine learning model architecture trained using a method for conducting machine learning using partially-observed data by using a variational selective auto-encoder (VSAE) machine learning model framework, the method comprising: 

receiving one or more data sets representative of the partially-observed data, each having a subset of observed data and a subset of unobserved data, the data receiver configured to extract a mask data structure from each data set of the one or more data sets representative of which modalities are observed and which modalities are unobserved; and maintaining a attributive proposal network for processing the one or more data sets, the attributive proposal network including a set of individual encoders, each individual encoder adapted for a corresponding observed modality; maintaining a collective proposal network for processing the corresponding mask data structure, the collective proposal network including a collective encoder corresponding to all of the unobserved modalities, the mask data structure utilized for conditional selection of a proposal distribution for an unobserved modality; and maintaining a first generative network including a first set of one or more decoders, each decoder of the first set of the one or more decoders configured to generate output estimated data proposed by the attributive proposal network and the collective proposal network wherein, for the unobserved modalities, expectation over collective observation from the collective proposal network is applied as a corresponding proposal distribution as an approximation of a true posterior distribution based on the mask data structure such that a joint distribution of all attributes and mask data structure can be learned from the partially-observed data.
1.A computer implemented system for conducting machine learning using partially-observed data by using a variational selective auto-encoder (VSAE) machine learning model framework, 
the system including a processor operating in conjunction with computer memory, the system comprising: the processor configured to provide:
 a data receiver adapted to receive one or more data sets representative of the partially-observed data, each having a subset of observed data and a subset of unobserved data, the data receiver configured to extract a mask data structure from each data set of the one or more data sets representative of which modalities are observed and which modalities are unobserved; and a machine learning data architecture engine adapted to: maintain a attributive proposal network for processing the one or more data sets, the attributive proposal network including a set of individual encoders, each individual encoder adapted for a corresponding observed modality; maintain a collective proposal network for processing the corresponding mask data structure, the collective proposal network including a collective encoder corresponding to all of the unobserved modalities, the mask data structure utilized for conditional selection of a proposal distribution for an unobserved modality; and maintain a first generative network including a first set of one or more decoders, each decoder of the first set of the one or more decoders configured to generate output estimated data proposed by the attributive proposal network and the collective proposal network wherein, for the unobserved modalities, expectation over collective observation from the collective proposal network is applied as a corresponding proposal distribution as an approximation of a true posterior distribution based on the mask data structure such that a joint distribution of all attributes and mask data structure can be learned from the partially-observed data.

3. The system of claim 1, wherein the partially-observed data is heterogeneous data.

4. The system of claim 1, wherein the output estimated data includes estimated values corresponding to at least one unobserved modality and the output estimated data can be combined with the partially-observed data.

11. A computer implemented method for conducting machine learning using partially-observed data, the method comprising: receiving one or more data sets representative of the partially-observed data, each having a subset of observed data and a subset of unobserved data by 
using a variational selective auto-encoder (VSAE) machine learning model framework, 

the data receiver configured to extract a mask data structure from each data set of the one or more data sets representative of which modalities are observed and which modalities are unobserved; and maintaining a attributive proposal network for processing the one or more data sets, the attributive proposal network including a set of individual encoders, each individual encoder adapted for a corresponding observed modality; maintaining a collective proposal network for processing the corresponding mask data structure, the collective proposal network including a collective encoder corresponding to all of the unobserved modalities, the mask data structure utilized for conditional selection of a proposal distribution for an unobserved modality; and maintaining a first generative network including a first set of one or more decoders, each decoder of the first set of the one or more decoders configured to generate output estimated data proposed by the attributive proposal network and the collective proposal network wherein, for the unobserved modalities, expectation over collective observation from the collective proposal network is applied as a corresponding proposal distribution as an approximation of a true posterior distribution based on the mask data structure such that a joint distribution of all attributes and mask data structure can be learned from the partially-observed data.

13. The method of claim 11, wherein the partially-observed data is heterogeneous data.

14. The method of claim 11, wherein the output estimated data includes estimated values corresponding to at least one unobserved modality and the output estimated data can be combined with the partially-observed data.

20. A non-transitory computer readable medium storing machine interpretable instructions, which when executed, cause a processor to perform a computer implemented method for conducting machine learning using partially-observed data by using a variational selective auto-encoder (VSAE) machine learning model framework, the method comprising:

receiving one or more data sets representative of the partially-observed data, each having a subset of observed data and a subset of unobserved data, the data receiver configured to extract a mask data structure from each data set of the one or more data sets representative of which modalities are observed and which modalities are unobserved; and maintaining a attributive proposal network for processing the one or more data sets, the attributive proposal network including a set of individual encoders, each individual encoder adapted for a corresponding observed modality; maintaining a collective proposal network for processing the corresponding mask data structure, the collective proposal network including a collective encoder corresponding to all of the unobserved modalities, the mask data structure utilized for conditional selection of a proposal distribution for an unobserved modality; and maintaining a first generative network including a first set of one or more decoders, each decoder of the first set of the one or more decoders configured to generate output estimated data proposed by the attributive proposal network and the collective proposal network wherein, for the unobserved modalities, expectation over collective observation from the collective proposal network is applied as a corresponding proposal distribution as an approximation of a true posterior distribution based on the mask data structure such that a joint distribution of all attributes and mask data structure can be learned from the partially-observed data.

Allowable Subject Matter

Claims 1-20 are allowable over the prior art of record, and the case would be in condition for allowance when the above presented obviousness type Double Patenting rejection is overcome.

The following is an examiner’s statement of reasons for indicating allowable subject matter:

As per the independent claims, a compare/contrast of the current claim scope to the applied representative of prior art, show: Bui et al (20200160042) is directed to generating modified digital images based on verbal or gesture input, in particular, for using a vision neural network to identify pixels within a digital image that correspond to an object indicated by the verbal input and/or the gesture input. Bui discloses an approach for machine learning using partially-observed data, citing Bui at para. 0006, and FIG. 2. In Bui, a mask is described, but it is an entirely different type of mask than the claimed embodiment, and used for an entirely different purpose and mode of operation. Bui’s output is a binary masks that’s used to separate correspond to the selection from the user’s input, whether it is verbal or gesture based. See Bui, at paras. 0112, and 0113, and FIG. 7D. The input in the paragraph 0113 is to “Delete this girl’. At para. 0114, the multimodal input is used to track the intent from the user. As noted in the last response, the teaching of Bui is very different than the claimed embodiments, which is directed to an approach for addressing issues with partially-observed multimodal data whereby data is incomplete due to a lack of labels, etc., and a specific combination of networks is used in a machine learning architecture topology that uses mask data structures representative of which modalities are observed and which modalities are unobserved. The claimed embodiments adapt to improve learning despite “missinqness’” relative to previous approaches where, for example, there was a restrictive assumption on missingness mechanism that data is missing completely at random (MCAR). MCAR assumes missingness (the manner that data are missing) occurs independent of data. However, the claim scope embodies a relaxation of this technical assumption by learning the joint distribution of data and imputation mask (which denotes missingness pattern). Accordingly, a broader range of technical challenges can be addressed, not limited solely to MCAR (e.g., addressing incomplete data sets where data is not missing at random). The missingness in the high-dimensional multi-modal data can be intra-modality and/or inter-modality. For intra-modality incomplete data, the missing features are within one modality or attribute, and the target is to impute the missing entries given the observed entries in a single attribute like image inpainting. However, in a more realistic scenario, the missingness in the high-dimensional multi-modal data is more likely to follow an inter-modality way. This is common in high-dimensional multimedia data. On social media, users' behavior depends on their posts which include one or multiple attributes from image, audio, video, caption or utterance. Those attributes may be mutually correlated in some way and usually collected in an incomplete way with some missing attributes. Only when one is aware of what the user might tag on a given image or post some images based on certain tags, an approach may attempt to understand the user's behavior. (see para. 0055 of the instant invention). The Examiner’s Report at page 5 states that Bui does not explicitly teach calculating a joint distribution probability including parameters of unobservable / observable information. Eck is directed to a sensor data fusion system, where an initially empty inference model is extended with the set of variables, to obtain an extended model (Eck, at Abstract). In particular, Eck discloses, among paragraphs 0027, 0033, 0049, 0050, 0051: [0027] One or more embodiments employ data and semantic modeling tools; a machine learning inference model: an analytics engine; and one or more machine learning modules. The data and semantic modelling tools manage sensor data, variables and semantic relations. The machine learning inference model represents the analytics relations between variables. The analytics engine is configured to interpret analytics relations and run inference(s) on the inference model. Non-limiting examples of analytic relations include: deterministic functions, joint/conditional probability distributions, and the like. The machine learning modules are configured to learn new analytics relations. [0033] As noted above, one or more embodiments advantageously provide a system which computes observations of system variables (“observe” aspect) by running an inference model derived from analytical relations between requested variables and other known variables, and from a mapping between variables and sensor observations; the answer can be, for example, the estimate of the requested observations, or “Variable is unobservable’—missing analytic relations or sensor data required to make query observable are also returned. Thus, in FIG. 1, suppose it is desired to observe x1; x1 can be inferred from the data of x2 and x3. On the other hand, suppose it is desired to observe x5. This variable is unobservable, as it requires data for x5 or x6 or analytic relations on x5 or x6. To address this, register variable x8 by providing an analytic relation 117 to x6 and a mapping of x8 to the data store(s) 105. Furthermore, register variable x9 by providing a mapping of x9 to the data store(s) 105. The system then learns an analytic relation 127; it derives a connection to x5 from semantics (or data) and then learns the analytic relation 127. X9 is now used to observe x5 and x6. [0049] Further steps include extending an initially empty inference model 101 with the set of one or more variables of interest, to obtain an extended inference model; obtaining, from the user, a request to observe a given one of the set of one or more variables of interest at a given timestamp; responsive to the request, retrieving time series data for the set of registered variables in the extended inference model; and running the extended inference model with the retrieved data to obtain an estimate (for example, an optimal estimate) of the given one of the set of variables at the given timestamp. [0050] In some cases, no analytical relations exist to link the set of one or more variables of interest to the variables in the semantic data store, and the extending of the inference model includes navigating the semantic store 103 to identify new relationships; and associating a parametric relation to each of the new relationships. A further step includes learning parameters of the new relationships by extracting historical data from the at least one time series of sensor data. [0051] In some cases, in the associating sub-step, the parametric relation includes at least one of a joint probability density and a conditional probability density.  In Eck, a register variable is made to establish analytic relations for unobservable variables.   In Ecks, the register variable is used as a proxy relation, as shown as analytic relation 127 in the above reproduced figure, so that the unobserved variable x5 can be indirectly observed using this analytical relation. This is explained at Ecks, at para. 0055 (emphasis added): [0055] Some embodiments further include obtaining, from the user, registration of known analytic relations between any subset of variables in the semantic store (not necessarily the registered variables). In such cases, the extending of the inference model further includes extending the initially empty inference model with the known analytic relations, to obtain the extended inference model. Refer to the above discussion where a known analytical relation between two semantic entities is received and integrated into the underlying inference model. Non-limiting examples of the known analytic relations include at least one of functional relations and parametric joint densities. For example, if, in the joint density, there is a Gaussian distribution, the parameters would be a mean and a covariance matrix. On the other hand, the present claimed embodiment of claim 1 is directed to “maintain[ing] a collective proposal network for processing the corresponding mask data structure, the collective proposal network including a collective encoder corresponding to all of the unobserved modalities, the mask data structure utilized for conditional selection of a proposal distribution for an unobserved modality; and maintain a first generative network including a first set of one or more decoders, each decoder of the first set of the one or more decoders configured to generate output estimated data proposed by the attributive proposal network and the collective proposal network wherein, for the unobserved modalities, expectation over collective observation from the collective proposal network is applied as a corresponding proposal distribution as an approximation of a true posterior distribution based on the mask data structure such that a joint distribution of all attributes and mask data structure can be learned from the partially-observed data”. In the present claimed embodiment, the approach is not relating to identifying a proxy relationship for an unobserved variable through the register variable of Ecks, but rather, a mask data structure is maintained and used as an input for machine learning for expectation over collective observation.  Oono is then cited in respect of machine learning systems using variational autoencoders.  Li is directed to multimodal data sets using a plurality of different machine learning approaches, where a number of informative and less informative modalities can be utilized in combination (e.g., discarding noise information). Li does not remedy the deficiencies of Bui, Ecks, or Oono,  Rothberg et al (20190347523) teaches differing encoders for modalities A/B with a combined database influencing the feature space – Fig. 2a.
None of the prior art of record, alone or in combination, explicitly teach the claim features of the independent claims, as compared/contrasted above.
 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Please see related art listed on the PTO-892 form.

Furthermore, references that are pertinent to applicants claim features, are discussed above in the stated reasons for allowable subject matter.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/Michael N Opsasnick/Primary Examiner, Art Unit 2658                                                                                                                                                                                            03/31/2026
Read full office action
Prosecution Timeline

Jul 08, 2024
Application Filed
Mar 31, 2026
Non-Final Rejection — §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/512,723
Patent 12602554
SYSTEMS AND METHODS FOR PRODUCING RELIABLE TRANSLATION IN NEAR REAL-TIME
2y 5m to grant Granted Apr 14, 2026
17/698,029
Patent 12592246
SYSTEM AND METHOD FOR EXTRACTING HIDDEN CUES IN INTERACTIVE COMMUNICATIONS
2y 5m to grant Granted Mar 31, 2026
18/367,779
Patent 12586580
System For Recognizing and Responding to Environmental Noises
2y 5m to grant Granted Mar 24, 2026
18/344,007
Patent 12579995
Automatic Speech Recognition Accuracy With Multimodal Embeddings Search
2y 5m to grant Granted Mar 17, 2026
18/273,354
Patent 12567432
VOICE SIGNAL ESTIMATION METHOD AND APPARATUS USING ATTENTION MECHANISM
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
82%
Grant Probability
92%
With Interview (+10.5%)
3y 3m
Median Time to Grant
Low
PTA Risk
Based on 900 resolved cases by this examiner. Grant probability derived from career allow rate.