Notice of Pre-AIA or AIA Status
In response to communications filed 30 October 2025, claims 1, 3, 12-13, and 18-19 are amended; claim 14 is cancelled; and claim 21 is added per applicant’s request. Claims 1-13 and 15-21 are pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Allowable Subject Matter
Claims 12-13, 15-17, and 21 are allowed.
Response to Arguments
Applicant’s arguments, see section “Rejections based on 35 U.S.C. § 101,” filed 30 October 2025, with respect to claims 1, 12, and 18 have been fully considered. The arguments in subsection B are persuasive for claim 12, but are not persuasive for claims 1 and 18. The rejection of claims 12-13 and 15-17 has been withdrawn.
In subsection A, applicant argues that the claims are patent eligible because they require a computer.
Various independent claims and dependent claims cannot be performed without a computer and, in fact, require specific computing components.
However, these arguments are not persuasive, because a claim that requires a computer may still recite a mental process. See MPEP § 2106.04(a)(2)(III)(C). Regarding claim 1, applicant argues that
A human being cannot create or manipulate computer pointers, allocate digital storage media, or associate data structures in computer memory for example. These are computer-specific technical operations tied to data organization in storage systems.
Regarding claim 18, applicant argues that the claims recite
a data center system comprising multiple computing nodes with GPUs that receive natural language queries, parses them to identify multimodal conditions, and then map those conditions to distinct index data structures by associating them through pointers or keys.
First, these features are not recited in claim 18, so applicants arguments with respect to claim 18 are not persuasive. Second, the examiner notes that the “computer retrieval operation,” as recited in claim 1 amounts to an intended use of the “reference point to perform a computer retrieval operation.” Regarding claim 1, the features at issue merely use a computer as a tool in its ordinary capacity to allocate digital storage and that has the capability to retrieve information from the digital storage using pointers or keys. Accordingly, these arguments are not persuasive.
In subsection B, applicant argues that claims 1 and 12 are patent eligible because they amount to an improvement in technology .
claim 12 recites parsing a natural language query that includes conditions for different modalities and then mapping those conditions into distinct index data structures. This directly aligns with the disclosure in [0032], which explains that various embodiments enable true multimodal search by integrating multiple types of data (text, visual, and spatial) within a unified system, allowing queries such as "all wooden chairs in a kitchen" that existing single-modality systems cannot process.
These arguments are persuasive for claim 12, and the rejection of claims 12-13 and 15-17 is withdrawn. Regarding claim 1, the features of claim 12 supporting the improvement in technology are not recited. Applicant does not present argument specifically addressing claim 18, but the features of claim 12 are likewise not recited in claim 18.
In subsection C, applicant argues that claims 1 and 18 are patent eligible because they are analogous to Enfish. Regarding claim 1, applicant argues
Claim 1, for example, recites storing multimodal data using distinct index data structures in computer storage media and then generating a composite index by associating those indices under a single asset identifier via pointers or keys.
However, these arguments are not persuasive. First, applicant does not present arguments specifically addressing the features of claim 18. In addition, the features of generating a composite index by associating those indices under a single asset identifier via pointers or keys fall within the Mental Processes grouping of abstract ideas. One cannot argue a specific improvement in computer functionality based on features that fall within a judicial exception. The additional elements of “storing” do not amount to a specific improvement in computer functionality as discussed further in the instant rejection.
In subsection D, applicant argues that claims 1 and 18 are patent eligible because the steps of “storing” are not well-understood, routine, and convention activities. First, these features are not recited in claim 18, and applicant does not present arguments specifically addressing the features of claim 18. Regarding claim 1, paragraph [0002] of the specification provides evidence that “storing” was a well-understood, routine, and conventional activity of “Digital asset management and search.” Applicant does not address this evidence. In addition, applicant’s argument drawn to the “uses of a composite index is to reorganize how computers handle multimodal asset data” are directed to features that fall within the Mental Processes grouping of abstract ideas. Accordingly, these arguments are not persuasive.
Applicant’s arguments, see section “Objections,” filed 30 October 2025, with respect to claim 13 have been fully considered and are persuasive. The objection of claim 13 has been withdrawn.
Applicant’s arguments, see section “Rejections based on 35 U.S.C. § 102(a)(2),” filed 30 October 2025, with respect to claim 1 have been fully considered but are not persuasive.
In subsection A.i, applicant argues that Wang does not teach distinct indices.
Wang consistently describes that both textual tokens and image tokens are stored in a single search index. [emphasis in original]
However, these arguments are not persuasive. Applicant is correct that Want teaches the “Search Text Tokens” index (Fig. 2, element 122) and “Search Image Tokens” index (Fig. 2, element 124) as sub-indices of the “Search Index” (Fig. 2, element 118). However, these indices teach the “first index” and “second index” as understood in light of the specification. Paragraph [0027] of the specification discloses that the “first index” and “second index,” are “multiple sub-indices [where] Each sub-index stores information related to a specific data type or modality.” Wang therefore teaches the limitations at issue, where the “search image tokens” (“search text tokens”) are a first index (second index) representative of a first “image” (“text”) data type. Figure 4 of the application further shows the “Composite Index” (a single search index) comprising the sub-indices “Visual Data” and “Text Data,” similar to how the “Search Index” (Fig. 2, element 118) taught by Wang comprises elements 122 and 124. Applicant’s arguments are therefore not persuasive, when the claims are interpreted in light of the specification.
In subsection A.ii, applicant argues Wang does not teach the composite index associating a “single asset identifier” with both the first and second index.
Wang consistently describes the use of separate identifiers for different types
of tokens, each stored in the same search index. [emphasis in original]
However, these arguments are not persuasive, because Wang teaches a single “identifier.” Wang teaches in paragraph [0045] that the search of the composite “search index 118” produces a “list of identifiers for the resulting images can be determined for images in the search index 118.” Each of these identifiers is a “single asset identifier.” Wang further teaches that this single identifier is associated to the first and second index in paragraphs [0060] and [0064], where the “textual token also includes a corresponding identifier for each image” and the “image token . . . includes a corresponding identifier for each image,” respectively. Accordingly, Wang teaches the limitations at issue.
In section B, regarding claim 9, applicant argues
Claim 9 recites obtaining both a user-defined natural-language term and a set of images and computing a multimodal embedding by mapping the term and the set of images into a shared embedding space to form a single vector representation encoding both inputs. Wang's Wang's UI accepts textual data or image data as query input (Wang ¶[0068]), not both together as recited. [emphasis in original]
However, these arguments are not persuasive, because the features upon which applicant relies are not recited in claim 9. Claim 9 does not recite “to form a single vector representation encoding both inputs.” In addition, Wang does teach a shared embedded space, because in paragraph [0071] the “search tokens” are compared directly to the “textual tokens” and “image tokens.” If these (search, textual, and image) tokens were not all mapped to a shared embedded space, such a comparison would not be possible.
Applicant’s arguments, see section “Rejections based on 35 U.S.C. § 103,” subsection IV.A.ii, paragraph 5, filed 30 October 2025, with respect to claim 12 have been fully considered and are persuasive. The rejection of claims 12-13, 15-17, and 21 is withdrawn.
Applicant’s arguments, see section “Rejections based on 35 U.S.C. § 103,” section IV.B, filed 30 October 2025, with respect to claim 18 have been fully considered but are not persuasive.
In subsection IV.B.i, applicant argues against Wang individually.
Wang does not disclose aggregating or combining data from distinct indices (text and image) under a shared asset identifier as recited in amended claim 18.
However, these argument are not persuasive, because Wang is not relied upon to teach aggregating data. Instead, Krishnan is relied upon to teach these features. One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
In subsection IV.B.ii, applicant argues against Krishnan individually.
Krishnan's tensor is an intermediate feature representation, not an index-based aggregation that connects multiple modality-specific indices into a unified asset-level profile . . . The claimed "multimodal profile" depends on index-level association and retrieval logic, not embedding fusion.
However, these arguments are not persuasive, because Krishnan is not relied upon to the multiple modality-specific indices. Instead, Wang is relied upon to teach these features. One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).
In subsection IV.B.ii, applicant argues that Wang and Krishnan do not teach the relevance score computation as recited.
claim 18 recites computing a relevance score "based at least on the associating of the asset identifier to each index," which ensures that scoring reflects correlation across all modalities for the same digital asset, rather than token or tensor similarity in isolation.
However, these arguments are not persuasive, because the combination of references teach the limitations at issue. Krishnan teaches in [0052] to “generate a multimodal tensor containing various vector embeddings.” By aggregating the “embeddings” of textual and image features, as taught by Wang in paragraphs [0060] and [0064], respectively, into a “multimodal tensor containing various vector embeddings,” as taught by Krishnan, the art on record teaches that the “similarity score” (see Krishnan [0055]) will reflects correlations across all modalities. Applicant’s arguments that the token and tensor similarity is calculated in isolation, is therefore not persuasive, in view of Wang as modified by Krishnan, because Krishnan teaches that it would have been obvious to one of ordinary skill in the art to aggregate the vector embeddings taught by Wang.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-11 and 18-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 1 recites steps to
generate a composite index at least by associating, via at least one of: a pointer or a key, a single asset identifier to both the first index and the second index, the single asset identifier representing an identity of the digital asset and being a reference point to perform a retrieval operation of at least one of the first data or the second data using at least one of the first index or the second index for a query associated with the digital asset.
These limitations, under their broadest reasonable interpretation, fall within the Mental Processes grouping of abstract ideas. The step to “generate” can be performed as mental observations and/or evaluations, perhaps using pen and paper. Accordingly, the claim 1 recites an abstract idea.
This judicial exception is not integrated into a practical application. Claim 1 further recites to
obtain first data and second data associated with a digital asset;
store the first data using a first index, that represents an association between the first data and a first data type; and
store the second data using a second index, that represents an association between the second data and a second data type.
However, these additional elements do not integrate the judicial exception into a practical application, because they amount to an insignificant extra-solution activity, such as mere data gathering (“obtain” and “store”). See MPEP § 2106.05(g).
Claim 1 further recites “One or more processors comprising one or more processing units to” perform the steps analyzed above. In addition, claim 1 further recites that the first data and second data are stored “to [the] computer storage media,” and that the retrieval operation is a “computer” retrieval operation that retrieves at least one of the first data or the second data “from the computer storage media.” However, these additional elements do not integrate the judicial exception into a practical application, because they amount to merely implementing the judicial exception on a generic computer system. See MPEP § 2106.05(f).
Claim 3 further recites to “receive a user query.” However, these additional elements do not integrate the judicial exception into a practical application, because they amount to an insignificant extra-solution activity, such as mere data gathering (“receive”). See MPEP § 2106.05(g).
Claim 4 further recites to “cause presentation, at a user device, of a representation of the digital asset.” However, these additional elements do not integrate the judicial exception into a practical application, because they amount to an insignificant extra-solution activity, such as mere data outputting (“cause presentation”). See MPEP § 2106.05(g).
Claim 8 further recites that the scene is “represented using a universal scene descriptor (USD) data format.” However, these additional elements do not integrate the judicial exception into a practical application, because they amount to merely implementing the judicial exception on a generic computer system. See MPEP § 2106.05(f).
Claim 9 further recites to “obtain a user-defined natural language term and a set of images.” However, these additional elements do not integrate the judicial exception into a practical application, because they amount to an insignificant extra-solution activity, such as mere data gathering (“obtain”). See MPEP § 2106.05(g).
Claim 10 further recites to “provide an image of the digital asset as input to a vision language model (VLM).” However, these additional elements do not integrate the judicial exception into a practical application, because they amount to merely implementing the judicial exception on a generic computer system. See MPEP § 2106.05(f).
Claim 11 further recites “wherein the one or more processors is comprised in at least one of” a list of technological environments. However, these additional elements do not integrate the judicial exception into a practical application, because they amount to generally linking the use of a judicial exception to a particular technological environment or field of use. See MPEP § 2106.05(h).
Claim 18 recites the steps of
mapping one or more parameters or one or more conditions of a user query to a plurality of indices, each index of the plurality of indices representing an association of the data with a respective modality associated with a digital asset;
generating a multimodal profile for the digital asset by aggregating data obtained from the plurality of indices by associating an asset identifier to each index, of the plurality of indices, the asset identifier indicating an identity of the digital asset; and
based at least on the associating of the asset identifier to each index, of the plurality of indices, computing a relevance score for the digital asset, the relevance score indicating a measure in which the multimodal profile satisfies the one or more query parameters or conditions across each respective modality.
These limitations, under their broadest reasonable interpretation, fall within the Mental Processes grouping of abstract ideas. The steps of “mapping,” “generating,” and “computing” can be performed as mental observations and/or evaluations, perhaps using pen and paper. Accordingly, the claim 18 recites an abstract idea.
This judicial exception is not integrated into a practical application. Claim 18 further recites
based at least on the relevance score, causing presentation of an indicator that represents at least the digital asset.
However, these additional elements do not integrate the judicial exception into a practical application, because they amount to an insignificant extra-solution activity, such as mere data outputting (“causing presentation”). See MPEP § 2106.05(g).
Claim 20 further recites “wherein the method is performed by at least one of” a list of technological environments. However, these additional elements do not integrate the judicial exception into a practical application, because they amount to generally linking the use of a judicial exception to a particular technological environment or field of use. See MPEP § 2106.05(h).
The additional limitations of claims 2-6, 8-11, and 19-20 not explicitly addressed above, likewise fall within the Mental Processes grouping of abstract ideas. Considering the limitations as an ordered combination adds nothing that is not already present when considering the elements individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology.
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than implementing the judicial exception on a generic computer system; insignificant extra-solution activities; and/or generally linking the use of a judicial exception to a particular technological environment or field of use. In addition, paragraph [0002] of the specification provides evidence that the steps to “obtain,” “store,” “receive . . . query,” and “cause presentation” were well-understood, routine, and conventional activities of “Digital asset management and search” systems. See also MPEP § 2106.05(d)(II). Therefore, claims 1-11 and 18-20 are not patent eligible.
Claim Objections
Claim 19 is objected to because of the following informalities: the dependency to “claim 1” appears to be a typographical error for --claim 18-- (line 1).
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-5, 9, and 11 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Wang (US 2025/0217407 A1).
Regarding claim 1, Wang teaches one or more processors comprising one or more processing units to (see Wang [0080]):
obtain first data and second data associated with a digital asset (see Wang [0060], [0064], and [0038]-[0039], the “search image token” and “search text token” are first and second data, respectively, associated with the a utility asset image);
store, to computer storage media, the first data using a first index, that represents an association between the first data and a first data type (see Wang Fig. 1, element 124, [0064] and [0038]-[0039], the “search image tokens” are a first index representative of a first “image” data type);
store, to the computer storage media, the second data using a second index, that represents an association between the second data and a second data type (see Wang Fig. 1, element 122, [0060] and [0038]-[0039], the “search text tokens” are a second index representative of a second “text” data type); and
generate a composite index at least by associating, via at least one of: a pointer or a key, a single asset identifier to both the first index and the second index (see Wang Fig. 2, element 118, and [0038]-[0039], the “search index” is a composite index associating an “identifier for each image,” i.e., a single asset identifier being a reference point, with index),
the single asset identifier representing an identity of the digital asset and being a reference point to perform a computer retrieval operation of at least one of the first data or the second data from the computer storage media using at least one of the first index of the second index for a query associated with the digital asset (see Wang [0060], [0064][0071], “identify a set of candidate images by comparing search tokens . . . to the textual tokens and image tokens stored in a search index,” where [0045] teaches a “list of identifiers . . . determined for images in the search index,” i.e., a list of single asset identifiers).
Regarding claim 2, Wang teaches wherein the first data type and the second data type are distinct data types among two or more of: visual data that indicates attributes of the digital asset, text data that describes the digital asset in natural language, spatial data associated with the digital asset, or material data that describes one or more material properties of the digital asset (see Wang [0058], “image pixels,” and [0064], “natural language”).
Regarding claim 3, Wang teaches wherein the one or more processing units are further to:
receive a user query associated with the digital asset (see Wang [0067]-[0068], “search input”); and
compute a relevance score for at least a subset of digital assets of a plurality of digital assets, based at least in part on a measure in which the first data from the first index and the second data from the second index satisfies specific parameters or conditions in the user query (see Wang [0074], “computes a similarity score indicating a likelihood of an image from the candidate images matching the textual token and the image token from the search input”).
Regarding claim 4, Wang teaches wherein the one or more processing units are further to:
rank at least each asset of the subset of digital assets of the plurality of assets based at least on the relevancy score for each digital asset of the subset of digital assets (see Wang [0074], “ranks the candidate images based on the similarity score”); and
based at least on the rank of each digital asset of the subset of digital assets, cause presentation, at a user device, of a representation of the digital asset (see Wang [0076] and [0047]-[0048] and [0076], “provide ranked candidate images . . . for output”).
Regarding claim 5, Wang teaches wherein the user query includes at least one of: a set of natural language characters, an image or other visual input, or a multimodal query combining multiple criteria (see Wang [0067]-[0068]).
Regarding claim 9, Wang teaches wherein the one or more processing units are further to:
obtain a user-defined natural language term and a set of images (see Wang [0058]-[0060], “annotation describing the object” and “subset of images”); and
compute a multimodal embedding by mapping the user-defined natural language term and the set of images into a shared embedding space, wherein the multimodal embedding is a vector representation that encodes data from the user-defined natural language term and the set of images into the shared embedding space, wherein the multimodal embedding is included in the second data (see Wang [0039] and [0060], “embedding and/or encoding processes to map text feature data into a search text token”).
Regarding claim 11, Wang teaches wherein the one or more processors is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system for generating synthetic data; a system for generating synthetic data using one or more large language models (LLMs); a system for generating synthetic data using one or more vision language models (VLMs); a system for generating synthetic data using one or more multi-modal language models; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources (see Wang [0052] and/or [0083]).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (US 2025/0217407 A1) in view of Krishnan et al. (US 2023/0306087 A1).
Regarding claim 18, Wang teaches a method comprising:
mapping one or more parameters or one or more conditions of a user query to a plurality of indices (see Wang [0071], “comparing search tokens . . . to the textual tokens and image tokens stored in a search index,” where [0067]-[0068] teaches “text input” and “image input” query parameters or conditions),
each index of the plurality of indices representing an association of the data with a respective modality associated with a digital asset (see Wang Fig. 1, elements 122 and 124, [0060], [0064] and [0038]-[0039], the “search image tokens” and “search text tokens” are indicies representative of “image” and “text” modalities).
Wang does not explicitly teach generating a multimodal profile for the digital asset by aggregating data.
However, Krishnan teaches generating a multimodal profile for the digital asset by aggregating data (see Krishnan [0052], “generate a multimodal tensor containing various vector embeddings,” where aggregating the vector embeddings into a tensor generates a multimodal profile).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to generate a multimodal profile, as taught by Krishnan, in combination with the techniques taught by Wang, because “The standardization of the summary tensors enables the use of ANN techniques to reduce the size of the original asset index library 174 by an order of magnitude (e.g., from millions to a few thousand candidates) to create a smaller candidate set for a given query” (see Krishnan [0054]).
Wang as modified teaches
generating a multimodal profile for the digital asset by aggregating data obtained from the plurality of indices by associating an asset identifier to each index, of the plurality of indices, the asset identifier indicating an identity of the digital asset (see Krishnan [0052] and Wang [0038]-[0039], where the multimodal profile is generated, as taught by Krishnan, using data obtained from the plurality of indices by associating an “identifier for each image,” i.e., asset identifier, as taught by Wang);
based at least on the associating of the asset identifier to each index, of the plurality of indices, computing a relevance score for the digital asset, the relevance score indicating a measure in which the multimodal profile satisfies the one or more query parameters or conditions across each respective modality (see Wang [0045] and [0074] and Krishnan [0055], where “comput[ing] a similarity score indicating a likelihood of an image from the candidate images matching the textual token and the image token from the search input,” as taught by Wang, is based on the “tensor-to-tensor matching” of the multimodal profile, as taught by Krishnan); and
based at least on the relevance score, causing presentation of an indicator that represents at least the digital asset (see Wang [0076] and [0047]-[0048] and [0076], “provide ranked candidate images . . . for output”).
Regarding claim 19, Wang as modified teaches wherein the plurality of indices include at least two or more of: visual data that indicates attributes of the digital asset, text data that describes the digital asset in natural language, spatial data associated with the digital asset, or material data that describes one or more material properties of the digital asset (see Wang [0058], “image pixels,” and [0064], “natural language”).
Regarding claim 20, Wang as modified teaches wherein the method is performed by at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing deep learning operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using an edge device; a system implemented using a robot; a system for performing conversational AI operations; a system for generating synthetic data; a system for generating synthetic data using one or more large language models (LLMs); a system for generating synthetic data using one or more vision language models (VLMs); a system for generating synthetic data using one or more multi-modal language models; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources (see Wang [0052] and/or [0083]).
Claims 7 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (US 2025/0217407 A1) as applied to claim 1 above, and further in view of Lebaredian et al. (US 2022/0101619 A1).
Regarding claim 7, Wang does not explicitly teach wherein the one or more processing units are further to: convert a file in a universal scene descriptor (USD) data format into one or more image previews of one or more scenes of the file.
However, Lebaredian teaches wherein the one or more processing units are further to: convert a file in a universal scene descriptor (USD) data format into one or more image previews of one or more scenes of the file (see Lebaredian [0050], “Universal Scene Description (USD) framework . . . renderers,” where [0071] and [0156] teach image previews by rendering the scene description and to “output images in response to rendering commands”).
Wang as modified teaches to
convert the one or more image previews into one or more respective embeddings (see Lebaredian [0050] and [0156] and Wang [0038], where the image previews, as taught by Lebaredian, are converted to embeddings, as taught by Wang),
wherein at least one of the one or more image previews or the one or more respective embeddings are included in the second data (see Lebaredian [0050] and [0156] and Wang [0038]-[0039], where the embeddings, as taught by Lebaredian in view of Wang, are included in the second data taught by Wang).
Regarding claim 8, Wang does not explicitly teach wherein the one or more processing units are further to:
determine spatial data of one or more elements in a scene represented using a universal scene descriptor (USD) data format, the spatial data including at least one of: one or more positions of the elements, one or more orientations of the elements, one or more hierarchical relationships between the elements, and one or more spatial dependencies between the elements; and
based on the spatial data, construct a graph data structure that includes graph data.
However, Lebaredian teaches to
determine spatial data of one or more elements in a scene represented using a universal scene descriptor (USD) data format, the spatial data including at least one of: one or more positions of the elements, one or more orientations of the elements, one or more hierarchical relationships between the elements, and one or more spatial dependencies between the elements (see Lebaredian [0050], “format . . . the Universal Scene Description,” and [0058], “scene description of a virtual environment may be resolved to a tree structure of a transformation hierarchy”); and
based on the spatial data, construct a graph data structure that includes graph data (see Lebaredian [0058], “scene graph”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to determine the spatial data using a USD data format, as taught by Lebaredian, in combination with the techniques taught by Wang, because “Each asset may be defined in terms of one or more properties, one or more values of the one or more properties . . . The assets of a virtual environment may be defined in a scene description” (see Lebaredian [0048]).
Wang as modified teaches wherein the graph data is included in the first data (see Lebaredian [0050] and [0058] and Wang [0038]-[0039], where the graph data, taught by Lebaredian, is included in the first data taught by Wang).
Claims 6 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Wang (US 2025/0217407 A1) as applied to claim 1 above, and further in view of Deschaintre et al. (US 2025/0078387 A1).
Regarding claim 6, Wang does not explicitly teach wherein the one or more processing units are further to: compute a multimodal embedding from an image that represents the digital asset, wherein the multimodal embedding comprises a vector representation that encodes data from a text modality and an image modality into a shared embedding space.
However, Deschaintre teaches wherein the one or more processing units are further to: compute a multimodal embedding from an image that represents the digital asset, wherein the multimodal embedding comprises a vector representation that encodes data from a text modality and an image modality into a shared embedding space (see Deschaintre [0030], “embedding space in which image features and text features can be commonly encoded”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to compute a multimodal embedding, as taught by Deschaintre, in combination with the techniques taught by Wang, because the “joint feature comparison space improves searching for surface materials that can be applied to three-dimensional graphical objects” (see Deschaintre [0014]).
Wang as modified teaches wherein the multimodal embedding is included in the first data (see Wang [0038]-[0039] and Deschaintre [0030], where the multimodal embedding, taught by Deschaintre, is included in the first data taught by Wang)
Regarding claim 10, Wang does not explicitly teach wherein the one or more processing units are further to: provide an image of the digital asset as input to a vision language model (VLM) to generate metadata, the metadata including at least one of: a material property of an object in the image, a color of the object, an object type identifier of the object, scene context of the image, positioning of the object, or lighting and shading information associated with the image.
However, Deschaintre teaches wherein the one or more processing units are further to: provide an image of the digital asset as input to a vision language model (VLM) to generate metadata, the metadata including at least one of: a material property of an object in the image, a color of the object, an object type identifier of the object, scene context of the image, positioning of the object, or lighting and shading information associated with the image (see Deschaintre [0030], “vision-language model,” where [0028] teaches to generate “rendering criteria . . . lighting parameters . . . geometry parameters” metadata).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to generate metadata, as taught by Deschaintre, in combination with the techniques taught by Wang, because “contemporary techniques for material searching using text input can disregard characteristics of the materials that are being searched” (see Deschaintre [0002]).
Wang as modified teaches wherein the metadata is included in the first data (see Wang [0038]-[0039] and Deschaintre [0028] and [0030], where the metadata taught by Deschaintre is included in the first data taught by Wang).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Kristopher Andersen whose telephone number is (571)270-5743. The examiner can normally be reached 8:30 AM-5:00 PM ET, Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached at (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Kristopher Andersen/Primary Examiner, Art Unit 2159