Last updated: April 19, 2026
Application No. 18/150,129
CONTRASTIVE MULTI-FORMAT SHAPE SIMILARITY AND SEARCH

Final Rejection §101§103
Filed
Jan 04, 2023
Examiner
MAHMOOD, REZWANUL
Art Unit
2159
Tech Center
2100 — Computer Architecture & Software
Assignee
Autodesk, Inc.
OA Round
6 (Final)
Interview Optional

— +34.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 402 resolved cases, 2023–2026
Examiner Intelligence

MAHMOOD, REZWANUL View full profile →
Grants 46% of resolved cases
Career Allow Rate
186 granted / 402 resolved
-8.7% vs TC avg
Strong +35% interview lift
Without
With
+34.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
31 currently pending
Career history
433
Total Applications
across all art units
Statute-Specific Performance

§101
18.9%
-21.1% vs TC avg
§103
54.8%
+14.8% vs TC avg
§102
9.0%
-31.0% vs TC avg
§112
12.1%
-27.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 402 resolved cases
Office Action

§101 §103
DETAILED ACTION
	This office action is in response to the communication filed on December 08, 2025. Claims 1-8, 10-17, and 19-23 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/12/25 has been considered by the examiner.

Response to Arguments
	Applicant's arguments filed on December 08, 2025 have been fully considered but they are not persuasive for the following reasons:

	Applicant in Pages 10-13 of the Remarks argues that the amended claims are not directed towards non-statutory subject matter because the claims do not fall within an abstract subject matter grouping.
	Applicant in Pages 13-15 of the Remarks further argues that the claimed approach improves technology, and because the limitations of the amended claims are specifically tied to those improvements, the amended claims necessarily integrate any purported abstract idea into a practical application.
	Examiner respectfully disagrees. It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements. (MPEP 2106.05(a)).

Independent claim 1 and similarly independent claims 11 and 20 cover several steps, such as the determining embedding(s) steps, matching embeddings by computing similarities and applying a threshold steps, and the identifying based on matching step, that recite an abstract idea within the “Mental Processes” grouping of abstract ideas, because a person can mentally or using a pen and paper perform the limitations recited in said steps, which are discussed in detail below.

“determining…(i) a first embedding for a first query shape, wherein the first embedding is associated with a first query and a first format, and (ii) a first plurality of target embeddings for a first plurality of target shapes, wherein the first plurality of target embeddings is associated with at least a second format, wherein the first embedding and the first plurality of target embeddings are generated by one or more trained machine learning models based on the first query shape and the first plurality of target shapes, wherein the one or more trained machine learning models are trained based upon one or more augmented shape representations as well as contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and at least the second format comprise different three-dimensional shape formats”;

A person can mentally or using a pen and paper determine a first embedding for a first query shape associated with a first query and a first format and the person can mentally or using a pen and paper determine a first plurality of target embeddings for a first plurality of target shapes associated with a second format, wherein the first format and the second format comprise different three-dimensional shape formats, and wherein the embeddings the user analyzes for the determination were generated by one or more trained machine learning models, which were trained based on different types of information such as augmented shape representations as well as contrastive losses.

Here the claim is applying trained machine learning models generally to generate embeddings, which are used for performing the step of determining, which is a determination process that can be performed in the human mind or by a human using a pen and paper.

“matching the first embedding and the first plurality of target embeddings by: computing a plurality of similarities between the first embedding and the first plurality of target embeddings, and applying a threshold to the plurality of similarities to identify one or more matching target embeddings”;

A person can mentally or using a pen and paper match a first embedding and a first plurality of target embeddings by mentally or using a pen and paper computing a plurality of similarities between the first embedding and the first plurality of target embeddings and mentally or using a pen and paper applying a threshold to the plurality of similarities to identify one or more matching target embeddings.

“identifying, based on the one or more matching target embeddings, one or more target shapes included in the first plurality of target shapes”;

A person can mentally identify, based on one or more matching target embeddings, one or more target shapes included in a first plurality of target shapes.

The limitations, as recited above in claims 1, 11, and 20, are processes that, under their broadest reasonable interpretation, cover steps that can be performed in the human mind or by a human using a pen and paper, but for recitation of generic computer components.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.

Here the claim is applying one or more trained machine learning models generally to generate the embeddings, which are used for performing the steps of determining, matching and identifying, which are processes that can be performed in the human mind or by a human using a pen and paper.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.

The claims do not provide any limitations that are directed to a specific improvement in computer technology because the determining and matching steps, as argued by the applicant as being directed to a specific improvement in computer technology, are all recited in the claims as limitations that have been identified as abstract ideas.

The remaining steps in the claims that are identified as reciting additional
elements, are only adding insignificant extra-solution activity to the judicial exception, and are recognized as a well understood, routine, and conventional activity within the field of computer functions, which is not sufficient to amount to significantly more than the judicial exception and are not directed to any specific improvement in computer technology.

Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Applicant in Pages 15-17 of the Remarks argues that the cited prior art Tasse, Dibra, and Kelkar do not teach or even suggest the features "wherein the first format and at least the second format comprise different three-dimensional shape formats", as recited in amended independent claim 1 and similarly recited in amended independent claims 11 and 20.

Applicant’s arguments have been considered but are moot in view of new grounds of rejection, discussed in detail in the 103 rejection below.

For the above reasons, Examiner states that rejection of the current Office action is proper.

Claim Objections
Claim 19 is objected to because of the following informalities:
Dependent claim 19 is objected to because it currently depends on cancelled dependent claim 18 and should depend on independent claim 11.
Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-8, 10-17, and 19-23 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

At step 1:

Independent claims 1, 11, and 20 respectively recite a computer-implemented
method, one or more non-transitory computer-readable media, and a system, which are directed to a statutory category such as a process, machine, or an article of manufacture.

At step 2A, prong one:

Independent claim 1 and similarly independent claims 11 and 20 recite the limitations:

 “determining…(i) a first embedding for a first query shape, wherein the first embedding is associated with a first query and a first format, and (ii) a first plurality of target embeddings for a first plurality of target shapes, wherein the first plurality of target embeddings is associated with at least a second format, wherein the first embedding and the first plurality of target embeddings are generated by one or more trained machine learning models based on the first query shape and the first plurality of target shapes, wherein the one or more trained machine learning models are trained based upon one or more augmented shape representations as well as contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and at least the second format comprise different three-dimensional shape formats”;

A person can mentally or using a pen and paper determine a first embedding for a first query shape associated with a first query and a first format and the person can mentally or using a pen and paper determine a first plurality of target embeddings for a first plurality of target shapes associated with a second format, wherein the first format and the second format comprise different three-dimensional shape formats, and wherein the embeddings the user analyzes for the determination were generated by one or more trained machine learning models, which were trained based on different types of information such as augmented shape representations as well as contrastive losses.

Here the claim is applying trained machine learning models generally to generate embeddings, which are used for performing the step of determining, which is a determination process that can be performed in the human mind or by a human using a pen and paper.

“matching the first embedding and the first plurality of target embeddings by: computing a plurality of similarities between the first embedding and the first plurality of target embeddings, and applying a threshold to the plurality of similarities to identify one or more matching target embeddings”;

A person can mentally or using a pen and paper match a first embedding and a first plurality of target embeddings by mentally or using a pen and paper computing a plurality of similarities between the first embedding and the first plurality of target embeddings and mentally or using a pen and paper applying a threshold to the plurality of similarities to identify one or more matching target embeddings.

“identifying, based on the one or more matching target embeddings, one or more target shapes included in the first plurality of target shapes”;

A person can mentally identify, based on one or more matching target embeddings, one or more target shapes included in a first plurality of target shapes.

The limitations, as recited above in claims 1, 11, and 20, are processes that, under their broadest reasonable interpretation, cover steps that can be performed in the human mind or by a human using a pen and paper, but for recitation of generic computer components.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claims recite an abstract idea.

At step 2A, prong two:

This judicial exception is not integrated into a practical application.

Independent claim 1 and similarly independent claims 11 and 20 recite the limitations:

“outputting the one or more shapes in a first response to the first query”, which is a step of outputting information in response to a query, and is just adding insignificant extra-solution activity to the judicial exception as a form of selecting information for collection, analysis, and display (MPEP 2106.05(g)(iii)).

The additional elements “a computer-implemented method”, “via one or more trained machine learning models configured to process augmented shape representations”, and “by the one or more trained machine learning models” in the steps in claim 1, are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.

The additional elements “one or more non-transitory computer-readable media storing instructions that, when executed by one or more processors, cause the one or more processors to analyze similarities associated with a plurality of shapes when processing a query, by performing the steps of”, “via one or more trained machine learning models configured to process augmented shape representations”, and “by the one or more trained machine learning models” in the steps in claim 11, are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.

The additional elements “a system, comprising: one or more memories that store instructions, and one or more processors that are coupled to the one or more memories and, when executing the instructions, are configured to perform the steps”, “via one or more trained machine learning models configured to process augmented shape representations”, and “by the one or more trained machine learning models” in the determining, matching, and outputting steps in claim 20, are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.

The high level application of a trained machine learning model is just adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).

Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

At step 2B:

Independent claims 1, 11, and 20 recite the same additional elements as identified in step 2A prong two above. These additional elements are not sufficient to amount to significantly more than the judicial exception.

Independent claim 1 and similarly independent claims 11 and 20 recite the limitations:

“outputting the one or more shapes in a first response to the first query”, which is a step of outputting information in response to a query, and is recognized as a well understood, routine, and conventional activity within the field of computer functions as an element of presenting offers and gathering statistics (MPEP 2106.05(d)(II)(iv)).

The high level application of a trained machine learning model is just adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).

Accordingly, the additional limitations are not sufficient to amount to significantly more than the judicial exception. Therefore, the claims are directed to an abstract idea and are not patent eligible.

Dependent claim 2 recites additional limitations, such as:
“determining a second embedding for a second query that includes a textual description of a second query shape; and
matching the textual description to one or more additional shapes included in the first plurality of target shapes based on the second embedding and the first plurality of target embeddings”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claims 1 and 11, because a person can mentally or using a pen and paper determine a second embedding for a second query that includes a textual description of a second query shape and the person can mentally or using a pen and paper match the textual description to one or more additional shapes included in a first plurality of target shapes based on the second embedding and the first plurality of target embeddings, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependent claim 3 recites additional limitations, such as:
“matching the first query shape to one or more additional shapes associated with a third format based on the first embedding and a second plurality of target embeddings for a second plurality of target shapes associated with the third format”;
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper match a first query shape to one or more additional shapes associated with a third format based on a first embedding and a second plurality of target embeddings for a second plurality of target shapes associated with the third format, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
“outputting the one or more additional shapes in the first response to the first query”, which is a step of outputting information.
At step 2A prong two, the step is recited at a high level of generality, and amounts to mere data gathering and outputting, which is a form of insignificant extra-solution activity.
At step 2B, the step is recognized as a well understood, routine, and conventional activity within the field of computer functions as an element of presenting offers and gathering statistics (MPEP 2106.05(d)(II)(iv))
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependent claim 4 recites additional limitations, such as:
“determining a second embedding for a second query shape associated with a second query and at least one of the first format or the second format”;
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper determine a second embedding for a second query shape associated with a second query and at least one of a first format or the second format, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
“…convert the second embedding into a target shape in a third format”;
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper convert a second embedding into a target shape in a third format, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
The additional elements “executing a decoder neural network to” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
“outputting the target shape in a second response associated with the second query”, which is a step of outputting information.
At step 2A prong two, the step is recited at a high level of generality, and amounts to mere data gathering and outputting, which is a form of insignificant extra-solution activity.
At step 2B, the step is recognized as a well understood, routine, and conventional activity within the field of computer functions as an element of presenting offers and gathering statistics (MPEP 2106.05(d)(II)(iv))
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependent claim 5 recites additional limitations such as:
wherein determining the first embedding comprises:
“determining the first query shape based on one or more attributes of the first query”;
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper determine a first query shape based on one or more attributes of the first query, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
“…generate the first embedding based on the first query shape”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper generate a first embedding based on a first query shape, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
The additional elements “executing a first machine learning model that is included in the one or more trained machine learning models and is associated with the first format to” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependent claim 6 recites additional limitations such as:
wherein determining the first plurality of target embeddings further comprises:
“determining the second format based on one or more additional attributes of the first query”;
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper determine a first plurality of target embeddings by mentally or using a pen and paper determining a second format based on one or more additional attributes of a first query, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
“…generate the first plurality of target embeddings based on the first plurality of target shapes”.
This limitation is directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper generate a first plurality of target embeddings based on a first plurality of target shapes, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
The additional elements “executing a second machine learning model that is included in the one or more trained machine learning models and is associated with the second format to” is recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependents claim 7 recites additional limitations such as:
wherein determining the first embedding comprises:
“determining a plurality of query shapes to be combined into the first query shape based on one or more attributes of the first query; and
“aggregating a plurality of embeddings associated with the plurality of query shapes into the first embedding”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper determine a first embedding by mentally or using a pen and paper determining a plurality of query shapes to be combined into a first query shape based on one or more attributes of a first query and by mentally or using a pen and paper aggregating a plurality of embeddings associated with the plurality of query shapes into the first embedding, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependents claim 8 recites additional limitations such as:
wherein determining the first embedding comprises:
“determining the first query shape based on one or more attributes of the first query”; and
“matching the first query shape to the first embedding based on a plurality of mappings between a second plurality of target shapes associated with the first format and a second plurality of target embeddings for the second plurality of target shapes”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper determine a first embedding by mentally or using a pen and paper determining a first query shape based on one or more attributes of the first query and by mentally or using a pen and paper matching the first query shape to the first embedding based on a plurality of mappings between a second plurality of target shapes associated with a first format and a second plurality of target embeddings for the second plurality of target shapes, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependents claim 10 recites additional limitations such as:
“wherein the first format and the second format are determined based on one or more attributes of the first query and comprise at least one of: a three-dimensional (3D) model, a mesh, a boundary representation, a point cloud, or a construction model”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pern and paper determine a first format and a second format based on one or more attributes or a first query, and because the limitation does not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependent claim 12 recites additional limitations, such as:
wherein the instructions further cause the one or more processors to perform the steps of:
“determining a second embedding for a second query that includes a textual description of a second query shape; and
matching the textual description to one or more additional shapes included in the first plurality of target shapes based on the second embedding and the first plurality of target embeddings”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 11, because a person can mentally or using a pen and paper determine a second embedding for a second query that includes a textual description of a second query shape and the person can mentally or using a pen and paper match the textual description to one or more additional shapes included in a first plurality of target shapes based on the second embedding and a first plurality of target embeddings, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependents claim 13 recites additional limitations, such as:
wherein the instructions further cause the one or more processors to perform the steps of:
“determining a second embedding for a second query shape that is associated with the first query and the second format”; and
“determining the one or more shapes based on a combination of the first embedding and the second embedding”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 11, because a person can mentally or using a pen and paper determine a second embedding for a second query shape that is associated with a first query and a second format and the person can mentally or using a pen and paper determine one or more shapes based on a combination of a first embedding and the second embedding, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependents claim 14 recites additional limitations, such as:
“wherein the combination of the first embedding and the second embedding comprises at least one of an average, a weighted average, a sum, or a difference”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 11, because a person can mentally or using a pen and paper determine a combination of a first embedding and a second embedding comprising at least one of an average, a weighted average, a sum, or a difference, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependents claim 15 recites additional limitations, such as:
wherein the instructions further cause the one or more processors to perform the steps of:
“determining a third format based on one or more additional attributes of the first query; and
matching the first query shape to one or more additional shapes associated with the third format based on the first embedding and a second plurality of embeddings for a second plurality of target shapes associated with the third format”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 11, because a person can mentally or using a pen and paper determine a third format based on one or more additional attributes of a first query and the person can mentally or using a pen and paper match a first query shape to one or more additional shapes associated with the third format based on a first embedding and a second plurality of embeddings for a second plurality of target shapes associated with the third format, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependents claim 16 recites additional limitations, such as:
wherein determining the first embedding and the first plurality of target embeddings comprises:
“determining at least one of the first format or the second format based on one or more attributes of the first query”;
“…generate the first embedding based on the first query shape”; and
“…generate the first plurality of target embeddings based on the first plurality of target shapes”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 11, because a person can mentally or using a pen and paper determine a first embedding and a first plurality of target embeddings by mentally or using a pen and paper determining at least one of a first format or a second format based on one or more attributes of a first query, generating the first embedding based on a first query shape, and generating a first plurality of target embeddings based on a first plurality of target shapes and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
The additional elements “executing a first encoder neural network that is included in the one or more trained machine learning models and associated with the first format to” and “executing a second encoder neural network that is included in the one or more trained machine learning models and associated with the second format to” are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependents claim 17 recites additional limitations, such as:
wherein determining the first embedding comprises:
“determining the first format based on one or more additional attributes of the first query;
matching the first query shape to the first embedding based on a plurality of mappings between a second plurality of target shapes associated with the first format and a second plurality of embeddings generated from the second plurality of target shapes”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 11, because a person can mentally or using a pen and paper determine a first embedding by mentally or using a pen and paper determining a first format based on one or more additional attributes of a first query and by mentally or using a pen and paper matching a first query shape to the first embedding based on a plurality of mappings between a second plurality of target shapes associated with the first format and a second plurality of embeddings generated from the second plurality of target shapes, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependents claim 19 recites additional limitations, such as:
“wherein the plurality of similarities comprises at least one of a cosine similarity, a Euclidean distance, or a dot product”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 11, because a person can mentally or using a pen and paper determine a plurality of similarities comprising at least one of a cosine similarity, a Euclidean distance, or a dot product, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependent claim 21 recites additional limitations, such as:
“…based on one or more query shape attributes associated with the first query shape…convert one or more embeddings associated with the one or more target shapes included in the first plurality of target shapes into one or more target shapes in the first format”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper convert one or more embeddings associated with one or more target shapes included in a first plurality of target shapes into one or more target shapes in a first format based on one or more query shape attributes associated with a first query shape, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
The additional elements “executing… a decoder neural network trained based on the contrastive losses to convert” are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependent claim 22 recites additional limitations, such as:
“…convert the first embedding into a target shape in the second format”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper convert a first embedding into a target shape in a second format, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
The additional elements “executing a decoder neural network associated with the second format to convert” are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Dependent claim 23 recites additional limitations, such as:
“wherein the first format comprises a point cloud and the second format comprises a mesh”.
These limitations are directed to the same abstract idea under the mental processes grouping as independent claim 1, because a person can mentally or using a pen and paper determine a first format comprising a point cloud and a second format comprising a mesh, and because the limitations do not recite any additional elements that are sufficient to amount to significantly more.
The additional elements “executing a decoder neural network associated with the second format to convert” are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using generic computer components.
Accordingly, the additional elements, individually or in combination, do not integrate the abstract idea into a practical application, even viewing the claims a whole, because it does not impose any meaningful limits on practicing the abstract idea.

Accordingly, dependent claims 2-8, 10, 12-17, 19, and 21-23 are also directed to abstract idea without significantly more and are not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-8, 10-17, 19, 20, 22, and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tasse (US Pub 2020/0104318) in view of Dibra (US Pub 2021/0201565) in view of Kelkar (US Pub 2024/0020954) and in further view of Thacker (US Pat 10,930,066).

With respect to claim 1, Tasse discloses a computer-implemented method for
analyzing similarities associated with a plurality of shapes when processing a query, the
method comprising:
determining, via one or more trained machine learning models configured to process…shape representations: (i) a first embedding for a first query shape, wherein the first embedding is associated with a first query and a first format and (ii) a first plurality of target embeddings for a first plurality of target shapes, wherein the first plurality of target embeddings is associated with at least a second format, wherein the first embedding and the first plurality of target embeddings are generated by one or more trained machine learning models based on the first query shape and the first plurality of target shapes, wherein the one or more trained machine learning models are trained based upon one or more…shape representations… (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query; Tasse in [0007], [0008], and [0019] discloses using machine learning performed through the use of one or more neural network, using deep learning to embed images and shapes to one common vector space to directly search for shapes; Tasse in [0018], [0020], and [0049] discloses vector representation of image data performed using a neural network, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector; here Tasse does not explicitly disclose one or more trained machine learning models configured to process augmented shape representations, wherein the one or more machine learning models are trained based upon one or more augmented shape representations as well as contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and the second format comprises different shape formats, but the Dibra, Kelkar, and Thacker references disclose the features, as discussed below);
matching the first embedding and the first plurality of target embeddings by: computing a plurality of similarities between the first embedding and the first plurality of target embeddings, and applying a threshold to the plurality of similarities to identify one or more matching target embeddings (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0015] discloses allowing similar multi-modal data to be associated by proximity; Tasse in [0022], [0025], [0029], and [0073] discloses retrieve top objects in a collection based on multimodal query using smallest Euclidean distance, calculate Euclidean difference between two points, calculating a shape descriptor according to an average of one or more descriptors in relation to each of the one or more embedded images, average biases according to one or more weights, calculating weighted average; Tasse in [0028], [0031], and [0074] and in Figures 5 and 6 discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images);
identifying, based on the one or more matching target embeddings, one or more target shapes included in the first plurality of target shapes (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] and in Figures 5 and 6 discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images); and
outputting the one or more shapes in a first response to the first query (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] and in Figures 5 and 6 discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).
Tasse discloses training one or more machine learning models based on one or more shape representations, however, Tasse does not explicitly disclose:
one or more trained machine learning models configured to process augmented shape representations…wherein the one or more trained machine learning models are trained based upon one or more augmented shape representations as well as contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and at least the second format comprises different three-dimensional shape formats;
The Dibra reference discloses one or more trained machine learning models configured to process augmented shape representations, wherein the one or more trained machine learning models are trained based upon one or more augmented shape representations (Dibra in [0020] and [0050] discloses training neural network with training images from image models, using machine learning engine for generating a model using data and images; Dibra in [0052], [0058], and [0075] discloses neural network trained from pairs of images using shapes; Dibra in [0084] and [0094] and in Figures 1 and 10 discloses machine learning learns by gathering datasets of models, machine learning algorithms trained on data augmented by changing shapes; Dibra in [0102] discloses computer system including memory or non-transitory media storing instructions executed by processor; here Dibra does not explicitly disclose wherein the one or more trained machine learning models are trained based upon contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and the second format comprises different shape formats, but the Kelkar and Thacker references disclose the features, as discussed below).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse and Dibra, to have combined Tasse and Dibra. The motivation to combine Tasse and Dibra would be to create real-time visualizations using augmented or mixed reality based on machine learning algorithms trained on data augmented by changing shapes (Dibra: [0003] and [0094]).
Tasse discloses training one or more machine learning models based on one or more image or shape representations and Dibra discloses machine learning algorithms trained on pairs of images and augmented shapes, however, Tasse and Dibra do not explicitly disclose:
one or more trained machine learning models are trained based upon… contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and at least the second format comprises different three-dimensional shape formats;
The Kelkar reference discloses one or more trained machine learning models trained based upon contrastive losses calculated between positive pairs corresponding to the same and between negative pairs corresponding to different (Kelkar in [0001] and [0004] and in Figure 6 discloses images encoded into different representations such as multidimensional vectors, comparing image representations using distance metric to determine similarity, responding to image search query based on similarity, train based on contrastive learning loss, form positive sample pair and negative sample pair, train to encode the positive sample pair to have similar representations and negative sample pair to have dissimilar representations; Kelkar in [0043] discloses locating objects and boundaries, such as lines, curves, etc. in images; Kelkar in [0071] and [0097] and in Figure 6 discloses once a set of image pairs are generated using a contrasting training method to train a machine learning model using the set of image pairs, augmenting each image of an image pair as positive sample inputs to the training component, using a training image of an image pair and a different image chosen from input images as negative sample inputs, training machine learning model using the inputs based on contrastive learning loss, computing contrastive learning loss based on positive sample pair and negative sample pair, using pairs of data points that are similar and pairs that are different in order to learn higher level features about the data, data points that are similar are positive pairs and data points that are different are negative pairs; here Tasse, Dibra, and Kelkar do not explicitly disclose wherein the first format and the second format comprise different three-dimensional shape formats, but the Thacker reference discloses the feature, as discussed below).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse, Dibra, and Kelkar, to have combined Tasse, Dibra, and Kelkar. The motivation to combine Tasse, Dibra, and Kelkar would be to train images into similar and dissimilar representations based on contrastive learning loss (Kelkar [0001] and [0004]).
Tasse discloses searching for objects using a plurality of three-dimensional shapes, Dibra discloses training machine learning models with changing shapes, and Kelkar discloses locating objects and boundaries, such as lines, curves, etc. in images, however, Tasse, Dibra, and Kelkar do not explicitly disclose:
wherein the first format and at least the second format comprises different three-dimensional shape formats;
The Thacker reference discloses wherein a first format and a second format comprise different three-dimensional shape formats (Thacker in Column 1 lines 22-47 and in Column 4 lines 18-34 discloses using natural language processing to automatically generate three-dimensional objects, a generative neural network, as part of a autoencoder that includes an encoder and a decoder, is trained on a set of three-dimensional models; Thacker in Column 5 lines 9-53 discloses individual objects using different three-dimensional formats, a first object using a first three-dimensional format, a second object using a second three-dimensional format, one or more three-dimensional formats based on vectors, one or more three-dimensional formats based on point clouds, one or more three-dimensional formats based on polygons, object described using a triangle mesh, objects using a three dimensional format based on triangles, using three-dimensional formats to describe a three-dimensional shape of individual objects).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse, Dibra, Kelkar, and Thacker to have combined Tasse, Dibra, Kelkar, and Thacker. The motivation to combine Tasse, Dibra, Kelkar, and Thacker would be to automatically generate three-dimensional objects using natural language processing (Thacker: Column 1, lines 9-11).

With respect to claim 2, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 1, further comprising:
determining a second embedding for a second query that includes a textual description of a second query shape (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query); and
matching the textual description to one or more additional shapes included in the first plurality of target shapes based on the second embedding and the first plurality of target embeddings (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).

With respect to claim 3, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 1, further comprising:
matching the first query shape to one or more additional shapes associated with a third format based on the first embedding and a second plurality of target embeddings for a second plurality of target shapes associated with the third format (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images); and
outputting the one or more additional shapes in the first response to the first query (Tasse in [0006], [0028], [0031], and [0074] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors, searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).

With respect to claim 4, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 1, further comprising:
determining a second embedding for a second query shape associated with a second query and at least one of the first format or the second format (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query);
executing a decoder neural network to convert the second embedding into a target shape in a third format (Tasse in [0007], [0008], [0018]-[0020], and [0049] discloses using machine learning performed through the use of one or more neural network, learn image descriptors from data to produce better retrieval results, using learning to embed both images and shapes to one common vector space, making it possible to directly search shapes using an image, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector); and
outputting the target shape in a second response associated with the second query (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).

With respect to claim 5, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 1, wherein determining the first embedding comprises:
determining the first query shape based on one or more attributes of the first query (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query); and
executing a first machine learning model that is included in the one or more trained machine learning models and is associated with the first format to generate the first embedding based on the first query shape (Tasse in [0007], [0008], [0018]-[0020], and [0049] discloses using a neural network using machine learning, learn image descriptors from data to produce better retrieval results, using learning to embed both images and shapes to one common vector space, making it possible to directly search shapes using an image, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector).

With respect to claim 6, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 5, wherein determining the first plurality of target embeddings further comprises:
determining the second format based on one or more additional attributes of the first query (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query); and
executing a second machine learning model that is included in the one or more trained machine learning models and is associated with the second format to generate the first plurality of target embeddings based on the first plurality of target shapes (Tasse in [0007], [0008], [0018]-[0020], and [0049] discloses using a neural network using machine learning, learn image descriptors from data to produce better retrieval results, using learning to embed both images and shapes to one common vector space, making it possible to directly search shapes using an image, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector).

With respect to claim 7, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 1, wherein determining the first embedding comprises:
determining a plurality of query shapes to be combined into the first query shape based on one or more attributes of the first query (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query); and
aggregating a plurality of embeddings associated with the plurality of query shapes into the first embedding (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query).

With respect to claim 8, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 1, wherein
determining the first embedding comprises:
determining the first query shape based on one or more attributes of the first query (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query); and
matching the first query shape to the first embedding based on a plurality of mappings between a second plurality of target shapes associated with the first format and a second plurality of target embeddings for the second plurality of target shapes (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).

With respect to claim 10, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 1, wherein the first format and the second format are determined based on one or more attributes of the first query and comprise at least one of: a three-dimensional (3D) model, a mesh, a boundary representation, a point cloud, or a construction model (Tasse in [0006], [0028], [0031], and [0074] discloses create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images; Thacker in Column 5 lines 9-53 discloses individual objects using different three-dimensional formats, a first object using a first three-dimensional format, a second object using a second three-dimensional format, one or more three-dimensional formats based on vectors, one or more three-dimensional formats based on point clouds, one or more three-dimensional formats based on polygons, object described using a triangle mesh, objects using a three dimensional format based on triangles, using three-dimensional formats to describe a three-dimensional shape of individual objects).

With respect to claim 11, Tasse discloses one or more non-transitory computer-readable media storing instructions that, when executed by one or more processors, cause the one or more processors to analyze similarities associated with a plurality of shapes when processing a query by perform the steps of:
determining, via one or more trained machine learning models configured to process…shape representations: (i) a first embedding for a first query shape, wherein the first embedding is associated with a first query and a first format and (ii) a first plurality of target embeddings for a first plurality of target shapes, wherein the first plurality of target embeddings is associated with at least a second format, wherein the first embedding and the first plurality of target embeddings are generated by one or more trained machine learning models based on the first query shape and the first plurality of target shapes, wherein the one or more trained machine learning models are trained based upon one or more…shape representations… (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query; Tasse in [0007], [0008], and [0019] discloses using machine learning performed through the use of one or more neural network, using deep learning to embed images and shapes to one common vector space to directly search for shapes; Tasse in [0018], [0020], and [0049] discloses vector representation of image data performed using a neural network, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector; here Tasse does not explicitly disclose one or more trained machine learning models configured to process augmented shape representations, wherein the one or more machine learning models are trained based upon one or more augmented shape representations as well as contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and the second format comprises different shape formats, but the Dibra, Kelkar, and Thacker references disclose the features, as discussed below);
matching the first embedding and the first plurality of target embeddings by: computing a plurality of similarities between the first embedding and the first plurality of target embeddings, and applying a threshold to the plurality of similarities to identify one or more matching target embeddings (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0015] discloses allowing similar multi-modal data to be associated by proximity; Tasse in [0022], [0025], [0029], and [0073] discloses retrieve top objects in a collection based on multimodal query using smallest Euclidean distance, calculate Euclidean difference between two points, calculating a shape descriptor according to an average of one or more descriptors in relation to each of the one or more embedded images, average biases according to one or more weights, calculating weighted average; Tasse in [0028], [0031], and [0074] and in Figures 5 and 6 discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images);
identifying, based on the one or more matching target embeddings, one or more target shapes included in the first plurality of target shapes (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] and in Figures 5 and 6 discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images); and
outputting the one or more shapes in a first response to the first query (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] and in Figures 5 and 6 discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).
Tasse discloses training one or more machine learning models based on one or more shape representations, however, Tasse does not explicitly disclose:
one or more trained machine learning models configured to process augmented shape representations…wherein the one or more trained machine learning models are trained based upon one or more augmented shape representations as well as contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and at least the second format comprises different three-dimensional shape formats;
The Dibra reference discloses one or more trained machine learning models configured to process augmented shape representations, wherein the one or more trained machine learning models are trained based upon one or more augmented shape representations (Dibra in [0020] and [0050] discloses training neural network with training images from image models, using machine learning engine for generating a model using data and images; Dibra in [0052], [0058], and [0075] discloses neural network trained from pairs of images using shapes; Dibra in [0084] and [0094] and in Figures 1 and 10 discloses machine learning learns by gathering datasets of models, machine learning algorithms trained on data augmented by changing shapes; Dibra in [0102] discloses system including memory or non-transitory media storing instructions executed by processor; here Dibra does not explicitly disclose wherein the one or more trained machine learning models are trained based upon contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and the second format comprises different shape formats, but the Kelkar and Thacker references disclose the features, as discussed below).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse and Dibra, to have combined Tasse and Dibra. The motivation to combine Tasse and Dibra would be to create real-time visualizations using augmented or mixed reality based on machine learning algorithms trained on data augmented by changing shapes (Dibra: [0003] and [0094]).
Tasse discloses training one or more machine learning models based on one or more image or shape representations and Dibra discloses machine learning algorithms trained on pairs of images and augmented shapes, however, Tasse and Dibra do not explicitly disclose:
one or more trained machine learning models are trained based upon… contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and at least the second format comprises different three-dimensional shape formats;
The Kelkar reference discloses one or more trained machine learning models trained based upon contrastive losses calculated between positive pairs corresponding to the same and between negative pairs corresponding to different (Kelkar in [0001] and [0004] and in Figure 6 discloses images encoded into different representations such as multidimensional vectors, comparing image representations using distance metric to determine similarity, responding to image search query based on similarity, train based on contrastive learning loss, form positive sample pair and negative sample pair, train to encode the positive sample pair to have similar representations and negative sample pair to have dissimilar representations; Kelkar in [0043] discloses locating objects and boundaries, such as lines, curves, etc. in images; Kelkar in [0071] and [0097] and in Figure 6 discloses once a set of image pairs are generated using a contrasting training method to train a machine learning model using the set of image pairs, augmenting each image of an image pair as positive sample inputs to the training component, using a training image of an image pair and a different image chosen from input images as negative sample inputs, training machine learning model using the inputs based on contrastive learning loss, computing contrastive learning loss based on positive sample pair and negative sample pair, using pairs of data points that are similar and pairs that are different in order to learn higher level features about the data, data points that are similar are positive pairs and data points that are different are negative pairs; here Tasse, Dibra, and Kelkar do not explicitly disclose wherein the first format and the second format comprise different three-dimensional shape formats, but the Thacker reference discloses the feature, as discussed below).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse, Dibra, and Kelkar, to have combined Tasse, Dibra, and Kelkar. The motivation to combine Tasse, Dibra, and Kelkar would be to train images into similar and dissimilar representations based on contrastive learning loss (Kelkar [0001] and [0004]).
Tasse discloses searching for objects using a plurality of three-dimensional shapes, Dibra discloses training machine learning models with changing shapes, and Kelkar discloses locating objects and boundaries, such as lines, curves, etc. in images, however, Tasse, Dibra, and Kelkar do not explicitly disclose:
wherein the first format and at least the second format comprises different three-dimensional shape formats;
The Thacker reference discloses wherein a first format and a second format comprise different three-dimensional shape formats (Thacker in Column 1 lines 22-47 and in Column 4 lines 18-34 discloses using natural language processing to automatically generate three-dimensional objects, a generative neural network, as part of a autoencoder that includes an encoder and a decoder, is trained on a set of three-dimensional models; Thacker in Column 5 lines 9-53 discloses individual objects using different three-dimensional formats, a first object using a first three-dimensional format, a second object using a second three-dimensional format, one or more three-dimensional formats based on vectors, one or more three-dimensional formats based on point clouds, one or more three-dimensional formats based on polygons, object described using a triangle mesh, objects using a three dimensional format based on triangles, using three-dimensional formats to describe a three-dimensional shape of individual objects).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse, Dibra, Kelkar, and Thacker to have combined Tasse, Dibra, Kelkar, and Thacker. The motivation to combine Tasse, Dibra, Kelkar, and Thacker would be to automatically generate three-dimensional objects using natural language processing (Thacker: Column 1, lines 9-11).

With respect to claim 12, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the one or more non-transitory computer-readable media of claim 11, wherein the instructions further cause the one or more processors to perform the steps of:
determining a second embedding for a second query that includes a textual description of a second query shape (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query); and
matching the textual description to one or more additional shapes included in the first plurality of target shapes based on the second embedding and the first plurality of target embeddings (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).

With respect to claim 13, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the one or more non-transitory computer-readable media of claim 11, wherein the instructions further cause the one or more processors to perform the steps of:
determining a second embedding for a second query shape that is associated with the first query and the second format (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query); and
determining the one or more shapes based on a combination of the first embedding and the second embedding (Tasse in [0006], [0028], [0031], and [0074] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors, searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).

With respect to claim 14, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the one or more non-transitory computer-readable media of claim 13, wherein the combination of the first embedding and the second embedding comprises at least one of an average, a weighted average, a sum, or a difference (Tasse in [0022], [0025], [0029], and [0073] discloses retrieve top objects in a collection based on multimodal query using smallest Euclidean distance, calculate Euclidean difference between two points, calculating a shape descriptor according to an average of one or more descriptors in relation to each of the one or more embedded images, average biases according to one or more weights, calculating weighted average).

With respect to claim 15, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the one or more non-transitory computer-readable media of claim 11, wherein the instructions further cause the one or more processors to perform the steps of:
determining a third format based on one or more additional attributes of the first query (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query); and
matching the first query shape to one or more additional shapes associated with the third format based on the first embedding and a second plurality of embeddings for a second plurality of target shapes associated with the third format (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).

With respect to claim 16, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the one or more non-transitory computer-readable media of claim 11, wherein determining the first embedding and the first plurality of target embeddings comprises:
determining at least one of the first format or the second format based on one or more attributes of the first query (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query);
executing a first encoder neural network that is included in the one or more trained machine learning models and associated with the first format to generate the first embedding based on the first query shape (Tasse in [0007], [0008], [0018]-[0020], and [0049] discloses using machine learning performed through the use of one or more neural network, learn image descriptors from data to produce better retrieval results, using learning to embed both images and shapes to one common vector space, making it possible to directly search shapes using an image, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector); and
executing a second encoder neural network that is included in the one or more trained machine learning models and associated with the second format to generate the first plurality of target embeddings based on the first plurality of target shapes (Tasse in [0007], [0008], [0018]-[0020], and [0049] discloses using machine learning performed through the use of one or more neural network, learn image descriptors from data to produce better retrieval results, using learning to embed both images and shapes to one common vector space, making it possible to directly search shapes using an image, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector).

With respect to claim 17, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the one or more non-transitory computer-readable media of claim 11, wherein determining the first embedding comprises:
determining the first format based on one or more additional attributes of the first query (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query); and
matching the first query shape to the first embedding based on a plurality of mappings between a second plurality of target shapes associated with the first format and a second plurality of embeddings generated from the second plurality of target shapes (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).

With respect to claim 19, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the one or more non-transitory computer-readable media of claim 18, wherein the plurality of similarities comprises at least one of a cosine similarity, a Euclidean distance, or a dot product (Tasse in [0022], [0025], [0029], and [0073] discloses retrieve top objects in a collection based on multimodal query using smallest Euclidean distance, calculate Euclidean difference between two points, calculating a shape descriptor according to an average of one or more descriptors in relation to each of the one or more embedded images, average biases according to one or more weights, calculating weighted average).

With respect to claim 20, Tasse discloses a system, comprising: one or more memories that store instructions, and one or more processors that are coupled to the one or more memories and, when executing the instructions, are configured to perform the steps of:
determining, via one or more trained machine learning models configured to process…shape representations: (i) a first embedding for a first query shape, wherein the first embedding is associated with a first query and a first format and (ii) a first plurality of target embeddings for a first plurality of target shapes, wherein the first plurality of target embeddings is associated with at least a second format, wherein the first embedding and the first plurality of target embeddings are generated by one or more trained machine learning models based on the first query shape and the first plurality of target shapes, wherein the one or more trained machine learning models are trained based upon one or more…shape representations… (Tasse in [0001] discloses searching for two-dimensional or three-dimensional objects in a collection using a multi-modal query of image and/or tag data; Tasse in [0010] and [0014] discloses searching for digital objects using any combination of images, three-dimensional shapes, and text by embedding the vector representations for these multiple modes in the same space, embedding the image data and tag data in a vector space of words; Tasse in [0026] and [0027] discloses images and shapes embedded in the same word vector space ensuring that all modalities share a common representation, searching a collection of objects based on visual and semantic similarity, search query comprising both image or object data and word or tag data, determining one or more objects in the collection having spatially close vector representation to the search query; Tasse in [0007], [0008], and [0019] discloses using machine learning performed through the use of one or more neural network, using deep learning to embed images and shapes to one common vector space to directly search for shapes; Tasse in [0018], [0020], and [0049] discloses vector representation of image data performed using a neural network, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector; here Tasse does not explicitly disclose one or more trained machine learning models configured to process augmented shape representations, wherein the one or more machine learning models are trained based upon one or more augmented shape representations as well as contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and the second format comprises different shape formats, but the Dibra, Kelkar, and Thacker references disclose the features, as discussed below);
matching the first embedding and the first plurality of target embeddings by: computing a plurality of similarities between the first embedding and the first plurality of target embeddings, and applying a threshold to the plurality of similarities to identify one or more matching target embeddings (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0015] discloses allowing similar multi-modal data to be associated by proximity; Tasse in [0022], [0025], [0029], and [0073] discloses retrieve top objects in a collection based on multimodal query using smallest Euclidean distance, calculate Euclidean difference between two points, calculating a shape descriptor according to an average of one or more descriptors in relation to each of the one or more embedded images, average biases according to one or more weights, calculating weighted average; Tasse in [0028], [0031], and [0074] and in Figures 5 and 6 discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images);
identifying, based on the one or more matching target embeddings, one or more target shapes included in the first plurality of target shapes (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] and in Figures 5 and 6 discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images); and
outputting the one or more shapes in a first response to the first query (Tasse in [0006] discloses comparing the similarity between two images consists of computing a vector representation of each image description and then estimating the distance between the two descriptors; Tasse in [0028], [0031], and [0074] and in Figures 5 and 6 discloses searching a collection of objects based on visual and semantic similarity, retrieve objects based on multi-modal search queries using image and/or text, searching for an image or shape based on a query comprising tag and image data, create a word space in which images, three dimensional objects, text and combinations of the same are embedded, determining vector representations for each of the images, three dimensional objects, text, and combinations, determining vector representation of the query, determining which one or more of the images, three dimensional objects, text and combinations have a spatially close vector representation to the vector representation of the query, retrieving corresponding query results comprising databases of 3D models, sketches, and images).
Tasse discloses training one or more machine learning models based on one or more shape representations, however, Tasse does not explicitly disclose:
one or more trained machine learning models configured to process augmented shape representations…wherein the one or more trained machine learning models are trained based upon one or more augmented shape representations as well as contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and at least the second format comprises different three-dimensional shape formats;
The Dibra reference discloses one or more trained machine learning models configured to process augmented shape representations, wherein the one or more trained machine learning models are trained based upon one or more augmented shape representations (Dibra in [0020] and [0050] discloses training neural network with training images from image models, using machine learning engine for generating a model using data and images; Dibra in [0052], [0058], and [0075] discloses neural network trained from pairs of images using shapes; Dibra in [0084] and [0094] and in Figures 1 and 10 discloses machine learning learns by gathering datasets of models, machine learning algorithms trained on data augmented by changing shapes; Dibra in [0102] discloses system including memory or non-transitory media storing instructions executed by processor; here Dibra does not explicitly disclose wherein the one or more trained machine learning models are trained based upon contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and the second format comprises different shape formats, but the Kelkar and Thacker references disclose the features, as discussed below).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse and Dibra, to have combined Tasse and Dibra. The motivation to combine Tasse and Dibra would be to create real-time visualizations using augmented or mixed reality based on machine learning algorithms trained on data augmented by changing shapes (Dibra: [0003] and [0094]).
Tasse discloses training one or more machine learning models based on one or more image or shape representations and Dibra discloses machine learning algorithms trained on pairs of images and augmented shapes, however, Tasse and Dibra do not explicitly disclose:
one or more trained machine learning models are trained based upon… contrastive losses calculated between positive shape pairs corresponding to the same shape and between negative shape pairs corresponding to different shapes, and wherein the first format and at least the second format comprises different three-dimensional shape formats;
The Kelkar reference discloses one or more trained machine learning models trained based upon contrastive losses calculated between positive pairs corresponding to the same and between negative pairs corresponding to different (Kelkar in [0001] and [0004] and in Figure 6 discloses images encoded into different representations such as multidimensional vectors, comparing image representations using distance metric to determine similarity, responding to image search query based on similarity, train based on contrastive learning loss, form positive sample pair and negative sample pair, train to encode the positive sample pair to have similar representations and negative sample pair to have dissimilar representations; Kelkar in [0043] discloses locating objects and boundaries, such as lines, curves, etc. in images; Kelkar in [0071] and [0097] and in Figure 6 discloses once a set of image pairs are generated using a contrasting training method to train a machine learning model using the set of image pairs, augmenting each image of an image pair as positive sample inputs to the training component, using a training image of an image pair and a different image chosen from input images as negative sample inputs, training machine learning model using the inputs based on contrastive learning loss, computing contrastive learning loss based on positive sample pair and negative sample pair, using pairs of data points that are similar and pairs that are different in order to learn higher level features about the data, data points that are similar are positive pairs and data points that are different are negative pairs; here Tasse, Dibra, and Kelkar do not explicitly disclose wherein the first format and the second format comprise different three-dimensional shape formats, but the Thacker reference discloses the feature, as discussed below).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse, Dibra, and Kelkar, to have combined Tasse, Dibra, and Kelkar. The motivation to combine Tasse, Dibra, and Kelkar would be to train images into similar and dissimilar representations based on contrastive learning loss (Kelkar [0001] and [0004]).
Tasse discloses searching for objects using a plurality of three-dimensional shapes, Dibra discloses training machine learning models with changing shapes, and Kelkar discloses locating objects and boundaries, such as lines, curves, etc. in images, however, Tasse, Dibra, and Kelkar do not explicitly disclose:
wherein the first format and at least the second format comprises different three-dimensional shape formats;
The Thacker reference discloses wherein a first format and a second format comprise different three-dimensional shape formats (Thacker in Column 1 lines 22-47 and in Column 4 lines 18-34 discloses using natural language processing to automatically generate three-dimensional objects, a generative neural network, as part of a autoencoder that includes an encoder and a decoder, is trained on a set of three-dimensional models; Thacker in Column 5 lines 9-53 discloses individual objects using different three-dimensional formats, a first object using a first three-dimensional format, a second object using a second three-dimensional format, one or more three-dimensional formats based on vectors, one or more three-dimensional formats based on point clouds, one or more three-dimensional formats based on polygons, object described using a triangle mesh, objects using a three dimensional format based on triangles, using three-dimensional formats to describe a three-dimensional shape of individual objects).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse, Dibra, Kelkar, and Thacker to have combined Tasse, Dibra, Kelkar, and Thacker. The motivation to combine Tasse, Dibra, Kelkar, and Thacker would be to automatically generate three-dimensional objects using natural language processing (Thacker: Column 1, lines 9-11).

With respect to claim 22, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the method of claim 1, further comprising executing a decoder neural network associated with the second format to convert the first embedding into a target shape in the second format (Tasse in [0007], [0008], [0018]-[0020], and [0049] discloses using machine learning performed through the use of one or more neural network, learn image descriptors from data to produce better retrieval results, using learning to embed both images and shapes to one common vector space, making it possible to directly search shapes using an image, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector; Thacker in Column 1 lines 22-47 and in Column 4 lines 18-34 discloses using natural language processing to automatically generate three-dimensional objects, a generative neural network, as part of a autoencoder that includes an encoder and a decoder, is trained on a set of three-dimensional models).

	With respect to claim 23, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 1, wherein the first format comprises a point cloud and the second format comprises a mesh (Thacker in Column 5 lines 9-53 discloses individual objects using different three-dimensional formats, a first object using a first three-dimensional format, a second object using a second three-dimensional format, one or more three-dimensional formats based on vectors, one or more three-dimensional formats based on point clouds, one or more three-dimensional formats based on polygons, object described using a triangle mesh, objects using a three dimensional format based on triangles, using three-dimensional formats to describe a three-dimensional shape of individual objects).

Claim(s) 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tasse (US Pub 2020/0104318) in view of Dibra (US Pub 2021/0201565) in view of Kelkar (US Pub 2024/0020954) in view of Thacker (US Pat 10,930,066) and in further view of Tal (US Pub 2023/0038256).

With respect to claim 21, Tasse in view of Dibra and in view of Kelkar and in further view of Thacker discloses the computer-implemented method of claim 1, further comprising:
executing, based on one or more query shape attributes associated with the first query shape, a decoder neural network trained based on the contrastive losses… (Tasse in [0007], [0008], and [0019] discloses using machine learning performed through the use of one or more neural network, using deep learning to embed images and shapes to one common vector space to directly search for shapes; Tasse in [0018], [0020], and [0049] discloses vector representation of image data performed using a neural network, training to identify image labels and generate descriptors, training neural network to map image or shape to a vector; Kelkar in [0001] and [0004] discloses images encoded into different representations such as multidimensional vectors, comparing image representations using distance metric to determine similarity, responding to image search query based on similarity, train based on contrastive learning loss, form positive sample pair and negative sample pair, train to encode the positive sample pair to have similar representations and negative sample pair to have dissimilar representations; Kelkar in [0071] and [0097] and in Figure 6 discloses once a set of image pairs are generated using a contrasting training method to train a machine learning model using the set of image pairs, augmenting each image of an image pair as positive sample inputs to the training component, using a training image of an image pair and a different image chosen from input images as negative sample inputs, training machine learning model using the inputs based on contrastive learning loss, computing contrastive learning loss based on positive sample pair and negative sample pair, using pairs of data points that are similar and pairs that are different in order to learn higher level features about the data, data points that are similar are positive pairs and data points that are different are negative pairs; here Tasse, Dibra, Kelkar, and Thacker do not explicitly disclose a decoder neural network trained based on the contrastive losses to convert one or more embeddings associated with the one or more shapes included in the first plurality of shapes into one or more shape in the first format, but the Tal reference discloses the feature, as discussed below).
Tasse discloses training a decoder neural network and Kelkar discloses training machine learning model using inputs based on contrastive learning loss, however, Tasse, Dibra, Kelkar, and Thacker do not explicitly disclose:
a decoder neural network trained based on the contrastive losses to convert one or more embeddings associated with the one or more target shapes included in the first plurality of target shapes into one or more target shapes in the first format.
The Tal reference discloses a decoder neural network trained based on contrastive losses to convert one or more embeddings associated with one or more target shapes included in a first plurality of target shapes into one or more target shapes in a first format (Tal in [0130] and [0135] discloses neural networks build a model from training data including shapes, encoder portion of the neural network reduces the dimensionality, learning a model from which the decoder portion recreates the input; Tal in [0171] discloses training embeddings using contrastive optimization; Tal in [0175], [0197] and [0218] discloses using neural network to convert embeddings associated with shapes into similar shapes, embeddings constructed using contrastive loss).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the teachings of Tasse, Dibra, Kelkar, Thacker, and Tal, to have combined Tasse, Dibra, Kelkar, Thacker, and Tal. The motivation to combine Tasse, Dibra, Kelkar, Thacker, and Tal would be to produce clusters of similar substructures using contrastive optimization (Tal: [0175]).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.














Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to REZWANUL MAHMOOD whose telephone number is (571)272-5625. The examiner can normally be reached M-F 9-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J. Lo can be reached at 571-272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/R.M/Examiner, Art Unit 2159  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2159
Read full office action
Prosecution Timeline

Jan 04, 2023
Application Filed
Dec 30, 2023
Non-Final Rejection — §101, §103
Apr 04, 2024
Response Filed
Jun 01, 2024
Final Rejection — §101, §103
Aug 06, 2024
Response after Non-Final Action
Aug 22, 2024
Examiner Interview (Telephonic)
Aug 22, 2024
Response after Non-Final Action
Sep 06, 2024
Request for Continued Examination
Sep 06, 2024
Response after Non-Final Action
Sep 25, 2024
Non-Final Rejection — §101, §103
Dec 26, 2024
Response Filed
Apr 19, 2025
Final Rejection — §101, §103
Jul 21, 2025
Applicant Interview (Telephonic)
Jul 24, 2025
Examiner Interview Summary
Jul 30, 2025
Request for Continued Examination
Aug 05, 2025
Response after Non-Final Action
Aug 30, 2025
Non-Final Rejection — §101, §103
Dec 08, 2025
Response Filed
Mar 21, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/343,938
Patent 12579192
PROMISE KEYS FOR RESULT CACHES OF DATABASE SYSTEMS
2y 5m to grant Granted Mar 17, 2026
17/569,030
Patent 12548309
LABEL INHERITANCE FOR SOFT LABEL GENERATION IN INFORMATION PROCESSING SYSTEM
2y 5m to grant Granted Feb 10, 2026
17/343,379
Patent 12541537
DEVICE DISCOVERY SYSTEM
2y 5m to grant Granted Feb 03, 2026
18/495,269
Patent 12524465
SYSTEMS AND METHODS FOR BROWSER EXTENSIONS AND LARGE LANGUAGE MODELS FOR INTERACTING WITH VIDEO STREAMS
2y 5m to grant Granted Jan 13, 2026
18/297,527
Patent 12450226
EFFICIENTLY ANALYZING TRACE DATA
2y 5m to grant Granted Oct 21, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
46%
Grant Probability
81%
With Interview (+34.7%)
4y 5m
Median Time to Grant
High
PTA Risk
Based on 402 resolved cases by this examiner. Grant probability derived from career allow rate.
CONTRASTIVE MULTI-FORMAT SHAPE SIMILARITY AND SEARCH

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email