Office Action Analysis: 18056101 — CONTRASTIVE EMBEDDING OF STRUCTURED SPACE FOR BAYESIAN OPTIMIZATION

Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment/Status of Claims
Claims 1-15 were amended.
Claims 1-15 are pending and examined herein.
Claims 1-15 are rejected under 35 U.S.C. 112(a).
Claims 1-15 are rejected under 35 U.S.C. 101.
Claims 1-3, 5-8, 10-13, and 15 are rejected under 35 U.S.C. 103.

Response to Arguments
Applicant’s arguments, see page 9, filed 12/04/2025, with respect to the objection to claim 9 have been fully considered and are persuasive.  The objection of claim 9 has been withdrawn. 

Applicant’s arguments, see page 9, filed 12/04/2025, with respect to the 35 U.S.C. 112(b) rejection of claims 1-15 have been fully considered and are persuasive.  The 35 U.S.C. 112(b) rejection of claims 1-15 has been withdrawn. 

Applicant's arguments filed 12/04/2025 regarding the 35 U.S.C. 101 rejection of claims 1-15 have been fully considered but they are not persuasive. Applicant argues, see pages 9-11, that "[The cited disclosure alone represents a specific technical improvement that is reflected in the language of the independent claims. For example, claim 1 recites: training a deep neural network using a contrastive learning algorithm to learn a representation of the input sample set of data by modelling a plurality of points in the input sample set of data based on the set of rules, wherein the representation has a lower dimensionality than the input space. One of ordinary skill in the computing arts would readily understand that a representation with a lower dimensionality than the input space would utilize less memory as well."
MPEP 2106.05(a) states “It is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements.”
The cited limitation is interpreted below as an abstract idea of mathematical concepts. Therefore, the limitation cannot provide the improvement.
Additionally, MPEP 2106.05(a) states "An important consideration in determining whether a claim improves technology is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. McRO, 837 F.3d at 1314-15, 120 USPQ2d at 1102-03; DDR Holdings, 773 F.3d at 1259, 113 USPQ2d at 1107. In this respect, the improvement consideration overlaps with other considerations, specifically the particular machine consideration (see MPEP § 2106.05(b)), and the mere instructions to apply an exception consideration (see MPEP § 2106.05(f)). Thus, evaluation of those other considerations may assist examiners in making a determination of whether a claim satisfies the improvement consideration."
	Even if the cited limitation was not directed to an abstract idea, the cited limitation recites generic contrastive learning at a high-level and does not cover a particular solution to the problem. Additionally, the part of the limitation “wherein the representation has a lower dimensionality than the input space” does not cover a particular solution, rather an outcome. Therefore, the claims do not represent an improvement to technology and are directed to an abstract idea.

Applicant’s arguments, see page 13, filed 12/04/2025, with respect to the rejection(s) of claim(s) 1-3, 5-8, 10-13, and 15 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of He (“SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding”, September 14, 2022), Kusner (“Grammar Variational Autoencoder”, 2017), and Zhou (“Generative Melody Composition with Human-in-the-Loop Bayesian Optimization”, October 7, 2020).

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

Claims 1-15 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claims 1, 6, and 11 recite the limitation “providing, by the search system and based on the representation, a datapoint from the input sample set of data relating to audio selection or user interface design.” However, [0056] of the specification recites "In one aspect of the embodiments, a "score" (e.g., a value) is assigned to each datapoint. In turn, a search for each datapoint that optimizes the score in a short amount of time is performed. In an example implementation, the search is 12 performed by using a search algorithm that finds a datapoint in a representation that has an optimal score (or value) for the search problem of interest."  Therefore, the search described in the specification does not provide a datapoint from the input sample set of data, rather a datapoint from the representation of the data. For purposes of examination, this limitation will be interpreted as “providing, by the search system and based on the representation, a datapoint from the representation of the input sample set of data relating to audio selection or user interface design.”
Dependent claims 2-5, 7-10, and 12-15 fail to resolve the issue and are rejected with the same rationale.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject
matter. The analysis of claims 1-15, in accordance with these steps, follows.

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine,
manufacture, or composition of matter. Claims 1-5 are directed to a process, claims 6-10 are directed to a machine, and claims 11-15 are directed to an article of manufacture. All claims are directed to statutory categories and analysis proceeds.

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.
	None of the claims represent an improvement to technology.

	Regarding claim 1, the following are abstract ideas:
	A method for performing contrastive embedding of a structured space, comprising the steps of: (Performing contrastive embedding can be practically performed in the human mind, i.e. group data based on their similarity/dissimilarity to other data, then map each group to a number based on their grouping. This limitation recites a mental process.)
	training a deep neural network using a contrastive learning algorithm to learn a representation of the input sample set of data by modelling a plurality of points in the input sample set of data based on the set of rules, wherein the representation has lower dimensionality than the input space; and (Using an algorithm to train a deep neural network is performing mathematical calculations, which is a mathematical concept. Modelling a plurality of points based on the set of rules can be practically performed in the human mind, i.e. deciding point based on the rules. )
	The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
receiving an input sample set of data corresponding to an input space, wherein the input space relates to a structure of a context-free grammar; (Receiving data is a known process on a computer; this is mere instructions to apply an exception. See MPEP § 2106.05(f)(2). Restricting the input space to a structure of a context-free grammar merely indicates a field of use. See MPEP § 2106.05(h).)
	obtaining a set of rules defining similarities in an embedding of the input sample set of data; (Receiving data is a known process on a computer; this is mere instructions to apply an exception)
supplying the representation of the sample set of data (x) to a search system; and (Supplying a representation of data is transmitting data, which is a known process on a computer. This is mere instructions to apply an exception.)
providing, by the search system and based on the representation, a datapoint from the input sample set of data relating to audio selection or user interface design. (Receiving data is a known process on a computer; this is mere instructions to apply an exception. See MPEP § 2106.05(f)(2). Restricting the input sample set of data to audio selection or user interface design merely indicates a field of use. See MPEP § 2106.05(h).)
	
	Regarding claim 2, the rejection of claim 1 is incorporated herein. The following are abstract ideas:
	selecting from the representation of the sample set of data (x) a datapoint having a score that is higher than a score associated with other datapoints in the representation of the sample set of data (x), using a search algorithm. (Using an algorithm is performing mathematical calculations, which is a mathematical concept. Selecting the data using the higher score is organizing information and manipulating information through mathematical correlations, which is a mathematical relationship. See MPEP § 2106.05(a)(2)(I)(A), example iv.)
	Claim 2 does not recite any additional elements.

	Regarding claim 3, the rejection of claim 1 is incorporated herein. The following are abstract ideas:
	generating, for every sample of the input sample set of data (x), a tree representation (xtree), thereby creating a set of tree representations; (Generating a tree representation of data can be practically performed in the human mind with the aid of pen and paper, i.e. drawing out a tree for each sample. This is a mental process.)
	for each tree in the set of tree representations, generating a list of trees that are similar, thereby generating a list of similar trees; (Generating a list of similar trees can be practically performed in the human mind with the aid of pen and paper, i.e. drawing out trees that are similar. This is a mental process.)
	generating a set of similar trees based on the list of similar trees; (Generating a set of similar trees based on the list of similar trees can be practically performed in the human mind, i.e. analyzing the similar trees to add to the set.)
applying the contrastive learning algorithm to the set of similar trees to obtain a deep neural network; and (Using an algorithm to train a deep neural network is performing mathematical calculations, which is a mathematical concept.)
	mapping … each tree in the list of similar trees to a corresponding vector representation having a predetermined length. (Mapping the trees to corresponding vector representations can be practically performed in the human mind, given the information which values to map to which vectors. This is a mental process.)
	The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	obtaining a subtree replacement lookup dictionary (L); (Obtaining a dictionary is receiving data, which is a known process on a computer. This amounts to mere instructions to apply an exception.)
using the deep neural network, (This is a generic recitation of a deep neural network used to embed data; this amounts to mere instructions to apply an exception.)

	Regarding claim 4, the rejection of claim 1 is incorporated herein. Further, the following are abstract ideas:
	parsing the anchor sequence into a tree representation; (Parsing the sequence into a tree representation can be practically performed in the human mind using the aid of pen and paper, i.e. drawing out the tree based on the anchor sequence. This is a mental process.)
	randomly selecting a random choice from the number of replacements; and (Randomly selecting a number can be practically performed in the human mind. This is a mental process.)
randomly selecting a subtree in the tree representation having a height less than the maximum subtree height, (Randomly selecting a subtree can be practically performed in the human mind. This is a mental process.)
selecting a replacement subtree from the subtree replacement lookup dictionary, and (Selecting a replacement subtree from a dictionary can be practically performed in the human mind. This is a mental process.)
replacing the subtree from the tree representation with the replacement subtree  to generate an updated tree representation. (One could practically perform replacing a subtree with the replacement subtree in the human mind with the aid of pen and paper, i.e. writing down the new tree. This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
receiving an anchor sequence, a number of replacements, a maximum subtree height and a subtree replacement lookup dictionary; (This limitation recites receiving data, which is a known process in computers. This amounts to mere instructions to apply an exception.)
from 1 to the random choice: (This is the insignificant extra-solution activity of performing repetitive calculations. See MPEP § 2106.05(d)(II), list 1, example ii.)

	Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, the following is an abstract idea:
	wherein the search algorithm is a Bayesian optimization algorithm. (This limitation recites an algorithm which is mathematical calculations, which are a mathematical concept.)
	Claim 5 does not recite any additional elements.

	Regarding claim 6,  the following are abstract ideas:
A system for performing contrastive embedding of a structured space, comprising:  (Performing contrastive embedding can be practically performed in the human mind, i.e. group data based on their similarity/dissimilarity to other data, then map each group to a number based on their grouping. This limitation recites a mental process.)
train a deep neural network using a contrastive learning algorithm to learn a representation of the input sample set of data by modelling a plurality of points in the sample set of data based on the set of rules; and (Using an algorithm to train a deep neural network is performing mathematical calculations, which is a mathematical concept.)
apply a search algorithm on the representation of the input sample set of data. (Using an algorithm is performing mathematical calculations, which is a mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
a rules database configured to store rules defining similarities in embeddings of the sample sets of data; (A database is a generic computing component, and storing data is a known process in computers. This limitation amounts to mere instructions to apply an exception.)
	 an input sample receiver configured to receive an input sample set of data corresponding to an input space; (Receiving data is a known process on a computer and a “receiver” is a generic computer component; this is mere instructions to apply an exception.)
	 a machine learning kernel configured to: (A machine learning kernel is a generic machine learning component. This amounts to mere instructions to apply an exception.)
obtain, from the rules database, a set of rules defining similarities in an embedding of the sample set of data, and (Receiving data is a known process on a computer; this is mere instructions to apply an exception)
a network access device configured to supply the representation of the sample set of data to a search system to enable the search system to … (A network access device is a generic computer component. Supplying a representation of data is transmitting data, which is a known process on a computer. This is mere instructions to apply an exception.)

Regarding claim 7, the rejection of claim 6 is incorporated herein. Further, the following is an abstract idea:
	a search system configured to select from the representation of the sample set of data a datapoint having a score that is higher than a score associated with other datapoints in the representation of the sample set of data, using the search algorithm. (Using an algorithm is performing mathematical calculations, which is a mathematical concept. Selecting the data using the higher score is organizing information and manipulating information through mathematical correlations, which is a mathematical relationship. See MPEP § 2106.05(a)(2)(I)(A), example iv.)
	Claim 7 does not recite any additional elements.

	Regarding claim 8, the rejection of claim 6 is incorporated herein. The following are abstract ideas:
	generate, for every sample of the input sample set of data, a tree representation, thereby creating a set of tree representations (Generating a tree representation of data can be practically performed in the human mind with the aid of pen and paper, i.e. drawing out a tree for each sample. This is a mental process.)
	for each tree in the set of tree representations, generate a list of trees that are similar, thereby generating a list of similar trees, and (Generating a list of similar trees can be practically performed in the human mind with the aid of pen and paper, i.e. drawing out trees that are similar. This is a mental process.)
	generate a set of similar trees based on the list of similar trees; and (Generating a set of similar trees based on the list of similar trees can be practically performed in the human mind, i.e. analyzing the similar trees to add to the set.)
	the machine learning kernel configured to apply the contrastive learning algorithm to the set of similar trees to obtain a deep neural network; and (Using an algorithm to train a deep neural network is performing mathematical calculations, which is a mathematical concept.)
a mapper configured to map each tree in the list of similar trees to a corresponding vector representation having a predetermined length. (Mapping the trees to corresponding vector representations can be practically performed in the human mind, given the information which values to map to which vectors. This is a mental process.)
	The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
	a subtree replacement lookup dictionary database configured to store one or more subtree replacement lookup dictionaries; (A database is a generic computing component, and storing data is a known process in computers. This limitation amounts to mere instructions to apply an exception.)
a tree representation generator configured to: (The broadest reasonable interpretation of this element is a code component, which is a generic computing component. This is mere instructions to apply an exception.)
obtain a subtree replacement lookup dictionary from the subtree replacement lookup dictionary database and (Obtaining a dictionary is receiving data, which is a known process on a computer. This amounts to mere instructions to apply an exception.)
using the deep neural network, (This is a generic recitation of a deep neural network used to embed data; this amounts to mere instructions to apply an exception.)

Regarding claim 9, the rejection of claim 6 is incorporated herein. The following are abstract ideas:
a parser configured to parse the anchor sequence into a tree representation; (Parsing the sequence into a tree representation can be practically performed in the human mind using the aid of pen and paper, i.e. drawing out the tree based on the anchor sequence. This is a mental process.)
a random choice selector operable to randomly select a random choice from the number of replacements; and (Randomly selecting a number can be practically performed in the human mind. This is a mental process.)
randomly select a subtree in the tree representation having a height less than the maximum subtree height, (Randomly selecting a subtree can be practically performed in the human mind. This is a mental process.)
select a replacement subtree from the subtree replacement lookup dictionary, and replace the subtree (t) from the tree representation with the replacement subtree to generate an updated tree representation. (Selecting a replacement subtree from a dictionary can be practically performed in the human mind. This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
the input data set receiver further configured to receive an anchor sequence, a number of replacements, a maximum subtree height and a subtree replacement lookup dictionary; (This limitation recites receiving data, which is a known process in computers. This amounts to mere instructions to apply an exception.)
a tree representation generator configured to: (The broadest reasonable interpretation of this element is a code component, which is a generic computing component. This is mere instructions to apply an exception.)
from 1 to the random choice: (This is the insignificant extra-solution activity of performing repetitive calculations. See MPEP § 2106.05(d)(II), list 1, example ii.)

Regarding claim 10, the rejection of claim 7 is incorporated herein. Further, the following is an abstract idea:
wherein the search algorithm is a Bayesian optimization algorithm. (This limitation recites an algorithm which is mathematical calculations, which are a mathematical concept.)
Claim 10 does not recite any additional elements.

Regarding claim 11, the following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A non-transitory computer-readable medium having stored thereon one or more sequences of instructions for causing one or more processors to perform: (This limitation recites generic computer components and processes. This is mere instructions to apply an exception.)
The remainder of claim 11 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 12-15 recite substantially similar subject matter to claims 2-5 respectively and are rejected with the same rationale, mutatis mutandis.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1, 2, 5, 11, 12, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over He (“SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding”, September 14, 2022), Kusner (“Grammar Variational Autoencoder”, 2017), and Zhou (“Generative Melody Composition with Human-in-the-Loop Bayesian Optimization”, October 7, 2020).

Regarding claim 1, He teaches
	A method for performing contrastive embedding of a structured space, comprising the steps of: (The conclusion states "In this work, we propose a new pre-trained conversation model named SPACE-2, which learns dialog representations from both labeled and unlabeled corpora via tree-structured semi-supervised contrastive learning (CL)." The representations are interpreted as embeddings, which are contrastive as they are learned from contrastive learning. The dialog is interpreted as the structured space.)
receiving an input sample set of data corresponding to an input space; (Page 3, section 3.1 states "To provide sufficient high-quality dialog corpora to train our pre-trained conversation model, we use a labeled dialog dataset called AnPreDial, which contains 32 existing labeled TOD datasets, ranging from single-turn question answering to multiturn dialogs. We also use a large-scale unlabeled dialog corpus called UnPreDial with 21 dialog datasets, ranging from open-domain online forums to document-grounded dialogs." These dialog data sets are interpreted as the input sample set of data, which corresponds to an input space, as all sample data represents an input space.)
	obtaining a set of rules defining similarities in an embedding of the input sample set of data; (Page 3 states "Different from the vanilla input representations as in BERT (Devlin et al., 2019), we set our input embeddings consisting of four elements: tokens, roles, turns, and positions. Role embeddings are used to segment which role the current token belongs to either user or system. Turn embeddings are assigned to each token according to its turn number in the dialog. Position embeddings are assigned to each token according to its relative position within its belonging sentence." Therefore, the sample set of data is embedded before input to the model. Page 5 states "Since we compute multiple scores                         
                            
                                
                                    f
                                
                                
                                    i
                                    ,
                                    j
                                
                                
                                    k
                                
                            
                        
                     for each STS pair, different types of scoring functions can be used to construct the CL loss." Page 5 states "Similar to the common practice in the current CL, we simply average the                         
                            K
                        
                     scores into a single value to weigh the semantic similarity among samples:" Therefore, the scoring functions for computing the loss are for defining similarities in embeddings of the sample set of data, and must have been obtained in order to have been used in the experiments.)
	training a deep neural network using a contrastive learning algorithm to learn a representation of the sample set of data by modelling a plurality of points in the input  sample set of data based on the set of rules; and (Page 4 states "As illustrated in Figure 3, we build our SPACE-2 model based on the bidirectional Transformer architecture (Vaswani et al., 2017)." One of ordinary skill in the art would realize that the bidirectional Transformer architecture is a deep neural network. The experiment results on page 8 require that the deep neural network be trained. The caption of figure 5 on page 6 states "(a) Self-supervised CL only predicts augmented itself from in-batch negatives, with different dropout masks applied. (b) Supervised CL considers samples of the exact same label as positives. (c) Tree-structured supervised CL considers all in-batch samples as positives with soft scores. Only scores of ISV set are depicted here." CL is contrastive learning, and therefore, the training was contrastive learning. Page 5 states "Figure 6 illustrates the 2D t-SNE visualization of the output unit vectors                         
                            
                                
                                    σ
                                
                                
                                    k
                                
                            
                            (
                            z
                            )
                        
                     for test dialog samples from the MULTIWOZ dataset. Due to the limited space, we only show the sub-spaces of D, I, S, V here. As we can see, the hidden representations of SPACE-2multi are able to differentiate the similar and dissimilar parts in different semantic sub-spaces. The learned latent sub-space is highly correlated with the dialog annotations of domain, intent, slot or value, which confirms our assumption." Therefore, a representation is generated.)
	He does not appear to explicitly teach
wherein the input space relates to a structure of a context-free grammar;
wherein the representation has a lower dimensionality than the input space;
supplying the representation to a search system; and
providing, by the search system and based on the representation, a datapoint from the input sample set of data relating to audio selection or user interface design.
However, Kusner—directed to analogous art—teaches
wherein the input space relates to a structure of a context-free grammar; (Page 3 states "Consider a subset of the SMILES grammar as shown in Figure 1, box 1. These are the possible pro duction rules that can be used for constructing a molecule. Imagine we are given as input the SMILES string for benzene: ‘c1ccccc1’. Figure 1, box 3 shows this molecule. To encode this molecule into a continuous latent representation we begin by using the SMILES grammar to parse this string into a parse tree (partially shown in box 2 ). This tree describes how ‘c1ccccc1’ is generated by the grammar. We decompose this tree into a sequence of production rules by performing a pre-order traversal on the branches of the parse tree from left-to-right, shown in box 4. We convert these rules into 1-hot indicator vectors, where each dimension corresponds to a rule in the SMILES grammar, box 5." Page 3 further states "We use a deep convolutional neural network to map the collection of 1-hot vectors X to a continuous latent vector z." Therefore, the input to the deep convolutional neural network, interpreted as the input space, relates to the grammar, which is a context-free grammar as described in section 2.2.)
wherein the representation has a lower dimensionality than the input space; (Page 11 states "We then pass these vectors through our encoder network which consists of 3 layers of one-dimensional convolutions (Kalchbrenner et al., 2014). We then flatten the resulting sequence into a vector and pass it through a fully connected layer. We then pass the resulting vector through two separate fully connected layers to produce the mean and variance of the latent distribution q(z|X) over z." As the sequence is flattened, the representation will have a lower dimension than the input space.)
supplying the representation to a search system; and (Page 6 states "After training the GVAE, we obtain a latent feature vector for each sequence in the training data, given by the mean of the variational encoding distributions. We use these vectors and their corresponding property estimates to train a sparse Gaussian process (SGP) model with 500 inducing points (Snelson & Ghahramani, 2005), which is used to make predictions for the properties of new points in latent space. After training the SGP, we then perform 5 iterations of batch Bayesian optimization using the expected improvement (EI) heuristic (Jones et al., 1998)." The SGP and the Bayesian optimization is interpreted as the search system.)
providing, by the search system and based on the representation, a datapoint from the input sample set of data … (Page 6 states "After training the GVAE, we obtain a latent feature vector for each sequence in the training data, given by the mean of the variational encoding distributions. We use these vectors and their corresponding property estimates to train a sparse Gaussian process (SGP) model with 500 inducing points (Snelson & Ghahramani, 2005), which is used to make predictions for the properties of new points in latent space. After training the SGP, we then perform 5 iterations of batch Bayesian optimization using the expected improvement (EI) heuristic (Jones et al., 1998)." The SGP and the Bayesian optimization is interpreted as the search system. The prediction for the properties of new points in latent space are interpreted as the provided datapoint from the representation of the input sample set of data.) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He and Kusner because, as Kusner states on page 2, "We show not only does our model produce a higher proportion of valid outputs than a character based autoencoder, it also produces smoother latent representations. We also show that this learned latent space is effective for searching for arithmetic expressions that fit data, for finding better drug-like molecules, and for making accurate predictions about target properties"
The combination of He and Kusner does not appear to explicitly teach
[providing, by the search system and based on the representation, a datapoint from the input sample set of data] relating to audio selection or user interface design
However, Zhou—directed to analogous art—teaches
[providing, by the search system and based on the representation, a datapoint from the input sample set of data] relating to audio selection or user interface design (Page 2 states "Our system iteratively asks the user to compare multiple melody candidates sampled from a one-dimensional subspace (Koyama et al., 2017) and to select the preferred one." Page 4 states "The searching mode is designed to iteratively show multiple candidates generated by BO based on the user's preference. In each iteration, it shows n candidates (n = 4 in our case), from which it allows the user to select the one they prefer." Therefore, the candidates, interpreted as data points, relate to audio selection.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He and Kusner with the teachings of Zhou because, as stated by Zhou on page 1, "Deep generative models, such as generative adversarial networks (GANs) (Goodfellow et al., 2014) and variational auto-encoders (VAEs) (Kingma & Welling, 2014), have been applied to support creative work, showing great potentials in images, videos, and music (Vondrick et al., 2016; Zhu et al., 2016; Dong et al., 2018). When we focus our scope on the music domain, many research projects have tried to learn latent spaces of melodies (Yang et al., 2017; Fernandez & Vico, 2013; Sim~oes et al., 2019; Herremans et al., 2017). By sampling latent vectors from the smooth latent space produced by one of these generative melody models, even novice composers can quickly generate various meaningful melodies (Ghosh et al., 2019; Roberts, Engel, et al., 2018)."
	
	Regarding claim 2, the rejection of claim 1 is incorporated herein. He does not appear to explicitly teach
	selecting from the representation of the sample set of data a datapoint having a score that is higher than a score associated with other datapoints in the representation of the sample set of data (x), using a search algorithm. 
	However, Kusner—directed to analogous art—teaches
	selecting from the representation of the sample set of data (x) a datapoint having a score that is higher than a score associated with other datapoints in the representation of the sample set of data (x), using a search algorithm. (Page 6 states "After training the GVAE, we obtain a latent feature vector for each sequence in the training data, given by the mean of the variational encoding distributions. We use these vectors and their corresponding property estimates to train a sparse Gaussian process (SGP) model with 500 inducing points (Snelson & Ghahramani, 2005), which is used to make predictions for the properties of new points in latent space. On each iteration, we select a batch of 50 latent vectors by sequentially maximizing the EI acquisition function." Therefore, as the EI acquisition function is evaluated using the representation from the encoder, and the maximum of the evaluation is taken, a data point having a score that is high than a score associated with other datapoints in the representation of the data is selected.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He and Kusner for the reasons given above in regards to claim 1.

	Regarding claim 5, the rejection of claim 1 is incorporated herein. He does not appear to explicitly teach
	wherein the search system uses a search algorithm that includes a Bayesian optimization algorithm.
	However, Kusner—directed to analogous art—teaches 
	wherein the search system uses a search algorithm that includes a Bayesian optimization algorithm. (Page 6 states "After training the GVAE, we obtain a latent feature vector for each sequence in the training data, given by the mean of the variational encoding distributions. We use these vectors and their corresponding property estimates to train a sparse Gaussian process (SGP) model with 500 inducing points (Snelson & Ghahramani, 2005), which is used to make predictions for the properties of new points in latent space. After training the SGP, we then perform 5 iterations of batch Bayesian optimization using the expected improvement (EI) heuristic (Jones et al., 1998)." The SGP and the Bayesian optimization is interpreted as the search system. The prediction for the properties of new points in latent space are interpreted as the provided datapoint from the representation of the input sample set of data.) 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He and Kusner for the reasons given above in regards to claim 1.
	
Regarding claim 11, He teaches
A non-transitory computer-readable medium having stored thereon one or more sequences of instructions for causing one or more processors to perform: (The abstract states "In this paper, we propose SPACE-2, a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised contrastive pre-training." One of ordinary skill would realize that this model would be implemented on a computer with a non-transitory computer-readable medium storing the instructions for the processors to perform the method.)
The remainder of claim 11 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 12 and 15 recite substantially similar subject matter to claims 2 and 5 respectively and are rejected with the same rationale, mutatis mutandis.
		
	Claim(s) 3 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over He (“SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding”, September 14, 2022), Kusner (“Grammar Variational Autoencoder”, 2017), and Zhou (“Generative Melody Composition with Human-in-the-Loop Bayesian Optimization”, October 7, 2020) as applied to claim 1 above, and further in view of Kim (“ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification”, Februrary 2022) and Vaswani (“Attention Is All You Need”, 2017).
	
	Regarding claim 3, the rejection of claim 1 is incorporated herein. He teaches
	generating, for every sample of the input sample set of data, a tree representation, thereby creating a set of tree representations; (Page 4 states "Therefore, in order to utilize all possible data in AnPreDial to pre-train our model, we adopt a unified schema called semantic tree structure (STS) that is suitable for every TOD sample." Page 4 states "We aim to leverage semi-supervised pre-training to learn better pre-trained representations from both the labeled and unlabeled data. Concretely, we adopt a tree-structured supervised contrastive objective on the labeled dataset AnPreDial, while a tree-structured self-supervised contrastive objective on the unlabeled dataset UnPreDial." Therefore, for every sample (labeled and unlabeled), an STS, which is a tree representation, is generated. As the dataset is then used to train the model, a dataset of tree representations is created.)
	[generating a list of similar trees] (Page 5 states "For unlabeled data, since there are no available labels, we adopt a tree-structured self-supervised contrastive objective in a similar way in Gao et al. (2021), where only the augmented data by dropout is deemed as a positive sample." The augmented data by dropout is interpreted as the set of similar trees, and as the contrastive objective uses trees, the augmented data must be in the form of trees.)
	generating a set of similar trees based on the list of similar trees; (The broadest reasonable interpretation of set includes any form of dataset, which is what is created when the augmented data is added.)
	applying the contrastive learning algorithm to the set of similar trees to obtain a deep neural network; and (Page 5 states "For unlabeled data, since there are no available labels, we adopt a tree-structured self-supervised contrastive objective in a similar way in Gao et al. (2021), where only the augmented data by dropout is deemed as a positive sample." As the augmented data, interpreted as the set of similar trees, is used for the contrastive objective, which one of ordinary skill in the art would realize is represented as a loss function in this case, the deep neural network is obtained using the contrastive learning algorithm. One of ordinary skill would realize that the loss is used to train the neural network, which is a bidirectional Transformer architecture, which is a deep neural network.)
mapping, using the deep neural network, each tree in the list of similar trees to a corresponding vector representation (Page 5 states "Given the dialog context, our model output a pooled representation at the position as the [CLS] sentence embedding                         
                            z
                        
                     of the whole context. Thus for any sample pair,                         
                            i
                            ,
                             
                            j
                        
                     the output sentence embeddings are denoted as                         
                            
                                
                                    z
                                
                                
                                    i
                                
                            
                            ,
                             
                            
                                
                                    z
                                
                                
                                    j
                                
                            
                        
                    ." As the sentences are represented as trees, the deep neural network model outputs a vector representation (embedding) of each tree, including the similar trees.)
The combination of He, Kusner, and Zhou does not appear to explicitly teach
obtaining a subtree replacement lookup dictionary;
for each tree in the set of tree representations, generating a list of trees that are similar, thereby generating a list of similar trees;
	[a vector representation] having a predetermined length.
	However, Kim—directed to analogous art--teaches
obtaining a subtree replacement lookup dictionary; (Page 10896, Fig. 2 states, in reference to “Stage 3. Extract Subtrees with Lexical Heads”, "A collection of subtrees with lexical heads of the input sentence. The lexical heads are positions that can be syntactically augmented." This collection of subtrees is interpreted as the subtree replacement lookup dictionary.)
for each tree in the set of tree representations, generating a list of trees that are similar, thereby generating a list of similar trees; (Page 10897 states "Stage 3. Extract subtrees with lexical heads After collecting all the plausible tree rules to use, we extract subtrees using lexical heads as the position information to swap. Figure 2 shows an example of VP as the lexical head. ALP swaps sub-subtrees with other types of lexical heads such as NP or PP within the subtrees if available." As can be seen in Fig. 2, Stage 4, the dictionary is used to replace subtrees, and as they have “similar lexical heads”, they are similar trees. Page 10898 states "Stages 1–2. Parse with probabilistic threshold to select more trees We first extract all the valid parse trees using probabilistic threshold τ, instead of picking a single tree with the maximum probability." Therefore, this is the set of tree representations. Page 10897 states "We use all the plausible trees generated from sentences in the same class if they are available." Therefore, a list of similar trees for each tree in the set of tree representations is created.) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He, Kusner, and Zhou with the augmentation method taught by Kim because, as Kim states on page 10894, "Our approach aims to reach theoretical guarantees of increasing both the amount and the diversity of a given dataset in a pretty label-preserving manner. As such, ALP is designed to produce augmented samples with diverse sentence structures, each still respecting the linguistic rules and preserving the corresponding class label."
	The combination of He, Kusner, Zhou and Kim does not appear to explicitly teach
	[a vector representation] having a predetermined length. (Page 3 states "To facilitate these residual connections, all sub-layers in the model, as well as the embedding layers, produce outputs of dimension                         
                            
                                
                                    d
                                
                                
                                    m
                                    o
                                    d
                                    e
                                    l
                                
                            
                            =
                            512
                        
                    ." Therefore, the output of the embedding layer, which one of ordinary skill in the art would recognize is a vector representation, has the predetermined length of 512. Additionally, page 5 states "Similarly to other sequence transduction models, we use learned embeddings to convert the input tokens and output tokens to vectors of dimension                         
                            
                                
                                    d
                                
                                
                                    m
                                    o
                                    d
                                    e
                                    l
                                
                            
                        
                    .")
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He, Kusner, Zhou and Kim with the dimension taught by Vaswani because, as stated by Vaswani on page 2, "The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs."

	Claim 13 recites substantially similar subject matter to claim 3 and is rejected with the same rationale, mutatis mutandis.
	
Claim(s) 6, 7, and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over He (“SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding”, September 14, 2022), Kim (“ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification”, Februrary 2022), Kusner (“Grammar Variational Autoencoder”, 2017), and Zhou (“Generative Melody Composition with Human-in-the-Loop Bayesian Optimization”, October 7, 2020).

Regarding claim 6, He teaches
A system for performing contrastive embedding of a structured space, comprising: (The conclusion states "In this work, we propose a new pre-trained conversation model named SPACE-2, which learns dialog representations from both labeled and unlabeled corpora via tree-structured semi-supervised contrastive learning (CL)." The representations are interpreted as embeddings, which are contrastive as they are learned from contrastive learning. The dialog is interpreted as the structured space.)
	an input sample receiver configured to receive an input sample set of data corresponding to an input space (Page 3, section 3.1 states "To provide sufficient high-quality dialog corpora to train our pre-trained conversation model, we use a labeled dialog dataset called AnPreDial, which contains 32 existing labeled TOD datasets, ranging from single-turn question answering to multiturn dialogs. We also use a large-scale unlabeled dialog corpus called UnPreDial with 21 dialog datasets, ranging from open-domain online forums to document-grounded dialogs." These dialog data sets are interpreted as the input sample set of data, which corresponds to an input space, as all sample data represents a input space. As one of ordinary skill in the art would understand, the system of He is implemented on a computer. The processor would receive the data from memory as instructed by the code stored in the memory device of the computer, as one of ordinary skill in the art would understand. Therefore, the part of the processor that receives the data and the code that implements it is interpreted as the input sample receiver.)
	a machine learning kernel configured to: (The portion of code that implements the below operations is interpreted as the machine learning kernel.)
	[obtaining rules defining similarities] in an embedding [of the sample set of data (x)](Page 3 states "Different from the vanilla input representations as in BERT (Devlin et al., 2019), we set our input embeddings consisting of four elements: tokens, roles, turns, and positions. Role embeddings are used to segment which role the current token belongs to either user or system. Turn embeddings are assigned to each token according to its turn number in the dialog. Position embeddings are assigned to each token according to its relative position within its belonging sentence." Therefore, the sample set of data is embedded before input to the model. Page 5 states "Since we compute multiple scores                         
                            
                                
                                    f
                                
                                
                                    i
                                    ,
                                    j
                                
                                
                                    k
                                
                            
                        
                     for each STS pair, different types of scoring functions can be used to construct the CL loss." Page 5 states "Similar to the common practice in the current CL, we simply average the                         
                            K
                        
                     scores into a single value to weigh the semantic similarity among samples:" Therefore, the scoring functions for computing the loss are for defining similarities in embeddings of the sample set of data, and must have been obtained in order to have been used in the experiments.)
train a deep neural network using a contrastive learning algorithm to learn a representation of the sample set of data by modelling a plurality of points in the input sample set of data based on [the augmented data]; and (Page 4 states "As illustrated in Figure 3, we build our SPACE-2 model based on the bidirectional Transformer architecture (Vaswani et al., 2017)." One of ordinary skill in the art would realize that the bidirectional Transformer architecture is a deep neural network. The experiment results on page 8 require that the deep neural network be trained. Page 5 states "For unlabeled data, since there are no available labels, we adopt a tree-structured self-supervised contrastive objective in a similar way in Gao et al. (2021), where only the augmented data by dropout is deemed as a positive sample."  The loss represented in eq. 7 and 8 show that the augmented data is used in the loss function, which would be used to train the deep neural network, as one of ordinary skill in the art would understand. The caption of figure 5 on page 6 states "(a) Self-supervised CL only predicts augmented itself from in-batch negatives, with different dropout masks applied. (b) Supervised CL considers samples of the exact same label as positives. (c) Tree-structured supervised CL considers all in-batch samples as positives with soft scores. Only scores of ISV set are depicted here." CL is contrastive learning, and therefore, the training was contrastive learning. Page 5 states "Figure 6 illustrates the 2D t-SNE visualization of the output unit vectors                         
                            
                                
                                    σ
                                
                                
                                    k
                                
                            
                            (
                            z
                            )
                        
                     for test dialog samples from the MULTIWOZ dataset. Due to the limited space, we only show the sub-spaces of D, I, S, V here. As we can see, the hidden representations of SPACE-2 multi are able to differentiate the similar and dissimilar parts in different semantic sub-spaces. The learned latent sub-space is highly correlated with the dialog annotations of domain, intent, slot or value, which confirms our assumption." Therefore, a representation is generated.)
He does not appear to explicitly teach
wherein the input space relates to a structure of context-free grammar
a rules database configured to store rules defining similarities … of the sample sets of data; 
obtain, from the rules database, a set of rules defining similarities in an embedding of the input sample set of data, and
wherein the representation has a lower dimensionality than the input space;
a network access device configured to supply the representation of the input sample set of data  to a search system to enable the search system to apply a search algorithm on the representation of the input sample set of data, wherein the search system is configured to provide, by the search system and based on the representation, a datapoint from the input sample set of data relating to audio selection or user interface design.
However, Kim—directed to analogous art—teaches
a rules database configured to store rules defining similarities … of the sample sets of data; (Page 10896 states "To explain the Lexicalized PCFGs, we first introduce the context-free grammar (CFG). CFG is a list of rules that define well-structured sentences in a language." Page 10896 further states "Lexicalized PCFGs (L-PCFGs) extends PCFGs by incorporating lexical information to further disambiguate the parsing decisions." Page 10897 states "Lexical information serves as the additional criteria to produce parse trees that are valid in the corresponding grammar rules." As the rules provide a way to produce similar trees, it defines similarities. The broadest reasonable interpretation of database is anywhere data is stored. Therefore, the rules must be stored in order to be used, and that storage is interpreted as the rules database.)
obtain, from the rules database, a set of rules defining similarities in an embedding of the input sample set of data (x), and (Page 10896 states "To explain the Lexicalized PCFGs, we first introduce the context-free grammar (CFG). CFG is a list of rules that define well-structured sentences in a language." Page 10896 further states "Lexicalized PCFGs (L-PCFGs) extends PCFGs by incorporating lexical information to further disambiguate the parsing decisions." Page 10897 states "Lexical information serves as the additional criteria to produce parse trees that are valid in the corresponding grammar rules." As the rules provide a way to produce similar trees, it defines similarities. As the rules are used to parse trees, they must have been received from the rules database.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He with the augmentation method taught by Kim because, as Kim states on page 10894, "Our approach aims to reach theoretical guarantees of increasing both the amount and the diversity of a given dataset in a pretty label-preserving manner. As such, ALP is designed to produce augmented samples with diverse sentence structures, each still respecting the linguistic rules and preserving the corresponding class label."
The combination of He and Kim does not appear to explicitly teach
wherein the input space relates to a structure of context-free grammar
wherein the representation has a lower dimensionality than the input space;
a network access device configured to supply the representation of the input sample set of data  to a search system to enable the search system to apply a search algorithm on the representation of the input sample set of data, wherein the search system is configured to provide, by the search system and based on the representation, a datapoint from the input sample set of data relating to audio selection or user interface design.
However, Kusner—directed to analogous art—teaches
a network access device configured to supply the representation of the input sample set of data  to a search system to enable the search system to apply a search algorithm on the representation of the input sample set of data, wherein the search system is configured to provide, by the search system and based on the representation, a datapoint from the input sample set of data relating to audio selection or user interface design. (Page 6 states "After training the GVAE, we obtain a latent feature vector for each sequence in the training data, given by the mean of the variational encoding distributions. We use these vectors and their corresponding property estimates to train a sparse Gaussian process (SGP) model with 500 inducing points (Snelson & Ghahramani, 2005), which is used to make predictions for the properties of new points in latent space. After training the SGP, we then perform 5 iterations of batch Bayesian optimization using the expected improvement (EI) heuristic (Jones et al., 1998)." The SGP and the Bayesian optimization is interpreted as the search system. The prediction for the properties of new points in latent space are interpreted as the provided datapoint from the representation of the input sample set of data. As one of ordinary skill in the art would understand, a computer is used to carry out this method. A modern computer has the ability to access networks. Therefore, the computer is a network access device. Additionally, the place in memory that the representation is stored must supply the representation to the processor executing the method, in order for the method to apply the search algorithm.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He and Kim with the teachings of Kusner because, as Kusner states on page 2, "We show not only does our model produce a higher proportion of valid outputs than a character based autoencoder, it also produces smoother latent representations. We also show that this learned latent space is effective for searching for arithmetic expressions that fit data, for finding better drug-like molecules, and for making accurate predictions about target properties"
The combination of He, Kim, and Kusner does not appear to explicitly teach
[providing, by the search system and based on the representation, a datapoint from the input sample set of data] relating to audio selection or user interface design
However, Zhou—directed to analogous art—teaches
[providing, by the search system and based on the representation, a datapoint from the input sample set of data] relating to audio selection or user interface design (Page 2 states "Our system iteratively asks the user to compare multiple melody candidates sampled from a one-dimensional subspace (Koyama et al., 2017) and to select the preferred one." Page 4 states "The searching mode is designed to iteratively show multiple candidates generated by BO based on the user's preference. In each iteration, it shows n candidates (n = 4 in our case), from which it allows the user to select the one they prefer." Therefore, the candidates, interpreted as data points, relate to audio selection.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He, Kim, and Kusner with the teachings of Zhou because, as stated by Zhou on page 1, "Deep generative models, such as generative adversarial networks (GANs) (Goodfellow et al., 2014) and variational auto-encoders (VAEs) (Kingma & Welling, 2014), have been applied to support creative work, showing great potentials in images, videos, and music (Vondrick et al., 2016; Zhu et al., 2016; Dong et al., 2018). When we focus our scope on the music domain, many research projects have tried to learn latent spaces of melodies (Yang et al., 2017; Fernandez & Vico, 2013; Sim~oes et al., 2019; Herremans et al., 2017). By sampling latent vectors from the smooth latent space produced by one of these generative melody models, even novice composers can quickly generate various meaningful melodies (Ghosh et al., 2019; Roberts, Engel, et al., 2018)."

Regarding claim 7, the rejection of claim 6 is incorporated herein. The combination of He and Kim does not appear to explicitly teach
	wherein the search system configured to select from the representation of the input sample set of data a datapoint having a score that is higher than a score associated with other datapoints in the representation of the input sample set of data, using the search algorithm. 
However, Kusner—directed to analogous art—teaches
wherein the search system configured to select from the representation of the input sample set of data a datapoint having a score that is higher than a score associated with other datapoints in the representation of the input sample set of data, using the search algorithm. (Page 6 states "After training the GVAE, we obtain a latent feature vector for each sequence in the training data, given by the mean of the variational encoding distributions. We use these vectors and their corresponding property estimates to train a sparse Gaussian process (SGP) model with 500 inducing points (Snelson & Ghahramani, 2005), which is used to make predictions for the properties of new points in latent space. On each iteration, we select a batch of 50 latent vectors by sequentially maximizing the EI acquisition function." Therefore, as the EI acquisition function is evaluated using the representation from the encoder, and the maximum of the evaluation is taken, a data point having a score that is high than a score associated with other datapoints in the representation of the data is selected.)	
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He, Kusner, and Kim for the reasons given above in regards to claim 6.
	Regarding claim 10, the rejection of claim 7 is incorporated herein. The combination of He and Kim does not appear to explicitly teach
wherein the search algorithm is a Bayesian optimization algorithm. 
However, Kusner—directed to analogous art—teaches
wherein the search algorithm is a Bayesian optimization algorithm. (Page 6 states "After training the GVAE, we obtain a latent feature vector for each sequence in the training data, given by the mean of the variational encoding distributions. We use these vectors and their corresponding property estimates to train a sparse Gaussian process (SGP) model with 500 inducing points (Snelson & Ghahramani, 2005), which is used to make predictions for the properties of new points in latent space. After training the SGP, we then perform 5 iterations of batch Bayesian optimization using the expected improvement (EI) heuristic (Jones et al., 1998)." The SGP and the Bayesian optimization is interpreted as the search system. The prediction for the properties of new points in latent space are interpreted as the provided datapoint from the representation of the input sample set of data.) 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He, Kusner, and Kim for the reasons given above in regards to claim 6.

	Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over He (“SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding”, September 14, 2022), Kim (“ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification”, Februrary 2022), Kusner (“Grammar Variational Autoencoder”, 2017), and Zhou (“Generative Melody Composition with Human-in-the-Loop Bayesian Optimization”, October 7, 2020) as applied to claim 6 above, and further in view of Vaswani (“Attention Is All You Need”, 2017).
	
	Regarding claim 8, the rejection of claim 6 is incorporated herein. He teaches
	generate, for every sample of the input sample set of data (x), a tree representation (xtree), thereby creating a set of tree representations, (Page 4 states "Therefore, in order to utilize all possible data in AnPreDial to pre-train our model, we adopt a unified schema called semantic tree structure (STS) that is suitable for every TOD sample." Page 4 states "We aim to leverage semi-supervised pre-training to learn better pre-trained representations from both the labeled and unlabeled data. Concretely, we adopt a tree-structured supervised contrastive objective on the labeled dataset AnPreDial, while a tree-structured self-supervised contrastive objective on the unlabeled dataset UnPreDial." Therefore, for every sample (labeled and unlabeled), an STS, which is a tree representation, is generated. As the dataset is then used to train the model, a dataset of tree representations is created.)
	 [generate a list of similar trees] (Page 5 states "For unlabeled data, since there are no available labels, we adopt a tree-structured self-supervised contrastive objective in a similar way in Gao et al. (2021), where only the augmented data by dropout is deemed as a positive sample." The augmented data by dropout is interpreted as the set of similar trees, and as the contrastive objective uses trees, the augmented data must be in the form of trees.)
generate a set of similar trees based on the list of similar trees; and (The broadest reasonable interpretation of set includes any form of dataset, which is what is created when the augmented data is added.)
the machine learning kernel configured to apply the contrastive learning algorithm to the set of similar trees to obtain a deep neural network; and  (Page 5 states "For unlabeled data, since there are no available labels, we adopt a tree-structured self-supervised contrastive objective in a similar way in Gao et al. (2021), where only the augmented data by dropout is deemed as a positive sample." As the augmented data, interpreted as the set of similar trees, is used for the contrastive objective, which one of ordinary skill in the art would realize is represented as a loss function in this case, the deep neural network is obtained using the contrastive learning algorithm. One of ordinary skill would realize that the loss is used to train the neural network, which is a bidirectional Transformer architecture, which is a deep neural network.)
a mapper configured to map, using the deep neural network, each tree in the list of similar trees to a corresponding vector representation having a predetermined length. (Page 5 states "Given the dialog context, our model output a pooled representation at the position as the [CLS] sentence embedding                 
                    z
                
             of the whole context. Thus for any sample pair,                 
                    i
                    ,
                     
                    j
                
             the output sentence embeddings are denoted as                 
                    
                        
                            z
                        
                        
                            i
                        
                    
                    ,
                     
                    
                        
                            z
                        
                        
                            j
                        
                    
                
            ." As the sentences are represented as trees, the deep neural network model outputs a vector representation (embedding) of each tree, including the similar trees. The part of the code configured to perform this action is interpreted as the mapper.)
He does not appear to explicitly teach
a subtree replacement lookup dictionary database configured to store one or more subtree replacement lookup dictionaries;
a tree representation generator configured to: 
obtain a subtree replacement lookup dictionary (L) from the subtree replacement lookup dictionary database and 
for each tree in the set of tree representations, generate a list of trees that are similar, thereby generating a list of similar trees, and 
However, Kim—directed to analogous art—teaches
a subtree replacement lookup dictionary database configured to store one or more subtree replacement lookup dictionaries; (Page 10896, Fig. 2 states, in reference to “Stage 3. Extract Subtrees with Lexical Heads”, "A collection of subtrees with lexical heads of the input sentence. The lexical heads are positions that can be syntactically augmented." This collection of subtrees is interpreted as the subtree replacement lookup dictionary. The broadest reasonable interpretation of database is anywhere data is stored. Therefore, the place in memory where the dictionary is stored is interpreted as the database.)
a tree representation generator configured to: (The section of code configured to perform the following actions is interpreted as the tree representation generator.)
obtain a subtree replacement lookup dictionary (L) from the subtree replacement lookup dictionary database and (As the dictionary is used for generating the similar trees, the dictionary is obtained from the database.)
for each tree in the set of tree representations, generate a list of trees that are similar, thereby generating a list of similar trees, and (Page 10897 states "Stage 3. Extract subtrees with lexical heads After collecting all the plausible tree rules to use, we extract subtrees using lexical heads as the position information to swap. Figure 2 shows an example of VP as the lexical head. ALP swaps sub-subtrees with other types of lexical heads such as NP or PP within the subtrees if available." As can be seen in Fig. 2, Stage 4, the dictionary is used to replace subtrees, and as they have “similar lexical heads”, they are similar trees. Page 10898 states "Stages 1–2. Parse with probabilistic threshold to select more trees We first extract all the valid parse trees using probabilistic threshold τ, instead of picking a single tree with the maximum probability." Therefore, this is the set of tree representations. Page 10897 states "We use all the plausible trees generated from sentences in the same class if they are available." Therefore, a list of similar trees for each tree in the set of tree representations is created.) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He and Kim for the reasons given above in regards to claim 6.
The combination of He, Kim, Kusner, and Zhou does not appear to explicitly teach
[a vector representation] having a predetermined length. 
However, Vaswani—directed to analogous art—teaches
 [a vector representation] having a predetermined length. (Page 3 states "To facilitate these residual connections, all sub-layers in the model, as well as the embedding layers, produce outputs of dimension                 
                    
                        
                            d
                        
                        
                            m
                            o
                            d
                            e
                            l
                        
                    
                    =
                    512
                
            ." Therefore, the output of the embedding layer, which one of ordinary skill in the art would recognize is a vector representation, has the predetermined length of 512. Additionally, page 5 states "Similarly to other sequence transduction models, we use learned embeddings to convert the input tokens and output tokens to vectors of dimension                 
                    
                        
                            d
                        
                        
                            m
                            o
                            d
                            e
                            l
                        
                    
                
            .")
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of He, Kim, Kusner, and Zhou with the dimension taught by Vaswani because, as stated by Vaswani on page 2, "The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs."

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/J.T.P./Examiner, Art Unit 2121  


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
CONTRASTIVE EMBEDDING OF STRUCTURED SPACE FOR BAYESIAN OPTIMIZATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

CONTRASTIVE EMBEDDING OF STRUCTURED SPACE FOR BAYESIAN OPTIMIZATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email