Office Action Analysis: 18193367 — SYSTEMS AND METHODS FOR INCORPORATING SUPPLEMENTAL SHAPE INFORMATION IN A LINEAR DISCRIMINANT ANALYSIS

Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
According to the first part of the analysis, in the instant case, claims 1-12 are directed to a method, claims 13-19 are directed to a machine. Each of these claims fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). Claim 20 is directed to a computer program product, and is therefore rejected under 35 USC § 101.

Regarding claim 9;
Step 2A Prong One
The method of claim 5, wherein optimizing each knot value vector comprises utilizing an Adaptive Moment Estimation (Adam) stochastic gradient descent algorithm.
(This step for using ADAM stochastic gradient descent (SGD) is understood to be a recitation of a mathematical concept, which is an abstract idea)
Step 2A Prong Two
The claim does not include additional elements, when considered separately and in combination, that integrate the judicial exception into a practical application.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mathematical concepts such as using ADAM SGD to optimize knot value vectors without any technological improvement or inventive step. 


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 2, 4, 11-14, 16, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Maya Gupta et al. (hereinafter Gupta) (“Deep Lattice Networks and Partial Monotonic Functions”, 09/19/2017) in view of Choi Yong Jun et al. (hereinafter Choi) (KR 20220109258 A, 08/04/2022).

Regarding claim 1, Gupta teaches a lattice model;
([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints)
NOTE: Teaches a machine learning model that integrates an ensemble of lattices and calibrators (which are 1D lattices).
selecting
([pg.3 section 2] We also experimented with constraining all calibrators to be monotonic (even for non-monotonic inputs) for more stable/regularized training.)
NOTE: This excerpt discloses that every input (and therefore each feature) can be constrained to be monotonic {every input will pass through the [shape constrained] calibrators, as shown in the figure below}. A set can be as few as one element, and this selected set of shape constraints contains one element (monotonic shape constraint) which is selected to be the shape constraint for each feature in the training dataset.

    PNG
    media_image1.png
    383
    778
    media_image1.png
    Greyscale

…and the selected set of shape constraints, wherein:
([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network.)
NOTE: Gupta teaches training (training the resulting network) the model with the selected set of shape constraints (constraints for monotonicity).
the Lattice
([pg.3 Section 2] Here each ct;d is a 1D lattice with K key-value pairs (a 2 RK; b 2 RK), and the function for each input is linearly interpolated between the two b values corresponding to the input’s surrounding a values. An example is shown on the left in Fig. 1. Each 1D calibration function is equivalent to a sum of weighted-and-shifted Rectified Linear Units (ReLU), that is, a calibrator function c(x[d]; a; b) can be equivalently expressed as)

    PNG
    media_image2.png
    195
    552
    media_image2.png
    Greyscale

NOTE: Each 1D lattice (calibration function or calibrator) is an additive form of a plurality of nonlinear functions (sum of RELU, which is a non-linear function)
([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network.)
NOTE: Teaches generating a model based on the lattices (layers of the model are ensembles of lattices and calibrators). Gupta therefore teaches the Lattice model being generated based on an additive form of a plurality of nonlinear functions. 
Shape restricted model predictions
([pg.1 introduction] For example, if one is predicting whether to give someone else a loan, we expect and would like to constrain the prediction to be monotonically increasing with respect to the applicant’s income, if all other features are unchanged. Imposing monotonicity acts as a regularizer, improves generalization to test data, and makes the end-to-end model more interpretable, debuggable, and trustworthy.)
NOTE: The disclosure of Gupta pertains to applying shape restrictions (monotonicity) to model predictions. 

Gupta does not teach but Choi teaches; 
A method for training a 
([pg.16 5th paragraph] As another example, the processor 130 may perform classification on data based on linear discriminant analysis (LDA) in which data classification is performed by generating a decision boundary based on data distribution.)
NOTE: This excerpt discloses that the classifier in the disclosure of Choi can be an LDA model.
([pg.5 3rd paragraph] The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: This excerpt teaches training the LDA model.
receiving, by communications hardware, a training dataset comprising one or more features;
([pg.5 3rd paragraph] The computing device 100 of the present disclosure may receive health checkup information or DNA data from the external server 20, and build a learning data set based on the information. The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: The computing device receives (which is considered a communication) a training dataset comprising one or more features (health checkup information or DNA data).
([pg.22 1st paragraph] A method of generating a learning model for classification and reading of DNA data performed on one or more processors of a computing device.)
NOTE: The computing device contains processors and is therefore hardware. Therefore, Choi teaches communications hardware receiving a training dataset comprising one or more features (health checkup information or DNA data).
training circuitry, 
([pg.3 4th paragraph] As used herein, the term “unit” or “module” refers to a hardware component such as software, FPGA, or ASIC, and “unit” or “module” performs certain roles.)
NOTE: The term ‘unit’ can refer to an ASIC, which is circuitry.
([pg.8 4th paragraph] According to an embodiment of the present disclosure, the computing device 100 may include the user terminal 10 and the network unit 110 for transmitting and receiving data to and from the external server 20 .)
NOTE: The computing device (on which the model is implemented and trained on) contains the network unit (‘unit’ indicates circuitry) and therefore is also considered training circuitry. 
and training, by the training circuitry, the Lattice-LDA using the training dataset… 
([pg.5 3rd paragraph] The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: Teaches the training circuitry (computing device) training the LDA model using the training data set.
and training the Lattice-LDA generates a 
([pg.16 5th paragraph] As another example, the processor 130 may perform classification on data based on linear discriminant analysis (LDA) in which data classification is performed by generating a decision boundary based on data distribution. The processor 130 identifies an axis capable of maximally separating the target value classes of the data embedded in the vector space, making the variance between each class as large as possible, and making the variance within each class as small as possible, so that chromosome 13, chromosome 18, each of chromosome 21 can be clearly distinguished, and variance within each class can be minimized.)
NOTE: Teaches classification of the data by generating a hyperplane (decision boundary) for separating classes.
([pg.17 3rd paragraph] As a specific example, the classification model may classify the first DNA data as at least one of normal or abnormal (ie, first classification) when the first DNA data of the first user (ie, mother) is input.)
NOTE: Teaches separating a first class of data points from a second class of data points (normal or abnormal DNA data).
([pg.5 3rd paragraph] The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: Teaches classification on the training dataset. Therefore, teaches training the LDA model generating a hyperplane (decision boundary) that defines a decision boundary separating a first class of data points in the training dataset from a second class of data points in the training dataset (normal or abnormal classes).

OBVIOUSNESS TO COMBINE GUPTA AND CHOI:
Gupta and Choi are both analogous art to each other and to the present disclosure as the both relate to methods of machine learning and classification. Gupta pertains to a machine learning model utilizing a plurality of lattices, while Choi pertains to a machine learning model utilizing LDA. 

Gupta also states:
([pg.1 section 1] Lattices have been shown to be an efficient nonlinear function class that can be constrained to be monotonic by adding appropriate sparse linear inequalities on the parameters [1], and can be trained in a standard empirical risk minimization framework [2, 1]. Recent work showed lattices could be jointly trained as an ensemble to learn flexible monotonic functions for an arbitrary number of inputs [3].)
NOTE: The lattices have been shown to be an efficient nonlinear function that can learn flexible shape restraints (flexible monotonic functions) for inputs. 
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to train a lattice-LDA model (training an LDA model as taught by Choi utilizing the lattice methods of the model presented in the disclosure of Gupta) to allow the model to efficiently represent nonlinear data distributions and apply shape constraints. 

	Additionally:
There is no explicit recitation of selecting the set of shape constraints via the training circuitry in Gupta, however, the training circuitry (computing device) of the disclosure of Choi has been determined to have processing capabilities,
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the processing capabilities of the training circuitry (computing device) taught by Choi to act as a medium to select the set of shape constraints for each feature in the training dataset (as taught by Gupta).

Additionally:
Gupta further states;
([pg.1 introduction] For example, if one is predicting whether to give someone else a loan, we expect and would like to constrain the prediction to be monotonically increasing with respect to the applicant’s income, if all other features are unchanged. Imposing monotonicity acts as a regularizer, improves generalization to test data, and makes the end-to-end model more interpretable, debuggable, and trustworthy.)
NOTE: Imposing shape restraints (monotonicity) to the model acts as a regularizer, improves generalization, improves interpretability, debuggability, and trustworthiness
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to additionally integrate the selected set of shape constraints from Gupta into the training process and the generated hyperplane as described by Choi, to act as a regularizer, improve generalization, improve interpretability, debuggability, and trustworthiness for the model. 


Regarding claim 2, Gupta teaches;
The method of claim 1, wherein selecting the set of shape constraints comprises: receiving
([pg.6 section 6] We present results on the same benchmark dataset (Adult) with the same monotonic features as in Canini et al. [3], and for three problems from a large internet services company where the monotonicity constraints were specified by product groups.)
NOTE: Discloses receiving user input comprising a shape constraint selection (monotonicity constraints specified by product groups from the company) for a feature (monotonic features) in the training dataset (benchmark dataset) {the associated table 2 details that the benchmark dataset includes training data}.
    PNG
    media_image3.png
    189
    770
    media_image3.png
    Greyscale

And selecting the set of (a set can be as few as one element, and a set of one shape constraint is selected here, being monotonicity) shape constraints based on the received user input (product groups from the company). 
Gupta fails to teach but Choi teaches [in claim 1]; 
Communications hardware
([pg.5 3rd paragraph] The computing device 100 of the present disclosure may receive health checkup information or DNA data from the external server 20, and build a learning data set based on the information. The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: The computing device is capable of receiving (which is considered communication) information (health checkup information or DNA data).
([pg.22 1st paragraph] A method of generating a learning model for classification and reading of DNA data performed on one or more processors of a computing device.)
NOTE: The computing device contains processors and is therefore hardware. Therefore, Choi teaches communications hardware capable of receiving data.
Training circuitry
([pg.3 4th paragraph] As used herein, the term “unit” or “module” refers to a hardware component such as software, FPGA, or ASIC, and “unit” or “module” performs certain roles.)
NOTE: The term ‘unit’ can refer to an ASIC, which is circuitry.
([pg.8 4th paragraph] According to an embodiment of the present disclosure, the computing device 100 may include the user terminal 10 and the network unit 110 for transmitting and receiving data to and from the external server 20 .)
NOTE: The computing device (on which the model is implemented and trained on) contains the network unit (‘unit’ indicates circuitry) and therefore is also considered training circuitry. The network is also capable of transmitting and receiving data, and is therefore capable of making selections.
OBVIOUSNESS:
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to perform the receiving of the shape constraints selection by the communications hardware and to select the user inputted shape constraints by the training circuitry, to provide a physical interface for the user to communicate with the system to select the specified shape constraints. 


Regarding claim 4, Gupta teaches;
	The method of claim 1, wherein training the Lattice-LDA [as taught in claim 1] includes: generating, 
	([pg.1 section 1] Calibrators are one-dimensional lattices, which nonlinearly transform a single input [1])
	NOTE: Calibrators are 1D lattices
	([pg.3 section 2] For monotonic inputs, we can constrain the calibrator functions [note: calibrators] to be monotonic by constraining the calibrator parameters b 2 [0; 1]K to be monotonic, by adding the linear inequality constraints)
	NOTE: Discloses generating one or more lattices (calibrators) based on the selected set of shape constraints (constraining the calibrator parameters to be monotonic). 
wherein each lattice corresponds to one or more of the one or more features in the training dataset,
Figure 1

    PNG
    media_image4.png
    382
    760
    media_image4.png
    Greyscale

	NOTE: As shown in fig. 1 above, each input (and thus all of the input features) must pass through a lattice (calibrator). Therefore, each lattice corresponds to one or more of the features in the training dataset (training dataset for the model taught in claim 1).
	wherein the Lattice-LDA comprises the generated lattices;
	(Gupta [pg.3 section 2] When creating the DLN, if the t + 1th layer is an ensemble of lattices, we randomly permute the outputs of the previous layer to be assigned to the Gt+1St+1 inputs of the ensemble.)
	NOTE: The model in the disclosure of Gupta comprises the generated lattices. It therefore would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, for the lattice-LDA to comprise the generated lattices (using the Lattice-LDA teaching and motivation by Gupta in view of Choi in claim 1).
	and combining
	([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network. We implement the layers and projections with new computational graph nodes in TensorFlow and use the ADAM optimizer and batched stochastic gradients. Experiments on benchmark and real-world datasets show that six-layer monotonic deep lattice networks achieve state-of-the art performance for classification and regression with monotonicity guarantees.)
	NOTE: Gupta discloses that combining the generated lattices in the model of their disclosure, and further states that this process has been shown to perform very well for classification with flexible shape constraints (monotonicity).
	Gupta fails to teach generating the lattices using the training circuitry or using the generated lattices to generate a shape restricted hyperplane, however;
Choi teaches using the training circuitry to generate the LDA model in claim 1, and would therefore also be capable of generating lattices associated with the model and combining them to the model.
	OBVIOUSNESS;
[Abstract, see above] Gupta discloses combining the generated lattices with the model of their disclosure, and further states that this process has been shown to perform very well for classification with flexible shape constraints (monotonicity).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to generate the lattices using the training circuitry of Choi and to use the generated lattices to generate the shape restricted hyperplane (taught in claim 1) to achieve state of the art performance for the model.


Regarding claim 11, Gupta fails to teach but Choi teaches;
The method of claim 1. further comprising outputting, by the communications hardware, the trained Lattice-LDA [as taught in claim 1].

    PNG
    media_image5.png
    500
    446
    media_image5.png
    Greyscale

	([pg.9 7th paragraph] The processor 130 may read a computer program stored in the memory 120 to provide a classification model according to an embodiment of the present disclosure. According to an embodiment of the present disclosure, the processor 130 may perform a calculation for classifying DNA data. According to an embodiment of the present disclosure, the processor 130 may perform calculation for training the classification model.)
	NOTE: The processor is coupled to the communications hardware (computing device) as shown in fig. 2 above. Therefore, this teaches the communications hardware outputting (provide a classification model according the present disclosure) the trained model.


Regarding claim 12, Gupta fails to teach but Choi teaches;
The method of claim 1, further comprising: receiving, by the communications hardware, a target data point;
([pg.8 4th paragraph] According to an embodiment of the present disclosure, the computing device 100 may include the user terminal 10 and the network unit 110 for transmitting and receiving data to and from the external server 20. The network unit 110 may transmit/receive data for performing a method for generating a classification model for performing classification on DNA data according to an embodiment of the present disclosure to other computing devices, servers, and the like. That is, the network unit 110 may provide a communication function between the computing device 100 , the user terminal 10 , and the external server 20 . For example, the network unit 110 may receive electronic health records or DNA test records for a plurality of users from a hospital server. Additionally, the network unit 110 may allow information transfer between the computing device 100 and the user terminal 10 and the external server 20 by calling a procedure to the computing device 100.)
	NOTE: The communications hardware (computing device as taught in claim 1) receives a target data point (DNA data).
classifying, by a classifier circuitry and by using the trained Lattice-LDA, the target data point as a first classification or a second classification;
	([pg.17 3rd paragraph] As a specific example, the classification model may classify the first DNA data as at least one of normal or abnormal (ie, first classification) when the first DNA data of the first user (ie, mother) is input. have. As a result of the classification, if it is classified as normal, it may mean that there is a very high probability that a genetic abnormality will occur in the fetus. In other words, by classifying the user's DNA data as at least one of normal and abnormal, the classification model may provide analysis information regarding whether the fetus is deformed.)	
NOTE: Discloses classifying the target data point (classify the first DNA data) as a first or second classification (normal or abnormal). The classifier is implemented on the computing device which includes circuitry (as taught in claim 1), and is therefore considered classifier circuitry. Therefore, Choi teaches classifying, by a classifier circuitry (computing device) and by using the trained Lattice-LDA (as taught in claim 1), the target data point (DNA data) as a first classification or a second classification (normal or abnormal class).
and outputting, by the communications hardware, an indication of whether the target data point corresponds to the first classification or the second classification.
Figure 2


    PNG
    media_image5.png
    500
    446
    media_image5.png
    Greyscale

([pg.8] For example, the memory 120 stores input/output data (eg, user's DNA data, classification result information corresponding to the DNA data (eg, information on whether a fetus is genetically deformed or not, or related to a malformation).)
NOTE: Teaches outputting, by the communication hardware (the memory is part of the computing device as shown by fig. 2, and the computing device is considered communications hardware as taught in claim 1) an indication of whether the target data point corresponds to the first classification or the second classification (classification result information indicates whether the DNA is in the normal or abnormal class).

Regarding claim 13, Gupta teaches a lattice model;
([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints)
NOTE: Teaches a machine learning model that integrates an ensemble of lattices and calibrators (which are 1D lattices).
select a set of shape constraints, the set of shape constraints including a shape constraint for each feature in the training dataset, 
([pg.3 section 2] We also experimented with constraining all calibrators to be monotonic (even for non-monotonic inputs) for more stable/regularized training.)
NOTE: This excerpt discloses that every input (and therefore each feature) can be constrained to be monotonic {every input will pass through the [shape constrained] calibrators, as shown in the figure below}. A set can be as few as 1 element, so the selected set of shape constraints is the monotonic shape constraint which is selected to be the shape constraint for each feature in the training dataset.

    PNG
    media_image1.png
    383
    778
    media_image1.png
    Greyscale

…and the selected set of shape constraints, wherein: 
([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network.)
NOTE: Gupta teaches training (training the resulting network) the model with the selected set of shape constraints (constraints for monotonicity).
the Lattice-
([pg.3 Section 2] Here each ct;d is a 1D lattice with K key-value pairs (a 2 RK; b 2 RK), and the function for each input is linearly interpolated between the two b values corresponding to the input’s surrounding a values. An example is shown on the left in Fig. 1. Each 1D calibration function is equivalent to a sum of weighted-and-shifted Rectified Linear Units (ReLU), that is, a calibrator function c(x[d]; a; b) can be equivalently expressed as)
NOTE: Each 1D lattice (calibration function or calibrator) is an additive form of a plurality of nonlinear functions (sum of RELU, which is a non-linear function)
([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network.)
NOTE: Teaches generating a model based on the lattices (layers of the model are ensembles of lattices and calibrators). Gupta therefore teaches the model being generated based on an additive form of a plurality of nonlinear functions. 
Shape restricted model predictions
([pg.1 introduction] For example, if one is predicting whether to give someone else a loan, we expect and would like to constrain the prediction to be monotonically increasing with respect to the applicant’s income, if all other features are unchanged. Imposing monotonicity acts as a regularizer, improves generalization to test data, and makes the end-to-end model more interpretable, debuggable, and trustworthy.)
NOTE: The disclosure of Gupta pertains to applying shape restrictions (monotonicity) to model predictions. This excerpt states the benefits of applying shape restrictions to the model predictions, including regularizing the model, improving generalization, and improving interpretability, debuggability, and trustworthiness. 
Gupta fails to teach but Choi teaches;
An apparatus for training a 
([pg.16 5th paragraph] As another example, the processor 130 may perform classification on data based on linear discriminant analysis (LDA) in which data classification is performed by generating a decision boundary based on data distribution.)
NOTE: This excerpt discloses that the classifier in the disclosure of Choi can be an LDA model.
([pg.22 1st paragraph] A method of generating a learning model for classification and reading of DNA data performed on one or more processors of a computing device.)
NOTE: The processor on which the model is implemented is hardware (processors) and is therefore considered an apparatus for training the model.
([pg.5 3rd paragraph] The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: This excerpt teaches training the model.
communications hardware configured to: receive a training dataset comprising one or more features; 
([pg.5 3rd paragraph] The computing device 100 of the present disclosure may receive health checkup information or DNA data from the external server 20, and build a learning data set based on the information. The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: The computing device receives (which is considered a communication) a training dataset comprising one or more features (health checkup information or DNA data).
([pg.22 1st paragraph] A method of generating a learning model for classification and reading of DNA data performed on one or more processors of a computing device.)
NOTE: The computing device contains processors and is therefore hardware. Therefore, Choi teaches communications hardware receiving a training dataset comprising one or more features (health checkup information or DNA data).
and training circuitry configured to:
([pg.3 4th paragraph] As used herein, the term “unit” or “module” refers to a hardware component such as software, FPGA, or ASIC, and “unit” or “module” performs certain roles.)
NOTE: The term ‘unit’ can refer to an ASIC, which is circuitry.
([pg.8 4th paragraph] According to an embodiment of the present disclosure, the computing device 100 may include the user terminal 10 and the network unit 110 for transmitting and receiving data to and from the external server 20 .)
NOTE: The computing device (on which the model is implemented and trained on) contains the network unit (‘unit’ indicates circuitry) and therefore is considered training circuitry. 
and train the 
([pg.5 3rd paragraph] The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: Teaches the training circuitry (computing device) training the model using the training data set.
and training the 
([pg.16 5th paragraph] As another example, the processor 130 may perform classification on data based on linear discriminant analysis (LDA) in which data classification is performed by generating a decision boundary based on data distribution. The processor 130 identifies an axis capable of maximally separating the target value classes of the data embedded in the vector space, making the variance between each class as large as possible, and making the variance within each class as small as possible, so that chromosome 13, chromosome 18, each of chromosome 21 can be clearly distinguished, and variance within each class can be minimized.)
NOTE: Teaches classification of the data by generating a hyperplane (decision boundary) for separating classes.
([pg.17 3rd paragraph] As a specific example, the classification model may classify the first DNA data as at least one of normal or abnormal (ie, first classification) when the first DNA data of the first user (ie, mother) is input.)
NOTE: Teaches separating a first class of data points from a second class of data points (normal or abnormal DNA data).
([pg.5 3rd paragraph] The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: Teaches classification on the training dataset. Therefore, teaches training the model generating a hyperplane (decision boundary) that defines a decision boundary separating a first class of data points in the training dataset from a second class of data points in the training dataset (normal or abnormal classes).

OBVIOUSNESS TO COMBINE GUPTA AND CHOI:
Gupta and Choi are both analogous art to each other and to the present disclosure as the both relate to methods of machine learning and classification. Gupta pertains to a machine learning model utilizing a plurality of lattices, while Choi pertains to a machine learning model utilizing LDA. 

Gupta also states:
([pg.1 section 1] Lattices have been shown to be an efficient nonlinear function class that can be constrained to be monotonic by adding appropriate sparse linear inequalities on the parameters [1], and can be trained in a standard empirical risk minimization framework [2, 1]. Recent work showed lattices could be jointly trained as an ensemble to learn flexible monotonic functions for an arbitrary number of inputs [3].)
NOTE: The lattices have been shown to be an efficient nonlinear function that can learn flexible shape restraints (flexible monotonic functions) for inputs. 
From this;
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to train a lattice-LDA model (LDA model as taught by Choi utilizing the lattice methods of the model presented in the disclosure of Gupta) to allow the model to efficiently represent nonlinear data distributions and apply shape constraints. 
Additionally, there is no explicit recitation of selecting the set of shape constraints via the training circuitry in Gupta, however, the training circuitry (computing device) of the disclosure of Choi has been determined to have processing capabilities,
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the processing capabilities of the training circuitry (computing device) taught by Choi as a medium to select the set of shape constraints for each feature in the training dataset (as taught by Gupta).
Additionally, Gupta further states;
([pg.1 introduction] For example, if one is predicting whether to give someone else a loan, we expect and would like to constrain the prediction to be monotonically increasing with respect to the applicant’s income, if all other features are unchanged. Imposing monotonicity acts as a regularizer, improves generalization to test data, and makes the end-to-end model more interpretable, debuggable, and trustworthy.)
NOTE: Imposing shape restraints (monotonicity) to the model acts as a regularizer, improves generalization, improves interpretability, debuggability, and trustworthiness
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to integrate the selected set of shape constraints into the described training process for the Lattice-LDA, and to the generated hyperplane, to act as a regularizer, improve generalization, improve interpretability, debuggability, and trustworthiness for the model. 


Claim 14 is an apparatus claim directly corresponding to claim 2, and is therefore rejected using the same reasoning as the 103 rejection for claim 2 taught by Gupta in view of Choi.


Claim 16 is an apparatus claim directly corresponding to claim 4, and is therefore rejected using the same reasoning as the 103 rejection for claim 4 taught by Gupta in view of Choi.


Regarding claim 20, Gupta teaches;
a lattice model;
([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints)
NOTE: Teaches a machine learning model that integrates an ensemble of lattices and calibrators (which are 1D lattices).
select a set of shape constraints, the set of shape constraints including a shape constraint for each feature in the training dataset; 
([pg.3 section 2] We also experimented with constraining all calibrators to be monotonic (even for non-monotonic inputs) for more stable/regularized training.)
NOTE: This excerpt discloses that every input (and therefore each feature) can be constrained to be monotonic {every input will pass through the [shape constrained] calibrators, as shown in the figure below}. A set can be as few as 1 element, so the selected set of shape constraints is the monotonic shape constraint which is selected to be the shape constraint for each feature in the training dataset.

    PNG
    media_image1.png
    383
    778
    media_image1.png
    Greyscale

and the selected set of shape constraints, wherein: 
([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network.)
NOTE: Gupta teaches training (training the resulting network) the model with the selected set of shape constraints (constraints for monotonicity).
the Lattice-
([pg.3 Section 2] Here each ct;d is a 1D lattice with K key-value pairs (a 2 RK; b 2 RK), and the function for each input is linearly interpolated between the two b values corresponding to the input’s surrounding a values. An example is shown on the left in Fig. 1. Each 1D calibration function is equivalent to a sum of weighted-and-shifted Rectified Linear Units (ReLU), that is, a calibrator function c(x[d]; a; b) can be equivalently expressed as)
NOTE: Each 1D lattice (calibration function or calibrator) is an additive form of a plurality of nonlinear functions (sum of RELU, which is a non-linear function)
([Abstract] We propose learning deep models that are monotonic with respect to a user specified set of inputs by alternating layers of linear embeddings, ensembles of lattices, and calibrators (piecewise linear functions), with appropriate constraints for monotonicity, and jointly training the resulting network.)
NOTE: Teaches generating a model based on the lattices (layers of the model are ensembles of lattices and calibrators). Gupta therefore teaches the lattice model being generated based on an additive form of a plurality of nonlinear functions. 

Gupta fails to teach but Choi teaches;
A computer program product for training a 
([pg.9 1st paragraph] According to an embodiment of the present disclosure, the memory 120 is a flash memory type, a hard disk type, a multimedia card micro type, and a card type memory (eg, For example, SD or XD memory), Random Access Memory (RAM), Static Random Access Memory (SRAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Programmable Read (PROM) -Only Memory), a magnetic memory, a magnetic disk, and an optical disk may include at least one type of storage medium. The computing device 100 may operate in relation to a web storage that performs a storage function of the memory 120 on the Internet. The description of the above-described memory is only an example, and the present disclosure is not limited thereto.)
NOTE: Teaches that the memory is a tangible, physical storage medium and is therefore non-transitory. 
([pg.9 7th paragraph] The processor 130 may read a computer program stored in the memory 120 to provide a classification model according to an embodiment of the present disclosure. According to an embodiment of the present disclosure, the processor 130 may perform a calculation for classifying DNA data. According to an embodiment of the present disclosure, the processor 130 may perform calculation for training the classification model.)
NOTE: Teaches a computer program product comprising at least one non-transitory computer-readable storage medium storing software instructions (computer program stored in memory, which is non-transitory) that, when executed (the processor may read the computer program), cause an apparatus to perform the operations of the classification model presented in the disclosure of Choi.
([pg.16 5th paragraph] As another example, the processor 130 may perform classification on data based on linear discriminant analysis (LDA) in which data classification is performed by generating a decision boundary based on data distribution.)
NOTE: This excerpt discloses that the classifier in the disclosure of Choi can be an LDA model.
([pg.5 3rd paragraph] The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: This excerpt teaches training the LDA model.
receive a training dataset comprising one or more features; 
([pg.5 3rd paragraph] The computing device 100 of the present disclosure may receive health checkup information or DNA data from the external server 20, and build a learning data set based on the information. The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: The computing device receives a training dataset comprising one or more features.
and train the Lattice-LDA using the training dataset 
([pg.5 3rd paragraph] The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: Teaches training the model using the training data set.
and training the Lattice-LDA generates a 
([pg.16 5th paragraph] As another example, the processor 130 may perform classification on data based on linear discriminant analysis (LDA) in which data classification is performed by generating a decision boundary based on data distribution. The processor 130 identifies an axis capable of maximally separating the target value classes of the data embedded in the vector space, making the variance between each class as large as possible, and making the variance within each class as small as possible, so that chromosome 13, chromosome 18, each of chromosome 21 can be clearly distinguished, and variance within each class can be minimized.)
NOTE: Teaches classification of the data by generating a hyperplane (decision boundary) for separating classes.
([pg.17 3rd paragraph] As a specific example, the classification model may classify the first DNA data as at least one of normal or abnormal (ie, first classification) when the first DNA data of the first user (ie, mother) is input.)
NOTE: Teaches separating a first class of data points from a second class of data points (normal or abnormal DNA data).
([pg.5 3rd paragraph] The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: Teaches classification on the training dataset. Therefore, teaches training the model generating a hyperplane (decision boundary) that defines a decision boundary separating a first class of data points in the training dataset from a second class of data points in the training dataset (normal or abnormal classes).

OBVIOUSNESS TO COMBINE CHOI AND GUPTA:
Gupta and Choi are both analogous art to each other and to the present disclosure as the both relate to methods of machine learning and classification. Gupta pertains to a machine learning model utilizing a plurality of lattices, while Choi pertains to a machine learning model utilizing LDA. 

Gupta also states:
([pg.1 section 1] Lattices have been shown to be an efficient nonlinear function class that can be constrained to be monotonic by adding appropriate sparse linear inequalities on the parameters [1], and can be trained in a standard empirical risk minimization framework [2, 1]. Recent work showed lattices could be jointly trained as an ensemble to learn flexible monotonic functions for an arbitrary number of inputs [3].)
NOTE: The lattices have been shown to be an efficient nonlinear function that can learn flexible shape restraints (flexible monotonic functions) for inputs. 
From this;
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to train a lattice-LDA model (LDA model as taught by Choi utilizing the lattice methods of the model presented in the disclosure of Gupta) to allow the model to efficiently represent nonlinear data distributions and apply shape constraints. 
Additionally, Gupta further states;
([pg.1 introduction] For example, if one is predicting whether to give someone else a loan, we expect and would like to constrain the prediction to be monotonically increasing with respect to the applicant’s income, if all other features are unchanged. Imposing monotonicity acts as a regularizer, improves generalization to test data, and makes the end-to-end model more interpretable, debuggable, and trustworthy.)
NOTE: Imposing shape restraints (monotonicity) to the model acts as a regularizer, improves generalization, improves interpretability, debuggability, and trustworthiness.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to integrate the selected set of shape constraints into the described training process for the model, and into the generated hyperplane, to act as a regularizer, improve generalization, improve interpretability, debuggability, and trustworthiness for the model. 


Claim(s) 3, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gupta (“Deep Lattice Networks and Partial Monotonic Functions” , 09/19/2017) in view of Choi (KR 20220109258 A, 08/04/2022) further in view of Pya Natalya (Hereinafter Pya) (“Shape Constrained Additive Models”, 02/25/2014).

Regarding claim 3, Gupta and Choi teach;
The method of claim 1,
(Using the same reasoning as the 103 rejection for claim 1 by Gupta in view of Choi)
Gupta and Choi do not teach but Pya teaches;
wherein the set of shape constraints is selected from a set of candidate shape constraints, the set of candidate shape constraints comprising: a linear shape constraint, a monotone increasing shape constraint, a monotone decreasing shape constraint, a convex shape constraint, a concave shape constraint, and a combination of two or more of the above.
([pg.546 section 2.2.2] This simple monotonically increasing smooth can be extended to a variety of monotonic functions, including decreasing, convex/concave,  increasing/decreasing and concave, increasing/ decreasing and convex, the difference between alternative shape constraints being the form of the matrices Σ and D. Table 1 details eight possibilities, while  Supplementary material, S.2, provides the corresponding derivations.) 
NOTE: Pya details a set of monotonic candidate shape constraints including monotone increasing/decreasing, convex, concave, and a combination of two or more shape constraints (increasing/decreasing AND concave/convex).  

OBVIOUSNESS TO COMBINE PYA WITH GUPTA AND CHOI:
Pya is analogous art to the present disclosure, Gupta, and Choi as it pertains to methods of machine learning. The disclosure of Pya discloses an additive model utilizing shape constraints, while the disclosure of Gupta pertains to a shape constrained model using a plurality of lattices, and Choi pertains to an LDA-model.
Having a set of specific shape constraints to select from allows the model to better fit different distributions of data by simply selecting one of the predefined shape constraints. 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to select the set of shape constraints from the candidate set of shape constraints disclosed by Pya, to allow for a selection of more specific shape constraints to better fit different data distributions. 


Claim 15 is an apparatus claim directly corresponding to claim 3, and is therefore rejected using the same reasoning as the 103 rejection for claim 2 taught by Gupta in view of Choi further in view of Pya.


Claim(s) 5-10, 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gupta (“Deep Lattice Networks and Partial Monotonic Functions” , 09/19/2017) in view of Choi (KR 20220109258 A, 08/04/2022) further in view of Taylor Darwin Berkeley Berg-Kirkpatrick (hereinafter Taylor) (US 20170352344 A1, 12/07/2017) further in view of Andrew Cotter et al. (hereinafter Cotter) (“Monotonic Calibrated Interpolated Look-Up Tables”, 07/16/2016).
	
	Regarding claim 5, Gupta teaches;
	The method of claim 4, wherein generating the one or more lattices includes: 
	([pg.3 section 2] Here each ct;d is a 1D lattice with K key-value pairs (a 2 RK; b 2 RK), and the function for each input is linearly interpolated between the two b values corresponding to the input’s surrounding a values. An example is shown on the left in Fig. 1.)
	NOTE: Each lattice has been generated with a candidate knot set (‘k’ key value pairs, where the ‘a’ value is a knot [key values along a feature where the lattice stores values]). 
defining, by the training circuitry and based on the selected set of shape constraints, a constraint function for each lattice; 
	([pg.3 section 2] Each 1D calibration function is equivalent to a sum of weighted-and-shifted Rectified Linear Units (ReLU) ....  For monotonic inputs, we can constrain the calibrator functions to be monotonic by constraining the calibrator parameters b 2 [0; 1]K to be monotonic, by adding the linear inequality constraints)
	NOTE: Teaches defining a shape constraint function (linear inequality function for monotonicity) for each lattice (a calibrator function ct;d is a 1D lattice, taught in above limitation) based on the selected set of shape constraints (monotonacity, as taught in claim 1 by Gupta).


	Gupta fails to teach but Choi teaches;
	selecting, by the training circuitry
([pg.3 4th paragraph] As used herein, the term “unit” or “module” refers to a hardware component such as software, FPGA, or ASIC, and “unit” or “module” performs certain roles.)
NOTE: The term ‘unit’ can refer to an ASIC, which is circuitry.
([pg.8 4th paragraph] According to an embodiment of the present disclosure, the computing device 100 may include the user terminal 10 and the network unit 110 for transmitting and receiving data to and from the external server 20 .)
NOTE: The computing device (on which the model is implemented and trained on) contains the network unit (‘unit’ indicates circuitry) and therefore is also considered training circuitry. The network is also capable of transmitting and receiving data, and is therefore capable of making selections.


	Gupta and Choi fail to teach but Taylor teaches;
	wherein the candidate knot set is associated with a knot value vector;
	([0078] The full vector of all knot heights Ξ and the full set of segment scores Ψ can be parameterized jointly as a function of the full input sequence x: (Ξ, Ψ)=h(θ, x), where h is a non-linear function parameterized by θ that maps the input x to knot heights Ξ and segment scores Ψ.)
	NOTE: Teaches a knot value vector (a full vector of all knot heights). 

	OBVIOUSNESS TO COMBINE TAYLOR WITH CHOI AND GUPTA:
Taylor is analogous art to the present disclosure, Gupta, and Choi as it pertains to methods of implementing machine learning models. The disclosure of Taylor details a method of machine learning utilizing lattices. 
In [0078] above, Taylor further explains that storing the knots in a vector allows joint parameterization; i.e. instead of predicting each knot height separately, all knot heights are predicted together as a package. This allows the model to learn relationships between knots within the same vector and train the whole shape end to end. 
	Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to associate the candidate knot set of Gupta with the knot value vector of Taylor to allow the knots to be jointly parameterized and learned in relation to each other. 

	Gupta, Choi, and Taylor fail to teach but Cotter teaches;
	and optimizing, by the training circuitry, each knot value [knot value vector taught by Taylor above] within constraints of the defined constraint function for the corresponding lattice to obtain an optimized knot value vector.
	([pg.20 section 7.1] This joint estimation makes the objective non-convex, discussed further in Section 9.3. To simplify estimating the parameters, we treat the number of changepoints Cd for the dth feature as a hyperparameter, and fix the Cd changepoint locations (also called knots) at equally-spaced quantiles of the feature values. The changepoint values are then optimized jointly with the lattice parameters, detailed in Section 9.3.)
	NOTE: The knot values (changepoints) are optimized with the lattice parameters.
	([pg.12 section 4.1] These same pairwise linear inequality constraints can be imposed when learning the parameters theta to ensure a monotonic function is learned. The following result establishes these constraints are sufficient and necessary for a 2^D lattice to be monotonically increasing in the dth feature (the result extends trivially to larger lattices))
	NOTE: The knot values are optimized (when learning the parameters) by the training circuitry (using the teaching from claim 1) within constraints of the defined constraint function for the corresponding lattice (linear inequality constraints) to obtain an optimized knot value vector (taught by taylor). 

	OBVIOUSNESS TO COMBINE COTTER WITH GUPTA, CHOI, AND TAYLOR: 
Cotter is analogous art to the present disclosure, Gupta, Choi, and Taylor as it pertains to methods of Machine learning. Cotter pertains to machine learning methods that utilize lattices defined by knot values and shape constraints to better model data distributions.
In [pg.12 section 4.1] above, Cotter further explains that applying the selected shape constraints while learning/optimizing the lattice parameters (knot values) ensures a monotonic function is learned.
	Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to use the shape constrained optimization process for the lattice knots from the disclosure of Cotter to ensure a monotonic function is learned when optimizing each knot value vector.


	Regarding claim 6, Gupta fails to teach but Choi teaches;
	Training circuitry
([pg.3 4th paragraph] As used herein, the term “unit” or “module” refers to a hardware component such as software, FPGA, or ASIC, and “unit” or “module” performs certain roles.)
NOTE: The term ‘unit’ can refer to an ASIC, which is circuitry.
([pg.8 4th paragraph] According to an embodiment of the present disclosure, the computing device 100 may include the user terminal 10 and the network unit 110 for transmitting and receiving data to and from the external server 20 .)
NOTE: The computing device (on which the model is implemented and trained on) contains the network unit (‘unit’ indicates circuitry) and therefore is also considered training circuitry. The network is also capable of transmitting and receiving data, and is therefore capable of making selections or identifying data.
Gupta, Choi, and Taylor fail to teach but Cotter teaches;
	The method of claim 5, wherein selecting the candidate knot set includes: identifying, 
	([pg.20 section 7.1] This joint estimation makes the objective non-convex, discussed further in Section 9.3. To simplify estimating the parameters, we treat the number of changepoints Cd for the dth feature as a hyperparameter, and fix the Cd changepoint locations (also called knots) at equally-spaced quantiles of the feature values. The changepoint values are then optimized jointly with the lattice parameters, detailed in Section 9.3.)
	NOTE: The candidate knot set is selected based on an identified predefined number of quantiles (the number of quantiles is predefined to be the same as the number of knots, Cd). 


Regarding claim 7, Gupta fails to teach but Choi teaches;
Communications hardware
([pg.5 3rd paragraph] The computing device 100 of the present disclosure may receive health checkup information or DNA data from the external server 20, and build a learning data set based on the information. The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: The computing device is capable of receiving (which is considered communication) information (health checkup information or DNA data).
([pg.22 1st paragraph] A method of generating a learning model for classification and reading of DNA data performed on one or more processors of a computing device.)
NOTE: The computing device contains processors and is therefore hardware. Therefore, Choi teaches communications hardware capable of receiving data.
Training circuitry
([pg.3 4th paragraph] As used herein, the term “unit” or “module” refers to a hardware component such as software, FPGA, or ASIC, and “unit” or “module” performs certain roles.)
NOTE: The term ‘unit’ can refer to an ASIC, which is circuitry.
([pg.8 4th paragraph] According to an embodiment of the present disclosure, the computing device 100 may include the user terminal 10 and the network unit 110 for transmitting and receiving data to and from the external server 20 .)
NOTE: The computing device (on which the model is implemented and trained on) contains the network unit (‘unit’ indicates circuitry) and therefore is also considered training circuitry. The network is also capable of transmitting and receiving data, and is therefore capable of making selections.
Gupta, Choi, and Taylor fail to teach but Cotter teaches;
	The method of claim 5, wherein selecting the candidate knot set includes: receiving, 
	([pg.20 section 7.1] This joint estimation makes the objective non-convex, discussed further in Section 9.3. To simplify estimating the parameters, we treat the number of changepoints Cd for the dth feature as a hyperparameter, and fix the Cd changepoint locations (also called knots) at equally-spaced quantiles of the feature values. The changepoint values are then optimized jointly with the lattice parameters, detailed in Section 9.3.)
	NOTE: The number of changepoints is a hyperparameter, meaning it can be adjusted via received user input. The number of quantiles directly corresponds to the received number of knots. This therefore teaches receiving a number of quantiles to be used for knot selection. 
	


Regarding Claim 8 Gupta fails to teach but Choi teaches,
Communications hardware
([pg.5 3rd paragraph] The computing device 100 of the present disclosure may receive health checkup information or DNA data from the external server 20, and build a learning data set based on the information. The computing device 100 may generate a classification model that performs a classification related to predicting genetic anomalies of a fetus based on the user's DNA data by learning one or more network functions through the training data set.)
NOTE: The computing device is capable of receiving (which is considered communication) information (health checkup information or DNA data).
([pg.22 1st paragraph] A method of generating a learning model for classification and reading of DNA data performed on one or more processors of a computing device.)
NOTE: The computing device contains processors and is therefore hardware. Therefore, Choi teaches communications hardware capable of receiving data.
Gupta, Choi, and Taylor fail to teach but Cotter further teaches;
The method of claim 5, wherein selecting the candidate knot set includes: receiving, 
([pg.20 section 7.1] This joint estimation makes the objective non-convex, discussed further in Section 9.3. To simplify estimating the parameters, we treat the number of changepoints Cd for the dth feature as a hyperparameter, and fix the Cd changepoint locations (also called knots) at equally-spaced quantiles of the feature values. The changepoint values are then optimized jointly with the lattice parameters, detailed in Section 9.3.)
NOTE: The number of knots (changepoints) is a hyperparameter. A hyperparameter is a user defined parameter. The knots are equally spaced, so selecting the number of knots in this case is the same as selecting the knot locations since the number of knots determines where they will be located (1 knot: located at the 50% quantile, 2 knots; located at the 33% and 66% quantiles, etc.). This therefore teaches receiving a set of user-specified knot locations, wherein the candidate knot set comprises the set of user-specified knot locations. 


Regarding Claim 9 Gupta teaches;
	The method of claim 5, wherein optimizing 
	([pg.6 section 5] We use the ADAM optimizer [16] and batched stochastic gradients to update model parameters.)
	NOTE: Teaches optimizing utilizing ADAM stochastic gradient descent (SGD) algorithm.
	Gupta and Choi fail to teach but Taylor teaches;
	Optimizing knot values
([pg.20 section 7.1] This joint estimation makes the objective non-convex, discussed further in Section 9.3. To simplify estimating the parameters, we treat the number of changepoints Cd for the dth feature as a hyperparameter, and fix the Cd changepoint locations (also called knots) at equally-spaced quantiles of the feature values. The changepoint values are then optimized jointly with the lattice parameters, detailed in Section 9.3.)
	NOTE: The knot values (changepoints) are optimized with the lattice parameters. 
	Gupta, Choi, and Taylor fail to teach but Cotter teaches;
	Knot value vector
	([0078] The full vector of all knot heights Ξ and the full set of segment scores Ψ can be parameterized jointly as a function of the full input sequence x: (Ξ, Ψ)=h(θ, x), where h is a non-linear function parameterized by θ that maps the input x to knot heights Ξ and segment scores Ψ.)
	NOTE: Teaches a knot value vector (a full vector of all knot heights). 

	OBVIOUSNESS:
	There is no explicit recitation of optimizing the knot value vectors using ADAM SGD algorithm, however;
	ADAM would be beneficial for optimizing lattice knot-value vectors because it provides adaptive per-parameter step sizes and momentum, which would improve stability across knots since they are updated independently. 
	Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to update the knot value vector using ADAM SGD algorithm to improve stability across knots when optimizing. 


Regarding claim 10, Gupta teaches;
	The method of claim 9, further comprising: applying, 
	([pg.6 section 5] After each gradient update, we project parameters to satisfy their monotonicity)
	NOTE: Teaches applying a shape projection algorithm at each iteration of the SGD algorithm (after each gradient update, we project parameters to satisfy their monotonicity) to satisfy the constraints of the constraint function. 
	Gupta fails to teach but Choi teaches;
	applying by the training circuitry
	[Abstract] Disclosed is a method for generating a learning model for classifying and reading DNA data performed by one or more processors of a computing device.
	The computing device (which contains the network unit training circuitry, and is therefore training circuitry itself) is capable of generating (training, optimizing, etc.) a learning model. The training circuitry (computing device) is therefore also capable of applying constraints such as a shape projection algorithm during model training. 
	 


Claim 17 is an apparatus claim directly corresponding to claim 5, and is therefore rejected using the same reasoning as the 103 rejection for claim 5 taught by Gupta in view of Choi further in view of Taylor further in view of Cotter.


Claim 18 is an apparatus claim directly corresponding to claim 6, and is therefore rejected using the same reasoning as the 103 rejection for claim 6 taught by Gupta in view of Choi further in view of Taylor further in view of Cotter.


Claim 19 is an apparatus claim directly corresponding to claim 7, and is therefore rejected using the same reasoning as the 103 rejection for claim 7 taught by Gupta in view of Choi further in view of Taylor further in view of Cotter.

			               CONCLUSION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Matthew Alan Cady whose telephone number is (571) 272-7229. The examiner can normally be reached Monday - Friday, 7:30 am - 5:00 pm ET. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached on (571)272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) 
at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/MATTHEW ALAN CADY/ Examiner, Art Unit 2145 

/CESAR B PAULA/            Supervisory Patent Examiner, Art Unit 2145
Read full office action
SYSTEMS AND METHODS FOR INCORPORATING SUPPLEMENTAL SHAPE INFORMATION IN A LINEAR DISCRIMINANT ANALYSIS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

SYSTEMS AND METHODS FOR INCORPORATING SUPPLEMENTAL SHAPE INFORMATION IN A LINEAR DISCRIMINANT ANALYSIS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email