Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Note
For clarification and to avoid possible interpretations under 35 U.S.C. 112(f) or possible rejections under 35 USC § 101 (e.g., software per se), it is advised that in claim 19, “a memory device” be amended to “a memory”, and “a processor device” be amended to “a processor” or “a hardware processor”.
The Examiner encourages Applicant to schedule an interview to discuss issues related to, for example, the rejections noted below under 35 U.S.C § 101 and § 103, for moving forward allowance.
Providing supporting paragraph(s) for each limitation of amended/new claim(s) in Remarks is strongly requested for clear and definite claim interpretations by Examiner.
Priority
Acknowledgment is made of applicant's claim for the provisional application filed on 02/09/2022.
Claim Objections
Claim(s) 5, 7-8, 14, 16-17 is/are objected to because of the following informalities.
Claim(s) 5 is/are objected to because of the following informalities: it appears that “the k-1 classifiers” (line 2) needs to read “the k-1 binary classifiers” or something else. Appropriate correction is required. In addition, claim(s) 14 is/are objected to for the same reason.
Claim(s) 7 is/are objected to because of the following informalities: for consistency, simplicity and clarity, it appears that “wherein said identifying step searches” (line 1) needs to read “wherein the identifying searches” or something else. Appropriate correction is required. In addition, claim(s) 16 is/are objected to for the same reason.
Claim(s) 8 is/are objected to because of the following informalities: for consistency, simplicity and clarity, it appears that “wherein said correcting step removes” (line 1) needs to read “wherein the correcting removes” or something else. Appropriate correction is required. In addition, claim(s) 17 is/are objected to for the same reason.
Claim(s) 5, 7-8, 14, 16-17 each recite(s) limitations that raise issues of indefiniteness as set forth above, and their dependent claims are objected to at least based on their direct and/or indirect dependency from the claims listed above. Appropriate explanation and/or amendment is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim(s) 1 recite(s) the limitation “the labeled space” (line 5). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a labeled space”, or something else. For the purposes of examination, “a labeled space” is used. In addition, claim(s) 10, 19 is/are rejected for the same reason.
The term “high” (claim 2, line 3) is a relative term which renders the claim indefinite. The term “high” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. In addition, claims 11, 20 is/are rejected for the same reason.
The term “low” (claim 2, line 3) is a relative term which renders the claim indefinite. The term “low” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. In addition, claims 11, 20 is/are rejected for the same reason.
Claim(s) 2 recite(s) the limitation “from a high dimensional space above x dimensions into a low dimensional space below y dimensions, where x and y are integers, and x > y”. However, based on par 44 “We first use neural networks to transform time series segments from high dimensional to low dimensional latent space”, it is not clear why the high dimensional space is “above x dimensions” and the low dimensional space is “below y dimensions” since it renders a gap of 2 dimensions between the “high dimensional space” and the “low dimensional space”. It appears that it may need to read “from an x-dimensional space into a y-dimensional space, where x and y are integers, and x > y”. For the purposes of examination, “from an x-dimensional space into a y-dimensional space, where x and y are integers, and x > y” is used. In addition, claims 11, 20 is/are rejected for the same reason.
Claim(s) 6 recite(s) the limitation “the nominal loss” (line 1). There is insufficient antecedent basis for this limitation in the claim. It is not clear if it means “a nominal loss” or indicates “a nominal loss” (claim 5), or something else. It appears “The computer-implemented method of claim 1” (claim 6) may need to read “The computer-implemented method of claim 5” or something else. For the purposes of examination, “The computer-implemented method of claim 5” is used. In addition, claim(s) 15 is/are rejected for the same reason.
Claim(s) 1-2, 6, 10-11, 15, 19 each recite(s) limitations that raise issues of indefiniteness as set forth above, and their dependent claims are rejected at least based on their direct and/or indirect dependency from the claims listed above. Appropriate explanation and/or amendment is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 3-8, 10, 12-17, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cao et al. (Rank consistent ordinal regression for neural networks with application to age estimation) in view of Kim et al. (Instance-Level Future Motion Estimation in a Single Image Based on Ordinal Regression and Semi-Supervised Domain Adaptation)
Regarding claim 1
Cao teaches
encoding time series data with a temporal encoder to obtain latent space representations;
(Cao [fig(s) 2] [sec(s) 1] “Aging can be regarded as a non-stationary process since age progression effects appear differently depending on the person’s age. During childhood, facial aging is primarily associated with changes in the shape of the face, whereas aging during adulthood is defined mainly by changes in skin texture [16,20]. Based on this assumption, age prediction can be modeled using ordinal regression-based approaches [2,3,13,29].” [sec(s) 4.1] “The MORPH-2 dataset [24], containing 55,608 face images, was downloaded from https://www.faceaginggroup.com/morph/ and preprocessed by locating the average eye-position in the respective dataset using facial landmark detection [26] and then aligning each image in the dataset to the average eye position using EyepadAlign function in MLxtend v0.14 [22]. The faces were then re-aligned such that the tip of the nose was located in the center of each image. The age labels used in this study were in the range of 16–70 years” [sec(s) 4.2] “To evaluate the performance of CORAL for age estimation from face images, we chose the ResNet-34 architecture [9], which is a modern CNN architecture that achieves good performance on a variety of image classification tasks [8]. For the remainder of this paper, we refer to the original ResNet-34 CNN with standard cross-entropy loss as CE-CNN. To implement a ResNet-34 CNN for ordinal regression using the proposed CORAL method, we replaced the last output layer with the corresponding binary tasks (Fig. 2) and refer to this implementation as CORAL-CNN. Similar to CORAL-CNN, we modified the output layer of ResNet-34 to implement the ordinal regression reference approach described in Niu et al. [16]; we refer to this architecture as OR-CNN”; e.g., feature representations based on ResNet-34 read(s) on “latent space representations”.)
(Note: Hereinafter, if a limitation has bold brackets (i.e. [·]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)
optimizing the temporal encoder using [semi]-supervised learning to distinguish different classes in the labeled space using labeled data, and augment the latent space representations using [un]labeled training data, to obtain [semi]-supervised representations;
(Cao [fig(s) 2] “ResNet-34” [sec(s) 3] “Given a training dataset D = {xi, yi}Ni=1, a rank yi is first extended into K − 1 binary labels y(1)i, . . ., y(K−1)i such that y(k)i ∈ {0, 1} indicates whether yi exceeds rank rk, for instance, y(k)i = 1{yi > rk}. The indicator function 1{·} is 1 if the inner condition is true and 0 otherwise. Using the extended binary labels during model training, we train a single CNN with K − 1 binary classifiers in the output layer, which is illustrated in Fig. 2. Based on the binary task responses, the predicted rank label for an input xi is obtained via h(xi) = rq. The rank index1 q is given by
PNG
media_image1.png
137
340
media_image1.png
Greyscale
, (1) where fk(xi) ∈ {0, 1} is the prediction of the kth binary classifier in the output layer. We require that {fk}K−1k=1 reflect the ordinal information and are rank-monotonic, f1(xi) ≥ f2(xi) ≥ . . . ≥ fK−1(xi), which guarantees consistent predictions. To achieve rank-monotonicity and guarantee binary classifier consistency (Theorem 1), the K − 1 binary tasks share the same weight parameters2 but have independent bias units (Fig. 2). … which is the weighted cross-entropy of K − 1 binary classifiers. For rank prediction (Eq. (1)), the binary labels are obtained via
PNG
media_image2.png
86
750
media_image2.png
Greyscale
(5)” [sec(s) 4.2] “To evaluate the performance of CORAL for age estimation from face images, we chose the ResNet-34 architecture [9], which is a modern CNN architecture that achieves good performance on a variety of image classification tasks [8]. For the remainder of this paper, we refer to the original ResNet-34 CNN with standard cross-entropy loss as CE-CNN.”;)
discarding a linear layer after the temporal encoder and fixing the temporal encoder;
(Cao [fig(s) 2] [sec(s) 4.2] “To evaluate the performance of CORAL for age estimation from face images, we chose the ResNet-34 architecture [9], which is a modern CNN architecture that achieves good performance on a variety of image classification tasks [8]. For the remainder of this paper, we refer to the original ResNet-34 CNN with standard cross-entropy loss as CE-CNN. To implement a ResNet-34 CNN for ordinal regression using the proposed CORAL method, we replaced the last output layer with the corresponding binary tasks (Fig. 2) and refer to this implementation as CORAL-CNN. Similar to CORAL-CNN, we modified the output layer of ResNet-34 to implement the ordinal regression reference approach described in Niu et al. [16]; we refer to this architecture as OR-CNN”; Note that the last output layer of ResNet-34 is a linear layer.)
training k-1 binary classifiers on top of the [semi]-supervised representations to obtain k-1 binary predictions;
(Cao [sec(s) 3] “Given a training dataset D = {xi, yi}Ni=1, a rank yi is first extended into K − 1 binary labels y(1)i, . . ., y(K−1)i such that y(k)i ∈ {0, 1} indicates whether yi exceeds rank rk, for instance, y(k)i = 1{yi > rk}. The indicator function 1{·} is 1 if the inner condition is true and 0 otherwise. Using the extended binary labels during model training, we train a single CNN with K − 1 binary classifiers in the output layer, which is illustrated in Fig. 2.”;)
identifying and correcting inconsistent ones of the k-1 binary predictions by matching the inconsistent ones to consistent ones of the k-1 binary predictions; and
(Cao [fig(s) 1] “a rank-inconsistent model (left) versus a rank-consistent model where the probabilities decrease consistently (right).” [fig(s) 2] [fig(s) 3] “OR-CNN”, “CORAL-CNN” [sec(s) 1] “This inconsistency problem among the predictions of individual binary classifiers is illustrated in Fig. 1. We propose a new method and theorem for guaranteed classifier consistency that can easily be implemented in various neural network architectures.” [sec(s) 2] “Niu et al. [16] acknowledged the classifier inconsistency as not being ideal and also noted that ensuring the K − 1 binary classifiers are consistent would increase the training complexity substantially [16]. The CORAL method proposed in this paper addresses both these issues with a theoretical guarantee for classifier consistency and without increasing the training complexity.” [sec(s) 3.2] “Given a training dataset D = {xi, yi}Ni=1, a rank yi is first extended into K − 1 binary labels y(1)i , . . ., y(K−1)i such that y(k)i ∈ {0, 1} indicates whether yi exceeds rank rk, for instance, y(k)i = 1{yi > rk}. The indicator function 1{·} is 1 if the inner condition is true and 0 otherwise. Using the extended binary labels during model training, we train a single CNN with K − 1 binary classifiers in the output layer, which is illustrated in Fig. 2. Based on the binary task responses, the predicted rank label for an input xi is obtained via h(xi) = rq. The rank index1 q is given by
PNG
media_image3.png
177
449
media_image3.png
Greyscale
(1) where fk(xi) ∈ {0, 1} is the prediction of the kth binary classifier in the output layer. We require that {fk}K−1k=1 reflect the ordinal information and are rank-monotonic, f1(xi) ≥ f2(xi) ≥ . . . ≥ fK−1(xi), which guarantees consistent predictions. To achieve rank-monotonicity and guarantee binary classifier consistency (Theorem 1), the K − 1 binary tasks share the same weight parameters2 but have independent bias units (Fig. 2).”;)
aggregating the k-1 binary predictions to obtain an ordinal prediction.
(Cao [sec(s) 3] “Based on the binary task responses, the predicted rank label for an input xi is obtained via h(xi) = rq. The rank index1 q is given by
PNG
media_image1.png
137
340
media_image1.png
Greyscale
, (1) where fk(xi) ∈ {0, 1} is the prediction of the kth binary classifier in the output layer. We require that {fk}K−1k=1 reflect the ordinal information and are rank-monotonic, f1(xi) ≥ f2(xi) ≥ . . . ≥ fK−1(xi), which guarantees consistent predictions. To achieve rank-monotonicity and guarantee binary classifier consistency (Theorem 1), the K − 1 binary tasks share the same weight parameters2 but have independent bias units (Fig. 2). … which is the weighted cross-entropy of K − 1 binary classifiers. For rank prediction (Eq. (1)), the binary labels are obtained via
PNG
media_image2.png
86
750
media_image2.png
Greyscale
(5)”;)
However, Cao does not appear to explicitly teach:
optimizing the temporal encoder using [semi]-supervised learning to distinguish different classes in the labeled space using labeled data, and augment the latent space representations using [un]labeled training data, to obtain [semi]-supervised representations;
training k-1 binary classifiers on top of the [semi]-supervised representations to obtain k-1 binary predictions;
(Note: Hereinafter, if a limitation has one or more bold underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)
Kim teaches
optimizing the temporal encoder using semi-supervised learning to distinguish different classes in the labeled space using labeled data, and augment the latent space representations using unlabeled training data, to obtain semi-supervised representations;
(Kim [fig(s) 6] “Labeled input”, “Unlabeled input” and “The architecture of the proposed FM-Net in the semi-supervised domain adaptation setting.” [sec(s) IV] “Let x be an instance and yx ∈ C be its class. For COR, binary classifiers, f0, f1, . . . , fK/2−1, are used. Each binary classifier fn is defined as
PNG
media_image4.png
204
1216
media_image4.png
Greyscale
(2) where (n)K denotes the modulo operator returning the remainder after the division of n by K. … Note that, in the linear ordinal regression [42], the classes in a line segment is divided into two parts. Therefore, for K-way classification, K −1 binary classifiers are required. In contrast, in the proposed COR, a circle is halved into two semicircles, as done in [45]. Consequently, only K/2 binary classifiers are needed.” [sec(s) V] “When a source domain for training and a target domain for the FM inference are different, FM-Net may fail to estimate the FMs of test instances in the target domain accurately. In this work, we attempt to improve the generalization performance of FM-Net by adapting it to a new target domain in a semi-supervised manner. More specifically, FM-Net is trained in the semi-supervised domain adaptation setting, in which a sufficient number of labeled data are available in the source domain, a limited number of labeled data are in the target domain, and a large number of unlabeled data are in the target domain”;)
training k-1 binary classifiers on top of the semi-supervised representations to obtain k-1 binary predictions;
(Kim [fig(s) 6] “Labeled input”, “Unlabeled input” and “The architecture of the proposed FM-Net in the semi-supervised domain adaptation setting.” [sec(s) IV] “Let x be an instance and yx ∈ C be its class. For COR, binary classifiers, f0, f1, . . . , fK/2−1, are used. Each binary classifier fn is defined as
PNG
media_image4.png
204
1216
media_image4.png
Greyscale
(2) where (n)K denotes the modulo operator returning the remainder after the division of n by K. … Note that, in the linear ordinal regression [42], the classes in a line segment is divided into two parts. Therefore, for K-way classification, K −1 binary classifiers are required. In contrast, in the proposed COR, a circle is halved into two semicircles, as done in [45]. Consequently, only K/2 binary classifiers are needed.” [sec(s) V] “When a source domain for training and a target domain for the FM inference are different, FM-Net may fail to estimate the FMs of test instances in the target domain accurately. In this work, we attempt to improve the generalization performance of FM-Net by adapting it to a new target domain in a semi-supervised manner. More specifically, FM-Net is trained in the semi-supervised domain adaptation setting, in which a sufficient number of labeled data are available in the source domain, a limited number of labeled data are in the target domain, and a large number of unlabeled data are in the target domain”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Cao with the semi-supervised representations of Kim.
One of ordinary skill in the art would have been motived to combine in order to yield remarkable future motion estimation results despite variations in camera viewpoints and capturing environments, and improve future motion estimation accuracies.
(Kim [sec(s) I] “Experimental results demonstrate that the proposed FM-Net yields remarkable FM estimation results for pedestrian, car, and animal instances despite variations in camera viewpoints and capturing environments, when a sufficient number of labeled training data are provided. Moreover, it is demonstrated that the proposed semi-supervised domain adaptation learning improves FM estimation accuracies, when only a limited number of labeled data for a new domain are available.”)
Regarding claim 3
The combination of Cao, Kim teaches claim 1.
Cao further teaches
wherein the linear layer is a classifier on the latent space representations.
(Cao [fig(s) 2] [sec(s) 4.1] “The MORPH-2 dataset [24], containing 55,608 face images, was downloaded from https://www.faceaginggroup.com/morph/ and preprocessed by locating the average eye-position in the respective dataset using facial landmark detection [26] and then aligning each image in the dataset to the average eye position using EyepadAlign function in MLxtend v0.14 [22]. The faces were then re-aligned such that the tip of the nose was located in the center of each image. The age labels used in this study were in the range of 16–70 years” [sec(s) 4.2] “To evaluate the performance of CORAL for age estimation from face images, we chose the ResNet-34 architecture [9], which is a modern CNN architecture that achieves good performance on a variety of image classification tasks [8]. For the remainder of this paper, we refer to the original ResNet-34 CNN with standard cross-entropy loss as CE-CNN. To implement a ResNet-34 CNN for ordinal regression using the proposed CORAL method, we replaced the last output layer with the corresponding binary tasks (Fig. 2) and refer to this implementation as CORAL-CNN. Similar to CORAL-CNN, we modified the output layer of ResNet-34 to implement the ordinal regression reference approach described in Niu et al. [16]; we refer to this architecture as OR-CNN”; e.g., feature representations based on ResNet-34 read(s) on “latent space representations”. Note that the last output layer of ResNet-34 is a linear layer.)
Regarding claim 4
The combination of Cao, Kim teaches claim 1.
Cao further teaches
wherein the temporal encoder and the linear layer are both trainable.
(Cao [sec(s) 3] “Given a training dataset D = {xi, yi}Ni=1, a rank yi is first extended into K − 1 binary labels y(1)i, . . ., y(K−1)i such that y(k)i ∈ {0, 1} indicates whether yi exceeds rank rk, for instance, y(k)i = 1{yi > rk}. The indicator function 1{·} is 1 if the inner condition is true and 0 otherwise. Using the extended binary labels during model training, we train a single CNN with K − 1 binary classifiers in the output layer, which is illustrated in Fig. 2. Based on the binary task responses, the predicted rank label for an input xi is obtained via h(xi) = rq. The rank index1 q is given by
PNG
media_image1.png
137
340
media_image1.png
Greyscale
, (1) where fk(xi) ∈ {0, 1} is the prediction of the kth binary classifier in the output layer. We require that {fk}K−1k=1 reflect the ordinal information and are rank-monotonic, f1(xi) ≥ f2(xi) ≥ . . . ≥ fK−1(xi), which guarantees consistent predictions. To achieve rank-monotonicity and guarantee binary classifier consistency (Theorem 1), the K − 1 binary tasks share the same weight parameters2 but have independent bias units (Fig. 2).” [sec(s) 4.2] “To evaluate the performance of CORAL for age estimation from face images, we chose the ResNet-34 architecture [9], which is a modern CNN architecture that achieves good performance on a variety of image classification tasks [8]. For the remainder of this paper, we refer to the original ResNet-34 CNN with standard cross-entropy loss as CE-CNN. To implement a ResNet-34 CNN for ordinal regression using the proposed CORAL method, we replaced the last output layer with the corresponding binary tasks (Fig. 2) and refer to this implementation as CORAL-CNN. Similar to CORAL-CNN, we modified the output layer of ResNet-34 to implement the ordinal regression reference approach described in Niu et al. [16]; we refer to this architecture as OR-CNN”; Note that the last output layer of ResNet-34 is a linear layer.)
Regarding claim 5
The combination of Cao, Kim teaches claim 1.
Cao further teaches
further comprising training the k-1 classifiers using a nominal loss.
(Cao [sec(s) 3] “For model training, we minimize the loss function
PNG
media_image5.png
288
1280
media_image5.png
Greyscale
(4) which is the weighted cross-entropy of K − 1 binary classifiers. For rank prediction (Eq. (1)), the binary labels are obtained via
PNG
media_image6.png
84
748
media_image6.png
Greyscale
. (5) In Eq. (4), λ(k) denotes the weight of the loss associated with the kth classifier (assuming λ(k) > 0). In the remainder of the paper, we refer to λ(k) as the importance parameter for task k. Some tasks may be less robust or harder to optimize, which can be considered by choosing a non-uniform task weighting scheme.” [sec(s) 4] “For the remainder of this paper, we refer to the original ResNet-34 CNN with standard cross-entropy loss as CE-CNN.” [sec(s) 5] “We conducted a series of experiments on three independent face image datasets for age estimation (Section 4.1) to compare the proposed CORAL method (CORAL-CNN) with the ordinal regression approach proposed by Niu et al. [16] (OR-CNN). All implementations were based on the ResNet-34 architecture, as described in Section 4.2. We include the standard ResNet-34 classification network with cross-entropy loss (CE-CNN) as a performance baseline.”;)
Regarding claim 6
The combination of Cao, Kim teaches claim 1.
Cao further teaches
wherein the nominal loss is selected from the group consisting of a softmax cross-entropy loss and a binary cross-entropy loss.
(Cao [sec(s) 3] “For model training, we minimize the loss function
PNG
media_image5.png
288
1280
media_image5.png
Greyscale
(4) which is the weighted cross-entropy of K − 1 binary classifiers. For rank prediction (Eq. (1)), the binary labels are obtained via
PNG
media_image6.png
84
748
media_image6.png
Greyscale
. (5) In Eq. (4), λ(k) denotes the weight of the loss associated with the kth classifier (assuming λ(k) > 0). In the remainder of the paper, we refer to λ(k) as the importance parameter for task k. Some tasks may be less robust or harder to optimize, which can be considered by choosing a non-uniform task weighting scheme.” [sec(s) 4] “For the remainder of this paper, we refer to the original ResNet-34 CNN with standard cross-entropy loss as CE-CNN.” [sec(s) 5] “We conducted a series of experiments on three independent face image datasets for age estimation (Section 4.1) to compare the proposed CORAL method (CORAL-CNN) with the ordinal regression approach proposed by Niu et al. [16] (OR-CNN). All implementations were based on the ResNet-34 architecture, as described in Section 4.2. We include the standard ResNet-34 classification network with cross-entropy loss (CE-CNN) as a performance baseline.”; For more details on a binary cross-entropy loss, please refer to Ho et al. (The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling) - [sec(s) III])
Regarding claim 7
The combination of Cao, Kim teaches claim 1.
Cao further teaches
wherein said identifying step searches for sequences of ones having an unexpected zero therein and sequences of zeros having an unexpected one therein.
(Cao [fig(s) 1] “a rank-inconsistent model (left) versus a rank-consistent model where the probabilities decrease consistently (right).” [fig(s) 2] [sec(s) 1] “This inconsistency problem among the predictions of individual binary classifiers is illustrated in Fig. 1. We propose a new method and theorem for guaranteed classifier consistency that can easily be implemented in various neural network architectures.” [sec(s) 2] “Niu et al. [16] acknowledged the classifier inconsistency as not being ideal and also noted that ensuring the K − 1 binary classifiers are consistent would increase the training complexity substantially [16]. The CORAL method proposed in this paper addresses both these issues with a theoretical guarantee for classifier consistency and without increasing the training complexity.” [sec(s) 3.2] “Given a training dataset D = {xi, yi}Ni=1, a rank yi is first extended into K − 1 binary labels y(1)i , . . ., y(K−1)i such that y(k)i ∈ {0, 1} indicates whether yi exceeds rank rk, for instance, y(k)i = 1{yi > rk}. The indicator function 1{·} is 1 if the inner condition is true and 0 otherwise. Using the extended binary labels during model training, we train a single CNN with K − 1 binary classifiers in the output layer, which is illustrated in Fig. 2. Based on the binary task responses, the predicted rank label for an input xi is obtained via h(xi) = rq. The rank index1 q is given by
PNG
media_image3.png
177
449
media_image3.png
Greyscale
(1) where fk(xi) ∈ {0, 1} is the prediction of the kth binary classifier in the output layer. We require that {fk}K−1k=1 reflect the ordinal information and are rank-monotonic, f1(xi) ≥ f2(xi) ≥ . . . ≥ fK−1(xi), which guarantees consistent predictions. To achieve rank-monotonicity and guarantee binary classifier consistency (Theorem 1), the K − 1 binary tasks share the same weight parameters2 but have independent bias units (Fig. 2).”;)
Regarding claim 8
The combination of Cao, Kim teaches claim 7.
Cao further teaches
wherein said correcting step removes unexpected zeros and unexpected ones from the sequence of ones and the sequences of zeros, respectively.
(Cao [fig(s) 1] “a rank-inconsistent model (left) versus a rank-consistent model where the probabilities decrease consistently (right).” [fig(s) 2] [fig(s) 3] “OR-CNN”, “CORAL-CNN” [sec(s) 1] “This inconsistency problem among the predictions of individual binary classifiers is illustrated in Fig. 1. We propose a new method and theorem for guaranteed classifier consistency that can easily be implemented in various neural network architectures.” [sec(s) 2] “Niu et al. [16] acknowledged the classifier inconsistency as not being ideal and also noted that ensuring the K − 1 binary classifiers are consistent would increase the training complexity substantially [16]. The CORAL method proposed in this paper addresses both these issues with a theoretical guarantee for classifier consistency and without increasing the training complexity.” [sec(s) 3.2] “Given a training dataset D = {xi, yi}Ni=1, a rank yi is first extended into K − 1 binary labels y(1)i , . . ., y(K−1)i such that y(k)i ∈ {0, 1} indicates whether yi exceeds rank rk, for instance, y(k)i = 1{yi > rk}. The indicator function 1{·} is 1 if the inner condition is true and 0 otherwise. Using the extended binary labels during model training, we train a single CNN with K − 1 binary classifiers in the output layer, which is illustrated in Fig. 2. Based on the binary task responses, the predicted rank label for an input xi is obtained via h(xi) = rq. The rank index1 q is given by
PNG
media_image3.png
177
449
media_image3.png
Greyscale
(1) where fk(xi) ∈ {0, 1} is the prediction of the kth binary classifier in the output layer. We require that {fk}K−1k=1 reflect the ordinal information and are rank-monotonic, f1(xi) ≥ f2(xi) ≥ . . . ≥ fK−1(xi), which guarantees consistent predictions. To achieve rank-monotonicity and guarantee binary classifier consistency (Theorem 1), the K − 1 binary tasks share the same weight parameters2 but have independent bias units (Fig. 2).”;)
Regarding claim 10
The claim is a computer program product claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 12
The claim is a computer program product claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 13
The claim is a computer program product claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 14
The claim is a computer program product claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 15
The claim is a computer program product claim corresponding to the method claim 6, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 16
The claim is a computer program product claim corresponding to the method claim 7, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 17
The claim is a computer program product claim corresponding to the method claim 8, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 19
The claim is a system claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Claim(s) 2, 11, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cao et al. (Rank consistent ordinal regression for neural networks with application to age estimation) in view of Kim et al. (Instance-Level Future Motion Estimation in a Single Image Based on Ordinal Regression and Semi-Supervised Domain Adaptation) further in view of Liu et al. (Deep Learning in Latent Space for Video Prediction and Compression)
Regarding claim 2
The combination of Cao, Kim teaches claim 1.
However, the combination of Cao, Kim does not appear to explicitly teach:
wherein the temporal encoder comprises a Long Short-Term Memory (LSTM) encoding the time series data from a high dimensional space above x dimensions into a low dimensional space below y dimensions, where x and y are integers, and x > y.
Liu teaches
wherein the temporal encoder comprises a Long Short-Term Memory (LSTM) encoding the time series data from a high dimensional space above x dimensions into a low dimensional space below y dimensions, where x and y are integers, and x > y.
(Liu [sec(s) Abs] “We propose a novel DNN based framework that predicts and compresses video sequences in the latent vector space. The proposed method first learns the efficient lower-dimensional latent space representation of each video frame and then performs inter-frame prediction in that latent domain” [sec(s) 1] “• Learning based video prediction in latent domain: We use an convolutional long short-term memory (ConvLSTM) network to predict a compact latent representation of the next frame substituting for motion compensation in conventional codecs. This approach only stores the differences between the predicted and actual representation in low dimensional latent space, resulting in entropy reduction of the residuals. The predictor is adversarially trained against a discriminator which significantly enhances the quality of prediction to bring down the entropy (i.e., density and magnitude of nonzero elements) of residuals.” [sec(s) 3] “To further reduce the video code size, we encode the residual with quantization and entropy coding. A desired compression rate is controlled by the size of latent dimension in the image compression stage as well as the number of quantization levels used in residual encoding.”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Cao, Kim with the latent space dimension of Liu.
One of ordinary skill in the art would have been motived to combine in order to achieve significant quality improvements and provide reliable and accurate prediction for compression for videos.
(Liu [sec(s) 4] “In the video compression task, we alleviate this issue and achieve significant quality improvements by saving and transmitting key-frames and residuals. For the majority of normal video frame sequences that have strong temporal correlations, the proposed prediction method provides reliable and accurate prediction for compression.”)
Regarding claim 11
The claim is a computer program product claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 20
The claim is a system claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Claim(s) 9, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cao et al. (Rank consistent ordinal regression for neural networks with application to age estimation) in view of Kim et al. (Instance-Level Future Motion Estimation in a Single Image Based on Ordinal Regression and Semi-Supervised Domain Adaptation) in view of Zhang et al. (Non-Uniform Discretization-Based Ordinal Regression for Monocular Depth Estimation of an Indoor Drone)
Regarding claim 9
The combination of Cao, Kim teaches claim 1.
However, the combination of Cao, Kim does not appear to explicitly teach:
further comprising automatically controlling a vehicle system for collision avoidance responsive to the ordinal prediction predicting an impending collision.
Zhang teaches
further comprising automatically controlling a vehicle system for collision avoidance responsive to the ordinal prediction predicting an impending collision.
(Zhang [table(s) 1] “fps 75” [sec(s) 1] “In this paper, NSID is proposed, and is shown to yield better results than those of non-uniform discretization (NUD). In important decision areas, the performance of the ordinal regression algorithm proposed in this paper reaches the performance of the state-of-the-art two-stream regression algorithm [14], and the inference time of NSIDORA is 3.4 times faster than that of the two-stream regression algorithm, which is of great significance for autonomous drone navigation and obstacle avoidance with high security requirements.” [sec(s) 2] “This algorithm is currently the most advanced indoor navigation algorithm. The latter two perception methods obtain better navigation performance in actual indoor environments. However, the prediction results of typical classification algorithms are rough, which is not conducive to fine control, whereas the state-of-art regression model treats the distance equally, resulting in a decrease in the performance of close-range prediction. In order to avoid these problems, based on the strong orderly correlation of the distance value, we turned the autonomous navigation and obstacle avoidance of the UAV into an ordinal regression problem.” [sec(s) 4] “The comparison of the overall performance of the four algorithms is shown in Table 1. It can be seen from Table 1 that the classification performance of NSIDORA in the non-decision area is similar to the NUD-based algorithm. The RMSE (the lower the better) of NSIDORA in the decision area is 33.5% lower than that of the NUD-based ordinal regression, and is 6% higher than that of the state-of-the-art two-stream regression algorithm; however, it is better than that of two-stream regression in the first ten categories, shown in Figure 7 and Table 2. Furthermore, on a hardware device with an Intel Xeon E5-2640v4 32GB RAM, the inference fps of NSIDORA, another important indicator of drone indoor depth estimation [35], can reach 75, which is 3.4 times faster than that of the two-stream regression algorithm.”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Cao, Kim with the automatic vehicle system of Zhang.
One of ordinary skill in the art would have been motived to combine in order to achieve an inference speed of 3.4 times faster than the conventional method so that the vehicle may move avoid obstacles.
(Zhang [sec(s) 5] “The experimental results show that the RMSE of NSIDORA in the decision area is 33.5% lower than that of the NUD-based ordinal regression method. Although the RMSE is higher than that of the state-of-the-art two-stream regression algorithm, the inference speed of NSIDORA is 3.4 times faster than that of two-stream ordinal regression method. Furthermore, the RMSE of our distance decoder in the decision area is 20.7% lower than that of the distance decoder presented in [30].”)
Regarding claim 18
The claim is a computer program product claim corresponding to the method claim 9, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Shi et al. (Deep Neural Networks for Rank-Consistent Ordinal Regression Based On Conditional Probabilities) teaches rank-consistent ordinal regression that does not require a weight-sharing constraint in a neural network’s fully connected output layer.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Fri 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SEHWAN KIM/Examiner, Art Unit 2129
12/8/2025