DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit as a continuation application of International Application No. PCT/US21/50798, filed 09/17/2021 which claims priority to US provisional application no. 62706905, filed 09/17/2020, which is acknowledged.
Drawings
The drawings were received on 03/10/2023. These drawings are acceptable.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/10/20230 is being considered by the examiner.
Response to Arguments
Applicant's arguments filed 11/07/2025 have been fully considered.
Regarding the rejection of claims under 35 USC 112 (a) and 112(b), the rejection made in the previous office action has been withdrawn.
Regarding the rejection of claims under 35 USC 101, the rejection made in the previous office action has been withdrawn. Per the guidance in MPEP 2106.04(d)(1), the applicant’s specification, filed 3/10/2023, discloses:
“…Prior to deployment, pre-trained ML models for the two ML modules may need to be evaluated and compared, whereas untrained models may need to also be trained, verified, and tested. Evaluating and training the ML modules usually requires at least three corresponding ground truth data sets representing the input (e.g., user images), the output (e.g., measurements), and the intermediate input (e.g., keypoints); where the "ground truth" qualifier is used for output data sets, but also for corresponding input-output data sets comprising an input data set and a corresponding ground truth output data set.
Importantly, while corresponding input-output data sets (i.e., user images and measurements) are readily available through scanners, and while corresponding intermediate-output data sets (i.e., keypoints and measurements) are easily generated artificially, obtaining corresponding input and intermediate data sets is difficult.
Annotation ML modules are usually evaluated and trained using manually determined keypoints, where body segmentation, i.e., estimating a sample human's body underneath the clothing, and body annotation, i.e., drawing keypoints or lines for each body feature for the sample human, are both carried out manually by a human annotator…
Such evaluation and training data for the annotation ML is difficult to obtain. Furthermore, even when available, it is difficult to assess for quality and accuracy.
Therefore, it would be an advancement in the state of the art to provide a system and method for estimating the performance of a pre-trained annotation ML module or for training an untrained annotation ML module without access to the intermediate ground truth data set, using only corresponding intermediate-output and input-output data sets. A related method can also be used to evaluate different human annotators or different human or non-human annotation schemes.
There are other applications in which machine learning modules need to be trained where corresponding ground truth data sets are not necessarily available, complete, or reliable.
It is against this background that the present invention was developed.”
This recitation sets forth an improvement in technology, and amended claim 13 limitations appear to reflect the disclosed improvement. Thus, the claims are deemed eligible under 35 USC 101.
Regarding the rejection of claims under 35 USC 102, the remarks are directed to the amended limitations, see current office action regarding the claim status. The applicant remarks are directed to amended limitations where a substantial amount of the previous claim language has been strike out/removed from newly filed claim set. The examiner has not previously examined the current claim set and has updated the office action to address the amended claim limitations.
Claim Interpretation
MPEP 2111 notes:
“During patent examination, the pending claims must be "given their broadest reasonable interpretation consistent with the specification." The Federal Circuit’s en banc decision in Phillips v. AWH Corp., 415 F.3d 1303, 1316, 75 USPQ2d 1321, 1329 (Fed. Cir. 2005) expressly recognized that the USPTO employs the "broadest reasonable interpretation" standard:”
And the examiner notes the following per PG Pub No. 20230316046 of the instant application:
Ground truth keypoint annotation
This is not considered a term of art and is just interpreted as any manually determined annotation associated with body image data, for example lines or points associated with an image of a body part. Where all lines are considered to have a starting point and end point. See applicant remarks, filed 11/07/2025 pgs.7-8 ,that also provides citations that gives examples of key point annotations as key points or an heatmap. There is no citation regarding the claimed term “ground truth keypoint annotation” so it not considered a term of art in computer science and machine learning.
In 0004: … Accurately estimating clothing size and fit can be based on body part length and body weight measurements. Such an estimation can be performed with machine learning through a multi-stage process having user images as an input and one or more body or body-part measurements as an output. The annotation of user images is often required as an initial stage in this process, where annotation is the generation of annotation keypoints or annotation lines indicating corresponding body feature measurement locations underneath user clothing for one or more identified body features (e.g., height, size of foot, size of arm, size of torso, etc.). Image annotations may be carried out through one or more annotation ML modules that have been trained on each body feature, such as an annotation deep neural network (DNN)… [0008] Annotation ML modules are usually evaluated and trained using manually determined keypoints, where body segmentation, i.e., estimating a sample human's body underneath the clothing, and body annotation, i.e., drawing keypoints or lines for each body feature for the sample human, are both carried out manually by a human annotator. The annotation ML modules are then trained on the manually annotated images collected and annotated for thousands of sample humans.
body part measurements:
In 0061: Accurately estimating various body-related physical quantities such as body measurements (e.g., height), body part measurements (e.g., arm or foot dimensions), …
Can be interpreted as any measurement related to a physical body
Ground truth:
0066: Such ground truth evaluation and training data for the annotation ML 214 is time-consuming, costly, and hard to obtain as it requires the manual labor of multiple annotators. Furthermore, annotation accuracy and clarity need to be assessed ahead of any use of the generated corresponding (A, B) data sets for the evaluation or training of annotation ML modules 214. The variation in accuracy and quality emanates from the differences in manual annotator performance, but also from the performance variations among multiple annotation mechanisms used by the annotators (e.g., computer-aided manual annotation, scanned physical image annotation, etc.).
Can be interpreted as any input, that includes an annotation or label, from an annotator captured via scanned annotations or computer-aided manual annotations or any annotation mechanism.
ground-truth keypoint set
in 0077: …The input images 506 and output body part measurements 508 are hence global (or system) ground-truth input-output data sets spanning the concatenated DNN and regressor (see FIG. 2 ). A ground-truth keypoint set corresponding to the input body part images 506 (i.e., an intermediate data set) is either unavailable, difficult to obtain, or difficult to assess for quality (i.e., partially or fully unreliable).
Can be interpreted as any data sets that include images and body part measurements.
Claimed module(s) (e.g. annotation-to-measurement machine learning module, a fixed annotation-to-measurement machine learning module, a photo-to-annotation machine learning module):
In 0183: … In general, the method executed to implement the embodiments of the invention, may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as “computer program(s)” or “computer code(s).”
Can be interpreted as any computer instructions used to perform the claimed method(s)/operation(s).
Distance Metric/ Batch distance Metric
In 0024: In yet another embodiment, the first distance metric is a batch distance measure selected from the group consisting of a mean absolute error (MAE), a mean squared error (MSE), a mean squared deviation (MSD), and a mean squared prediction error (MSPE).
Can be interpreted as any batch distance measure, any distance measure or any mean error measure in additional to the ones provided in the list above.
landmark indicators:
In [0103]: In various embodiments of the present invention, the first ML module (MAB) 104, 214, 634, 644, 654 has a different type of output than the second ML module (MBC) 108, 218, 638, 648, 658. In the example of FIG. 2 and the examples below, the output of the first ML module (MAB) comprises keypoint annotations of one or more body parts under clothing, while the output of the second ML module (MBC) comprises measurements of one or more body parts. Keypoints (i.e., 2D landmark indicators) and measurements (i.e., single real values or vectors of real values) represent distinct types of output. In addition to body-part keypoints and body-part measurements,... Intermediate ML variables without real-world significance, such as intermediate DNN tensors (e.g., feature maps) that are commonly generated through freezing one or more neural network layers during training, are hence excluded.
Can be interpreted as any indicator associated with one or more body parts.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 13 and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Koh et al. (US Pat. No. 10,321,728, hereinafter ‘Koh’) in view of Akbas (US 11074711, hereinafter ‘Bas’).
Regarding independent claim 13 limitations, Koh teaches: a non-transitory storage medium storing program code for generating a predicted body part measurement from a user image by training a photo-to-annotation machine learning (ML) model in a multi-stage setup: the program code executable by a hardware processor, the program code when executed by the processor, causing the processor to: (As depicted in Fig. 1A and Fig.2 and in 2:24-30: …The present invention relates to methods and systems for extracting full body measurements using 2D user images, taken for example from a mobile device camera. More specifically, in various embodiments, the present invention is a computer-implemented method for generating body size measurements of a human, the computer-implemented method executable by a hardware processor [computer-implemented method for machine learning training for body measurements prediction executable by a hardware processor, comprising, the program code executable by a hardware processor, the program code when executed by the processor, causing the processor to],..; 33:7-34:60: …The cloud computing environment may include one or more cloud computing nodes with which local computing devices used by cloud consumers, such as, for example … The present invention may be implemented using server- 30 based hardware and software. FIG. 18 shows an illustrative hardware architecture diagram 1800 of a server for implementing one embodiment of the present invention…The hardware operates under the control of an operating system 1870, and executes various computer software applications 1860, components, programs, codes, libraries, objects, modules, etc. indicated collectively by reference numerals to perform the methods, processes, and techniques described above… In general, the method executed to implement the embodiments of the invention, may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as "computer program(s)" or "computer code(s)." [the program code executable by a hardware processor, the program code when executed by the processor, causing the processor to:]…)
train an annotation-to-measurement machine learning model using an annotation- to-measurement training dataset comprising a set of keypoint annotations of a first set of one or more body parts and a corresponding set of measurements of the first set of one or more body parts, to generate a trained annotation-to-measurement machine learning model, wherein the trained annotation-to-measurement machine learning model predicts measurements of body parts from keypoint annotations of body parts, wherein each given keypoint annotation comprises a landmark indicator indicating a given body part, and wherein the corresponding set of measurements of the first set of one or more body parts comprises a dimension for each part of a physical body represented within the first set of one or more body parts; (As depicted in Fig 2, And In 22:22-34: At step 206, actual human measurements for each body feature [wherein each given keypoint annotation comprises a landmark indicator indicating a given body part, and wherein the corresponding set of measurements of the first set of one or more body parts comprises a dimension for each part of a physical body represented within the first set of one or more body parts] (e.g., determined by a tailor, or ID measurements taken from 3D body scans) [the first set of one or more body parts comprises a dimension for each part of a physical body represented within the first set of one or more body parts] may be received to serve as ground-truth data. The actual human measurements may be used as validation data and used for training the algorithms used by the system. For example, the actual human mea-surements may be used in minimizing an error function or loss function (mean squared error, likelihood loss, log-loss hinge loss, etc.) associated with the machine learning algorithms. In one embodiment, the annotation lines [a set of keypoint annotations of a first set of one or more body parts] from step 205 and the ground-truth data from step 206 [using an annotation- to-measurement training dataset comprising a set of keypoint annotations of a first set of one or more body parts and a corresponding set of measurements of the first set of one or more body parts, to generate a trained annotation-to-measurement machine learning model,] are utilized to train the annotation D LN [train an annotation-to-measurement machine learning model using an annotation- to-measurement training dataset comprising a set of keypoint annotations of a first set of one or more body parts and a corresponding set of measurements of the first set of one or more body parts] used in step 107 and the sizing ML step 108 of FIG. lA)
fix at least one training parameter of the trained annotation-to-measurement machine learning model to generate a fixed-parameter trained annotation-to-measurement machine learning model; (As depicted in Fig. 2:212 the trained models are fixed [fix at least one training parameter of the trained annotation-to-measurement machine learning model to generate a fixed-parameter trained annotation-to-measurement machine learning model] and set of outputting after they are trained, and in 24:1-8: At step 212, the trained segmentation DLN, annotation DLN, and sizing ML module to be used in FIGS. 1A, 1B, and 1C may be output. In particular, the segmentation DLN trained in step 208 is output for use in step 106 in FIG. 1A. Similarly, the one or more annotation DLNs trained in step 210 are output for use in step 107 in FIG. 1A. Finally, the sizing ML module trained in step 210 is output for use in step 108 in FIG. 1A.)
and train the photo-to-annotation machine learning model while connected to the fixed-parameter trained annotation-to-measurement machine learning model using a photo-to-measurement training dataset comprising a set of training photos of clothed individuals, the training photos of clothed individuals showing one or more body parts under clothing, and a corresponding set of training ground truth measurements of the one or more body parts under clothing, (In 13:8-36: In one embodiment, the segmentation DLN algorithm may be trained with segmentation training data [train the photo-to-annotation machine learning model while connected to the fixed-parameter trained annotation-to-measurement machine learning model using a photo-to-measurement training dataset comprising a set of training photos of clothed individuals], as described in relation to FIG. 2 below. In some embodiments, the segmentation training data may include thousands of sample humans with manually-segmented body features. In some embodiments, the training data includes medical data, for example from CAT scans, MM scans, and so forth. In some embodiments, the training data includes data from previous tailor or 3D body measurements that include 3D body scans from 3D body scanners and “ground truth” data [the training photos of clothed individuals showing one or more body parts under clothing, and a corresponding set of training ground truth measurements of the one or more body parts under clothing]. In some embodiments, the 3D body scans may be used to extract approximate front and/or side view photos, in cases where the front and side view photos are not explicitly available. In some embodiments, the ground truth data comprises human tailor-measured data; while in other embodiments, the ground truth data comprises automatically extracted 1D body size measurements from the 3D body scans [a corresponding set of training ground truth measurements of the one or more body parts under clothing] … In yet other embodiments, an organization utilizing the present invention may capture their own front and side photos, along with suitable ground truth data using a human tailor, for training the segmentation DLN. Examiner notes that the segmentation DLN is trained on separate training data that does not impact the training of the other fixed models at their own trained state as depicted in Fig. 2 )
wherein each keypoint annotation output of the photo-to-annotation machine learning model is a keypoint annotation input of the annotation-to-measurement machine learning model, (As depicted in Fig. 1A and in 13:37-43: At step 107, an annotation line for each body part that was extracted at step 106 [wherein each keypoint annotation output of the photo-to-annotation machine learning model…] may be drawn using one or more additional deep learning networks (DLNs), for example an annotation DLN […is a keypoint annotation input of the annotation-to-measurement machine learning model]. In one embodiment, there is a separate body feature annotation DLN for each body part. In other embodiments, there is one body feature annotation DLN for the entire body…)
wherein the photo-to-annotation machine learning model is a deep neural network(DNN), (in 12:40-47: At step 106, a body feature, such as a body part of the human (e.g., a neck, an arm, a leg, etc.) may be extracted from the image using a first deep learning network (DLN) known as a segmentation DLN [wherein the photo-to-annotation machine learning model is a deep neural network(DNN)]. In one embodiment, “deep learning” may refer to a class of machine learning algorithms that use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation modeled after neural networks [wherein the photo-to-annotation machine learning model is a deep neural network(DNN)]…; And in 12:53-67: Before performing this segmentation step on data from a real user, the system may have been trained first, for example, on sample photos of humans posing in different environments in different clothing, for example, with hands at 45 degrees, sometimes known as the “A-pose”, as described in relation to FIG. 2. In some embodiments, any suitable deep learning architecture may be used, such as deep neural networks [wherein the photo-to-annotation machine learning model is a deep neural network(DNN)], deep belief networks, and/or recurrent neural networks. In another embodiment, the deep learning algorithms may learn in supervised (e.g., classification) and/or unsupervised (e.g., pattern analysis) manners….)
wherein the photo-to-annotation machine learning model generates an intermediate set of predicted keypoint annotations of the one or more body parts under clothing based on the set of training photos of clothed individuals, (As depicted in Fig. 1A Fig. 2 and Fig. 10; And in 12:40-43: At step 106, a body feature, such as a body part of the human (e.g., a neck, an arm, a leg, etc.) may be extracted from the image [based on the set of training photos of clothed individuals] using a first deep learning network (DLN) known as a segmentation DLN [wherein the photo-to-annotation machine learning model generates an intermediate set of predicted keypoint annotations of the one or more body parts under clothing based on the set of training photos of clothed individuals]…)
wherein the fixed-parameter trained annotation-to-measurement machine learning model generates an output set of predicted measurements of the one or more body parts under clothing based on the intermediate set of predicted keypoint annotations of the one or more body parts under clothing, (As depicted in Fig. 1A, in 13:37-43: At step 107, an annotation line for each body part that was extracted at step 106 [based on the intermediate set of predicted keypoint annotations of the one or more body parts under clothing] may be drawn using one or more additional deep learning networks (DLNs), for example an annotation DLN [wherein the fixed-parameter trained annotation-to-measurement machine learning model generates an output set of predicted measurements of the one or more body parts under clothing]. In one embodiment, there is a separate body feature annotation DLN for each body part. In other embodiments, there is one body feature annotation DLN for the entire body…; And in 2:19-23: FIG. 2 shows a diagram 200 of an exemplary flow diagram for training the segmentation DLN, the annotation DLN, and the sizing ML, which are utilized in generating body measurements [wherein the fixed-parameter trained annotation-to-measurement machine learning model generates an output set of predicted measurements of the one or more body parts under clothing], in accordance with example embodiments of the present invention [based on the intermediate set of predicted keypoint annotations of the one or more body parts under clothing]…; Examiner notes that the model parameters are fixed prior to training and after training for processing information. Fig.1A refers to trained models from Fig.2 as disclosed in the drawings and fixed parameter of the trained models, in 24:1-8: At step 212, the trained segmentation DLN, annotation DLN, and sizing ML module to be used in FIGS. 1A, 1B, and 1C may be output. In particular, the segmentation DLN trained in step 208 is output for use in step 106 in FIG. 1A [based on the intermediate set of predicted keypoint annotations of the one or more body parts under clothing]. Similarly, the one or more annotation DLNs trained in step 210 are output for use in step 107 in FIG. 1A [wherein the fixed-parameter trained annotation-to-measurement machine learning model generates an output set of predicted measurements of the one or more body parts under clothing based on the intermediate set of predicted keypoint annotations of the one or more body parts under clothing]. Finally, the sizing ML module trained in step 210 is output for use in step 108 in FIG. 1A.)
wherein the training of the photo-to-annotation machine learning model while connected to the fixed-parameter trained annotation-to-measurement machine learning model comprises minimizing a loss function based on a batch distance measure calculated between the output set of predicted measurements of the one or more body parts under clothing and the set of training ground truth measurements of the one or more body parts under clothing, (22:22-34: At step 206, actual human measurements for each body feature (e.g., determined by a tailor, or 1D measurements taken from 3D body scans) may be received to serve as ground-truth data. The actual human measurements may be used as validation data and used for training the algorithms used by the system. For example, the actual human measurements may be used in minimizing an error function [comprises minimizing a loss function based on a batch distance measure calculated between the output set of predicted measurements of the one or more body parts under clothing and the set of training ground truth measurements of the one or more body parts under clothing] or loss function (mean squared error, likelihood loss, log-loss, hinge loss, etc.) associated with the machine learning algorithms [wherein the training of the photo-to-annotation machine learning model while connected to the fixed-parameter trained annotation-to-measurement machine learning model]. In one embodiment, the annotation lines from step 205 and the ground-truth data from step 206 are utilized to train the annotation DLN used in step 107 [while connected to the fixed-parameter trained annotation-to-measurement machine learning model] and the sizing ML step 108 of FIG. 1A. And the machine learning algorithms connected in Fig. 1A and Fig. 10 and in 22:46-57: At step 208, the segmentation DLN may be trained on a body segmentation or body feature extraction. In one embodiment, the segmentation DLN may be trained using manually-annotated human body segmentation obtained from one or more annotators in step 204 [wherein the training of the photo-to-annotation machine learning model while connected to the fixed-parameter trained annotation-to-measurement machine learning model …]. For example, the segmentation DLN may be presented with labeled data (e.g., an image of a user and associated actual body segmentations received from the annotator) and may determine an error function (e.g., from a loss function, as discussed above) [comprises minimizing a loss function based on a batch distance measure calculated between the output set of predicted measurements of the one or more body parts under clothing and the set of training ground truth measurements of the one or more body parts under clothing] based on the results of the segmentation DLN [the output set of predicted measurements of the one or more body parts under clothing] and the actual body segmentation by the annotator [the set of training ground truth measurements of the one or more body parts under clothing]. The segmentation DLN may be trained to reduce the magnitude of the error function [minimizing a loss function].)
wherein a training dataset comprising a set of training ground truth keypoint annotations of one or more body parts under clothing is unavailable for the training of the photo- to-annotation machine learning model, (As depicted in Fig. 2, step 208 is trained without the data available for stage 210:
PNG
media_image1.png
810
1152
media_image1.png
Greyscale
)
and wherein each given training ground truth keypoint annotation comprises a given landmark indicator indicating a corresponding given body part of the one or more body parts under clothing; (As depicted in Fig. 2:206)
receive the user image from a user device, wherein the user image shows a body part of a user under clothing; (As depicted in Fig. 1A:104 and )
and input the user image into the trained photo-to-annotation machine learning model while the trained photo-to-annotation machine learning model is connected to the fixed-parameter trained annotation-to-measurement machine learning model, to generate the predicted body part measurement of the body part of the user. (As depicted in Fig. 1A claimed trained models for generating claimed predicted measurements in Fig. 1A:126)
Koh teaches the system for training a plurality of machine learning models as disclosed above using respective training datasets unavailable to another.
One of ordinary skill in the art would understand that connected models can be trained separately based on their respective training datasets unavailable to another.
Additionally, Bas teaches that connected models can be trained separately based on their respective training data, as depicted in Fig. 1 and in 8:48-11:48: The shared backbone 20 of the illustrative software system (see FIG. 1) serves as a feature extractor for keypoint and person detection subnets [photo- to-annotation machine learning model]. The shared backbone 20 extracts many different features from the image 10 (e.g., vertical edges; horizontal edges; parts of people, such as heads, legs, and other body parts; and compositions of the image) [wherein a training dataset comprising a set of training ground truth keypoint annotations of one or more body parts under clothing is unavailable for the training of … photo- to-annotation machine learning model]. The pixels of the image 10 are first processed using the backbone 20 of the system. As such, the pixels of the image 10 are the input [a set of training ground truth keypoint annotations of one or more body parts under clothing is unavailable for the training of … photo- to-annotation machine learning model] to the backbone 20. The backbone 20 of the system is actually a deep residual network (ResNet—see e.g., ref. [36]) with two Feature Pyramid Networks 22, 24 (FPNs—see e.g., ref. [39]) (one 22 for the keypoint subnet 30, the other 24 for the person detection subnet 40) connected to it, … Advantageously, the pose residual network (PRN) 50 of the pose estimation system described herein is able to disambiguate which keypoint should be assigned to the current person box. In general, the inputs to the pose residual network (PRN) 50 are: (1) keypoint heatmaps from the keypoint subnet, and (2) coordinates of the bounding boxes from the person detection subnet… The training of the keypoint estimation subnet now will be described [train the photo-to-annotation machine learning model while connected to the fixed-parameter trained annotation-to-measurement machine learning model using a photo-to-measurement training dataset…]. For keypoint training, 480×480 image patches were used, which were centered around the crowd or the main person in the scene. Random rotations between ±40 degrees, random scaling between 0.8-1.2 and vertical flipping with a probability of 0.3 was used during training. The ImageNet (see ref. [49]) pretrained weights for each backbone were transferred before training… The training of the person detection subnet [train the photo-to-annotation machine learning model while connected to the fixed-parameter trained annotation-to-measurement machine learning model using a photo-to-measurement training dataset…] now will be described. In the illustrative embodiment, a person detection training strategy was followed, which was similar to that in Lin et al. (ref. [41]). Images containing persons were used, and they were resized such that shorter edge is 800 pixels. In the illustrative embodiment, backbone weights after keypoint training were frozen and not updated during person detection training.…And in 13:37-52: Next, the training of the pose residual network (PRN) will be described. In the illustrative embodiment, during the training of the pose residual network, input and output pairs were cropped and heatmaps were resized according to bounding-box proposals. All crops were resized to a fixed size of 36×56 (height/width=1.56). The PRN network was trained separately [… while connected to … fixed-parameter trained annotation-to-measurement machine learning model using a photo-to-measurement training dataset…] and Adam optimizer (ref. [50]) with a learning rate of 1e-4 was used during training. Since the model was shallow, convergence took approximately 1.5 hours. The model was trained with the person instances which had more than 2 keypoints [… while connected to … fixed-parameter trained annotation-to-measurement machine learning model using a photo-to-measurement training dataset…]. A sort of curriculum learning (ref. [51]) was utilized by sorting annotations based on the number of keypoints and bounding box areas. In each epoch, the model started to learn easy-to-predict instances, and hard examples were given in later stages.)
fix at least one training parameter of the trained annotation-to-measurement machine learning model to generate a fixed-parameter trained annotation-to-measurement machine learning model, (in 13:23-30: The training of the person detection subnet now will be described. In the illustrative embodiment, a person detection training strategy was followed, which was similar to that in Lin et al. (ref. [41]). Images containing persons were used, and they were resized such that shorter edge is 800 pixels. In the illustrative embodiment, backbone weights [at least one training parameter] after keypoint training were frozen [fix at least one training parameter of the trained annotation-to-measurement machine learning model to generate a fixed-parameter trained annotation-to-measurement machine learning model] and not updated during person detection training… )
Bas and Koh are analogous art because both involve developing automatic information processing techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing video motion image data using machine learning algorithms and deeper ResNet architectures as disclosed by Bas with the method of for identifying image attributes using machine learning algorithms disclosed by Koh.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Bas and Koh as discussed above; Doing so allows for obtaining hierarchal information using residual neural networks to make use of inherent multi-scale representations, (Bert, 8:61-66).
Regarding claim 22, the rejection of claim 13 is incorporated and Koh in combination with Bas further teaches the non-transitory storage medium of claim 13, wherein the batch distance measure is selected from the group consisting of a mean absolute error (MAE), a mean squared error (MSE), a mean squared deviation (MSD), and a mean squared prediction error (MSPE). (in 22:22-34: … At step 206, actual human measurements for each body feature (e.g., determined by a tailor, or ID measurements taken from 3D body scans) may be received to serve as ground-truth data. The actual human measurements may be used as validation data and used for training the algorithms used by the system. For example, the actual human measurements may be used in minimizing an error function or loss function [claimed wherein the batch distance measure is selected from the group consisting of a mean absolute error (MAE), a mean squared error (MSE), a mean squared deviation (MSD), and a mean squared prediction error (MSPE)] (mean squared error, likelihood loss, log-loss, hinge loss, etc.) associated with the machine learning algorithms…)
Regarding claim 23, the rejection of claim 13 is incorporated and Koh in combination with Bas further teaches the non-transitory storage medium of claim 13, further comprising program code to: store the landmark indicator in a three-dimensional array data structure representing the given body part. (in 28:4-46: At step 1306, a human segmentation (e.g., an extraction of the human from a background of the images) may be performed, and a 3D model [further comprising program code to: store the landmark indicator in a three-dimensional array data structure representing the given body part] may be fitted against an extracted human. Moreover, a three-dimensional shape may be estimated using a three-dimensional modeling technique. In one embodiment, the system may utilize deep learning techniques and/or OpenCV to extract the human body, including clothing, from the background. Before performing this step on data from a real user, the system may have been trained first, for example, on sample photos of humans posing in different environments in different clothing, with hands at 45 degrees (“A-pose”)… OpenPose may include a real-time multi-person system to jointly detect human body, hand, facial, and foot keypoints (in total 135 keypoints) on single images. In one embodiment, a keypoint may refer to a part of a person's pose that is estimated, such as the nose, right ear, left knee, right foot, etc [further comprising program code to: store the landmark indicator in a three-dimensional array data structure representing the given body part]. The keypoint contains both a position and a keypoint confidence score. Further aspects of OpenPose functionality include, but not be limited to, 2D real-time multi-person keypoint body estimation. The functionality may further include the ability for the algorithms to be run-time invariant to number of detected people. Another aspect of its functionality may include, but may not be limited to, 3D real-time single-person keypoint detection, including 3D triangulation from multiple single views… At step 1310, body sizing measurements may be determined based on estimated three-dimensional shape, joint position and/or posture [further comprising program code to: store the landmark indicator in a three-dimensional array data structure representing the given body part]. In another embodiment, the system may determine the sizes of the body parts using inputs of height, weight, and/or other parameters (e.g., BMI index, age, gender, etc.). In one embodiment, the system may use, in part, a Virtuoso algorithm, an algorithm providing standard DaVinci models of human body parts and relative sizes of body parts..)
Additionally, Bas teaches store the landmark indicator in a three-dimensional array data structure representing the given body part (in As depicted in Fig. 1:38 and 39; And in 9:33-45: Now, the keypoint estimation subnet 30 of the illustrative system will be described with reference to FIG. 4. The blocks C.sub.2 through K.sub.5 are processed by the CNN algorithms. The keypoints are obtained from the backbone features. The keypoint estimation subnet 30 of the illustrative system (see FIG. 4) takes hierarchical CNN features (outputted by the corresponding FPN) and outputs keypoint and segmentation heatmaps 38, 39 [store the landmark indicator in a three-dimensional array data structure representing the given body part]. Keypoint heatmaps 38 represent keypoint locations as Gaussian peaks. Each heatmap layer belongs to a specific keypoint class (nose, wrists, ankles, etc.) and contains an arbitrary number of peaks that pertain to person instances. The keypoint information contains joint locations, facial landmarks [store the landmark indicator in a three-dimensional array data structure representing the given body part], etc. And 10:12-21: FIG. 4, the loss function between the K blocks 32 and D blocks 34 is used to train the model by optimizing the network (i.e., to teach the network to predict keypoint heatmaps). Semantic person segmentation masks are predicted in the same way with keypoints. Then, after obtaining the single depth-512 layer feature map 36 in FIG. 4, the keypoint subnet downsamples in the depth dimension to 17 so as to obtain 17 different heatmaps 38, 39 that encode the locations of different body features [the landmark indicator in a three-dimensional array data structure representing the given body part] (e.g., location of the nose, location of the left eye, location of the right eye, etc.).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Bas and Koh for the same reasons disclosed above.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Ulyanov et al. (US 20210150792): teaches creating a differentiable loss by applying a distance transform to the segmentation mask of the user (created during the pre-processing step) and minimize its average value over the border of the rasterized 3D body for each frame; and the apparel fitting system returns the best appropriate clothing size (e.g., L or M) for a particular body, and optionally, brand/clothing id, e.g., identified by a link to a web site, based on the generated r3D avatar from the r3D engine/pipeline. The apparel fitting system is trained on a dataset of bodies with known best clothing size for a particular brand and possibly for a particular item. To train the system, up to 200 or so body measurements extracted from the 3-D body, such as waist circumference, etc., and a machine learning algorithm, such as random forest regressor or a neural network, is trained to predict the right size.
Black et al. (US 10679046): teaches subjective/qualitative and metrically accurate information about the body that can be used to size clothing, create avatars, measure health risks, etc. This may be in the form of a 3D “model” of the body, which can be represented by a mesh, point cloud, voxel grid, or other graphics representation. This model may be parametric such that the body shape is captured by a small number of parameters. Shape, however, can also mean things like standard tailoring measurements on the body, body type, or information related to clothing size.
Novotny et al. (NPL: “C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion”): teaches using images and keypoint annotations for training data set.
Omran et al. (NPL: Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation): teaches the use of loss function based on a distance metric; noting: if the dataset provides solely 2D joint position ground truth, we define a similar loss in terms of 2D distance and rely on error backpropagation through the projection.
Miao et al. (NPL: ClothingNet: Cross-Domain Clothing Retrieval With Feature Fusion and Quadruplet Loss): teaches using function loss expressed as metric learning is to reduce feature distances between the same category of samples as much as possible during the network training, and also to expand the feature distances between different categories of samples as much as possible.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OLUWATOSIN ALABI/ Primary Examiner, Art Unit 2129