Last updated: April 19, 2026
Application No. 18/301,582
COMPUTER-READABLE RECORDING MEDIUM STORING GENERATION PROGRAM, GENERATION METHOD, AND INFORMATION PROCESSING DEVICE

Non-Final OA §101§103§112
Filed
Apr 17, 2023
Examiner
LU, HWEI-MIN
Art Unit
2142
Tech Center
2100 — Computer Architecture & Software
Assignee
Fujitsu Limited
OA Round
1 (Non-Final)
This examiner grants 62% of cases after interview

— +39.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 217 resolved cases, 2023–2026
Examiner Intelligence

LU, HWEI-MIN View full profile →
Grants 62% of resolved cases
Career Allow Rate
134 granted / 217 resolved
+6.8% vs TC avg
Strong +40% interview lift
Without
With
+39.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
37 currently pending
Career history
254
Total Applications
across all art units
Statute-Specific Performance

§101
11.2%
-28.8% vs TC avg
§103
43.8%
+3.8% vs TC avg
§102
9.4%
-30.6% vs TC avg
§112
33.0%
-7.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 217 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This office action is in responsive to communication(s): original application filed on 04/17/2023, said application claims a priority filing date of 11/09/2020.  Claims 1-10 are pending. Claims 1 and 9-10 are independent.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claim 2 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 recites the limitation "… training includes acquiring a plurality of pieces of data from each of the plurality of data sets, and training the feature space in which the distance between the pieces of the data included in the same domain is shorter …" in lines 5-, which rendering the claim indefinite because ".

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. 

Independent Claims 1 and 9-10
Step 1: Claim 1 is non-transitory computer-readable recording medium claim, Claim 9 is a process claim, and Claim 10 is a device claim.  These claims are fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) recite(s) additional elements/limitations of ". 
Step 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional element/limitation of "with data included in each of a plurality of data sets, training a feature space" is well-understood, routine and conventional (WURC) activity similar to "performing repetitive calculation" (see MPEP 2106.05(d), "Performing repetitive calculations, Flook, 437 U.S. at 594, 198 USPQ2d at 199 (recomputing or readjusting alarm limit values)").  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claim 2
Step 1: Claim 2 is non-transitory computer-readable recording medium claim which is fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because (a) the additional elements/limitations of "with data included in each of a plurality of data sets, training a feature space, wherein the plurality of data sets is a plurality of unlabeled data sets that are constituted by unlabeled data and have domains different from each other" and "training the feature space" are well-understood, routine and conventional (WURC) activity similar to "performing repetitive calculation" (see MPEP 2106.05(d), "Performing repetitive calculations, Flook, 437 U.S. at 594, 198 USPQ2d at 199 (recomputing or readjusting alarm limit values)"); and (b) the additional element/limitation of "acquiring a plurality of pieces of data from each of the plurality of data sets" is also well-understood, routine and conventional (WURC) activity similar to "receiving or transmitting data over a network" (see MPEP 2106.05(d), "Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network)").  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claim 3
Step 1: Claim 3 is non-transitory computer-readable recording medium claim which is fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional element/limitation of "executing machine learning of a generation model that generates features from input data so as to generate the feature space" is well-understood, routine and conventional (WURC) activity similar to "performing repetitive calculation" (see MPEP 2106.05(d), "Performing repetitive calculations, Flook, 437 U.S. at 594, 198 USPQ2d at 199 (recomputing or readjusting alarm limit values)").  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claim 4
Step 1: Claim 4 is non-transitory computer-readable recording medium claim which is fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception.  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claim 5
Step 1: Claim 5 is non-transitory computer-readable recording medium claim which is fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception.  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claim 6
Step 1: Claim 6 is non-transitory computer-readable recording medium claim which is fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception.  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claim 7
Step 1: Claim 7 is non-transitory computer-readable recording medium claim which is fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception.  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claim 8
Step 1: Claim 8 is non-transitory computer-readable recording medium claim which is fall within at least one of the four categories of patent eligible subject matter.
Step 2A Prong 1: The claim(s) further recite(s) ". 
Step 2A Prong 2: This judicial exception is not integrated into a practical application because the claim(s) . 
Step 2B: The claim(s) does/do not further include additional elements that are sufficient to amount to significantly more than the judicial exception.  Thus, none of the additional limitations, taken either alone or combined, amount to significantly more than the abstract idea.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 6, and 8-10 are rejected under 35 U.S.C. 103 as being unpatentable over Sohn et al. (US 2019/0354801 A1, pub. date: 11/21/2019), hereinafter Sohn in view of Jawahar et al. (US 2018/0218284 A1, pub. date: 08/02/2018), hereinafter Jawahar.

Independent Claims 1 and 9-10
Sohn discloses a non-transitory computer-readable recording medium storing a generation program for causing a computer to execute a process (Sohn, ¶¶ [0066]-[0068]: a computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device; the medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.; ¶¶ [0070]-[0071] with FIG. 5: the computer system 500 includes at least one processor (CPU) 505 operatively coupled to other components via a system bus 502; a cache 506, a Read Only Memory (ROM) 508, a Random-Access Memory (RAM) 510, an input/output (I/O) adapter 520, a sound adapter 530, a network adapter 590, a user interface adapter 550, and a display adapter 560, are operatively coupled to the system bus 502; a first storage device 522 and a second storage device 529 are operatively coupled to system bus 502 by the I/O adapter 520; the storage devices 522 and 529 can be any of a disk storage device ( e.g., a magnetic or optical disk storage device), a solid state magnetic device, and so forth) comprising: 
with data included in each of a plurality of data sets, training a feature space in which a distance between pieces of the data included in a same domain is shorter and the distance of the data between different domains is longer; and generating labeled data sets by integrating labeled data included trained feature space, among a plurality of pieces of the labeled data (Sohn, ¶¶ [0004]-[0005] and [0013]-[0014]: implementing an unsupervised cross-domain distance metric adaptation framework with a feature transfer network for enhancing facial recognition; recursively training a feature transfer network based on a source domain associated with labeled source data and a target domain associated with unlabeled target data, and automatic labeling of target domain data using a clustering method; training the feature transfer network includes training a first domain discriminator and a second domain discriminator, including training an objective function corresponding to the first domain discriminator and an objective function corresponding to the second domain discriminator, and training a feature generator and a feature transformer based on the first and second domain discriminators, including training an objective function corresponding to the feature generator and an objective function corresponding to the feature transformer; implementing the feature transfer network and the automatic labeling to perform a facial recognition task; applied to situations where label spaces of source and target domains are disjoint; the feature transfer network can generate an augmented source domain embedding space to allow joint domain adversarial and domain separation training in a unified framework; a series of training objectives can be introduced to train the feature transfer network, which can include feature reconstruction loss, classification loss, domain adversarial loss and domain separation loss; provide a domain adaptation framework for classification and distance metric learning when a source domain has abundant labeled training data and the target domain has abundant unlabeled training data; ¶¶ [0015]-[0023] with FIG. 1: the system 100 can include a feature transfer network (FTN) subsystem 102 and an automatic labeling subsystem 104; a training framework can be achieved by recursively or iteratively training the FTN with respect to the FTN subsystem 102 and automatic labeling of data using the trained FTN with respect to the automatic labeling subsystem 104; the training of the FTN and the automatic labeling can be implemented within a neural network to perform facial recognition; the FTN subsystem 102 can include a source domain 110 and a target domain 120; the source domain 110 can be a labeled domain including labeled examples, while the target domain 120 can be an unlabeled domain including unlabeled examples; the verification task performed can include a binary classification task shared across the source and target domains 110 and 120 that takes a pair of images as an input and predicts a label of "1" if the pair of images shared the same identity, and predicts a label of "0" otherwise; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains 110 and 120, belong to the same class (it is not known which distribution the two random samples come from a priori); let the source domain 110 be denoted as Xs, the target domain 120 be denoted as XT, and the two random samples be denoted as x and x'; there are three scenarios of constructing a pair: (1) x, x'[Symbol font/0xCE]Xs; (2) x, x'[Symbol font/0xCE]XT; and (3) x[Symbol font/0xCE]Xs, x'[Symbol font/0xCE]XT; scenarios (1) and (2) can be referred to as intra-domain verifications, while the scenario (3) can be referred to as a cross-domain (or inter-domain) verification; for intra-domain verification scenarios, a source (or target) domain classifier may be needed; for the source domain 110, adequately labeled training examples can be provided to learn a competent classifier; for the target domain 120, only unlabeled training examples are provided; however, the discriminative power of the classifier can be transferred to the target domain 120 by adapting the representation spaces of XT×XT and Xs×Xs; i.e., the same competent classifier from the source domain 110 can be used to verify target domain pairs if the two domains are well-aligned; for the cross-domain verification scenario, it can be assumed that the two samples x and x' cannot be of the same class, which is true for problems such as, e.g., cross-ethnicity facial verification problems; the FTN component 130 can separate target features of the target domain 120 from source features of the source domain 110 while simultaneously aligning the features with an auxiliary domain of transformed source features; the FTN component 130 can include a feature generation module and a feature transfer module; an output of the FTN component 130 is received as input into each of a verification component 140, an entropy minimization component 150 and a domain discriminator 160; in the automatic labeling subsystem 104, target images 170 can be provided as input into the target domain 120 for automatic labeling; ¶¶ [0025]-[0044] with FIG. 2: an overall training framework by recursively or iteratively training a feature transfer network (FTN) and automatic labeling of data using the trained FTN is illustrated in FIG. 2 to implement an unsupervised cross-domain distance metric adaptation framework with a feature transfer network for enhancing facial recognition; a section 210 corresponding to a training protocol of domain discriminators D1 and D2 , a section 230 corresponding to training of a feature generator f and a feature transformer g, and a section 250 corresponding to an automatic labeling protocol; to perform the automatic labeling, section 250 can implement clustering of target examples for providing pseudo-labels; section 250 can implement a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; sections 210 and 230 collectively correspond to an iterative training of the FTN between the discriminators, the feature generator f and the feature transformer g using unlabeled source and target examples; the feature transformer g can allow joint optimization with domain adversarial loss (via D1) and domain separation loss (via D2 ); the feature generator f, which is represented as f: X→Z, can map Xs and XT to distinguishable representation spaces f(Xs) and f(XT); a domain separation objective function (e.g., loss function), Lsep, can be used to achieve this separation, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions (e.g., source or target); the feature transformer g, which can be represented as g: Z→Z, can transform f(Xs) to g(f(Xs)) for alignment with f(XT); a domain adversarial objective function, Ladv, can be used to achieve the alignment, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions ( e.g., source or target); domain adversarial objective functions for domain alignment can be applied between transformed source and target domains by D1 and can apply Lsep to distinguish the source domain from both the target domain and the transformed source domains by D2; verification objective functions can be applied to source pairs f(Xs) and transformed source pairs g(f(Xs)) using classifiers; e.g., the classifiers can include classifiers hf, hg: Z×Z→{0, 1}; during testing, the metric distance between f(x) and f(x') can be compared; the following desired capabilities can be achieved: (a) if x and x' are from different domains, f(x) and f(x') will be far away due to the functionality of the feature generation module; (b) if x, x'[Symbol font/0xCE]Xs, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hf, and (c) if x, x'[Symbol font/0xCE]Xr, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hg; section 210 is shown including sub-sections 212 through 220 for training the domain discriminators D1 and D2; the discriminator D2 is trained to discriminate between source features and the mixture of source-augmented features and target features; the discriminator D1 is trained to discriminate between source-augmented features and target features; sub-sections 212, 214 and 216 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D2 ,                         
                            
                                    L
                                
                                            D
                                        
                                            2
                                        
                    ; sub-sections 212 and 214 correspond to the source domain, while sub-section 216 corresponds to the target domain; sub-sections 218 and 220 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D1 ,                         
                            
                                    L
                                
                                            D
                                        
                                            1
                                        
                    ; sub-section 218 corresponds to the source domain and sub-section 220 corresponds to the target domain; section 230 provides a training protocol of the feature generator f and the feature transformer g; as shown, section 230 can include a plurality of sub-sections 232 through 246, where sub-sections 232, 234, 236, 242 and 244 correspond to the source domain and sub-sections 238, 240 and 246 correspond to the target domain; section 230 can train an objective function (e.g., loss function) corresponding to the feature transformer g, Lg, an objective function (e.g., loss function) corresponding to the feature generator f, Lf, an objective function (e.g., loss function) corresponding to feature reconstruction loss between features extracted from the feature generator (f) and the reference network (ref), Lrecon,  and an objective function (e.g., loss function) corresponding to multi-class entropy minimization loss, Lentropy; for purposes of training the objective functions in section 230, sub-section 234 can generate a same or similar output as sub-section 212, sub-section 236 can generate a same or similar output as sub-section 214, and sub-section 238 can generate a same or similar output as sub-section 216; Lg can be trained by verification loss and domain separation loss via the discriminator D2; Lvrf refers to an objective function (e.g., loss function) corresponding to verification loss between labeled pairs of images and λ2 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D2;  a "hyper-parameter" refers to a regularization coefficient (e.g., non-negative real number) that can be defined by a user; Lf can be trained by verification loss from both source and source-augmented features, domain separation loss via the discriminator D2, and domain adversarial loss via D1; λ1 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D1; yij refers to the ground truth verification label of examples xi and xj (e.g., yij=1 if xi and xj represent a same feature (e.g., same face of a person) and yij=0 otherwise), σ refers to the sigmoid function and f(xi)T refers to the transpose of the vector f(xi); Lvrf(f) can be extended into an N-pair distance metric loss for faster convergence and improved performance; Lvrf(g) can be trained in a similar manner; Lrecon can correspond to feature reconstruction loss between features extracted from the feature generator f and a reference network (ref) pretrained using labeled source data to stabilize the challenging adversarial training; Lentropy can use pseudo-labels retrieved by hierarchical clustering; ¶¶ [0045]-[0061] with FIG. 3: at block 310, a source domain and a target domain are obtained; the source domain can be associated with labeled source data (e.g., labeled source examples), while the target domain can be associated with unlabeled target data ( e.g., unlabeled examples); the source and target domains can be provided to perform a verification task; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains, belong to the same class (it is not known which distribution the two random samples come from a priori); at block 320, a feature transfer network (FTN) is trained based on the source domain and the target domain; training the FTN can include, at block 322 training a first domain discriminator and a second domain discriminator; the first domain discriminator can be trained to discriminate between source-augmented features and target features, and the second domain discriminator can be trained to discriminate between source features and a mixture of the source-augmented features and the target features; training the FTN can further include, at block 324, training a feature generator and a feature transformer based on the first and second domain discriminators; the feature generator can allow for joint optimization with domain adversarial loss via the first domain discriminator and domain separation loss via the second domain discriminator; feature generator training can include verification loss from both the source and the source-augmented features, domain separation loss via the second domain discriminator, and domain adversarial loss via the first domain discriminator; feature transformer training can include verification loss and domain separation loss via the second domain discriminator; the objective function corresponding to the feature generator can be trained based in part on the objective function corresponding to verification loss associated with the feature generator, and the objective function corresponding to the feature transformer can be trained based in part on the objective function corresponding to verification loss associated with the feature transformer; the verification loss can be extended into N-pair distance metric loss for faster convergence and improved performance; training the feature generator and the feature transformer can further include training an objective function corresponding to feature reconstruction loss between features extracted from the feature generator and a reference network pretrained using labeled source data to stabilize the challenging adversarial training; e.g., the objective function corresponding to feature reconstruction loss can be trained based on representations of examples from the source and target domains using the reference network; training the feature generator and the feature transformer can further include training an objective function corresponding to multi-class entropy minimization loss; the objective function corresponding to multi-class entropy minimization loss can use labels (e.g., pseudo-labels) retrieved by clustering (e.g., from block 330 as described in further detail below); e.g., the objective function corresponding to multi-class entropy minimization loss can be trained based on positive examples of respective examples from the target domain; at block 330, automatic labeling of target domain data is trained using a clustering method; automatically labeling the target examples can include clustering of the target examples for providing pseudo-labels to automatically discover class structure for the target domain; clustering the target examples can include implementing a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; clustering the target examples can include implementing hierarchical clustering; e.g., clustering the target examples can include implementing a hierarchical DBSCAN (HDBSCAN) method; the training of the feature transfer network and the automatic labeling can be recursive or iterative; more specifically, an output of the training at block 320 can be provided as input for training the automatic labeling at block 330, and an output of the training at block 330 can be provided as input for training the feature transfer network at block 320; at block 340, the feature transfer network and the automatic labeling can be implemented to perform a facial recognition task; introduces a source-augmented embedding space via a feature transformer, which allows for a unified learning framework of domain adversarial and domain separation for performing a facial recognition task using labeled source data and unlabeled target data; the recursive or iterative training framework of the system/method 300 of feature transfer network learning and automatic class structure discovery can allow for fast and accurate labeling of unlabeled target data and improved quality of feature representation; globalize a facial analysis system by adapting the facial analysis system to one or more new target domains based on information from one or more source domains; illustratively, in the context of domains associated with ethnicities, a source ethnicity domain (e.g., Caucasian source domain) can include abundant labeled training data, while at least one target ethnicity domain (e.g., non-Caucasian target domain) can include abundant unlabeled target data; examples of possible target domains in this illustrative example can include, but are not limited to, African-American, EastAsian, South-Asian, Hispanic, etc.).  
Sohn further discloses an information processing device (Sohn, ¶ [0070] with 500 in FIG. 5: computer system 500) comprising: a memory (Sohn, ¶¶ [0070]-[0071] with FIG. 5: a Read Only Memory (ROM) 508, a Random-Access Memory (RAM) 510, a first storage device 522 and a second storage device 529); and a processor (Sohn, ¶ [0070] with 505 in FIG. 5: one processor (CPU) 505) coupled to the memory and configured to perform the method described above (Sohn, ¶¶ [0070]-[0071]: the computer system 500 includes at least one processor (CPU) 505 operatively coupled to other components via a system bus 502; a cache 506, a Read Only Memory (ROM) 508, a Random-Access Memory (RAM) 510, an input/output (I/O) adapter 520, a sound adapter 530, a network adapter 590, a user interface adapter 550, and a display adapter 560, are operatively coupled to the system bus 502; a first storage device 522 and a second storage device 529 are operatively coupled to system bus 502 by the I/O adapter 520; the storage devices 522 and 529 can be any of a disk storage device ( e.g., a magnetic or optical disk storage device), a solid state magnetic device, and so forth). 
Sohn fails to explicitly discloses generating labeled data sets by integrating labeled data included within a predetermined range in the trained feature space.
Jawahar teaches a system and a method related to domain adaption from a source domain for a target domain (Jawahar, ¶ [0001]), wherein generating labeled data sets by integrating labeled data included within a predetermined range in the trained feature space (Jawahar, ABSTRACT and ¶¶ [0006]-[0008] and [0019]-[0038]: a domain adaptation method for learning transferable feature representations from a source domain for a target domain; receiving real-time input data comprising a plurality of labeled instances of the source domain and a plurality of unlabeled instances of the target domain; learning common representation shared between the source domain and the target domain, based on the plurality of labeled instances of the source domain; labeling one or more unlabeled instances in the plurality of unlabeled instances of the target domain, based on the common representation; determining a target specific representation corresponding to the target domain, based on the one or more labeled instances of the target domain; training a target specific classifier based on the target specific representation and the common representation to perform automatic text classification on remaining one or more unlabeled instances of the plurality of unlabeled instances of the target domain; a "source domain" corresponds to a technical or business field for which a classifier is already trained; a "target domain" refers to a technical or business field for which a classifier is to be trained; a plurality of labeled instances (such as, a plurality of source text segments) from the source domain may be utilized to train the classifier to label/classify a plurality of unlabeled instances in the target domain; examples of the labeled instances may include labeled images, labeled text segments, and/or labeled audio segments associated with the source domain; examples of the plurality of unlabeled instances may include unlabeled images, unlabeled text segments, and/or unlabeled audio segments associated with the target domain; a "source specific representation" comprises a plurality of features that is specific to a source domain; e.g., the source specific representation may include an M-dimensional feature vector extracted from a plurality of labeled instances of the source domain; a "common representation" comprises a plurality of features that is shared between a source domain and a target domain; a "target specific representation" comprises a plurality of features that is specific to a target domain; e.g., the target specific representation may include a P-dimensional feature vector extracted from one or more labeled instances of the target domain; a "pseudo-label" corresponds to a label that is determined by a generalized classifier for one or more unlabeled instances of the target domain; ¶¶ [0060]-[0113] and [0115]-[0126] with FIGS. 3-4: extract a plurality of features from the plurality of labeled instances of the source domain; the extracted plurality of features may constitute a first hidden layer of the first neural network; learn the common representation and the source specific representation from the extracted plurality of features; partition the first hidden layer of the first neural network by learning the common representation and the source specific representation; partition the first hidden layer into the common representation and the source specific representation, such that a source classification error of the first neural network is minimized and further ensures that the domain divergence associated with the common representation in the partitioned first hidden layer is minimum; i.e., the learning of the common representation may be based on a minimization of the domain divergence between the source domain and the target domain; the common representation becomes domain independent by attaining minimum domain divergence; determine the plurality of common features that are present in both the plurality of labeled instances of the source domain and the plurality of unlabeled instances of the target domain, and have a TF-IDF score greater than the predefined threshold; determine the plurality of source specific features that are present in the plurality of labeled instances of the source domain only and have a TF-IDF score greater than a predefined threshold; for an accurate partitioning of the first hidden layer the domain regression processor may be configured to maintain a trade-off between the source classification error and the domain divergence; label the one or more unlabeled instances in the plurality of unlabeled instances of the target domain based on the learned common representation; use the trained generalized classifier to label the one or more unlabeled instances in the plurality of unlabeled instances to generate pseudo-labeled instances of the target domain; the trained generalized classifier may further determine a confidence score for each prediction of label; the instances that are labeled with a confidence score greater than a predefined threshold are referred to as pseudo-labeled instances of the target domain; label a plurality of unlabeled instances of the source domain, based on the learned source specific representation and the learned common representation; i.e., use the partitioned first hidden layer of the first neural network for labeling the plurality of unlabeled instances of the source domain; the source specific representation and the common representation contribute positively in the classification of the plurality of unlabeled instances of the source domain, such that the source classification error is minimum; after labeling of the unlabeled instances of the source domain, further determine the source specific representation and the common representation, respectively, from the instances that are newly labeled; further, based on the determination of the source specific representation and the common representation, the first hidden layer of the first neural network may be updated; the labeled instances of the source domain may correspond to a first output layer of the first neural network; the identification of the plurality of target specific features from the pseudo-labeled instances may be based on a positive contribution of the identified plurality of target specific features to the adaption processor configured to formulate a second neural network based on the pseudo-labeled instances of the target domain and the learned common representation; the pseudo-labeled instances of the target domain may constitute a second input layer of the second neural network and the learned common representation may constitute a second hidden layer of the second neural network; in the first iteration the target specific representation constituting the plurality of target specific features may be determined based on the pseudo-labeled instances of the target domain; add the extracted plurality of target specific features to the second hidden layer; further, re-train the generalized classifier based on the updated second hidden layer (i.e., the learned common representation and the target specific representation determined in the first iteration); thereafter, the re-trained generalized classifier may be further used to generate new pseudo-labeled instances from remaining one or more unlabeled instances of the target domain; for the second iteration, the new pseudo-labeled instances may constitute the second input layer of the second neural network and the plurality of new target specific features extracted from the new pseudo-labeled instances may be used to update the target specific representation; the update of the target specific representation may correspond to an addition of the plurality of new target specific features to the target specific representation; the update of the target specific representation may further correspond to a merging of the one or more target specific features to obtain a single feature based on a similarity score between the one or more target specific features; continue the iterative process of determining the target specific representation till the classification performance of the re-trained generalized classifier converges; a target specific classifier is trained based on the determined target specific representation and the learned common representation to perform automatic text classification of the remaining one or more unlabeled instances of the plurality of unlabeled instances of the target domain).
Sohn and Jawahar are analogous art because they are from the same field of endeavor, a system and a method related to domain adaption from a source domain for a target domain.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Jawahar to Sohn.  Motivation for doing so would improv.

Claim 2
Sohn in view of Jawahar discloses all the elements as stated in Claim 1 and further discloses wherein the plurality of data sets is a plurality of unlabeled data sets that are constituted by unlabeled data and have domains different from each other (Sohn, ¶¶ [0004]-[0005] and [0013]-[0014]: implementing an unsupervised cross-domain distance metric adaptation framework with a feature transfer network for enhancing facial recognition; recursively training a feature transfer network based on a source domain associated with labeled source data and a target domain associated with unlabeled target data, and automatic labeling of target domain data using a clustering method; implementing the feature transfer network and the automatic labeling to perform a facial recognition task; applied to situations where label spaces of source and target domains are disjoint; the feature transfer network can generate an augmented source domain embedding space to allow joint domain adversarial and domain separation training in a unified framework; a series of training objectives can be introduced to train the feature transfer network, which can include feature reconstruction loss, classification loss, domain adversarial loss and domain separation loss; provide a domain adaptation framework for classification and distance metric learning when a source domain has abundant labeled training data and the target domain has abundant unlabeled training data; ¶¶ [0017] and [0019] with FIG. 1; the FTN subsystem 102 can include a source domain 110 and a target domain 120; the source domain 110 can be a labeled domain including labeled examples, while the target domain 120 can be an unlabeled domain including unlabeled examples; for the target domain 120, only unlabeled training examples are provided; ¶ [0046] with 310 in FIG. 3: a source domain and a target domain are obtained; the source domain can be associated with labeled source data (e.g., labeled source examples), while the target domain can be associated with unlabeled target data (e.g., unlabeled examples); the source and target domains can be obtained for purposes of implementing facial recognition; e.g. the source and target domains can correspond to respective ethnicities for facial recognition, where the source domain can correspond to source ethnicity and the target domain can correspond to a target ethnicity; ¶¶ [0060]-[0061]: introduce a source-augmented embedding space via a feature transformer, which allows for a unified learning framework of domain adversarial and domain separation for performing a facial recognition task using labeled source data and unlabeled target data; the recursive or iterative training framework of the feature transfer network learning and automatic class structure discovery can allow for fast and accurate labeling of unlabeled target data and improved quality of feature representation; globalize a facial analysis system by adapting the facial analysis system to one or more new target domains based on information from one or more source domains; illustratively, in the context of domains associated with ethnicities, a source ethnicity domain (e.g., Caucasian source domain) can include abundant labeled training data, while at least one target ethnicity domain (e.g., non-Caucasian target domain) can include abundant unlabeled target data; examples of possible target domains in this illustrative example can include, but are not limited to, African-American, EastAsian, South-Asian, Hispanic, etc.), and the training includes acquiring a plurality of pieces of data from each of the plurality of data sets, and training the feature space in which the distance between the pieces of the data included in the same domain is shorter and the distance of the data between the different domains is longer, among the plurality of the pieces of the data (Sohn, ¶¶ [0025]-[0044] with FIG. 2: an overall training framework by recursively or iteratively training a feature transfer network (FTN) and automatic labeling of data using the trained FTN is illustrated in FIG. 2 to implement an unsupervised cross-domain distance metric adaptation framework with a feature transfer network for enhancing facial recognition; a section 210 corresponding to a training protocol of domain discriminators D1 and D2 , a section 230 corresponding to training of a feature generator f and a feature transformer g, and a section 250 corresponding to an automatic labeling protocol; to perform the automatic labeling, section 250 can implement clustering of target examples for providing pseudo-labels; section 250 can implement a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; sections 210 and 230 collectively correspond to an iterative training of the FTN between the discriminators, the feature generator f and the feature transformer g using unlabeled source and target examples; the feature transformer g can allow joint optimization with domain adversarial loss (via D1) and domain separation loss (via D2 ); the feature generator f, which is represented as f: X→Z, can map Xs and XT to distinguishable representation spaces f(Xs) and f(XT); a domain separation objective function (e.g., loss function), Lsep, can be used to achieve this separation, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions (e.g., source or target); the feature transformer g, which can be represented as g: Z→Z, can transform f(Xs) to g(f(Xs)) for alignment with f(XT); a domain adversarial objective function, Ladv, can be used to achieve the alignment, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions ( e.g., source or target); domain adversarial objective functions for domain alignment can be applied between transformed source and target domains by D1 and can apply Lsep to distinguish the source domain from both the target domain and the transformed source domains by D2; verification objective functions can be applied to source pairs f(Xs) and transformed source pairs g(f(Xs)) using classifiers; e.g., the classifiers can include classifiers hf, hg: Z×Z→{0, 1}; during testing, the metric distance between f(x) and f(x') can be compared; the following desired capabilities can be achieved: (a) if x and x' are from different domains, f(x) and f(x') will be far away due to the functionality of the feature generation module; (b) if x, x'[Symbol font/0xCE]Xs, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hf, and (c) if x, x'[Symbol font/0xCE]Xr, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hg; section 210 is shown including sub-sections 212 through 220 for training the domain discriminators D1 and D2; the discriminator D2 is trained to discriminate between source features and the mixture of source-augmented features and target features; the discriminator D1 is trained to discriminate between source-augmented features and target features; sub-sections 212, 214 and 216 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D2 ,                         
                            
                                    L
                                
                                            D
                                        
                                            2
                                        
                    ; sub-sections 212 and 214 correspond to the source domain, while sub-section 216 corresponds to the target domain; sub-sections 218 and 220 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D1 ,                         
                            
                                    L
                                
                                            D
                                        
                                            1
                                        
                    ; sub-section 218 corresponds to the source domain and sub-section 220 corresponds to the target domain; section 230 provides a training protocol of the feature generator f and the feature transformer g; as shown, section 230 can include a plurality of sub-sections 232 through 246, where sub-sections 232, 234, 236, 242 and 244 correspond to the source domain and sub-sections 238, 240 and 246 correspond to the target domain; section 230 can train an objective function (e.g., loss function) corresponding to the feature transformer g, Lg, an objective function (e.g., loss function) corresponding to the feature generator f, Lf, an objective function (e.g., loss function) corresponding to feature reconstruction loss between features extracted from the feature generator (f) and the reference network (ref), Lrecon,  and an objective function (e.g., loss function) corresponding to multi-class entropy minimization loss, Lentropy; for purposes of training the objective functions in section 230, sub-section 234 can generate a same or similar output as sub-section 212, sub-section 236 can generate a same or similar output as sub-section 214, and sub-section 238 can generate a same or similar output as sub-section 216; Lg can be trained by verification loss and domain separation loss via the discriminator D2; Lvrf refers to an objective function (e.g., loss function) corresponding to verification loss between labeled pairs of images and λ2 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D2;  a "hyper-parameter" refers to a regularization coefficient (e.g., non-negative real number) that can be defined by a user; Lf can be trained by verification loss from both source and source-augmented features, domain separation loss via the discriminator D2, and domain adversarial loss via D1; λ1 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D1; yij refers to the ground truth verification label of examples xi and xj (e.g., yij=1 if xi and xj represent a same feature (e.g., same face of a person) and yij=0 otherwise), σ refers to the sigmoid function and f(xi)T refers to the transpose of the vector f(xi); Lvrf(f) can be extended into an N-pair distance metric loss for faster convergence and improved performance; Lvrf(g) can be trained in a similar manner; Lrecon can correspond to feature reconstruction loss between features extracted from the feature generator f and a reference network (ref) pretrained using labeled source data to stabilize the challenging adversarial training; Lentropy can use pseudo-labels retrieved by hierarchical clustering).  

Claim 3
Sohn in view of Jawahar discloses all the elements as stated in Claim 1 and further discloses wherein the training includes executing machine learning of a generation model that generates features from input data so as to generate the feature space in which the distance between the pieces of the data included in the same domain is shorter and the distance of the data between the different domains is longer, and the generating includes using the trained generation model to generate the features for each of the plurality of the pieces of the labeled data that have domains different from each other, and generating the labeled data sets by integrating the labeled data of which the features are included within the predetermined range, among the features for each of the plurality of the pieces of the labeled data, in the trained feature space (Sohn, ¶¶ [0004]-[0005] and [0013]-[0014]: implementing an unsupervised cross-domain distance metric adaptation framework with a feature transfer network for enhancing facial recognition; recursively training a feature transfer network based on a source domain associated with labeled source data and a target domain associated with unlabeled target data, and automatic labeling of target domain data using a clustering method; training the feature transfer network includes training a first domain discriminator and a second domain discriminator, including training an objective function corresponding to the first domain discriminator and an objective function corresponding to the second domain discriminator, and training a feature generator and a feature transformer based on the first and second domain discriminators, including training an objective function corresponding to the feature generator and an objective function corresponding to the feature transformer; implementing the feature transfer network and the automatic labeling to perform a facial recognition task; applied to situations where label spaces of source and target domains are disjoint; the feature transfer network can generate an augmented source domain embedding space to allow joint domain adversarial and domain separation training in a unified framework; a series of training objectives can be introduced to train the feature transfer network, which can include feature reconstruction loss, classification loss, domain adversarial loss and domain separation loss; provide a domain adaptation framework for classification and distance metric learning when a source domain has abundant labeled training data and the target domain has abundant unlabeled training data; ¶¶ [0015]-[0023] with FIG. 1: the system 100 can include a feature transfer network (FTN) subsystem 102 and an automatic labeling subsystem 104; a training framework can be achieved by recursively or iteratively training the FTN with respect to the FTN subsystem 102 and automatic labeling of data using the trained FTN with respect to the automatic labeling subsystem 104; the training of the FTN and the automatic labeling can be implemented within a neural network to perform facial recognition; the FTN subsystem 102 can include a source domain 110 and a target domain 120; the source domain 110 can be a labeled domain including labeled examples, while the target domain 120 can be an unlabeled domain including unlabeled examples; the verification task performed can include a binary classification task shared across the source and target domains 110 and 120 that takes a pair of images as an input and predicts a label of "1" if the pair of images shared the same identity, and predicts a label of "0" otherwise; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains 110 and 120, belong to the same class (it is not known which distribution the two random samples come from a priori); let the source domain 110 be denoted as Xs, the target domain 120 be denoted as XT, and the two random samples be denoted as x and x'; there are three scenarios of constructing a pair: (1) x, x'[Symbol font/0xCE]Xs; (2) x, x'[Symbol font/0xCE]XT; and (3) x[Symbol font/0xCE]Xs, x'[Symbol font/0xCE]XT; scenarios (1) and (2) can be referred to as intra-domain verifications, while the scenario (3) can be referred to as a cross-domain (or inter-domain) verification; for intra-domain verification scenarios, a source (or target) domain classifier may be needed; for the source domain 110, adequately labeled training examples can be provided to learn a competent classifier; for the target domain 120, only unlabeled training examples are provided; however, the discriminative power of the classifier can be transferred to the target domain 120 by adapting the representation spaces of XT×XT and Xs×Xs; i.e., the same competent classifier from the source domain 110 can be used to verify target domain pairs if the two domains are well-aligned; for the cross-domain verification scenario, it can be assumed that the two samples x and x' cannot be of the same class, which is true for problems such as, e.g., cross-ethnicity facial verification problems; the FTN component 130 can separate target features of the target domain 120 from source features of the source domain 110 while simultaneously aligning the features with an auxiliary domain of transformed source features; the FTN component 130 can include a feature generation module and a feature transfer module; an output of the FTN component 130 is received as input into each of a verification component 140, an entropy minimization component 150 and a domain discriminator 160; in the automatic labeling subsystem 104, target images 170 can be provided as input into the target domain 120 for automatic labeling; ¶¶ [0025]-[0044] with FIG. 2: an overall training framework by recursively or iteratively training a feature transfer network (FTN) and automatic labeling of data using the trained FTN is illustrated in FIG. 2 to implement an unsupervised cross-domain distance metric adaptation framework with a feature transfer network for enhancing facial recognition; a section 210 corresponding to a training protocol of domain discriminators D1 and D2 , a section 230 corresponding to training of a feature generator f and a feature transformer g, and a section 250 corresponding to an automatic labeling protocol; to perform the automatic labeling, section 250 can implement clustering of target examples for providing pseudo-labels; section 250 can implement a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; sections 210 and 230 collectively correspond to an iterative training of the FTN between the discriminators, the feature generator f and the feature transformer g using unlabeled source and target examples; the feature transformer g can allow joint optimization with domain adversarial loss (via D1) and domain separation loss (via D2 ); the feature generator f, which is represented as f: X→Z, can map Xs and XT to distinguishable representation spaces f(Xs) and f(XT); a domain separation objective function (e.g., loss function), Lsep, can be used to achieve this separation, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions (e.g., source or target); the feature transformer g, which can be represented as g: Z→Z, can transform f(Xs) to g(f(Xs)) for alignment with f(XT); a domain adversarial objective function, Ladv, can be used to achieve the alignment, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions ( e.g., source or target); domain adversarial objective functions for domain alignment can be applied between transformed source and target domains by D1 and can apply Lsep to distinguish the source domain from both the target domain and the transformed source domains by D2; verification objective functions can be applied to source pairs f(Xs) and transformed source pairs g(f(Xs)) using classifiers; e.g., the classifiers can include classifiers hf, hg: Z×Z→{0, 1}; during testing, the metric distance between f(x) and f(x') can be compared; the following desired capabilities can be achieved: (a) if x and x' are from different domains, f(x) and f(x') will be far away due to the functionality of the feature generation module; (b) if x, x'[Symbol font/0xCE]Xs, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hf, and (c) if x, x'[Symbol font/0xCE]Xr, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hg; section 210 is shown including sub-sections 212 through 220 for training the domain discriminators D1 and D2; the discriminator D2 is trained to discriminate between source features and the mixture of source-augmented features and target features; the discriminator D1 is trained to discriminate between source-augmented features and target features; sub-sections 212, 214 and 216 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D2 ,                         
                            
                                    L
                                
                                            D
                                        
                                            2
                                        
                    ; sub-sections 212 and 214 correspond to the source domain, while sub-section 216 corresponds to the target domain; sub-sections 218 and 220 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D1 ,                         
                            
                                    L
                                
                                            D
                                        
                                            1
                                        
                    ; sub-section 218 corresponds to the source domain and sub-section 220 corresponds to the target domain; section 230 provides a training protocol of the feature generator f and the feature transformer g; as shown, section 230 can include a plurality of sub-sections 232 through 246, where sub-sections 232, 234, 236, 242 and 244 correspond to the source domain and sub-sections 238, 240 and 246 correspond to the target domain; section 230 can train an objective function (e.g., loss function) corresponding to the feature transformer g, Lg, an objective function (e.g., loss function) corresponding to the feature generator f, Lf, an objective function (e.g., loss function) corresponding to feature reconstruction loss between features extracted from the feature generator (f) and the reference network (ref), Lrecon,  and an objective function (e.g., loss function) corresponding to multi-class entropy minimization loss, Lentropy; for purposes of training the objective functions in section 230, sub-section 234 can generate a same or similar output as sub-section 212, sub-section 236 can generate a same or similar output as sub-section 214, and sub-section 238 can generate a same or similar output as sub-section 216; Lg can be trained by verification loss and domain separation loss via the discriminator D2; Lvrf refers to an objective function (e.g., loss function) corresponding to verification loss between labeled pairs of images and λ2 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D2;  a "hyper-parameter" refers to a regularization coefficient (e.g., non-negative real number) that can be defined by a user; Lf can be trained by verification loss from both source and source-augmented features, domain separation loss via the discriminator D2, and domain adversarial loss via D1; λ1 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D1; yij refers to the ground truth verification label of examples xi and xj (e.g., yij=1 if xi and xj represent a same feature (e.g., same face of a person) and yij=0 otherwise), σ refers to the sigmoid function and f(xi)T refers to the transpose of the vector f(xi); Lvrf(f) can be extended into an N-pair distance metric loss for faster convergence and improved performance; Lvrf(g) can be trained in a similar manner; Lrecon can correspond to feature reconstruction loss between features extracted from the feature generator f and a reference network (ref) pretrained using labeled source data to stabilize the challenging adversarial training; Lentropy can use pseudo-labels retrieved by hierarchical clustering; ¶¶ [0045]-[0061] with FIG. 3: at block 310, a source domain and a target domain are obtained; the source domain can be associated with labeled source data (e.g., labeled source examples), while the target domain can be associated with unlabeled target data ( e.g., unlabeled examples); the source and target domains can be provided to perform a verification task; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains, belong to the same class (it is not known which distribution the two random samples come from a priori); at block 320, a feature transfer network (FTN) is trained based on the source domain and the target domain; training the FTN can include, at block 322 training a first domain discriminator and a second domain discriminator; the first domain discriminator can be trained to discriminate between source-augmented features and target features, and the second domain discriminator can be trained to discriminate between source features and a mixture of the source-augmented features and the target features; training the FTN can further include, at block 324, training a feature generator and a feature transformer based on the first and second domain discriminators; the feature generator can allow for joint optimization with domain adversarial loss via the first domain discriminator and domain separation loss via the second domain discriminator; feature generator training can include verification loss from both the source and the source-augmented features, domain separation loss via the second domain discriminator, and domain adversarial loss via the first domain discriminator; feature transformer training can include verification loss and domain separation loss via the second domain discriminator; the objective function corresponding to the feature generator can be trained based in part on the objective function corresponding to verification loss associated with the feature generator, and the objective function corresponding to the feature transformer can be trained based in part on the objective function corresponding to verification loss associated with the feature transformer; the verification loss can be extended into N-pair distance metric loss for faster convergence and improved performance; training the feature generator and the feature transformer can further include training an objective function corresponding to feature reconstruction loss between features extracted from the feature generator and a reference network pretrained using labeled source data to stabilize the challenging adversarial training; e.g., the objective function corresponding to feature reconstruction loss can be trained based on representations of examples from the source and target domains using the reference network; training the feature generator and the feature transformer can further include training an objective function corresponding to multi-class entropy minimization loss; the objective function corresponding to multi-class entropy minimization loss can use labels (e.g., pseudo-labels) retrieved by clustering (e.g., from block 330 as described in further detail below); e.g., the objective function corresponding to multi-class entropy minimization loss can be trained based on positive examples of respective examples from the target domain; at block 330, automatic labeling of target domain data is trained using a clustering method; automatically labeling the target examples can include clustering of the target examples for providing pseudo-labels to automatically discover class structure for the target domain; clustering the target examples can include implementing a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; clustering the target examples can include implementing hierarchical clustering; e.g., clustering the target examples can include implementing a hierarchical DBSCAN (HDBSCAN) method; the training of the feature transfer network and the automatic labeling can be recursive or iterative; more specifically, an output of the training at block 320 can be provided as input for training the automatic labeling at block 330, and an output of the training at block 330 can be provided as input for training the feature transfer network at block 320; at block 340, the feature transfer network and the automatic labeling can be implemented to perform a facial recognition task; introduces a source-augmented embedding space via a feature transformer, which allows for a unified learning framework of domain adversarial and domain separation for performing a facial recognition task using labeled source data and unlabeled target data; the recursive or iterative training framework of the system/method 300 of feature transfer network learning and automatic class structure discovery can allow for fast and accurate labeling of unlabeled target data and improved quality of feature representation; globalize a facial analysis system by adapting the facial analysis system to one or more new target domains based on information from one or more source domains; illustratively, in the context of domains associated with ethnicities, a source ethnicity domain (e.g., Caucasian source domain) can include abundant labeled training data, while at least one target ethnicity domain (e.g., non-Caucasian target domain) can include abundant unlabeled target data; examples of possible target domains in this illustrative example can include, but are not limited to, African-American, EastAsian, South-Asian, Hispanic, etc.) (Jawahar, ABSTRACT and ¶¶ [0006]-[0008] and [0019]-[0038]: a domain adaptation method for learning transferable feature representations from a source domain for a target domain; receiving real-time input data comprising a plurality of labeled instances of the source domain and a plurality of unlabeled instances of the target domain; learning common representation shared between the source domain and the target domain, based on the plurality of labeled instances of the source domain; labeling one or more unlabeled instances in the plurality of unlabeled instances of the target domain, based on the common representation; determining a target specific representation corresponding to the target domain, based on the one or more labeled instances of the target domain; training a target specific classifier based on the target specific representation and the common representation to perform automatic text classification on remaining one or more unlabeled instances of the plurality of unlabeled instances of the target domain; a "source domain" corresponds to a technical or business field for which a classifier is already trained; a "target domain" refers to a technical or business field for which a classifier is to be trained; a plurality of labeled instances (such as, a plurality of source text segments) from the source domain may be utilized to train the classifier to label/classify a plurality of unlabeled instances in the target domain; examples of the labeled instances may include labeled images, labeled text segments, and/or labeled audio segments associated with the source domain; examples of the plurality of unlabeled instances may include unlabeled images, unlabeled text segments, and/or unlabeled audio segments associated with the target domain; a "source specific representation" comprises a plurality of features that is specific to a source domain; e.g., the source specific representation may include an M-dimensional feature vector extracted from a plurality of labeled instances of the source domain; a "common representation" comprises a plurality of features that is shared between a source domain and a target domain; a "target specific representation" comprises a plurality of features that is specific to a target domain; e.g., the target specific representation may include a P-dimensional feature vector extracted from one or more labeled instances of the target domain; a "pseudo-label" corresponds to a label that is determined by a generalized classifier for one or more unlabeled instances of the target domain; ¶¶ [0060]-[0113] and [0115]-[0126] with FIGS. 3-4: extract a plurality of features from the plurality of labeled instances of the source domain; the extracted plurality of features may constitute a first hidden layer of the first neural network; learn the common representation and the source specific representation from the extracted plurality of features; partition the first hidden layer of the first neural network by learning the common representation and the source specific representation; partition the first hidden layer into the common representation and the source specific representation, such that a source classification error of the first neural network is minimized and further ensures that the domain divergence associated with the common representation in the partitioned first hidden layer is minimum; i.e., the learning of the common representation may be based on a minimization of the domain divergence between the source domain and the target domain; the common representation becomes domain independent by attaining minimum domain divergence; determine the plurality of common features that are present in both the plurality of labeled instances of the source domain and the plurality of unlabeled instances of the target domain, and have a TF-IDF score greater than the predefined threshold; determine the plurality of source specific features that are present in the plurality of labeled instances of the source domain only and have a TF-IDF score greater than a predefined threshold; for an accurate partitioning of the first hidden layer the domain regression processor may be configured to maintain a trade-off between the source classification error and the domain divergence; label the one or more unlabeled instances in the plurality of unlabeled instances of the target domain based on the learned common representation; use the trained generalized classifier to label the one or more unlabeled instances in the plurality of unlabeled instances to generate pseudo-labeled instances of the target domain; the trained generalized classifier may further determine a confidence score for each prediction of label; the instances that are labeled with a confidence score greater than a predefined threshold are referred to as pseudo-labeled instances of the target domain; label a plurality of unlabeled instances of the source domain, based on the learned source specific representation and the learned common representation; i.e., use the partitioned first hidden layer of the first neural network for labeling the plurality of unlabeled instances of the source domain; the source specific representation and the common representation contribute positively in the classification of the plurality of unlabeled instances of the source domain, such that the source classification error is minimum; after labeling of the unlabeled instances of the source domain, further determine the source specific representation and the common representation, respectively, from the instances that are newly labeled; further, based on the determination of the source specific representation and the common representation, the first hidden layer of the first neural network may be updated; the labeled instances of the source domain may correspond to a first output layer of the first neural network; the identification of the plurality of target specific features from the pseudo-labeled instances may be based on a positive contribution of the identified plurality of target specific features to the adaption processor configured to formulate a second neural network based on the pseudo-labeled instances of the target domain and the learned common representation; the pseudo-labeled instances of the target domain may constitute a second input layer of the second neural network and the learned common representation may constitute a second hidden layer of the second neural network; in the first iteration the target specific representation constituting the plurality of target specific features may be determined based on the pseudo-labeled instances of the target domain; add the extracted plurality of target specific features to the second hidden layer; further, re-train the generalized classifier based on the updated second hidden layer (i.e., the learned common representation and the target specific representation determined in the first iteration); thereafter, the re-trained generalized classifier may be further used to generate new pseudo-labeled instances from remaining one or more unlabeled instances of the target domain; for the second iteration, the new pseudo-labeled instances may constitute the second input layer of the second neural network and the plurality of new target specific features extracted from the new pseudo-labeled instances may be used to update the target specific representation; the update of the target specific representation may correspond to an addition of the plurality of new target specific features to the target specific representation; the update of the target specific representation may further correspond to a merging of the one or more target specific features to obtain a single feature based on a similarity score between the one or more target specific features; continue the iterative process of determining the target specific representation till the classification performance of the re-trained generalized classifier converges; a target specific classifier is trained based on the determined target specific representation and the learned common representation to perform automatic text classification of the remaining one or more unlabeled instances of the plurality of unlabeled instances of the target domain).  

Claim 6
Sohn in view of Jawahar discloses all the elements as stated in Claim 1 and further discloses projecting the plurality of the pieces of the labeled data into the trained feature space; and projecting respective pieces of object data of an unlabeled data set that corresponds to a first domain into the trained feature space, wherein the generating includes generating the labeled data sets that correspond to a pseudo-domain of the first domain, by integrating the labeled data located within a predetermined distance from the respective pieces of object data in the trained feature space in which the plurality of the pieces of the labeled data is projected (Sohn, ¶¶ [0015]-[0023] with FIG. 1: the system 100 can include a feature transfer network (FTN) subsystem 102 and an automatic labeling subsystem 104; a training framework can be achieved by recursively or iteratively training the FTN with respect to the FTN subsystem 102 and automatic labeling of data using the trained FTN with respect to the automatic labeling subsystem 104; the training of the FTN and the automatic labeling can be implemented within a neural network to perform facial recognition; the FTN subsystem 102 can include a source domain 110 and a target domain 120; the source domain 110 can be a labeled domain including labeled examples, while the target domain 120 can be an unlabeled domain including unlabeled examples; the verification task performed can include a binary classification task shared across the source and target domains 110 and 120 that takes a pair of images as an input and predicts a label of "1" if the pair of images shared the same identity, and predicts a label of "0" otherwise; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains 110 and 120, belong to the same class (it is not known which distribution the two random samples come from a priori); let the source domain 110 be denoted as Xs, the target domain 120 be denoted as XT, and the two random samples be denoted as x and x'; there are three scenarios of constructing a pair: (1) x, x'[Symbol font/0xCE]Xs; (2) x, x'[Symbol font/0xCE]XT; and (3) x[Symbol font/0xCE]Xs, x'[Symbol font/0xCE]XT; scenarios (1) and (2) can be referred to as intra-domain verifications, while the scenario (3) can be referred to as a cross-domain (or inter-domain) verification; for intra-domain verification scenarios, a source (or target) domain classifier may be needed; for the source domain 110, adequately labeled training examples can be provided to learn a competent classifier; for the target domain 120, only unlabeled training examples are provided; however, the discriminative power of the classifier can be transferred to the target domain 120 by adapting the representation spaces of XT×XT and Xs×Xs; i.e., the same competent classifier from the source domain 110 can be used to verify target domain pairs if the two domains are well-aligned; for the cross-domain verification scenario, it can be assumed that the two samples x and x' cannot be of the same class, which is true for problems such as, e.g., cross-ethnicity facial verification problems; the FTN component 130 can separate target features of the target domain 120 from source features of the source domain 110 while simultaneously aligning the features with an auxiliary domain of transformed source features; the FTN component 130 can include a feature generation module and a feature transfer module; an output of the FTN component 130 is received as input into each of a verification component 140, an entropy minimization component 150 and a domain discriminator 160; in the automatic labeling subsystem 104, target images 170 can be provided as input into the target domain 120 for automatic labeling; ¶¶ [0025]-[0044] with FIG. 2: an overall training framework by recursively or iteratively training a feature transfer network (FTN) and automatic labeling of data using the trained FTN is illustrated in FIG. 2 to implement an unsupervised cross-domain distance metric adaptation framework with a feature transfer network for enhancing facial recognition; a section 210 corresponding to a training protocol of domain discriminators D1 and D2 , a section 230 corresponding to training of a feature generator f and a feature transformer g, and a section 250 corresponding to an automatic labeling protocol; to perform the automatic labeling, section 250 can implement clustering of target examples for providing pseudo-labels; section 250 can implement a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; sections 210 and 230 collectively correspond to an iterative training of the FTN between the discriminators, the feature generator f and the feature transformer g using unlabeled source and target examples; the feature transformer g can allow joint optimization with domain adversarial loss (via D1) and domain separation loss (via D2 ); the feature generator f, which is represented as f: X→Z, can map Xs and XT to distinguishable representation spaces f(Xs) and f(XT); a domain separation objective function (e.g., loss function), Lsep, can be used to achieve this separation, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions (e.g., source or target); the feature transformer g, which can be represented as g: Z→Z, can transform f(Xs) to g(f(Xs)) for alignment with f(XT); a domain adversarial objective function, Ladv, can be used to achieve the alignment, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions ( e.g., source or target); domain adversarial objective functions for domain alignment can be applied between transformed source and target domains by D1 and can apply Lsep to distinguish the source domain from both the target domain and the transformed source domains by D2; verification objective functions can be applied to source pairs f(Xs) and transformed source pairs g(f(Xs)) using classifiers; e.g., the classifiers can include classifiers hf, hg: Z×Z→{0, 1}; during testing, the metric distance between f(x) and f(x') can be compared; the following desired capabilities can be achieved: (a) if x and x' are from different domains, f(x) and f(x') will be far away due to the functionality of the feature generation module; (b) if x, x'[Symbol font/0xCE]Xs, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hf, and (c) if x, x'[Symbol font/0xCE]Xr, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hg; section 210 is shown including sub-sections 212 through 220 for training the domain discriminators D1 and D2; the discriminator D2 is trained to discriminate between source features and the mixture of source-augmented features and target features; the discriminator D1 is trained to discriminate between source-augmented features and target features; sub-sections 212, 214 and 216 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D2 ,                         
                            
                                    L
                                
                                            D
                                        
                                            2
                                        
                    ; sub-sections 212 and 214 correspond to the source domain, while sub-section 216 corresponds to the target domain; sub-sections 218 and 220 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D1 ,                         
                            
                                    L
                                
                                            D
                                        
                                            1
                                        
                    ; sub-section 218 corresponds to the source domain and sub-section 220 corresponds to the target domain; section 230 provides a training protocol of the feature generator f and the feature transformer g; as shown, section 230 can include a plurality of sub-sections 232 through 246, where sub-sections 232, 234, 236, 242 and 244 correspond to the source domain and sub-sections 238, 240 and 246 correspond to the target domain; section 230 can train an objective function (e.g., loss function) corresponding to the feature transformer g, Lg, an objective function (e.g., loss function) corresponding to the feature generator f, Lf, an objective function (e.g., loss function) corresponding to feature reconstruction loss between features extracted from the feature generator (f) and the reference network (ref), Lrecon,  and an objective function (e.g., loss function) corresponding to multi-class entropy minimization loss, Lentropy; for purposes of training the objective functions in section 230, sub-section 234 can generate a same or similar output as sub-section 212, sub-section 236 can generate a same or similar output as sub-section 214, and sub-section 238 can generate a same or similar output as sub-section 216; Lg can be trained by verification loss and domain separation loss via the discriminator D2; Lvrf refers to an objective function (e.g., loss function) corresponding to verification loss between labeled pairs of images and λ2 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D2;  a "hyper-parameter" refers to a regularization coefficient (e.g., non-negative real number) that can be defined by a user; Lf can be trained by verification loss from both source and source-augmented features, domain separation loss via the discriminator D2, and domain adversarial loss via D1; λ1 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D1; yij refers to the ground truth verification label of examples xi and xj (e.g., yij=1 if xi and xj represent a same feature (e.g., same face of a person) and yij=0 otherwise), σ refers to the sigmoid function and f(xi)T refers to the transpose of the vector f(xi); Lvrf(f) can be extended into an N-pair distance metric loss for faster convergence and improved performance; Lvrf(g) can be trained in a similar manner; Lrecon can correspond to feature reconstruction loss between features extracted from the feature generator f and a reference network (ref) pretrained using labeled source data to stabilize the challenging adversarial training; Lentropy can use pseudo-labels retrieved by hierarchical clustering; ¶¶ [0045]-[0061] with FIG. 3: at block 310, a source domain and a target domain are obtained; the source domain can be associated with labeled source data (e.g., labeled source examples), while the target domain can be associated with unlabeled target data ( e.g., unlabeled examples); the source and target domains can be provided to perform a verification task; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains, belong to the same class (it is not known which distribution the two random samples come from a priori); at block 320, a feature transfer network (FTN) is trained based on the source domain and the target domain; training the FTN can include, at block 322 training a first domain discriminator and a second domain discriminator; the first domain discriminator can be trained to discriminate between source-augmented features and target features, and the second domain discriminator can be trained to discriminate between source features and a mixture of the source-augmented features and the target features; training the FTN can further include, at block 324, training a feature generator and a feature transformer based on the first and second domain discriminators; the feature generator can allow for joint optimization with domain adversarial loss via the first domain discriminator and domain separation loss via the second domain discriminator; feature generator training can include verification loss from both the source and the source-augmented features, domain separation loss via the second domain discriminator, and domain adversarial loss via the first domain discriminator; feature transformer training can include verification loss and domain separation loss via the second domain discriminator; the objective function corresponding to the feature generator can be trained based in part on the objective function corresponding to verification loss associated with the feature generator, and the objective function corresponding to the feature transformer can be trained based in part on the objective function corresponding to verification loss associated with the feature transformer; the verification loss can be extended into N-pair distance metric loss for faster convergence and improved performance; training the feature generator and the feature transformer can further include training an objective function corresponding to feature reconstruction loss between features extracted from the feature generator and a reference network pretrained using labeled source data to stabilize the challenging adversarial training; e.g., the objective function corresponding to feature reconstruction loss can be trained based on representations of examples from the source and target domains using the reference network; training the feature generator and the feature transformer can further include training an objective function corresponding to multi-class entropy minimization loss; the objective function corresponding to multi-class entropy minimization loss can use labels (e.g., pseudo-labels) retrieved by clustering (e.g., from block 330 as described in further detail below); e.g., the objective function corresponding to multi-class entropy minimization loss can be trained based on positive examples of respective examples from the target domain; at block 330, automatic labeling of target domain data is trained using a clustering method; automatically labeling the target examples can include clustering of the target examples for providing pseudo-labels to automatically discover class structure for the target domain; clustering the target examples can include implementing a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; clustering the target examples can include implementing hierarchical clustering; e.g., clustering the target examples can include implementing a hierarchical DBSCAN (HDBSCAN) method; the training of the feature transfer network and the automatic labeling can be recursive or iterative; more specifically, an output of the training at block 320 can be provided as input for training the automatic labeling at block 330, and an output of the training at block 330 can be provided as input for training the feature transfer network at block 320; at block 340, the feature transfer network and the automatic labeling can be implemented to perform a facial recognition task; introduces a source-augmented embedding space via a feature transformer, which allows for a unified learning framework of domain adversarial and domain separation for performing a facial recognition task using labeled source data and unlabeled target data; the recursive or iterative training framework of the system/method 300 of feature transfer network learning and automatic class structure discovery can allow for fast and accurate labeling of unlabeled target data and improved quality of feature representation; globalize a facial analysis system by adapting the facial analysis system to one or more new target domains based on information from one or more source domains; illustratively, in the context of domains associated with ethnicities, a source ethnicity domain (e.g., Caucasian source domain) can include abundant labeled training data, while at least one target ethnicity domain (e.g., non-Caucasian target domain) can include abundant unlabeled target data; examples of possible target domains in this illustrative example can include, but are not limited to, African-American, EastAsian, South-Asian, Hispanic, etc.) (Jawahar, ABSTRACT and ¶¶ [0006]-[0008] and [0019]-[0038]: a domain adaptation method for learning transferable feature representations from a source domain for a target domain; receiving real-time input data comprising a plurality of labeled instances of the source domain and a plurality of unlabeled instances of the target domain; learning common representation shared between the source domain and the target domain, based on the plurality of labeled instances of the source domain; labeling one or more unlabeled instances in the plurality of unlabeled instances of the target domain, based on the common representation; determining a target specific representation corresponding to the target domain, based on the one or more labeled instances of the target domain; training a target specific classifier based on the target specific representation and the common representation to perform automatic text classification on remaining one or more unlabeled instances of the plurality of unlabeled instances of the target domain; a "source domain" corresponds to a technical or business field for which a classifier is already trained; a "target domain" refers to a technical or business field for which a classifier is to be trained; a plurality of labeled instances (such as, a plurality of source text segments) from the source domain may be utilized to train the classifier to label/classify a plurality of unlabeled instances in the target domain; examples of the labeled instances may include labeled images, labeled text segments, and/or labeled audio segments associated with the source domain; examples of the plurality of unlabeled instances may include unlabeled images, unlabeled text segments, and/or unlabeled audio segments associated with the target domain; a "source specific representation" comprises a plurality of features that is specific to a source domain; e.g., the source specific representation may include an M-dimensional feature vector extracted from a plurality of labeled instances of the source domain; a "common representation" comprises a plurality of features that is shared between a source domain and a target domain; a "target specific representation" comprises a plurality of features that is specific to a target domain; e.g., the target specific representation may include a P-dimensional feature vector extracted from one or more labeled instances of the target domain; a "pseudo-label" corresponds to a label that is determined by a generalized classifier for one or more unlabeled instances of the target domain; ¶¶ [0060]-[0113] and [0115]-[0126] with FIGS. 3-4: extract a plurality of features from the plurality of labeled instances of the source domain; the extracted plurality of features may constitute a first hidden layer of the first neural network; learn the common representation and the source specific representation from the extracted plurality of features; partition the first hidden layer of the first neural network by learning the common representation and the source specific representation; partition the first hidden layer into the common representation and the source specific representation, such that a source classification error of the first neural network is minimized and further ensures that the domain divergence associated with the common representation in the partitioned first hidden layer is minimum; i.e., the learning of the common representation may be based on a minimization of the domain divergence between the source domain and the target domain; the common representation becomes domain independent by attaining minimum domain divergence; determine the plurality of common features that are present in both the plurality of labeled instances of the source domain and the plurality of unlabeled instances of the target domain, and have a TF-IDF score greater than the predefined threshold; determine the plurality of source specific features that are present in the plurality of labeled instances of the source domain only and have a TF-IDF score greater than a predefined threshold; for an accurate partitioning of the first hidden layer the domain regression processor may be configured to maintain a trade-off between the source classification error and the domain divergence; label the one or more unlabeled instances in the plurality of unlabeled instances of the target domain based on the learned common representation; use the trained generalized classifier to label the one or more unlabeled instances in the plurality of unlabeled instances to generate pseudo-labeled instances of the target domain; the trained generalized classifier may further determine a confidence score for each prediction of label; the instances that are labeled with a confidence score greater than a predefined threshold are referred to as pseudo-labeled instances of the target domain; label a plurality of unlabeled instances of the source domain, based on the learned source specific representation and the learned common representation; i.e., use the partitioned first hidden layer of the first neural network for labeling the plurality of unlabeled instances of the source domain; the source specific representation and the common representation contribute positively in the classification of the plurality of unlabeled instances of the source domain, such that the source classification error is minimum; after labeling of the unlabeled instances of the source domain, further determine the source specific representation and the common representation, respectively, from the instances that are newly labeled; further, based on the determination of the source specific representation and the common representation, the first hidden layer of the first neural network may be updated; the labeled instances of the source domain may correspond to a first output layer of the first neural network; the identification of the plurality of target specific features from the pseudo-labeled instances may be based on a positive contribution of the identified plurality of target specific features to the adaption processor configured to formulate a second neural network based on the pseudo-labeled instances of the target domain and the learned common representation; the pseudo-labeled instances of the target domain may constitute a second input layer of the second neural network and the learned common representation may constitute a second hidden layer of the second neural network; in the first iteration the target specific representation constituting the plurality of target specific features may be determined based on the pseudo-labeled instances of the target domain; add the extracted plurality of target specific features to the second hidden layer; further, re-train the generalized classifier based on the updated second hidden layer (i.e., the learned common representation and the target specific representation determined in the first iteration); thereafter, the re-trained generalized classifier may be further used to generate new pseudo-labeled instances from remaining one or more unlabeled instances of the target domain; for the second iteration, the new pseudo-labeled instances may constitute the second input layer of the second neural network and the plurality of new target specific features extracted from the new pseudo-labeled instances may be used to update the target specific representation; the update of the target specific representation may correspond to an addition of the plurality of new target specific features to the target specific representation; the update of the target specific representation may further correspond to a merging of the one or more target specific features to obtain a single feature based on a similarity score between the one or more target specific features; continue the iterative process of determining the target specific representation till the classification performance of the re-trained generalized classifier converges; a target specific classifier is trained based on the determined target specific representation and the learned common representation to perform automatic text classification of the remaining one or more unlabeled instances of the plurality of unlabeled instances of the target domain).  

Claim 8
Sohn in view of Jawahar discloses all the elements as stated in Claim 1 and further discloses selecting the labeled data sets generated based on a first data set, from among a plurality of the labeled data sets generated by using the trained feature space; and executing an analysis related to accuracy of a classification model, by using the first data set and the selected labeled data sets (Sohn, ¶ [0013]: a series of training objectives can be introduced to train the feature transfer network, which can include feature reconstruction loss, classification loss, domain adversarial loss and domain separation loss; ¶¶ [0018]-[0023] and [0025]-[0044] with FIGS. 1-2; perform a verification task which can include a binary classification task shared across the source and target domains 110 and 120 that takes a pair of images as an input and predicts a label of "1" if the pair of images shared the same identity, and predicts a label of "0" otherwise; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains 110 and 120, belong to the same class (it is not known which distribution the two random samples come from a priori); scenarios (1) and (2) can be referred to as intradomain verifications, while the scenario (3) can be referred to as a cross-domain (or inter-domain) verification; for intra-domain verification scenarios, a source (or target) domain classifier may be needed; for the source domain 110, adequately labeled training examples can be provided to learn a competent classifier; for the target domain 120, only unlabeled training examples are provided; however, the discriminative power of the classifier can be transferred to the target domain 120 by adapting the representation spaces of XT×XT and Xs×Xs; i.e., the same competent classifier from the source domain 110 can be used to verify target domain pairs if the two domains are well aligned; for the cross-domain verification scenario, it can be assumed that the two samples x and x' cannot be of the same class, which is true for problems such as, e.g., cross-ethnicity facial verification problems; to handle both the intra-domain and cross-domain verification scenarios, the FTN subsystem 102 can further include an FTN component 130 which can separate target features of the target domain 120 from source features of the source domain 110 while simultaneously aligning the features with an auxiliary domain of transformed source features; an output of the FTN component 130 is received as input into each of a verification component 140, an entropy minimization component 150 and a domain discriminator 160; in the automatic labeling subsystem 104, target images 170 can be provided as input into the target domain 120 for automatic labeling; verification objective functions can be applied to source pairs f(Xs) and transformed source pairs g(f(Xs)) using classifiers; i.e., target images 170 can be provided to FTN for labeling by target domain classifier in FTN and perform analysis to determine/verify classification loss/accuracy in verification component 140) (Jawahar, ¶¶ [0074-[0113] and [0120]-[0126] with FIGS. 3-4: segregate the first hidden layer 406 into the source specific representation 406A and the common representation 406B by attaining a minimum source classification error and a minimum domain divergence between the source domain and the target domain. The minimum source classification error may depict that if a classifier is trained based on the source specific representation 406A and the common representation 406B, collectively, the error in the classification of an unlabeled text segment of the source domain by the trained classifier is minimum; the minimum domain divergence may depict that if a classifier is trained based on the common representation 406B, the trained classifier may predict the association of an instance with the source domain and the target domain with equal likelihood; the common representation 406B may contribute positively to the domain regression 410 operation for attaining the minimum domain divergence between the source domain and the target domain; the source specific representation 406A may have penalized contribution to the domain regression 410 operation for attaining minimum domain divergence between the source domain and the target domain; the source specific representation 406A and the common representation 406B may further contribute positively to the classification of a plurality of unlabeled text segments of the source domain; use the common representation 406B of the first hidden layer 406 for training a generalized classifier; the formulated first neural network 402 may represent a generalization step for the classification of the plurality of unlabeled text segments of the target domain; formulate the second neural network 412 for the classification of the plurality of unlabeled text segments associated with the target domain; the formulation of the second neural network 412 may represent an adaptation step for the classification of the plurality of unlabeled text segments of the target domain; label one or more unlabeled text segments of the plurality of unlabeled text segments of the target domain by using the trained generalized classifier to generate the pseudo-labeled instances of the target domain; determine the second input layer 414 of the second neural network 412 based on the pseudo-labeled instances of the target domain; the common representation 406B and the determined target specific representation 416A may constitute the second hidden layer 416 of the second neural network 412; re-train the generalized classifier based on the second hidden layer 416; the re-training of the generalized classifier and the determination of the target specific representation 416A may be an iterative process; in each iteration, determine new target specific features based on the pseudo-labeled instances of the target domain that were labeled in the previous iteration by the re-trained generalized classifier; the target specific representation 416A may be updated in each iteration based on the determined new target specific features in each iteration; terminate the iterative process based on a convergence of the classification performance of the re-trained generalized classifier; e.g., if a difference in the performance of the trained generalized classifier in the classification of the plurality unlabeled text segments of the target domain in two consecutive iterations exceeds the performance threshold).  

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Sohn in view of Jawahar as applied to Claim 1 above, and further in view of Warrier et al. (US 2019/0005115 A1, pub. date: 01/03/2019), hereinafter Warrier.

Claim 4
Sohn in view of Jawahar discloses all the elements as stated in Claim 1 and further discloses projecting the plurality of the pieces of the labeled data into the trained feature space, wherein the generating includes  (Sohn, ¶¶ [0015]-[0023] with FIG. 1: the system 100 can include a feature transfer network (FTN) subsystem 102 and an automatic labeling subsystem 104; a training framework can be achieved by recursively or iteratively training the FTN with respect to the FTN subsystem 102 and automatic labeling of data using the trained FTN with respect to the automatic labeling subsystem 104; the training of the FTN and the automatic labeling can be implemented within a neural network to perform facial recognition; the FTN subsystem 102 can include a source domain 110 and a target domain 120; the source domain 110 can be a labeled domain including labeled examples, while the target domain 120 can be an unlabeled domain including unlabeled examples; the verification task performed can include a binary classification task shared across the source and target domains 110 and 120 that takes a pair of images as an input and predicts a label of "1" if the pair of images shared the same identity, and predicts a label of "0" otherwise; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains 110 and 120, belong to the same class (it is not known which distribution the two random samples come from a priori); let the source domain 110 be denoted as Xs, the target domain 120 be denoted as XT, and the two random samples be denoted as x and x'; there are three scenarios of constructing a pair: (1) x, x'[Symbol font/0xCE]Xs; (2) x, x'[Symbol font/0xCE]XT; and (3) x[Symbol font/0xCE]Xs, x'[Symbol font/0xCE]XT; scenarios (1) and (2) can be referred to as intra-domain verifications, while the scenario (3) can be referred to as a cross-domain (or inter-domain) verification; for intra-domain verification scenarios, a source (or target) domain classifier may be needed; for the source domain 110, adequately labeled training examples can be provided to learn a competent classifier; for the target domain 120, only unlabeled training examples are provided; however, the discriminative power of the classifier can be transferred to the target domain 120 by adapting the representation spaces of XT×XT and Xs×Xs; i.e., the same competent classifier from the source domain 110 can be used to verify target domain pairs if the two domains are well-aligned; for the cross-domain verification scenario, it can be assumed that the two samples x and x' cannot be of the same class, which is true for problems such as, e.g., cross-ethnicity facial verification problems; the FTN component 130 can separate target features of the target domain 120 from source features of the source domain 110 while simultaneously aligning the features with an auxiliary domain of transformed source features; the FTN component 130 can include a feature generation module and a feature transfer module; an output of the FTN component 130 is received as input into each of a verification component 140, an entropy minimization component 150 and a domain discriminator 160; in the automatic labeling subsystem 104, target images 170 can be provided as input into the target domain 120 for automatic labeling; ¶¶ [0025]-[0044] with FIG. 2: an overall training framework by recursively or iteratively training a feature transfer network (FTN) and automatic labeling of data using the trained FTN is illustrated in FIG. 2 to implement an unsupervised cross-domain distance metric adaptation framework with a feature transfer network for enhancing facial recognition; a section 210 corresponding to a training protocol of domain discriminators D1 and D2 , a section 230 corresponding to training of a feature generator f and a feature transformer g, and a section 250 corresponding to an automatic labeling protocol; to perform the automatic labeling, section 250 can implement clustering of target examples for providing pseudo-labels; section 250 can implement a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; sections 210 and 230 collectively correspond to an iterative training of the FTN between the discriminators, the feature generator f and the feature transformer g using unlabeled source and target examples; the feature transformer g can allow joint optimization with domain adversarial loss (via D1) and domain separation loss (via D2 ); the feature generator f, which is represented as f: X→Z, can map Xs and XT to distinguishable representation spaces f(Xs) and f(XT); a domain separation objective function (e.g., loss function), Lsep, can be used to achieve this separation, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions (e.g., source or target); the feature transformer g, which can be represented as g: Z→Z, can transform f(Xs) to g(f(Xs)) for alignment with f(XT); a domain adversarial objective function, Ladv, can be used to achieve the alignment, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions ( e.g., source or target); domain adversarial objective functions for domain alignment can be applied between transformed source and target domains by D1 and can apply Lsep to distinguish the source domain from both the target domain and the transformed source domains by D2; verification objective functions can be applied to source pairs f(Xs) and transformed source pairs g(f(Xs)) using classifiers; e.g., the classifiers can include classifiers hf, hg: Z×Z→{0, 1}; during testing, the metric distance between f(x) and f(x') can be compared; the following desired capabilities can be achieved: (a) if x and x' are from different domains, f(x) and f(x') will be far away due to the functionality of the feature generation module; (b) if x, x'[Symbol font/0xCE]Xs, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hf, and (c) if x, x'[Symbol font/0xCE]Xr, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hg; section 210 is shown including sub-sections 212 through 220 for training the domain discriminators D1 and D2; the discriminator D2 is trained to discriminate between source features and the mixture of source-augmented features and target features; the discriminator D1 is trained to discriminate between source-augmented features and target features; sub-sections 212, 214 and 216 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D2 ,                         
                            
                                    L
                                
                                            D
                                        
                                            2
                                        
                    ; sub-sections 212 and 214 correspond to the source domain, while sub-section 216 corresponds to the target domain; sub-sections 218 and 220 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D1 ,                         
                            
                                    L
                                
                                            D
                                        
                                            1
                                        
                    ; sub-section 218 corresponds to the source domain and sub-section 220 corresponds to the target domain; section 230 provides a training protocol of the feature generator f and the feature transformer g; as shown, section 230 can include a plurality of sub-sections 232 through 246, where sub-sections 232, 234, 236, 242 and 244 correspond to the source domain and sub-sections 238, 240 and 246 correspond to the target domain; section 230 can train an objective function (e.g., loss function) corresponding to the feature transformer g, Lg, an objective function (e.g., loss function) corresponding to the feature generator f, Lf, an objective function (e.g., loss function) corresponding to feature reconstruction loss between features extracted from the feature generator (f) and the reference network (ref), Lrecon,  and an objective function (e.g., loss function) corresponding to multi-class entropy minimization loss, Lentropy; for purposes of training the objective functions in section 230, sub-section 234 can generate a same or similar output as sub-section 212, sub-section 236 can generate a same or similar output as sub-section 214, and sub-section 238 can generate a same or similar output as sub-section 216; Lg can be trained by verification loss and domain separation loss via the discriminator D2; Lvrf refers to an objective function (e.g., loss function) corresponding to verification loss between labeled pairs of images and λ2 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D2;  a "hyper-parameter" refers to a regularization coefficient (e.g., non-negative real number) that can be defined by a user; Lf can be trained by verification loss from both source and source-augmented features, domain separation loss via the discriminator D2, and domain adversarial loss via D1; λ1 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D1; yij refers to the ground truth verification label of examples xi and xj (e.g., yij=1 if xi and xj represent a same feature (e.g., same face of a person) and yij=0 otherwise), σ refers to the sigmoid function and f(xi)T refers to the transpose of the vector f(xi); Lvrf(f) can be extended into an N-pair distance metric loss for faster convergence and improved performance; Lvrf(g) can be trained in a similar manner; Lrecon can correspond to feature reconstruction loss between features extracted from the feature generator f and a reference network (ref) pretrained using labeled source data to stabilize the challenging adversarial training; Lentropy can use pseudo-labels retrieved by hierarchical clustering; ¶¶ [0045]-[0061] with FIG. 3: at block 310, a source domain and a target domain are obtained; the source domain can be associated with labeled source data (e.g., labeled source examples), while the target domain can be associated with unlabeled target data ( e.g., unlabeled examples); the source and target domains can be provided to perform a verification task; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains, belong to the same class (it is not known which distribution the two random samples come from a priori); at block 320, a feature transfer network (FTN) is trained based on the source domain and the target domain; training the FTN can include, at block 322 training a first domain discriminator and a second domain discriminator; the first domain discriminator can be trained to discriminate between source-augmented features and target features, and the second domain discriminator can be trained to discriminate between source features and a mixture of the source-augmented features and the target features; training the FTN can further include, at block 324, training a feature generator and a feature transformer based on the first and second domain discriminators; the feature generator can allow for joint optimization with domain adversarial loss via the first domain discriminator and domain separation loss via the second domain discriminator; feature generator training can include verification loss from both the source and the source-augmented features, domain separation loss via the second domain discriminator, and domain adversarial loss via the first domain discriminator; feature transformer training can include verification loss and domain separation loss via the second domain discriminator; the objective function corresponding to the feature generator can be trained based in part on the objective function corresponding to verification loss associated with the feature generator, and the objective function corresponding to the feature transformer can be trained based in part on the objective function corresponding to verification loss associated with the feature transformer; the verification loss can be extended into N-pair distance metric loss for faster convergence and improved performance; training the feature generator and the feature transformer can further include training an objective function corresponding to feature reconstruction loss between features extracted from the feature generator and a reference network pretrained using labeled source data to stabilize the challenging adversarial training; e.g., the objective function corresponding to feature reconstruction loss can be trained based on representations of examples from the source and target domains using the reference network; training the feature generator and the feature transformer can further include training an objective function corresponding to multi-class entropy minimization loss; the objective function corresponding to multi-class entropy minimization loss can use labels (e.g., pseudo-labels) retrieved by clustering (e.g., from block 330 as described in further detail below); e.g., the objective function corresponding to multi-class entropy minimization loss can be trained based on positive examples of respective examples from the target domain; at block 330, automatic labeling of target domain data is trained using a clustering method; automatically labeling the target examples can include clustering of the target examples for providing pseudo-labels to automatically discover class structure for the target domain; clustering the target examples can include implementing a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; clustering the target examples can include implementing hierarchical clustering; e.g., clustering the target examples can include implementing a hierarchical DBSCAN (HDBSCAN) method; the training of the feature transfer network and the automatic labeling can be recursive or iterative; more specifically, an output of the training at block 320 can be provided as input for training the automatic labeling at block 330, and an output of the training at block 330 can be provided as input for training the feature transfer network at block 320; at block 340, the feature transfer network and the automatic labeling can be implemented to perform a facial recognition task; introduces a source-augmented embedding space via a feature transformer, which allows for a unified learning framework of domain adversarial and domain separation for performing a facial recognition task using labeled source data and unlabeled target data; the recursive or iterative training framework of the system/method 300 of feature transfer network learning and automatic class structure discovery can allow for fast and accurate labeling of unlabeled target data and improved quality of feature representation; globalize a facial analysis system by adapting the facial analysis system to one or more new target domains based on information from one or more source domains; illustratively, in the context of domains associated with ethnicities, a source ethnicity domain (e.g., Caucasian source domain) can include abundant labeled training data, while at least one target ethnicity domain (e.g., non-Caucasian target domain) can include abundant unlabeled target data; examples of possible target domains in this illustrative example can include, but are not limited to, African-American, EastAsian, South-Asian, Hispanic, etc.).
Sohn in view of Jawahar fails to explicitly disclose wherein the generating includes selecting an arbitrary point from the trained feature space in which the plurality of the pieces of the labeled data is projected, and generating the labeled data sets obtained by integrating a predetermined number of the pieces of the labeled data located within a predetermined distance from the arbitrary point.  
Warrier teaches a system and a method relating to machine learning techniques (Warrier, ¶¶ [0002] and [0007]), wherein the generating includes selecting an arbitrary point from the trained feature space in which the plurality of the pieces of the labeled data is projected, and generating the labeled data sets obtained by integrating a predetermined number of the pieces of the labeled data located within a predetermined distance from the arbitrary point (Warrier, ¶ [0006]: selecting a subset of the data points to create a set of selected data points, the selection being based on each node of the plurality of nodes, whereby if there is only one data point that is a member of a particular node, then the one data point is selected to be a member of the set of selected data points and whereby if there are two or more data points that are a member of the particular node, then proportional number of data points relative to all data points that are members of that particular node are selected to be members of the set of selected data points, for each selected data point of the set of selected data points, determining a predetermined number of other data points of the set of selected data points that are closest in distance to that particular selected data point, the distance being determined based on a metric function between a vector of each data point, grouping the selected data points into a plurality of groups based, at least in part, on the predetermined number of other data points of the set of selected data points that are closest in distance, each group of the plurality of groups including a different subset of data points, and providing a list of selected data points and the plurality of groups; ¶¶ [0573]-[0595] with FIG.40: in step 4002, receive data S; the received data may include data points and features (e.g., dimensions); in step 4004, optionally generate reference space R; in step 4006, generate a map ref( ) from S into R; in step 4008, generate a cover of R based on the resolution received from the user; in step 4010, the analysis module 320 clusters each S(d) based on the metric function(s), filter function(s), and the space S; in step 4012, identify nodes which are associated with a subset of the partition elements of all of the S(d) for generating a network graph; in step 4014, select data points using nodes identified in steps 4002-4012; in step 4016, optionally determine nearest neighbors of the selected data points (from step 4014) to identify labels, groups, and/or features for each of the selected data points; identify nearest neighbors of the selected data points; for each data point of the selected data points, identify a predetermined number of data points (from the selected data points) that are closest that that particular data point; e.g., for each data point of the selected data points, identify "k" data points of the selected data points that are closest to that particular data point; as a result, for each data point of the selected data point, identify those data points of the selected data points that are closest to that data point relative to all other data points of the set of selected data points; identify group(s), outcome(s), or any label(s) for any number of the data points of the set of selected data points based on the group(s), outcome(s), or label(s) of each data point's nearest neighbors).  
Sohn in view of Jawahar, and Warrier are analogous art because they are from the same field of endeavor, a system and a method relating to machine learning techniques.  Also, it is well-known in the art that the key idea of DBSCAN1 used in Sohn is that for each point of a cluster the neighborhood of a given radius has to contain at least a minimum number of points, i.e. the density in the neighborhood as to exceed some threshold, and Warrior's clustering technique is similar to HDBSCAN2 used in Sohn.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Warrier to Sohn in view of Jawahar.  Motivation for doing so would improve grouping/clustering of data points (Warrier, ¶ [0002]).

Claim 5
Sohn in view of Jawahar discloses all the elements as stated in Claim 1 and further discloses projecting the plurality of the pieces of the labeled data into the trained feature space, wherein the generating includes  (Sohn, ¶¶ [0015]-[0023] with FIG. 1: the system 100 can include a feature transfer network (FTN) subsystem 102 and an automatic labeling subsystem 104; a training framework can be achieved by recursively or iteratively training the FTN with respect to the FTN subsystem 102 and automatic labeling of data using the trained FTN with respect to the automatic labeling subsystem 104; the training of the FTN and the automatic labeling can be implemented within a neural network to perform facial recognition; the FTN subsystem 102 can include a source domain 110 and a target domain 120; the source domain 110 can be a labeled domain including labeled examples, while the target domain 120 can be an unlabeled domain including unlabeled examples; the verification task performed can include a binary classification task shared across the source and target domains 110 and 120 that takes a pair of images as an input and predicts a label of "1" if the pair of images shared the same identity, and predicts a label of "0" otherwise; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains 110 and 120, belong to the same class (it is not known which distribution the two random samples come from a priori); let the source domain 110 be denoted as Xs, the target domain 120 be denoted as XT, and the two random samples be denoted as x and x'; there are three scenarios of constructing a pair: (1) x, x'[Symbol font/0xCE]Xs; (2) x, x'[Symbol font/0xCE]XT; and (3) x[Symbol font/0xCE]Xs, x'[Symbol font/0xCE]XT; scenarios (1) and (2) can be referred to as intra-domain verifications, while the scenario (3) can be referred to as a cross-domain (or inter-domain) verification; for intra-domain verification scenarios, a source (or target) domain classifier may be needed; for the source domain 110, adequately labeled training examples can be provided to learn a competent classifier; for the target domain 120, only unlabeled training examples are provided; however, the discriminative power of the classifier can be transferred to the target domain 120 by adapting the representation spaces of XT×XT and Xs×Xs; i.e., the same competent classifier from the source domain 110 can be used to verify target domain pairs if the two domains are well-aligned; for the cross-domain verification scenario, it can be assumed that the two samples x and x' cannot be of the same class, which is true for problems such as, e.g., cross-ethnicity facial verification problems; the FTN component 130 can separate target features of the target domain 120 from source features of the source domain 110 while simultaneously aligning the features with an auxiliary domain of transformed source features; the FTN component 130 can include a feature generation module and a feature transfer module; an output of the FTN component 130 is received as input into each of a verification component 140, an entropy minimization component 150 and a domain discriminator 160; in the automatic labeling subsystem 104, target images 170 can be provided as input into the target domain 120 for automatic labeling; ¶¶ [0025]-[0044] with FIG. 2: an overall training framework by recursively or iteratively training a feature transfer network (FTN) and automatic labeling of data using the trained FTN is illustrated in FIG. 2 to implement an unsupervised cross-domain distance metric adaptation framework with a feature transfer network for enhancing facial recognition; a section 210 corresponding to a training protocol of domain discriminators D1 and D2 , a section 230 corresponding to training of a feature generator f and a feature transformer g, and a section 250 corresponding to an automatic labeling protocol; to perform the automatic labeling, section 250 can implement clustering of target examples for providing pseudo-labels; section 250 can implement a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; sections 210 and 230 collectively correspond to an iterative training of the FTN between the discriminators, the feature generator f and the feature transformer g using unlabeled source and target examples; the feature transformer g can allow joint optimization with domain adversarial loss (via D1) and domain separation loss (via D2 ); the feature generator f, which is represented as f: X→Z, can map Xs and XT to distinguishable representation spaces f(Xs) and f(XT); a domain separation objective function (e.g., loss function), Lsep, can be used to achieve this separation, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions (e.g., source or target); the feature transformer g, which can be represented as g: Z→Z, can transform f(Xs) to g(f(Xs)) for alignment with f(XT); a domain adversarial objective function, Ladv, can be used to achieve the alignment, where the term "domain separation" indicates that the representation space can be separated with respect to domain definitions ( e.g., source or target); domain adversarial objective functions for domain alignment can be applied between transformed source and target domains by D1 and can apply Lsep to distinguish the source domain from both the target domain and the transformed source domains by D2; verification objective functions can be applied to source pairs f(Xs) and transformed source pairs g(f(Xs)) using classifiers; e.g., the classifiers can include classifiers hf, hg: Z×Z→{0, 1}; during testing, the metric distance between f(x) and f(x') can be compared; the following desired capabilities can be achieved: (a) if x and x' are from different domains, f(x) and f(x') will be far away due to the functionality of the feature generation module; (b) if x, x'[Symbol font/0xCE]Xs, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hf, and (c) if x, x'[Symbol font/0xCE]Xr, then f(x) and f(x') will be close if they belong to the same class and far away otherwise, due to the discriminative power acquired from optimizing hg; section 210 is shown including sub-sections 212 through 220 for training the domain discriminators D1 and D2; the discriminator D2 is trained to discriminate between source features and the mixture of source-augmented features and target features; the discriminator D1 is trained to discriminate between source-augmented features and target features; sub-sections 212, 214 and 216 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D2 ,                         
                            
                                    L
                                
                                            D
                                        
                                            2
                                        
                    ; sub-sections 212 and 214 correspond to the source domain, while sub-section 216 corresponds to the target domain; sub-sections 218 and 220 generate outputs for training an objective function (e.g., loss function) corresponding to the discriminator D1 ,                         
                            
                                    L
                                
                                            D
                                        
                                            1
                                        
                    ; sub-section 218 corresponds to the source domain and sub-section 220 corresponds to the target domain; section 230 provides a training protocol of the feature generator f and the feature transformer g; as shown, section 230 can include a plurality of sub-sections 232 through 246, where sub-sections 232, 234, 236, 242 and 244 correspond to the source domain and sub-sections 238, 240 and 246 correspond to the target domain; section 230 can train an objective function (e.g., loss function) corresponding to the feature transformer g, Lg, an objective function (e.g., loss function) corresponding to the feature generator f, Lf, an objective function (e.g., loss function) corresponding to feature reconstruction loss between features extracted from the feature generator (f) and the reference network (ref), Lrecon,  and an objective function (e.g., loss function) corresponding to multi-class entropy minimization loss, Lentropy; for purposes of training the objective functions in section 230, sub-section 234 can generate a same or similar output as sub-section 212, sub-section 236 can generate a same or similar output as sub-section 214, and sub-section 238 can generate a same or similar output as sub-section 216; Lg can be trained by verification loss and domain separation loss via the discriminator D2; Lvrf refers to an objective function (e.g., loss function) corresponding to verification loss between labeled pairs of images and λ2 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D2;  a "hyper-parameter" refers to a regularization coefficient (e.g., non-negative real number) that can be defined by a user; Lf can be trained by verification loss from both source and source-augmented features, domain separation loss via the discriminator D2, and domain adversarial loss via D1; λ1 refers to a hyper-parameter that balances the verification loss and the adversarial loss with respect to D1; yij refers to the ground truth verification label of examples xi and xj (e.g., yij=1 if xi and xj represent a same feature (e.g., same face of a person) and yij=0 otherwise), σ refers to the sigmoid function and f(xi)T refers to the transpose of the vector f(xi); Lvrf(f) can be extended into an N-pair distance metric loss for faster convergence and improved performance; Lvrf(g) can be trained in a similar manner; Lrecon can correspond to feature reconstruction loss between features extracted from the feature generator f and a reference network (ref) pretrained using labeled source data to stabilize the challenging adversarial training; Lentropy can use pseudo-labels retrieved by hierarchical clustering; ¶¶ [0045]-[0061] with FIG. 3: at block 310, a source domain and a target domain are obtained; the source domain can be associated with labeled source data (e.g., labeled source examples), while the target domain can be associated with unlabeled target data ( e.g., unlabeled examples); the source and target domains can be provided to perform a verification task; the goal of the verification task is to verify whether two random samples, drawn from either of the source and target domains, belong to the same class (it is not known which distribution the two random samples come from a priori); at block 320, a feature transfer network (FTN) is trained based on the source domain and the target domain; training the FTN can include, at block 322 training a first domain discriminator and a second domain discriminator; the first domain discriminator can be trained to discriminate between source-augmented features and target features, and the second domain discriminator can be trained to discriminate between source features and a mixture of the source-augmented features and the target features; training the FTN can further include, at block 324, training a feature generator and a feature transformer based on the first and second domain discriminators; the feature generator can allow for joint optimization with domain adversarial loss via the first domain discriminator and domain separation loss via the second domain discriminator; feature generator training can include verification loss from both the source and the source-augmented features, domain separation loss via the second domain discriminator, and domain adversarial loss via the first domain discriminator; feature transformer training can include verification loss and domain separation loss via the second domain discriminator; the objective function corresponding to the feature generator can be trained based in part on the objective function corresponding to verification loss associated with the feature generator, and the objective function corresponding to the feature transformer can be trained based in part on the objective function corresponding to verification loss associated with the feature transformer; the verification loss can be extended into N-pair distance metric loss for faster convergence and improved performance; training the feature generator and the feature transformer can further include training an objective function corresponding to feature reconstruction loss between features extracted from the feature generator and a reference network pretrained using labeled source data to stabilize the challenging adversarial training; e.g., the objective function corresponding to feature reconstruction loss can be trained based on representations of examples from the source and target domains using the reference network; training the feature generator and the feature transformer can further include training an objective function corresponding to multi-class entropy minimization loss; the objective function corresponding to multi-class entropy minimization loss can use labels (e.g., pseudo-labels) retrieved by clustering (e.g., from block 330 as described in further detail below); e.g., the objective function corresponding to multi-class entropy minimization loss can be trained based on positive examples of respective examples from the target domain; at block 330, automatic labeling of target domain data is trained using a clustering method; automatically labeling the target examples can include clustering of the target examples for providing pseudo-labels to automatically discover class structure for the target domain; clustering the target examples can include implementing a density-based spatial clustering of applications with noise (DBSCAN) method with trained feature network feature representation; clustering the target examples can include implementing hierarchical clustering; e.g., clustering the target examples can include implementing a hierarchical DBSCAN (HDBSCAN) method; the training of the feature transfer network and the automatic labeling can be recursive or iterative; more specifically, an output of the training at block 320 can be provided as input for training the automatic labeling at block 330, and an output of the training at block 330 can be provided as input for training the feature transfer network at block 320; at block 340, the feature transfer network and the automatic labeling can be implemented to perform a facial recognition task; introduces a source-augmented embedding space via a feature transformer, which allows for a unified learning framework of domain adversarial and domain separation for performing a facial recognition task using labeled source data and unlabeled target data; the recursive or iterative training framework of the system/method 300 of feature transfer network learning and automatic class structure discovery can allow for fast and accurate labeling of unlabeled target data and improved quality of feature representation; globalize a facial analysis system by adapting the facial analysis system to one or more new target domains based on information from one or more source domains; illustratively, in the context of domains associated with ethnicities, a source ethnicity domain (e.g., Caucasian source domain) can include abundant labeled training data, while at least one target ethnicity domain (e.g., non-Caucasian target domain) can include abundant unlabeled target data; examples of possible target domains in this illustrative example can include, but are not limited to, African-American, EastAsian, South-Asian, Hispanic, etc.).  
Warrier teaches a system and a method relating to machine learning techniques (Warrier, ¶¶ [0002] and [0007]), wherein the generating includes selecting a plurality of points that are arbitrary from the trained feature space in which the plurality of the pieces of the labeled data is projected, and generating each of the labeled data sets that correspond to each of the plurality of points, by acquiring and integrating a predetermined number of the pieces of the labeled data located within a predetermined distance from the selected points, for each of the plurality of points (Warrier, ¶ [0006]: selecting a subset of the data points to create a set of selected data points, the selection being based on each node of the plurality of nodes, whereby if there is only one data point that is a member of a particular node, then the one data point is selected to be a member of the set of selected data points and whereby if there are two or more data points that are a member of the particular node, then proportional number of data points relative to all data points that are members of that particular node are selected to be members of the set of selected data points, for each selected data point of the set of selected data points, determining a predetermined number of other data points of the set of selected data points that are closest in distance to that particular selected data point, the distance being determined based on a metric function between a vector of each data point, grouping the selected data points into a plurality of groups based, at least in part, on the predetermined number of other data points of the set of selected data points that are closest in distance, each group of the plurality of groups including a different subset of data points, and providing a list of selected data points and the plurality of groups; ¶¶ [0573]-[0595] with FIG.40: in step 4002, receive data S; the received data may include data points and features (e.g., dimensions); in step 4004, optionally generate reference space R; in step 4006, generate a map ref( ) from S into R; in step 4008, generate a cover of R based on the resolution received from the user; in step 4010, the analysis module 320 clusters each S(d) based on the metric function(s), filter function(s), and the space S; in step 4012, identify nodes which are associated with a subset of the partition elements of all of the S(d) for generating a network graph; in step 4014, select data points using nodes identified in steps 4002-4012; in step 4016, optionally determine nearest neighbors of the selected data points (from step 4014) to identify labels, groups, and/or features for each of the selected data points; identify nearest neighbors of the selected data points; for each data point of the selected data points, identify a predetermined number of data points (from the selected data points) that are closest that that particular data point; e.g., for each data point of the selected data points, identify "k" data points of the selected data points that are closest to that particular data point; as a result, for each data point of the selected data point, identify those data points of the selected data points that are closest to that data point relative to all other data points of the set of selected data points; identify group(s), outcome(s), or any label(s) for any number of the data points of the set of selected data points based on the group(s), outcome(s), or label(s) of each data point's nearest neighbors).  
Soh and Warrier are analogous art because they are from the same field of endeavor, a system and a method relating to machine learning techniques.  Also, it is well-known in the art that the key idea of DBSCAN3 used in Sohn is that for each point of a cluster the neighborhood of a given radius has to contain at least a minimum number of points, i.e. the density in the neighborhood as to exceed some threshold, and Warrior's clustering technique is similar to HDBSCAN4 used in Sohn.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to apply the teaching of Warrier to Sohn in view of Jawahar.  Motivation for doing so would improve grouping/clustering of data points (Warrier, ¶ [0002]).

Allowable Subject Matter
Claim 7 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 101, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  .

Claim 7
Sohn, Jawahar, and Warrier discloses all the elements as stated in Claims 1-6 and 8-10.
YE et al. (US 2023/0063148 A1, priority date on 04/18/2020) discloses in ABSTRACT and ¶¶ [0007]-[0046], [0082], [0109]-[0111], [0114]-[202] with FIGS. 1-8 that (1) obtaining to-be-processed data (301), where the to-be-processed data includes unlabeled data from a target domain and labeled data from a source domain; (2) obtaining a plurality of data segments of each dimension of data in the to-be-processed data (302), where the plurality of data segments are not the same; (3) training a transfer model based on the plurality of data segments, to obtain a trained transfer model (303); (4) segment division is performed on the to-be-processed data, to obtain both an overall feature of the to-be-processed data and a local feature hidden between the data segments of the to-be-processed data, which  can improve accuracy of training data of the transfer model, and further improve accuracy of the transfer model; (5) when the model is trained based on the features, impact caused by actual data misalignment (that is, data is not aligned) can be eliminated, and accuracy of the learned new model can be improved; (6) alignment may be understood as consistency; e.g., two groups of data are obtained in a same time period; (7) that data is not aligned means that the two groups of data are explicitly consistent, but a deviation actually exists; e.g., although the two groups of data are obtained in a same time period, data change trends of the two groups of data are inconsistent; i.e., start points of data changes are different; (8) the to-be-processed data includes one or more dimensions of to-be-processed data means that the to-be-processed data may include data of one or more parameters, or may be understood as data of one or more dimensions, or data of one or more categories; (9) the plurality of data segments obtained through division may have an intersection set, or may not have an intersection set, wherein having an intersection set may also be understood as data overlapping, and data overlapping may be understood as that different data segments have same data; (10) there is usually no intersection set between data segments of different dimensions in the to-be-processed data, and data segments with an intersection set are usually data segments of a same dimension in the to-be-processed data; (11) the to-be-processed data is divided into the plurality of data segments that are not the same, to determine impact of different data segments on a result, or find a data segment that has large impact on a result; (12) the to-be-divided time series data may not be sequentially divided; e.g., the M segments may all include the data at the moment t, which is equivalent to obtaining data segments from different start time points to the current moment; (13) during subsequent feature extraction, the foregoing division manner may be used to find a key time point (namely, a key start time point), for example, a moment at which a trend of data starts to change in a time series data curve; e.g., finding a moment at which a curve trend changes in FIG. 1, finding a moment that has the greatest impact on the current moment, and the like; (14) the labeled data of the source domain and the unlabeled data (for example, the plurality of data segments) of the target domain may be used to train a model applicable to the source domain, a model applicable to the target domain, or a model applicable to both the source domain and the target domain; (15) a start point (before training) of the transfer model may be a pre-trained model, but does not need to be a trained model applicable to the source domain; (16) a difference between the source domain and the target domain can be minimized, so that performance of the trained model in the target domain is also great, so as to transfer the trained model to the target domain; (17) after the feature vector/matrix of each dimension of to-be-processed data is obtained, a dependency (or may be understood as a correlation) between feature vectors/matrices of the to-be-processed data may be obtained by using the feature vectors/matrices, which may be understood as obtaining a dependency between different data segments of data of a same dimension; (18) the feature vectors/matrices may also be used to obtain a dependency (or a correlation or an impact weight) between a feature vector/matrix of a dimension of to-be-processed data and a feature vector/matrix of another dimension of to-be-processed data, which may also be understood as obtaining a dependency (or a correlation or an impact weight) between data segments of different dimensions; (19) when the transfer model is trained based on the plurality of data segments, a first structure feature between data segments of a same dimension in the plurality of data segments may be obtained, and then the transfer model is trained based on the first structure feature and another feature (including the overall feature and the local feature) extracted from the data segments; (20) the first structure feature may be understood as a correlation (an association relationship) between data segments of data of a same dimension; (21)  the first structure feature may be determined based on a dependency between the data segments of a same dimension; (22) when the transfer model is trained based on the plurality of data segments, a second structure feature between data segments of different dimensions in the plurality of data segments may further be obtained, and then the transfer model is trained based on the second structure feature and another feature (including the overall feature and the local feature) extracted from the data segments; (23) the second structure feature may be understood as a correlation (an association relationship) between data segments of data of different dimensions; (24) the second structure feature may be determined based on a dependency between the data segments of different dimensions; (25) both the first structure feature and the second structure feature are obtained, and both the first structure feature and the second structure feature are used for training; (26) in this manner, the local feature of the to-be-processed data can further be fully extracted, which can improve accuracy of the model obtained through training; (27) when the transfer model is trained based on the plurality of data segments, a loss function may be established based on a plurality of data segments in the source domain and a plurality of data segments in the target domain, and the transfer model is obtained through establishing and minimizing the loss function of the source domain and the target domain; and (28) a loss function of a label may further be obtained through combining structure extraction and structure alignment, and using a structure matrix of a sample as an input of a label predictor.
KIM et al. (US 2020/0321118 A1) discloses in ABSTRACT and [0002]-[0012] that (1) performing domain adaptation between a source domain and target domain while improving model performance of the target domain by adversarial learning and reducing cost of model construction; (2) performing domain adaptation based on adversarial learning by using neural network that includes a class-specific discriminator; (3) utilizing discriminators that correspond to each of multiple classes included in domain and allow the neural network to learn better representation for performing target task; (4) improving performance of domain adaptation based on adversarial learning by adjusting learning based on inverted label of the discriminators; (5) extracting feature data from multiple data sets; (6) training a first discriminator discriminating a domain for a first class using first feature data extracted from a first data set corresponding to a first class of a first domain among the multiple data sets; (7) training the first discriminator using second feature data extracted from a second data set corresponding to the first class of a second domain among the multiple data sets; (8) training a second discriminator discriminating a domain for a second class using third feature data extracted from a third data set that corresponds to a second class of the first domain among the multiple data sets; and (9) training the second discriminator using fourth feature data extracted from a fourth data set that corresponds to the second class of the second domain among the multiple data sets.  KIM'118 further discloses in ¶¶ [0039]-[0045] with FIGS. 1-2 that (1) the first data set (12) can be a data set comprised of multiple training samples (e.g. Data!) that belong to the first domain, and the second data set (13) can be a data set comprised of multiple training samples (e.g. Data2) that belong to the second domain different from the first domain; (2) construct neural network that can be utilized with the first domain and second domain using domain adaptation; (3) the first data set (12) that belongs to the first domain can include a data set (12-1) classified as first class and a data set (12-2) classified as second class; (4) the second data set (13) that belongs to the second domain can include a data set (13-1) classified as the first class and a data set (12-2) classified as the second class; (5) the first discriminator (16) can correspond to the first class, and the second discriminator (17) can correspond to the second class; i.e., each discriminator (16, 17) can be a class-specific discriminator; (6) the first discriminator (16) can be trained using the data set (12-1) that corresponds to the first class of the first domain and the data set (13-1) that corresponds to the first class of the second domain; (7)  the second discriminator (17) can be trained using the data set (12-2) that corresponds to the second class of the first domain and the data set (13-2) that corresponds to the second class of the second domain; (8) the output layer (15) can be trained to execute target tasks such as classification using all data sets (12, 13) that belong to the first domain and second domain; (9) since the feature extraction layer (14) must extract common features of the two domains, it can be trained using all data sets (12, 13) of the first domain and second domain; (10) adversarial learning can be performed between the feature extraction layer (14) and each discriminator (16, 17); i.e., the discriminator (16, 17) can be trained to discriminate domains well, and the feature extraction layer (14) can be trained to not discriminate domains well.  KIM'118 also discloses in ¶¶ [0054]-[0091] with FIGS.3-9 that (1) determine whether the first data set (or third data set) and second data set (or fourth data set) include different data forms; (2) if data forms are different, each data included in the second data set (or fourth data set) can be adjusted (or transformed) to have same input form as the first data set (or third data set); (3) it is decided whether the adjusted data satisfy conditions, wherein the conditions refer to criteria for determining suitability of the adjusted data as sample data for learning; (4) when domain adaptation is performed based on adversarial learning in neural network illustrated in FIG. 5, regardless of class of data sets, the first discriminator (33) will be trained to discriminate domains well and the feature extraction layer (31) will be trained not to discriminate domains; (5) as illustrated in FIG. 6, if one discriminator is used, data sets (41 through 44) that belong to the two domains can be crowded in feature space regardless of class; i.e., as a result of training the feature extraction layer (31) so as to minimize difference among different domains regardless of class, distance between different classes can be reduced to have data sets of different classes (e.g. 41 and 42) mixed in crowded area (46); (6) in this case, since reference line (45) that classifies classes cannot clearly discriminate data sets of different classes (e.g. 41 and 42), accuracy of the task is lowered; (7) to solve the problem of lowered accuracy of the task, multiple discriminators specialized in each class can be included in neural network, wherein each discriminator can correspond to one class and discriminate domains, but a certain discriminator may correspond to one or more classes; (8) feature data of acquired data set are extracted through feature extraction layer of neural network; (9) the first discriminator and feature extraction layer are trained using feature data that correspond to first class; (10) the first discriminator can refer to the domain discriminator in charge of the first class; (11) adversarial learning can be performed between the feature extraction layer and first discriminator; (12) the feature extraction layer can be trained using errors based on inverted label, wherein the inverted label can refer to a label that inverted ground truth domain label; (13) more specifically, domain prediction value for feature data that corresponds to the first class can be acquired by the first discriminator; (14) the domain prediction value can refer to probability value of each domain (e.g. confidence score of each domain) indicating the domain to which the data set with extracted feature data belongs; (15) errors can be calculated based on difference between the domain prediction value and inverted label, and weight value of the feature extraction layer can be updated by back propagation of the errors; (16) the weight value of the first discriminator is not updated by back propagation of the errors, which is because the first discriminator must be trained to discriminate domains well; (17) the domain prediction value of the first discriminator can be inverted, and errors can be calculated based on difference between the inverted prediction value and ground truth domain label; (18) errors can be calculated between the domain prediction value of the first discriminator and ground truth domain label, and gradient of the calculated errors can be inverted; i.e., the weight value of the feature extraction layer can be updated based on the inverted gradient; (19) second discriminator and feature extraction layer are trained using feature data that correspond to second class, wherein the second discriminator can refer to the domain discriminator in charge of the second class and the feature extraction layer is also trained using feature data that correspond to the second class; (20) adversarial learning can be performed between the feature extraction layer and second discriminator; (21)  the output layer is trained, wherein the output layer is a layer trained to execute the target task (that is, task-specific layer) and outputs probability that the input data set belongs to each class (e.g. confidence score of each class); (22) errors about the output prediction value from the output layer (that is, difference between the prediction value and ground truth label) can be calculated, and the weight value of the output layer can be updated by back propagation of the errors calculated; (23) here, the weight value of the feature extraction layer can be updated at the same time; (24) learning process associated with the first domain and learning process associated with the second domain can be executed at the same time; (25) the neural network can include a feature extraction layer (51), output layer, first discriminator (53) specific for first class, and second discriminator (54) specific for second class; (26) if domain adaptation based on adversarial learning is performed using a class-specific discriminator, distance between data sets of the same class (61/63 or 62/64) can become closer in feature space, and distance between data sets of different classes (61/62 or 63/64) can become further; (27) this is because difference between domains can be minimized for each class by performing independent adversarial learning on each class using the class-specific discriminator; (28) the neural network can learn optimal representation that executes the task for datasets of different domains with high accuracy; (29) the neural network that executes the task with high accuracy both in the source domain and target domain can be constructed by performing adversarial learning for each class using the class-specific discriminator; (30) therefore, cost of model construction in the target domain can be reduced by large; and (31) improve prediction performance of the neural network in a domain that cannot easily secure data (e.g. DBT domain), and the prediction performance can be improved further if the two domains have high similarity.  KIM'118 further teaches in ¶¶ [0094]-[0107] with FIGS. 10-12 that (1) data sets are acquired and feature data of the acquired data sets are extracted; (2) each discriminator is trained using feature data that correspond to each class; (3) when learning accuracy of each discriminator is greater than threshold value, the feature extraction layer and output layer are trained; (4) adversarial learning can be performed between the feature extraction layer and each of the discriminator; i.e., unlike the discriminator, the feature extraction layer can be trained not to discriminate domains; (5) adversarial learning of the feature extraction layer can be controlled based on learning accuracy of the output layer (that is, accuracy of the task); e.g., (a) if learning accuracy of the output layer is greater than (or is greater than or equal to) the threshold value, adversarial learning can be controlled to be continued (or resumed) on the feature extraction layer; and (b) if learning accuracy of the output layer is below (or less than or equal to) the threshold value, learning of the feature extraction layer can be controlled to stop; (6) this is because low learning accuracy of the output layer indicates closer distance between data sets of different classes in feature space; (7) in this case, to increase the distance between data sets of different classes, adversarial learning can be stopped and learning based on prediction errors of the output layer can be performed on the feature extraction layer; (8) if distance (dl) between data sets of different classes (72, 73) increases and distance (d3) between data sets of the same class (e.g. 72, 74) decreases, performance improvement effect of domain adaptation can be improved further; (9) the first class and second class can be discriminated more clearly as the distance (d1) increases, and discrimination of the first domain and second domain becomes more difficult as the distance (d3) decreases; (0) if adversarial learning using the discriminator is performed to further decrease the distance (d3), there can be a problem in which the distance (d1) also decreases in feature space depending on the case; (11) therefore, it is necessary to monitor the distance (d1) and control overall learning so that the distance (d1) can be increased again if necessary; (12) more specifically, learning accuracy of the output layer (that is, the performance evaluation result) can be used as an indicator to monitor the distance (dl). Low accuracy of the output layer can indicate closer distance (d1); (13) when learning accuracy of the output layer falls below the threshold value, adversarial learning of the feature extraction layer using the discriminator can be stopped; (14) in addition, learning of the output layer can be performed to increase the distance (d1) (15) learning of the output layer can include updating of the weight value of the output layer and feature extraction layer using prediction errors of the output layer; (16) on the contrary, when learning accuracy of the output layer becomes greater than or equal to the threshold value, adversarial learning of the feature extraction layer using the discriminator is resumed to control learning so that the distance (d3) between data sets of the same class would become closer; (17) if learning accuracy of the output layer falls below the threshold value, the importance of the output layer increases and learning of the output layer can be performed by reflecting the increased importance; e.g.,  learning of the output layer can be performed by amplifying prediction errors of the output layer based on the importance; (18) in this case, learning accuracy of the output layer can increase again; (19) as illustrated in FIG. 12, distance (d4) between data sets of the same class (72, 74) was decreased and distance (d2) between data sets of different classes (73, 74) was surely increased; and (20) the performance improvement effect of neural network according to domain adaptation can be maximized by controlling learning so that the distance between data sets of the same class is decreased and the distance between data sets of different classes is increased.
Watson et al. (US 2020/0082210 A1, pub. date: 03/12/2020) discloses in ABSTRACT ¶¶ [0003], [0017]-[0018], and [0049]-[0052] with FIGS. 4-5 that (1) at block 420, assign pseudo-labels to unlabeled examples of data using a similarity metric on an embedding space to produce pseudo-labeled example; (2) assigns weighted pseudo-labels (e.g., as an example of the pseudo-labels of block 520) to the unlabeled examples of the data based on a similarity metric in an embedding space to known, labeled examples of the data; (3) at sub-block 560, train a transfer model using a curriculum learning model; (4) at block 440/580, train a curriculum learning model using the pseudo-labeled examples; (5) at block 460/590, the curriculum learning model trained with the pseudo-labeled examples is employed in a fine-tuning task to enhance classification accuracy of the data; (6) generating and augmenting transfer learning datasets with pseudo-labeled data elements to achieve perfect data labeling; (7) assigning automatically generated pseudo-labels to unlabeled data elements for use in creating transfer models to enhance performance on a target task (e.g., correctly labeling over eight million images); (8) a pseudo-label is a listing of a similarities of data elements to different data elements (e.g., a relative label or relative name); (9) pseudo-label are automatically generated based on a distancing metric applied to unlabeled data elements of the data set to determine a distance, which itself is used as a pseudo-label of these unlabeled data items; (10) pseudo-labels are not human interpretable, as they can appear as arbitrary alphanumeric codes; and (11) more particularly, the pseudo-labels are data type agnostics, as any vector representation will work.  Watson further discloses in ¶¶ [0053]-[0061] with FIG. 6 that (1) at blocks 612, 613, and 614, a user data set is obtained, which comprises unlabeled data (at block 612), labeled data (at block 613), and task-specific data (at block 614); (2) proportionally, the unlabeled data of block 612 is greater in volume than the labeled data of block 613, which is greater in volume than the task-specific data of block 614; (3) at block 616, the user data set is passed through pre-trained models (for forward prediction), wherein forward prediction comprises an image feature extraction where the pre-trained model penultimate layer's outputs are feature vectors, from which each category's average feature vectors is computed as a category feature representation; (4) the category feature representation for both source and target data sets are computed; (5) at block 618, inter-item distances are generated for each unlabeled element utilizing each pre-trained model; (6) in this regard, each pre-trained model generates an individual dissimilarity score (e.g., dissimilarity scores between source and target dataset to measure how they different from each other, such as 0 for similar and 1 for dissimilar) for each unlabeled element; (7) the pre-trained model can be utilized in accordance with a hierarchical agglomerative clustering; (8) a dissimilarity measures as the inter-item distances are computed according to equation 1, wherein KL is a divergence between a target P and a source Q, Q represents a theory, model, description, or approximation of P; e.g., DKL(target||source) are computed as dissimilarity scores between source and target dataset to measure how they different from each other (e.g., 0: similar, 1: dissimilar).; (9) at block 620, pseudo labels generated and assigned to unlabeled data; e.g., the individual dissimilarity scores are placed in a new vector (formed to hold a plurality of dissimilarity scores for each unlabeled element); (10) the combined new vector and scores therein is the pseudo label for the unlabeled element; (11) a category dictionary c that contains (1, M), where 1 is the high-level category label and M is a vector in embedding space pointing to the center of the cluster, is created; (12) for an unknown image i, find KLD (as in 1) of the image to each i, and store this as weight w to create a dictionary of pseudo labeled images I→(i, w); (13) pseudo-labels don't have to be nameable, as they are standing in for category-clusters; (14) at blocks 630, 640, and 650, training steps 1, 2, and 3, are performed; (15) at block 630, a new transfer model trained on pseudo-labeled data; (16) at block 630, the new transfer model is also trained on the labeled data of block 613; (17) at block 650, the new transfer model is fine-tuned on the task specific data of block 614; and (18) at block 630, a final model is outputted; (19) a performance of the final model (such as sorting images or responding to un-seen data samples) has demonstrated greater accuracy on a small data task after augmentation with pseudo labels.
KUMAGAI et al. (US 2022/0230074 A1, filed on 05/17/2019) discloses in ¶¶ [0004]-[0006] and [0028] that (1) "metric learning" is a general term that refers to methods for learning data embedding (low-dimensional vector expression of data) such that similar data pieces are arranged close to each other and different data pieces are arranged away from each other, wherein data embedding that is obtained through metric learning is useful in various tasks in the field of machine learning, such as classification, clustering, and visualization; (2) a domain in which there is a task to be solved will be referred to as a "target domain", and a domain that relates to the target domain will be referred to as a "source domain"; (3) in the above-described case, a domain to which data used in the test belongs is the target domain, and a domain to which data used in the training belongs is the source domain; (4) if a large amount of labeled data of the target domain is available, it is best to train a model using the labeled data of the target domain; (5) however, in many applications, it is difficult to obtain a sufficient amount of labeled data of the target domain, and therefore, a method has been proposed in which, in addition to labeled data of the source domain, unlabeled data of the target domain, which can be collected at a relatively low cost, is used in training to acquire data embedding that is suited to test data even if a data generation distribution differs between the training and the test; and (6) labeled data is data to which training information such as "similar" or "dissimilar" is added.  KUMAGAI further discloses in ¶¶ [0030]-[0037] with FIGS. 1-2 that (1) as shown in FIG. 1, data pieces are arranged apart from each other in a source space X, wherein desired data embedding (see a latent space U) can be acquired with respect to the data in the source space X by learning appropriate mapping f; (2) a predictor that predicts a data embedding space of data that is a prediction targe; (3) training data that is used to train the predictor is labeled data and/or unlabeled data of a plurality of source domains; (4) a target domain is a domain in which there is a task to be solved; (5) a source domain refers to a domain that differs from the target domain, but relates to the target domain; (6) a latent domain vector (the center diagram in FIG. 2) that represents a feature of a domain is presumed from a sample set of each domain (the left diagram in FIG. 2), and data embedding that is suited to the domain (the right diagram in FIG. 2) is output based on the latent domain vector and the sample set; (7) the above relationship is learned using data of a plurality of source domains, and therefore data embedding that is suited to the target domain can be immediately output without carrying out learning when a sample set of the target domain is given; (8) the training device 10 trains a predictor that outputs data embedding that is unique to a domain based on a sample set of each domain, by using labeled data and/or unlabeled data of a plurality of source domains that are given in training; and (9) when a sample set of the target domain is given, the prediction device 20 outputs data embedding that is suited to the target domain by referring to the predictor trained by the training device 10.  KUMAGAI also discloses in ABSTRACT and ¶¶ [0016]-[0018] and [0038]-[0059] with FIGS. 3-5 that (1) a training data input unit (11) accepts input of labeled data of a source domain and/or unlabeled data of a source domain as training data; (2) a feature extraction unit (12) converts data unique to each source domain of which input has been accepted by the training data input unit (11), to a feature vector; (3) a training unit (13) that trains a predictor (141) that performs data embedding suited to an input domain, in accordance with metric learning by using the feature vector of each source domain; (4) when a set of feature vectors that belong to a domain is input, the first model estimates a latent feature vector that is a latent variable of each feature vector of the input domain and a latent domain vector that indicates information regarding the domain that is information regarding a data set of the input domain; (5) the second model outputs a feature vector of the domain when the domain latent feature vector and the latent domain vector that are estimated by the first model are input; (6) the training unit 13 optimizes parameters of the first model and the second model using input to the first model, output of the first model, and output of the second model; (7) the data input unit 21 accepts input of unlabeled data (sample set) of a target domain that is a prediction target, and outputs the unlabeled data of the target domain to the feature extraction unit 22; (8) the feature extraction unit 22 extracts a feature value of unlabeled data of each target domain of which input has been accepted by the data input unit; (9) the feature extraction unit 22 converts a sample that is a prediction target to a feature vector, wherein the feature value is extracted using the same procedure as that used by the feature extraction unit 12 of the training device 10.; (10)  the feature extraction unit 22 converts data that is unique to the target domain of which input has been accepted by the data input unit 21, to a feature vector; (11) the prediction unit 23 predicts data embedding from the sample set by using the predictor 141 trained by the training unit 13; (12) the prediction unit 23 performs data embedding that is suited to the target domain based on the feature vector converted by the feature extraction unit 22, by using the predictor 141 trained by the training unit 13; (13) the output unit 24 outputs the result of prediction performed by the prediction unit 23; (14) training data input unit 11 accepts input of labeled data and/or unlabeled data of a plurality of source domains, as training data (step S1); (15) the feature extraction unit 12 converts data of each domain of which input was accepted in step S1, to a feature vector (step S2); (16) then, the training unit 13 trains the predictor 141 for predicting data embedding unique to a domain based on a sample set of each domain (step S3), and stores the trained predictor 141 in the storage unit 14; (17) the data input unit 21 accepts input of unlabeled data (sample set) of a target domain (step S11); (18) the feature extraction unit 22 converts data of each domain of which input was accepted in step S11, to a feature vector (step S12); 196) then, the prediction unit 23 predicts data embedding from the sample set by using the predictor 141 trained by the training device 10 (step S13); and (20) the output unit 24 outputs the result of prediction performed by the prediction unit 23 (step S14).  KUMAGAI further teaches in ¶¶ [0086]-[0091] with FIG. 3 that (1) t converts data unique to each source domain among labeled data and/or unlabeled data of the source domain, which is training data, to a feature vector, and trains the predictor 141 that performs data embedding suited to an input domain, in accordance with metric learning by using the feature vector of each source domain; (2) the predictor 141 that predicts data embedding unique to each domain is trained by using information unique to each domain as well; (3) data embedding suited to a target domain can be predicted without necessary information being lost, by using the predictor 141 trained using information unique to each domain as well; (4) the predictor 141 includes the first model and the second model; (5) when a feature vector of a domain is input, the first model estimates a latent feature vector and a latent domain vector with respect to the input domain; (6) the second model outputs a feature vector of the domain when the domain latent feature vector and the latent domain vector that are estimated by the first model are input; (7) owing to these two models, the predictor 141 can use even a domain that only includes unlabeled data, in training; (8) information loss can be prevented by using information unique to each domain as well; (9) furthermore, a domain to which label information is not given can also be used as training data, and therefore highly precise data embedding suited to a target domain can be obtained with respect to actual problems in a wide range; i.e., it is possible to prevent information loss and predict data embedding suited to a target domain regardless of the presence or absence of labels of data in a source domain for training.
Sharma et al. (US 2018/0068231 A1, pub. date: 03/08/2018) discloses in ABSTRACT and ¶¶ [0005]-[0007] that (1) training a target domain classifier to label text segments; (2) identifying a set of common keywords with same label from a set of source keywords and a set of target keywords, wherein each keyword in the set of source keywords and the set of target keywords is associated with a label; (3)  training a first classifier, based on the set of common keywords, to label a first set of target text segments, from a plurality of target text segments, wherein each of the labeled first set of target text segments is associated with a first confidence score; (4) training a second classifier based on at least a subset of the labeled first set of target text segments for which the first confidence score exceeds a confidence threshold value; (5) training a third classifier, based on the first classifier and the second classifier, to label a second set of target text segments from the plurality of target text segments, wherein each of the labeled second set of target text segments is associated with a second confidence score and a subset of the labeled second set of target text segments, for which the second confidence score exceeds the confidence threshold value, is utilized for re-training the second classifier; and (6)  determining labels of another plurality of target text segments based on the re-trained second classifier that corresponds to the target domain classifier.  Sharma further discloses in ¶¶ [0019]-[0041] and [0069]-[0122] with FIGS. 3A-B and 4A-B that (1) a "source domain" corresponds to a technical or business field for which a classifier is already trained; (2) a "target domain" refers to a technical or business field for which a classifier is to be trained; (3) a "plurality of source/target text segments" corresponds to text content associated with a source/target domain; (4) the plurality of source text segments may be labeled, such that each source text segment is associated with a label in a set of labels; (5) the plurality of target text segments may be unlabeled; (6)  the plurality of target text segments may be utilized to train a target domain classifier; (7) a trained target domain classifier may be utilized to label each of the plurality of target text segments; (8) a "plurality of source/target keywords" corresponds to keywords present among a plurality of source/target text segments; (9) a "set of common keywords" comprises keywords with same label that are present in both labeled set of source keywords and labeled set of target keywords; (10) a "label" corresponds to a tag/metadata associated with a text-segment/keyword; (11) a "set of source keywords" encompasses keywords, present among a plurality of source keywords, for which a first significance score exceeds a first significance threshold value; (12) a "set of target keywords" encompasses keywords, present among a plurality of target keywords, for which a second significance score is less than a second significance threshold value; (13) a "first/second set of target text segments" encompasses target text segments, from a plurality of target text segments, labeled by a trained first/third classifier; (14) a "first confidence score" refers to a score that represents the measure of confidence with which a trained first classifier predicts a label for a target text segment in a first set of target text segment; (15) a "first classifier" refers to a mathematical model that may be trained based on a set of common keywords, i.e., (the SCP keywords); (16) the first classifier may be utilized to label a first set of target text segments; (17) the first classifier may be utilized to train a third classifier and a target domain classifier; (18) the first classifier may associate a first confidence score with a target text segment when it labels the target text segment; (19) a "second classifier" refers to a mathematical model that may be trained based on a subset of a first set of labeled target text segments and/or a subset of a second set of labeled target text segments; (20) the second classifier may be trained iteratively based on the subset of a second set of labeled target text segments; (21) the final re-trained second classifier may correspond to a trained target domain classifier; (22) the second classifier may be utilized to train a third classifier; (23) a "third classifier" refers to a mathematical model that may be trained based on a weighted combination of a trained first classifier and a trained second classifier; (24) the third classifier may be trained iteratively; (25) the trained third classifier may be utilized to train a target domain classifier; (26) the third classifier may associate a second confidence score with a target text segment when it labels the target text segment; (27) a "second set of target text segments" encompasses target text segments, from a plurality of target text segments, labeled by a trained third classifier; (28) a "second confidence score" refers to a score that represents the measure of confidence with which a trained third classifier predicts a label for a target text segment in a second set of target text segments; (29) a "first significance score" refers to a score that indicates the measure of significance of a keyword for a source domain; (30) a "second significance score" refers to a score that indicates the measure of significance of a keyword for a target domain; (31) a "first/second score" refers to a score that may be indicative of the strength of association of a source/target keyword, in a set of source/target keywords, with a label in a set of labels; (32) the plurality of source text segments associated with the source domain, and the plurality of target text segments associated with the target domain, are received; (33) identify the plurality of source/target keywords from the plurality of source/target text segments utilizing one or more text processing techniques; (34) determine the first significance score for each of the plurality of source keywords utilizing one or more statistical techniques; (35) determine the second significance score for each of the plurality of target keywords; (36) determine a correlation value of each target keyword, in the plurality of target keywords, with every other target keyword in the plurality of target keywords, wherein the correlation values thus obtained for each pair of the plurality of target keywords may correspond to the second significance score; (37) utilize the first/second significance score to identify the set of source/target keywords from the plurality of source/target keywords; (38) compare the first/second significance score of each of the plurality of source/target keywords with the first/second significance threshold value; (39) based on the comparison, identify the source/target keywords in the plurality of source/target keywords for which the first/second significance score "exceeds"/"is less than" the first/second significance threshold value; (40) the identified source/target keywords whose first/second significance score is higher/less than the first/second significance threshold value constitute the set of source/target keywords; (41) label each keyword in the set of source keywords and each keyword in the set of target keywords; (42) the label in the set of labels whose first score of the source keyword is highest is assigned to the source keyword; (43) utilize the one or more similarity measures, such as cosine similarity vector, to determine the second score for each target keyword in the set of target keywords; (44) utilize the one or more similarity measures between a target keyword in the set of target keywords and the set of pre-defined keywords to determine the second score for the target keyword; (45) identify a pre-defined keyword that has highest similarity value corresponding to the target keyword and then assign the label associated with the identified pre-defined keyword to the target keyword; (46) the set of common keywords with the same label is identified from the set of source keywords and the set of target keywords; (47) the first classifier is trained based on the set of common keywords to label the first set of target text segments from the plurality of target text segments; (48) utilize the trained first classifier for labeling the first set of target text segments from the plurality of target text segments; (49) the trained first classifier may determine the first confidence score based on a distance of the target text segment from the first optimal hyperplane; (49) the subset of labeled first set of target text segments, for which the first confidence score exceeds the confidence threshold value, is identified; (50) the second classifier is trained based on at least the subset of the labeled first set of target text segments; (51) determine the second optimal hyperplane for the second classifier based on the subset of labeled first set of target text segments; (52) extract target keywords from target text segments in the subset of labeled first set of target text segments; (53) assign a label to each of the extracted target keywords based on the label of the corresponding target text segment in the subset of labeled first set of target text segments; (54) the third classifier is trained based on the trained first classifier and the trained second classifier to label the second set of target text segments from the plurality of target text segments; (55) determine a third optimal hyperplane for the third classifier based on the first optimal hyperplane of the trained first classifier and the second optimal hyperplane of the trained second classifier; (56) the weights of the trained first classifier and the trained second classifier utilized to train the third classifier are determined based on an accuracy parameter associated with each of the trained first classifier and the trained second classifier; (57) utilize the trained third classifier for labeling the second set of target text segments from the plurality of target texts segment; (58) the trained third classifier may determine the second confidence score based on a distance of the target text segment from the third optimal hyperplane.; (59) the subset of the labeled second set of target text segments for which the second confidence score exceeds the confidence threshold value is identified; (60) compare the second confidence score of each target text segment in the labeled second set of target text segments with the confidence threshold value; (61) based on the comparison, identify the labeled target text segments in the labeled second set of target text segments whose second confidence score is higher than the confidence threshold value; (62) the identified labeled target text segments constitute the subset of labeled second set of target text segments; (63) a check is performed to determine whether the count of target text segments in the second set of target text segments in current iteration exceeds the count of target text segments in the second set of target text segments in a previous iteration; and (64) the second classifier is re-trained based on the subset of labeled second set of target text segments when the count of target text segments in the second set of target text segments in the current iteration exceeds the count of target text segments in the second set of target text segments in the previous iteration; otherwise, the labels of the other plurality of target text segments are determined based on the re-trained second classifier that corresponds to the target domain classifier.
KIM et al. (US 2021/0398004 A1. Priority date on 06/19/2020) discloses in ABSTRACT and ¶¶ [0007]-[0028] that (1)online Bayesian few-shot learning in which multi-domain-based online learning and few-shot learning are integrated when domains of tasks having data are sequentially given; (2) the estimating of the domain and task based on the context information of all pieces of the input support data may include performing batch sampling based on at least one task in a previous domain and a current domain consecutive to the previous domain, extracting features of the support data corresponding to each of the sampled tasks, performing embedding in consideration of context information of the extracted features, and estimating the domain and the task of the support data based on embedded feature information; (3) the performing of the embedding in consideration of the context information of the extracted features may include (a) setting the extracted feature as an input of a self-attention model composed of multi layers and acquiring the embedded feature information as an output corresponding to the input; and (b) setting the extracted feature as an input of a bidirectional long short - term memory (BILSTM) model com posed of the multi layers and acquiring the embedded feature information as the output corresponding to the input; (4) the estimating of the domain and the task of the support data based on the embedded feature information according to the embedding result may include setting the embedding feature information as an input of a multi-layer perceptron model and acquiring the area and the task of the estimated support data as the output corresponding to the input; (5) a dimension of an output stage for the output may be set to be smaller than a dimension of an input stage for the input; (6) in the adapting of the normalized parameter of the task execution model to all pieces of the support data, the adaptation of the normalized parameter of the task execution model to all pieces of the support data may be performed based on a probabilistic gradient decent method; and (7) the calculating of the contrast loss based on the acquired logit pair may include determining whether the acquired logit pair is generated as the same data, and calculating the contrast loss based on an error according to the determination result.  KIM'004 further discloses in ¶¶ [0037]-[0050] with FIG. 1 that (1) a few-shot learning technology is largely divided into a distance learning - based method and a gradient descent-based method; (2) the distance learning-based few-shot learning method is a method of learning a method of extracting a feature that makes a distance closer when two data categories are the same and makes the distance farther apart when the two data categories are different, and then selecting a category of the latest data in the feature space; (3) the gradient descent-based few-shot learning method is a method of finding initial values that show good performance by updating a small number of new tasks; e.g., model agnostic meta-learning ( MAML ) is a representative method which has the advantage that it may be used in all models that are trained based on the gradient descent method, unlike other few-shot learning methods; (4) the framework for online Bayesian few-shot learning illustrated in FIG . 1 targets online Bayesian few-shot learning in a kth domain; (5) the framework stores initial parameters of the entire model in a k-1st domain for normalization-based online learning and stores some data of a past domain (1, 2, k-1st domain) for rehearsal-based online learning; and (5) the goal of the online Bayesian few shot learning is to obtain an optimal parameter πk and ϕ(k) based on a loss function as a reference value.  KIM'004 also discloses in ¶¶ [0059]-[0084] with FIG. 3 that (1) the feature extraction unit 105 performs batch sampling based on at least one task in the previous domain and the current domain and then extracts features of all pieces of the support data corresponding to each sampled task; (2) the feature extraction unit 105 may construct a module using a multi-layer convolutional neural network-batch normalization-nonlinear function having strength in image processing, set an image as the input of the module to obtain an output , and then concatenate a label to extract features; (3) the context embedding unit 110 performs embedding in consideration of the context information of the features extracted by the feature extraction unit 105; (4) the context embedding unit 110 may set the extracted feature as an input of a self-attention model composed of multi - layers that considers correlation between inputs and acquire the embedded feature information as an output corresponding to the input; (5) the context embedding unit 110 may set the extracted features as an input of a bidirectional long short-term memory (BILSTM) model composed of multi-layers and acquire the embedded feature information as the output corresponding to the input; (6) the domain and task estimator 115 estimates domains and tasks of all pieces of the input support data based on the embedded feature information according to the embedding result; (7) the domain and task estimator 115 may set the embedded feature information as an input of a multi - layer perceptron model and acquire the estimated domain and task of the support data as the output corresponding to the input; (8) in this case, a dimension of an output stage for an output of the multi - layer perceptron model may be set to be smaller than that of an input stage for input; (9) the modulation information acquirer 120 acquires the modulation information of the initial parameter of the task execution model based on the estimated domain and task; (10) the modulation information acquirer 120 may acquire the modulation information of the initial parameter ok of the task execution model from the knowledge memory 130 using the estimated domain and task directly from the estimated domain and task or through a knowledge controller 125; (11) the knowledge controller 125 sets the estimated domain and task as the input of the BiLSTM model or the multi-layer perceptron model and generates a read_query and a write_query required for accessing the knowledge memory 130 as the output corresponding to the input; (12) the knowledge controller 125 may calculate a weight for a location of the knowledge memory 130 to be accessed with cosine similarity using the read_query and acquire the modulation information of the initial parameter ok of the task execution model by a linear combination with a value stored in the knowledge memory through the weight; (13) the knowledge controller 125 may calculate the weight for the location of the knowledge memory 130 to be written with the cosine similarity using the write_query, delete the value stored in the knowledge memory 130 based on the calculated weight, and add the modulation information of the estimated domain and task, thereby updating the knowledge memory 130; (14) the modulation information acquirer 120 may set the estimated domain and task as the input of a multi-layer perceptron model and then acquire the modulation information of the initial parameter of the task execution model as the output; (15) the modulator 135 modulates the initial parameter of the task execution model based on the modulation information; (16) in this case, the modulator 135 may sum the modulation information directly acquired by the modulation information acquirer 120 and the modulation information acquired from the knowledge memory 130 by the knowledge controller 125 and may modulate the initial parameter; (17) the normalization unit 140 normalizes the parameter of the modulated task execution model of the task execution model based on the summed modulation information; (18) the task adaptation unit adapts a parameter of the task execution model normalized by the normalization unit 140 to all pieces of the support data; (19) the task executor 150 calculates the task execution loss by performing the task on the input of the using the adapted parameter of the task execution model; (20) the task executor 150 may perform the task by applying the Bayesian neural network to the input of the query data, wherein coefficients of the Bayesian neural network are set to a Gaussian distribution whose covariance is a diagonal matrix, and also, the adapted parameter of the task execution model is composed of a covariance and a mean; (21) the task executor 150 samples the coefficients of the neural network from the Gaussian distribution and then applies the Bayesian neural network to the input of the query data, thereby outputting the result; (22) the determination and update unit 155 acquires a logit pair for all pieces of the support data and the input of the query data and calculates the contrast loss based on the acquired logit pair; (23) the determination and update unit 155 may acquire a logit pair for the support data and the input of the query data as the initial parameters of the entire model of the previous domain and the current domain consecutive to the previous domain; (24) the determination and update unit 155 may determine whether or not the acquired logit pair is generated as the same data and calculate the contrast loss based on an error according to the determination result; (25) the error due to the determination corresponds to the contrast loss, and the learning is performed to easily reduce the contrast loss in terms of interdependence information; (26) the determination and update unit 155 calculates a total loss based on the task execution loss and the contrast loss and updates the initial parameters of the entire model based on the total loss; and (27) in this case, the determination and update unit 155 may update the initial parameters of the entire model with a backpropagation algorithm using the total loss as the reference value.
However, closest arts of records, as discussed above, singly or in combination do not teach or suggest at least following features "selecting a set of the labeled data sets whose overlapping spaces are equal to or less than a threshold value and whose coverage in the trained feature space is equal to or higher than the threshold value, from among a plurality of the labeled data sets generated by using the trained feature space; and executing an analysis related to accuracy of a classification model, by using the selected set of the labeled data sets" when combining with all other limitations of the claim as a whole.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HWEI-MIN LU whose telephone number is (313)446-4913. The examiner can normally be reached Mon - Fri: 9:00 AM - 6:00 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mariela D. Reyes can be reached at (571) 270-1006. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HWEI-MIN LU/Primary Examiner, Art Unit 2142                                                                                                                                                                                                        

        1 See, Ester et al., "A density-based algorithm for discovering clusters in large spatial databases with noise", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), published on 08/02/1996, Sections 3-4 in Pages 227-230.
        2 See, Campello et al., "Density-Based Clustering Based on Hierarchical Density Estimates", Advances in Knowledge Discovery and Data Mining, 17th Pacific-Asia Conference, PAKDD 2013 Gold Coast, Australia, April 14-17, 2013 Proceedings, Part II, Sections 3-5 in Pages 162-168.
        3 See, Ester et al., "A density-based algorithm for discovering clusters in large spatial databases with noise", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96), published on 08/02/1996, Sections 3-4 in Pages 227-230.
        4 See, Campello et al., "Density-Based Clustering Based on Hierarchical Density Estimates", Advances in Knowledge Discovery and Data Mining, 17th Pacific-Asia Conference, PAKDD 2013 Gold Coast, Australia, April 14-17, 2013 Proceedings, Part II, Sections 3-5 in Pages 162-168.
Read full office action
Prosecution Timeline

Apr 17, 2023
Application Filed
Jan 20, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/737,938
Patent 12602578
LIGHT SOURCE COLOR COORDINATE ESTIMATION SYSTEM AND DEEP LEARNING METHOD THEREOF
2y 5m to grant Granted Apr 14, 2026
17/804,513
Patent 12596954
MACHINE LEARNING FOR MANAGEMENT OF POSITIONING TECHNIQUES AND RADIO FREQUENCY USAGE
2y 5m to grant Granted Apr 07, 2026
17/231,757
Patent 12591770
PREDICTING A STATE OF A COMPUTER-CONTROLLED ENTITY
2y 5m to grant Granted Mar 31, 2026
17/662,568
Patent 12579466
DYNAMIC USER-INTERFACE COMPARISON BETWEEN MACHINE LEARNING OUTPUT AND TRAINING DATA
2y 5m to grant Granted Mar 17, 2026
17/805,377
Patent 12561222
REDUCING BIAS IN MACHINE LEARNING MODELS UTILIZING A FAIRNESS DEVIATION CONSTRAINT AND DECISION MATRIX
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+39.5%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 217 resolved cases by this examiner. Grant probability derived from career allow rate.