Office Action Analysis: 18407957 — METHOD AND ELECTRONIC DEVICE WITH REPRESENTATION LEARNING

Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The Information Disclosure Statement filed 01/09/2024 has been considered by examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Moradiannejad (US 20210312227 A1) in view of Kingetsu (US 20220188707 A1) and Liu (US 20180144241 A1).
With respect to claim 1, Moradiannejad teaches A processor-implemented method (“Embodiments of the present disclosure are further directed to a system for detecting annotation errors. The system includes a processor” paragraph 0016), comprising: training a neural network through representation learning using, as training data, a plurality of images (“The neural network included in the image tagging assistant 110 may receive training images with erroneous annotations to learn one or more features of such training images to be able to predict, within a certain level of confidence, whether a particular captured image contains a pattern that is difficult to annotate. The learned one or more features may be associated with one or more metadata tags that may be used to tag the particular captured image that also contains the learned features.” Paragraph 0040), a respective metadata mapped to each of the plurality of images (“The neural network included in the image tagging assistant 110 may receive training images with erroneous annotations to learn one or more features of such training images to be able to predict, within a certain level of confidence, whether a particular captured image contains a pattern that is difficult to annotate. The learned one or more features may be associated with one or more metadata tags that may be used to tag the particular captured image that also contains the learned features.” Paragraph 0040 and “In some embodiments, the adding of the tag data may be the data preparation engine 100 in response to receipt of the tag data from the image tagging assistant. Once tagged, the tag information may be used by different engines of the system including, for example, the data preparation engine and/or the annotation engine 102 of FIG. 1.”), and a respective temporary classified label of each of the plurality of images (“The neural network included in the image tagging assistant 110 may receive training images with erroneous annotations to learn one or more features of such training images to be able to predict, within a certain level of confidence, whether a particular captured image contains a pattern that is difficult to annotate. The learned one or more features may be associated with one or more metadata tags that may be used to tag the particular captured image that also contains the learned features.” Paragraph 0040); and correcting label information, for a signal image and for a corresponding temporary classification label in the respective temporary classified labels, to have corrected classification information (“In one embodiment, the first classifier includes a binary classifier configured to classify annotations with the first label or a second label, wherein the first label is indicative of an incorrect annotation, and the second label is indicative of a correct annotation, and wherein the updated first annotation is classified with the second label.” Paragraph 0008), including determining that the corresponding temporary classification label of the image is mislabeled (“In one embodiment, annotations that are predicted to be incorrect (e.g. within a particular confidence level or value) may be collected as rejected labels 202, and provided to the annotation engine 102 of FIG. 1 for correction” paragraph 052).
Moradiannejad does not explicitly teach using a plurality of signal images, extracting latent features for each of the plurality of signal images using the trained neural network, and generating a feature map representing the plurality of signal images based on respective differences between the extracted latent features; and correcting label information, for a signal image and for a corresponding temporary classification label in the respective temporary classified labels, to have corrected classification information, including determining that the corresponding temporary classification label of the signal image is mislabeled using the generated feature map.
Kingetsu teaches generating a feature map representing the plurality of images based on respective differences between the extracted latent features (“A feature space 30 is obtained by visualizing each of the pieces of training data included in the training data set 141a. The horizontal axis of the feature space 30 corresponds to the axis of the first feature value, whereas the vertical axis corresponds to the axis of the second feature value.” Paragraph 0101); including determining that the corresponding temporary classification label of the signal image is mislabeled using the generated feature map (“In contrast, in the reference technology, accuracy degradation of the machine training model 10 is detected in the case where the output results of the inspector models 11A to 11C are different. “ paragraph 0055 and “FIG. 4 is a diagram illustrating a basic mechanism of the inspector model. For example, the inspector model is created by training a decision boundary 5 serving as a boundary between the distribution Al of the training data that belongs to the first class and the distribution B of the training data that belongs to the second class. In order to detect accuracy degradation of the machine training model 10 with respect to operation data in accordance with elapsed time, a critical area 5a that includes the decision boundary 5 is monitored, and whether or not the number of pieces of operation data included in the critical area 5a is increased (or decreased), and, if the number of pieces of the operation data is increased (or decreased), accuracy degradation is detected.” Paragraph 63).
Kingetsu is analogous art in the same field of endeavor as the claimed invention. Kingetsu is directed towards classifying data based on features (“A feature space 30 is obtained by visualizing each of the pieces of training data included in the training data set 141a. The horizontal axis of the feature space 30 corresponds to the axis of the first feature value, whereas the vertical axis corresponds to the axis of the second feature value. Here, for convenience of description, each of the pieces of training data is indicated by using two axes; however, it is assumed that the training data is multidimensional data. For example, the correct answer label associated with the training data indicated by a circle mark is defined as the “first class”, whereas the correct answer label associated with the training data indicated by a triangle mark is defined as the “second class”.” Paragraph 0101). A person of ordinary skill in the art would have found it obvious, before the effective filing date of the claimed invention, to combine the system of Moradiannejad and Kingetsu by utilizing the feature based similarity mis-labeling identification scheme of Kingetsu in combination with the overall metadata inclusive labeling correction scheme of Moradiannejad, with the expectation that doing so would result in increased accuracy for the machine learning model used by Moradiannejad (see Kingetsu paragraph 0109 “A description will be given here by referring back to FIG. 10. The detection unit 153 is a processing unit that detects accuracy degradation of the machine training model 50 by operating the inspector model 35. The detection unit 153 inputs each of the pieces of training data included in the training data set 141a to the inspector model 35. If the detection unit 153 inputs the training data to the inspector model 35, the distance (norm) between the decision boundary 31 and the training data on the feature space is output.”)
Liu teaches extracting latent features for each of the plurality of signal images using the trained neural network (“Accordingly, one embodiment discloses a method for training a neuron network using a processor in communication with a memory, and the method includes determining features of a signal using the neuron network” paragraph 0006) and labeling signal images based on their features (see figure 1C)
Liu is analogous art in the same field of endeavor as the claimed invention. Liu is directed towards improving the accuracy of signal classifications (“Some embodiments of the invention are based on recognition that an active learning using an uncertainty measure of features of input signals and reconstruction of the signals from the features provides less annotation processes with improving the accuracy of classifications of signals.” Paragraph 0005). A person of ordinary skill in the art before the effective filing date of the claimed invention, would have found it obvious to combine the system of Moradiannejad, Kingetsu, and Liu by incorporating Liu’s signal images teachings into the combined system of Moradiannejad and Kingetsu, with the expectation that doing so would result in further improvements within the machine learning algorithm and classification process (“Some embodiments of the invention are based on recognition that an active learning using an uncertainty measure of features of input signals and reconstruction of the signals from the features provides less annotation processes with improving the accuracy of classifications of signals.” Paragraph 0005)
With respect to claim 2, Moradiannejad, Kingetsu and Liu teach the method of claim 1. Liu teaches utilizing signal data as training data (“Accordingly, one embodiment discloses a method for training a neuron network using a processor in communication with a memory, and the method includes determining features of a signal using the neuron network” paragraph 0006) and Moradiannejad further teaches  wherein the correcting of the label information comprises updating the corresponding temporary classification label, in the respective temporary classified labels, to be the corrected classification information (“In one embodiment, annotations that are predicted to be incorrect (e.g. within a particular confidence level or value) may be collected as rejected labels 202, and provided to the annotation engine 102 of FIG. 1 for correction” paragraph 052), and wherein the training of the neural network comprises training the neural network using, as corresponding training data, the plurality of images (“The neural network included in the image tagging assistant 110 may receive training images with erroneous annotations to learn one or more features of such training images to be able to predict, within a certain level of confidence, whether a particular captured image contains a pattern that is difficult to annotate. The learned one or more features may be associated with one or more metadata tags that may be used to tag the particular captured image that also contains the learned features.” Paragraph 0040), the respective metadata corresponding to the plurality of images (“The neural network included in the image tagging assistant 110 may receive training images with erroneous annotations to learn one or more features of such training images to be able to predict, within a certain level of confidence, whether a particular captured image contains a pattern that is difficult to annotate. The learned one or more features may be associated with one or more metadata tags that may be used to tag the particular captured image that also contains the learned features.” Paragraph 0040), and updated label information that correspond to the respective temporary classification labels with the updated corresponding temporary classification label (“In some embodiments, the adding of the tag data may be the data preparation engine 100 in response to receipt of the tag data from the image tagging assistant. Once tagged, the tag information may be used by different engines of the system including, for example, the data preparation engine and/or the annotation engine 102 of FIG. 1.” Paragraph 0065 and figure 4).
With respect to claim 3, Moradiannejad, Kingetsu and Liu teach the method of claim 1, Liu teaches signal teaches data labeling (see figure 1C), while Kingetsu further teaches wherein the determining that the corresponding temporary classification label of the image is mislabeled comprises: with the feature map including a target point corresponding to a target image which has been classified as a first label (“FIG. 4 is a diagram illustrating a basic mechanism of the inspector model. For example, the inspector model is created by training a decision boundary 5 serving as a boundary between the distribution Al of the training data that belongs to the first class and the distribution B of the training data that belongs to the second class. In order to detect accuracy degradation of the machine training model 10 with respect to operation data in accordance with elapsed time, a critical area 5a that includes the decision boundary 5 is monitored, and whether or not the number of pieces of operation data included in the critical area 5a is increased (or decreased), and, if the number of pieces of the operation data is increased (or decreased), accuracy degradation is detected.” Paragraph 63), selecting the target signal image corresponding to the image to determine the mislabeled corresponding temporary classification label based on multiple signal images (“FIG. 28 is a diagram illustrating a process performed by the computing system according to the third embodiment. The computing system according to the third embodiment creates an inspector model by using knowledge distillation similarly to the computing system 100 according to the first embodiment. The decision boundary trained by using the inspector model is defined as a decision boundary 60. The computing system detects data as an instance that corresponds to the cause of accuracy degradation on the basis of the distance between an instance in the feature space and the decision boundary 60.” Paragraph 0202 and see figure 28), corresponding to a respective first number of different points in the generated feature map within a first proximity to the target point (“For example, in FIG. 28, a certainty factor is different in each of the instances that are included in an operation data set 61. For example, the distance between an instance 61a and the decision boundary 60 is denoted by da. The distance between an instance 61b and the decision boundary 60 is denoted by db. The distance da is smaller than the distance db, so that the instance 61a is more likely to be a cause of accuracy degradation than the instance 61b.” paragraph 0203 and see figure 28), all having been classified as a second label different from the first label (“FIG. 4 is a diagram illustrating a basic mechanism of the inspector model. For example, the inspector model is created by training a decision boundary 5 serving as a boundary between the distribution Al of the training data that belongs to the first class and the distribution B of the training data that belongs to the second class. In order to detect accuracy degradation of the machine training model 10 with respect to operation data in accordance with elapsed time, a critical area 5a that includes the decision boundary 5 is monitored, and whether or not the number of pieces of operation data included in the critical area 5a is increased (or decreased), and, if the number of pieces of the operation data is increased (or decreased), accuracy degradation is detected.” Paragraph 63).
With respect to claim 4, Moradiannejad, Kingetsu and Liu teach the method of claim 1. Liu teaches signal data labeling (see figure 1C), while Kingetsu further teaches wherein the determining that the corresponding temporary classification label of the signal image is mislabeled comprises determining the mislabeled corresponding temporary classification label by selecting the signal image from among two signal images, of the plurality of signal images, that have been respectively classified as different labels (“For example, the correct answer label associated with the training data indicated by the cross mark is defined as the “first class”, the correct answer label associated with the training data indicated by the triangle mark is defined as the “second class”, and the correct answer label associated with the training data indicated by the circle mark is defined as the “third class”.” Paragraph 0164) and that are represented as respective points in the feature map having a same position (see figure 26 overlapping shapes), including selecting the signal image that is not classified as a third label (see figure 26 element 41B note the different shapes present within the boundary that are not triangles), and wherein the third label is a label corresponding to a direction toward the respective points with respect to a boundary line disposed around the same position (see figure 26 element 41B).
With respect to claim 15, Moradiannejad, Kingetsu and Liu render obvious all limitations in consideration of claim 1, being directed to a device that preforms the processes of claim 1. Additionally, Moradiannejad teaches a processor (“Embodiments of the present disclosure are directed to a method for detecting annotation errors. A processor receives an image that includes a first annotation, and identifies a first classifier associated with the first annotation. The processor invokes the first classifier to classify the first annotation, where the first annotation is classified with a first label. The processor transmits a message in response to classifying the first annotation with the first label, where the message is for prompting an update to the first annotation. The processor receives the image with an updated first annotation, and saves the image with the updated first annotation in a data storage device. The image may be for training an artificial intelligence machine for conducting an automated task.” paragraph 0006).
With respect to claim 16, Moradiannejad, Kingetsu and Liu teach the electronic device of claim 15 and render obvious all limitations in consideration of claim 2, being directed to a device that preforms the processes of claim 2. Additionally, Moradiannejad teaches a processor (“Embodiments of the present disclosure are directed to a method for detecting annotation errors. A processor receives an image that includes a first annotation, and identifies a first classifier associated with the first annotation. The processor invokes the first classifier to classify the first annotation, where the first annotation is classified with a first label. The processor transmits a message in response to classifying the first annotation with the first label, where the message is for prompting an update to the first annotation. The processor receives the image with an updated first annotation, and saves the image with the updated first annotation in a data storage device. The image may be for training an artificial intelligence machine for conducting an automated task.” paragraph 0006).
With respect to claim 17, Moradiannejad, Kingetsu and Liu teach the electronic device of claim 15 and render obvious all limitations in consideration of claim 3, being directed to a device that preforms the processes of claim 3. Additionally, Moradiannejad teaches a processor (“Embodiments of the present disclosure are directed to a method for detecting annotation errors. A processor receives an image that includes a first annotation, and identifies a first classifier associated with the first annotation. The processor invokes the first classifier to classify the first annotation, where the first annotation is classified with a first label. The processor transmits a message in response to classifying the first annotation with the first label, where the message is for prompting an update to the first annotation. The processor receives the image with an updated first annotation, and saves the image with the updated first annotation in a data storage device. The image may be for training an artificial intelligence machine for conducting an automated task.” paragraph 0006).
Claims 5-6, 14, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Moradiannejad, Kingetsu and Liu as applied to claim 1 above, and further in view of Redmon (US 20190102646 A1).
With respect to claim 5, Moradiannejad, Kingetsu and Liu teaches the method of claim 1, Liu teaches signal teaches data labeling (see figure 1C) and further teaches wherein the neural network comprises: an encoder configured to generate extracted data in response to a corresponding signal image of the plurality of images being input to the encoder (“the determining features may be performed by using an encoder neural network. In this case, the encoder neural network can perform feature analysis of given signals.” Paragraph 0025), but does not teach further limitations. Redmon teaches a classification header configured to output a classified label of the corresponding image (“In some implementations, the annotation includes writing the metadata to a header of a file storing the image 102.” Paragraph 0043 and “The image may be annotated 430 based on the selected class. For example, an indication of this selected class is metadata that may be associated with the image (e.g., as part of a header for the image file or as overlaid text incorporated into the image as it will be displayed.” Paragraphs 0070), and a metadata header configured to output metadata mapped to the corresponding image (“In some implementations, the annotation includes writing the metadata to a header of a file storing the image 102.” Paragraph 0043 and “For example, the metadata may include a list of regions (e.g., specified by bounding boxes) that depict an object and respective classes for those objects. In some implementations, the annotation includes writing the metadata to a header of a file storing the image 102.” Paragraph 0043).
Redmon is analogous art in the same field of endeavor as the claimed invention. Redmon is directed to image classification (“This disclosure relates to image-based object detection and classification.” Paragraph 0002). A person of ordinary skill in the art before the effective filing date of the claimed invention, would have found it obvious to combine the system of Moradiannejad, Kingetsu and Liu with Redmon by utilizing the inclusive metadata data headers of Redmon within the combined system’s neural network as a way to deliver supplementary information, with the expectation that doing so would result in enabling the system to process large amounts of data more accurately (“to harness the large amount of classification data already available and use them to expand the scope and accuracy of current object detection systems. Some implementations use a hierarchical view of object classification that enables the combination of distinct datasets together for training of convolutional neural networks for object detection and classification.” Paragraph 0025).
With respect to claim 6, Moradiannejad, Kingetsu, Liu, and Redmon teach the method of claim 5.Liu further teaches wherein the neural network further comprises a decoder configured to restore the corresponding signal image using extracted data (“the method includes determining features of a signal using the neuron network; determining an uncertainty measure of the features for classifying the signal; reconstructing the signal from the features using a decoder neuron network to produce a reconstructed signal; comparing the reconstructed signal with the signal to produce a reconstruction error; combining the uncertainty measure with the reconstruction error to produce a rank of the signal for a necessity of a manual labeling; labeling the signal according to the rank to produce the labeled signal; and training the neuron network and the decoder neuron network using the labeled signal.” Paragraph 0006), and wherein the training of the neural network comprises, for each of the plurality of signal images, training the encoder, the decoder, and the restored corresponding signal image, a classification loss with respect to a training classification label (“the method includes determining features of a signal using the neuron network; determining an uncertainty measure of the features for classifying the signal; reconstructing the signal from the features using a decoder neuron network to produce a reconstructed signal; comparing the reconstructed signal with the signal to produce a reconstruction error; combining the uncertainty measure with the reconstruction error to produce a rank of the signal for a necessity of a manual labeling; labeling the signal according to the rank to produce the labeled signal; and training the neuron network and the decoder neuron network using the labeled signal.” Paragraph 0006).
Redmon further teaches training the classification header, and the metadata header based on a representation loss with respect to the corresponding image (see figure 7) and an output label of the classification header with respect to the corresponding image (“In some implementations, the annotating 430 the image includes writing the metadata to a header of a file storing the image.” Paragraph 0070 and “For example, an indication of this selected class is metadata that may be associated with the image (e.g., as part of a header for the image file or as overlaid text incorporated into the image as it will be displayed.” Paragraph 0070), and a metadata loss with respect to a training metadata (“When the neural network encounters an image labelled for detection, error can be backpropagated based on a full loss function for the neural network.” Paragraph 0064) and an output of the metadata header (“In some implementations, the annotation includes writing the metadata to a header of a file storing the image 102. In some implementations, the annotation includes graphical annotation of the image 102 that alters pixel values to overlay images based on the metadata (e.g., drawing identified bounding boxes for object regions on the image).” Paragraph 0043).
With respect to claim 14, Moradiannejad, Kingetsu, Liu teach the method of claim 1, Moradiannejad further teaches labeling classifications of each of a plurality of images to generate the respective temporary classified labels (“Embodiments of the present disclosure are directed to a method for detecting annotation errors. A processor receives an image that includes a first annotation” paragraph 0006);and wherein, when a final epoch of the plurality of epochs is determined to be the final epoch that completes the training of the model (see figure 4), one or more final classified abnormal signal images are identified by corresponding final classification outputs in the final epoch (see figure 4) and by a corresponding final performance of the correcting of the label information (see figure 4), with each epoch of the training of the neural network including a corresponding performance of the correcting of the label information (“In act 404, the QC engine 104 determines whether a label analyzed by one of the classifiers 108 is classified as “incorrect.” If the answer is YES, the QC engine 104 transmits, in act 406, a request to the annotation engine 102 for updating the incorrect label. In one embodiment, the update request is sent if the prediction that the label is incorrect, satisfies (e.g. is above) a threshold confidence level.” Paragraph 0059)
Liu teaches signal teaches data labeling (see figure 1C) and training an encoder to perform the extraction of the latent features (“the determining features may be performed by using an encoder neural network. In this case, the encoder neural network can perform feature analysis of given signals.” Paragraph 0025).
Redmon teaches wherein the training of the neural network includes training a classification header (“In some implementations, the annotation includes writing the metadata to a header of a file storing the image 102.” Paragraph 0043 and “The image may be annotated 430 based on the selected class. For example, an indication of this selected class is metadata that may be associated with the image (e.g., as part of a header for the image file or as overlaid text incorporated into the image as it will be displayed.” Paragraphs 0070) and a metadata header (“In some implementations, the annotation includes writing the metadata to a header of a file storing the image 102. In some implementations, the annotation includes graphical annotation of the image 102 that alters pixel values to overlay images based on the metadata (e.g., drawing identified bounding boxes for object regions on the image).” Paragraph 0043) of the neural network based on image features (“The system 100 includes a convolutional neural network 110 configured to be applied to an image 102 to determine predictions 120 that include localization data 122 indicating regions within the image that are likely to depict objects of interest and classification data 124 that identifies likely classes for the objects detected in the image. For example, the localization data 122 may include the specification of one or more bounding boxes that are constrained to be centered within a region of the image corresponding to a cell of a feature map for the image, and coordinates of the one or more bounding boxes within the region are predictions of the convolutional neural network 110 included in the localization data 122.” Paragraph 0031), a classification loss (“In some implementations, the annotating 430 the image includes writing the metadata to a header of a file storing the image.” Paragraph 0070 and “For example, an indication of this selected class is metadata that may be associated with the image (e.g., as part of a header for the image file or as overlaid text incorporated into the image as it will be displayed.” Paragraph 0070), and a metadata loss with respect to a training metadata (“When the neural network encounters an image labelled for detection, error can be backpropagated based on a full loss function for the neural network.” Paragraph 0064), and a metadata loss (“When the neural network encounters an image labelled for detection, error can be backpropagated based on a full loss function for the neural network.” Paragraph 0064).
Redmon is analogous art in the same field of endeavor as the claimed invention. Redmon is directed to image classification (“This disclosure relates to image-based object detection and classification.” Paragraph 0002). A person of ordinary skill in the art before the effective filing date of the claimed invention, would have found it obvious to combine the system of Moradiannejad, Kingetsu and Liu with Redmon by utilizing the inclusive metadata data headers of Redmon within the combined system’s neural network as a way to deliver supplementary information, with the expectation that doing so would result in enabling the system to process large amounts of data more accurately (“to harness the large amount of classification data already available and use them to expand the scope and accuracy of current object detection systems. Some implementations use a hierarchical view of object classification that enables the combination of distinct datasets together for training of convolutional neural networks for object detection and classification.” Paragraph 0025).
With respect to claim 18, Moradiannejad, Kingetsu, Liu, teach the electronic device of claim 15 and in view of Redmon, render obvious all claim limitations in consideration of claim 5, due to claim 18 being directed to an electronic device that preforms the process of claim 5. 
Claims 7-10 and 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Moradiannejad, Kingetsu, Liu, and Redmon as applied to claim 5 above, and further in view of Yang (CN 112836088 A).
With respect to claim 7, Moradiannejad, Kingetsu, Liu, and Redmon teaches the method of claim 5. Liu further teaches wherein the neural network further comprises a decoder configured to restore the corresponding signal image using the extracted data (“reconstructing the signal from the features using a decoder neuron network to produce a reconstructed signal” paragraph 0006), and wherein the training of the neural network comprises: performing a first training of the encoder and the decoder (see figure 1C). Redmon teaches training a classification header and a metadata header (“When the neural network encounters an image labelled for detection, error can be backpropagated based on a full loss function for the neural network. When the neural network encounters a classification image, backpropagation of loss may be limited to loss from the classification-specific parts of the architecture.” Paragraph 0064). However, Moradiannejad, Kingetsu, Liu, and Redmon do not teach performing a second training using the first trained encoder, the classification header, and the metadata header.
Yang teaches performing a second training (“the method further comprises: training the embedded network and the coding network through the second training data set” page 5 paragraph 3 lines 2-3) using the first trained encoder (“training the embedded network through the first training data set, the coding network and the decoding network, the method further comprises: training the embedded network and the coding network through the second training data set” page 5 paragraph 3 lines 1-3), the classification header (“each second training data comprises a second training video and a second training title corresponding to the second training video” page 5 paragraph 3 lines 4-5), and the metadata header (“obtaining a second training title corresponding to the second training video, and generating respectively vector of each word in the second training title, wherein each word feature vector and each image feature vector have the same dimension; splicing the image feature vector of the plurality of second training image frame and the character feature vector of the second training title, to obtain the second training splicing feature vector sequence” page 5 paragraph 3 lines 10-14).
Yang is analogous art in the same field of endeavor as the claimed invention. Yang is directed towards labeling sequences of images (“The invention relates to the technical field of deep learning, specifically to a method for generating label corresponding to the video, device and medium.” Page 2 Technical Field). A person of ordinary skill in the art before the effective filing date of the claimed invention would have found it obvious to combine the system of Moradiannejad, Kingetsu, Liu, and Redmon with Yang, by utilizing Yang’s multi-training strategy in combination with the combined system neural network components, with the expectation that doing so would lead to increasing the strength of the combined system’s characterizing (feature detection) and ultimately corresponding labeling component (“…but the used label generating model does not consider the depth interaction of different modal features, for example, when encoding the features of different modalities, and without mutual fusion encoding features, but only after the final encoding and decoding the characteristic of the " shallow " fusion, which undoubtedly weakens the characterizing capability of the model.” Page 2 Background paragraph 3 And “In view of the above situation, it is desirable to provide a new method for generating a label” page 2 Contents of the Invention)
With respect to claim 8, Moradiannejad, Kingetsu, Liu, Redmon, and Yang teach the method of claim 7. Liu further teaches wherein the performing of the first training comprises: generating first temporary output data by a decoder header of the decoder based on the extracted data (see figure 1C element SS2), calculating a representation loss based on the calculated first temporary output data and the corresponding signal image (see figure 1C element SS4 and reconstruction error), and performing the first training of only the encoder and the decoder based on the calculated representation loss (see figure 1C element SS7).
With respect to claim 9, Moradiannejad, Kingetsu, Liu, Redmon and Yang teach the method of claim 8. Liu further teaches wherein the performing of the first training comprises performing the first training until a desire performance is achieved (“Usually, this procedure is repeated several times until a predetermined preferred performance is achieved or the budget for annotations is empty” paragraph 0023), with the plurality of signal images being used as training data for the corresponding signal image (see figure 1C), while Moradiannejad specifically teaches corresponding calculated representation loss decreases to be less than a threshold loss (“The neural network included in the image tagging assistant 110 may receive training images with erroneous annotations to learn one or more features of such training images to be able to predict, within a certain level of confidence, whether a particular captured image contains a pattern that is difficult to annotate.” Paragraph 0040)
With respect to claim 10, Moradiannejad, Kingetsu, Liu, Redmon and Yang teach the method of claim 8. Liu teaches signal teaches data labeling (see figure 1C). Redmon further teaches the classification header, and the metadata header based on a representation loss with respect to the corresponding image (see figure 7). Yang further teaches wherein the performing of the second training comprises: generating, dependent on another corresponding image being provided to the first trained encoder (“training the embedded network and the coding network through the second training data set” page 5 paragraph 3 lines 2-3), second temporary output data by  classification (“wherein the second training data set comprises a plurality of second training data, each second training data comprises a second training video and a second training title corresponding to the second training video” page 5 paragraph 3 lines 3-5) and third temporary output data by (“generating respectively vector of each word in the second training title, wherein each word feature vector and each image feature vector have the same dimension; splicing the image feature vector of the plurality of second training image frame and the character feature vector of the second training title, to obtain the second training splicing feature vector sequence” page 5 paragraph 3 lines 10-14), respectively; generating a classification loss based on the second temporary output data and a previously classified label of the other corresponding signal image (“based on the similarity between the video vector and the corresponding title vector and the similarity between the video vector and the non-corresponding title vector and the similarity between the title vector and the corresponding video vector and the similarity between the title vector and the non-corresponding video vector, calculating the fifth loss function; based on the fifth loss function, training the embedded network and the coding network.” Page 5 (bottom)-6 (top)), and a metadata loss based on the third temporary output data and previously mapped metadata of the other corresponding signal image (“based on the similarity between the video vector and the corresponding title vector and the similarity between the video vector and the non-corresponding title vector and the similarity between the title vector and the corresponding video vector and the similarity between the title vector and the non-corresponding video vector, calculating the fifth loss function; based on the fifth loss function, training the embedded network and the coding network.” Page 5 (bottom)-6 (top)); and performing the second training based on a total loss comprising the calculated representation loss, the calculated classification loss, and the calculated metadata loss (“based on the similarity between the video vector and the corresponding title vector and the similarity between the video vector and the non-corresponding title vector and the similarity between the title vector and the corresponding video vector and the similarity between the title vector and the non-corresponding video vector, calculating the fifth loss function; based on the fifth loss function, training the embedded network and the coding network.” Page 5 (bottom)-6 (top)).
With respect to claim 19, Moradiannejad, Kingetsu, Liu, Redmon teach the electronic device of claim 18 and in view of Yang render obvious all claim limitations in consideration of claim 7, due to claim 19 beginning directed to a device that performs the process of claim 7.
With respect to claim 20, Moradiannejad, Kingetsu, Liu, Redmon and Yang teach the electronic device of claim 19 and render obvious all claim limitations in consideration of claim 8, due to claim 20 beginning directed to a device that performs the process of claim 8.
With respect to claim 21, Moradiannejad, Kingetsu, Liu, Redmon and Yang teach the electronic device of claim 20 and render obvious all claim limitations in consideration of claim 10, due to claim 21 beginning directed to a device that performs the process of claim 10. Additionally, Moradiannejad teaches a processor (“Embodiments of the present disclosure are directed to a method for detecting annotation errors. A processor receives an image that includes a first annotation, and identifies a first classifier associated with the first annotation. The processor invokes the first classifier to classify the first annotation, where the first annotation is classified with a first label. The processor transmits a message in response to classifying the first annotation with the first label, where the message is for prompting an update to the first annotation. The processor receives the image with an updated first annotation, and saves the image with the updated first annotation in a data storage device. The image may be for training an artificial intelligence machine for conducting an automated task.” paragraph 0006).
Allowable Subject Matter
Claims 11-13 and 22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Hao (US 20220138935 A1) – discloses an imbalanced dataset classification model
Teshima (US 11250295 B2) – discloses cluster based labeling and neural network training and finetuning based on feature similarity. 
Kim (US 20200394526 A1) – discloses a system utilizing a neural network, encoder, and decoder that detects (classifies) abnormal signal data based on a threshold.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to REBECCA C WILLIAMS whose telephone number is (571)272-7074. The examiner can normally be reached M-F 7:30am - 4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew W Bee can be reached at (571)270-5183. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/REBECCA COLETTE WILLIAMS/Examiner, Art Unit 2677                                                                                                                                                                                                        
/ANDREW W BEE/Supervisory Patent Examiner, Art Unit 2677
Read full office action
METHOD AND ELECTRONIC DEVICE WITH REPRESENTATION LEARNING

This examiner grants 43% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD AND ELECTRONIC DEVICE WITH REPRESENTATION LEARNING

This examiner grants 43% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email