Embodiments of the disclosure discloses method and apparatus for facial beauty prediction, device and storage medium. The method includes the steps of classifying a training set into a plurality of first images with noise labels and a plurality of second images with non-noise labels; performing a re-weighting processing on the plurality of the second images; training a first model by using a target training set to obtain a second model; labeling image data by using the second model to obtain first data with labels and second data without labels; generating third data with pseudo labels through a classifier by using the second data; training the classifier according to the first data, the second data and the third data; processing an image to be predicted through a model with a target classifier to obtain a facial beauty prediction result.
CROSS-REFERENCE TO RELATED APPLICATION
This application is a national stage filing under 35 U.S.C. § 371 of international application number PCT/CN2023/095555, filed May 22, 2023, which claims priority to Chinese patent application No. 202310525485.5 filed May 10, 2023. The entire contents of these applications are incorporated herein by reference in their entirety.
FIELD OF THE INVENTION
Embodiments of the disclosure relate to, but are not limited to, the field of image recognition, in particular to a method and an apparatus for facial beauty prediction, a device and a storage medium.
BACKGROUND OF THE INVENTION
Relative facial beauty prediction methods usually require a large amount of labeled data for model training. Meanwhile, when a noise sample is labeled, a label quality is influenced by subjective factors of manual or machine data labeling, a tool technology and other factors, thereby introducing a label noise. A label noise problem can greatly influence an accuracy of a model, and a facial beauty prediction effect is reduced.
SUMMARY OF THE INVENTION
The following is a summary of the subject matter described in detail herein. The summary is not intended to limit the scope of protection of the claims.
The embodiments of the disclosure disclose method an apparatus for facial beauty prediction, device and storage medium, which can weaken dependence of a model on noise labels and enhance an utilization effect of unlabeled data.
According to an embodiment of a first aspect of the present disclosure, a facial beauty prediction method may include following steps.
A training set of facial beauty prediction images is classified to obtain a plurality of first images with noise labels and a plurality of second images with non-noise labels.
A re-weighting processing is performed on the plurality of the second images.
A target training set is formed by the plurality of the first images and the plurality of the re-weighted second images, and a first model is trained by using the target training set to obtain a second model.
A labeling processing is performed on image data by using the second model to obtain first data with labels and second data without labels.
The second data is trained through a classifier to generate third data with pseudo labels.
The classifier is trained according to the first data, the second data and the third data to obtain a target classifier.
A facial beauty prediction is performed on an image to be predicted through a facial beauty prediction model with the target classifier to obtain a facial beauty prediction result.
In some embodiments of the first aspect of the present disclosure, the classifying a training set of facial beauty prediction images to obtain a plurality of first images with noise labels and a plurality of second images with non-noise labels may include following sub-steps.
A probability calculation is performed according to the training set of the facial beauty prediction images to obtain a plurality of probability values each of which indicates whether a corresponding facial beauty prediction image has a noise label.
A joint distribution of the training set is obtained according to the plurality of probability values.
The training set is classified according to the joint distribution to obtain the plurality of first images with noise labels and the plurality of second images with non-noise labels.
In some embodiments of the first aspect of the present disclosure, the classifying the training set according to the joint distribution to obtain the plurality of first images with noise labels and the plurality of second images with non-noise labels may include following sub-steps.
An expected risk of classification is obtained according to the joint distribution of the training set, a real distribution of the training set, a weight of the facial beauty prediction image and a loss function.
The training set is classified according to the expected risk to obtain the plurality of first images with noise labels and the plurality of second images with non-noise labels.
In some embodiments of the first aspect of the present disclosure, the weight of the facial beauty prediction image is determined by the joint distribution of the training set and a noise rate.
In some embodiments of the first aspect of the present disclosure, the noise rate is a minimum value of the joint distribution of the training set within a preset range.
In some embodiments of the first aspect of the present disclosure, when a value of the joint distribution of the training set is not equal to 0, the weight is not negative; when the value of the joint distribution of the training set is equal to 0, the weight is equal to 0.
In some embodiments of the first aspect of the present disclosure, the training the classifier according to the first data, the second data and the third data to obtain a target classifier may include following sub-step.
The classifier is trained according to the first data, the second data and the third data until the classifier converges to the target classifier.
According to an embodiment of a second aspect of the present disclosure, an apparatus for facial beauty prediction may include following units.
A re-weighting unit is configured to classify a training set of facial beauty prediction images to obtain a plurality of first images with noise labels and a plurality of second images with non-noise labels, perform a re-weighting processing on the plurality of the second images, form a target training set by the plurality of the first images and the plurality of the re-weighted second images, and train a first model by using the target training set to obtain a second model.
A self-training unit is configured to perform labeling processing on image data by using the second model to obtain first data with labels and second data without labels, train the second data through a classifier to generate third data with pseudo labels, train the classifier according to the first data, the second data and the third data to obtain a target classifier.
A prediction unit is configured to perform facial beauty prediction on an image to be predicted through a facial beauty prediction model with the target classifier to obtain a facial beauty prediction result.
According to an embodiment of a third aspect of the disclosure, an electronic device may include: a memory, a processor and a computer program stored on the memory and executable on the processor. The computer program is executed by the processor to implement the facial beauty prediction method as described above.
According to an embodiment of a fourth aspect of the present disclosure, a computer-readable storage medium stores computer-executable instructions for performing the facial beauty prediction method as described above.
BRIEF DESCRIPTION OF THE DRAWINGS
Accompanying drawings are used to disclose a further understanding of the technical schemes of the present disclosure and constitute a part of the specification, in conjunction with embodiments of the present disclosure to explain the technical schemes thereof, and do not constitute a limitation on the technical schemes of the present disclosure.
FIG. 1 is a flowchart of a facial beauty prediction method according to an embodiment of the present disclosure;
FIG. 2 is a sub-flowchart of step S100;
FIG. 3 is a sub-flowchart of step S130;
FIG. 4 is a structure diagram of a facial beauty prediction apparatus according to an embodiment of the present disclosure.
DETAILED DESCRIPTION OF THE EMBODIMENTS
In order to make the purposes, technical schemes and advantages of the present disclosure clearer, the present disclosure is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are intended only to interpret the disclosure, and shall not be constructed to limit the disclosure.
It is noted that while functional module divisions are disclosed in the diagram of the apparatus and logical sequences are shown in flowcharts, in some cases, module divisions of the apparatus may be different, and steps shown or described may be performed in sequences other than the sequences in the flowcharts. Terms such as “first”, “second” and the like in the description, claims, or accompany drawings are used for distinguishing similar objects and not necessarily for describing a particular sequential or chronological order.
The embodiments of the present disclosure will be further explained with reference to the accompany drawings.
An embodiment of the disclosure discloses a method for facial beauty prediction.
Referring to FIG. 1, the facial beauty prediction method includes following steps.
In a step of S100, a training set of facial beauty prediction images is classified to obtain a plurality of first images with noise labels and a plurality of second images with non-noise labels.
In a step of S200, a re-weighting processing is performed on the plurality of the second images.
In a step of S300, a target training set is formed by the plurality of the first images and the plurality of the re-weighted second images, and a first model is trained by using the target training set to obtain a second model;
In a step of S400, a labeling processing is performed on image data by using the second model to obtain first data with labels and second data without labels;
In a step of S500, the second data is trained through a classifier to generate third data with pseudo labels;
In a step of S600, the classifier is trained according to the first data, the second data and the third data to obtain a target classifier.
In a step of S700, a facial beauty prediction is performed on an image to be predicted through a facial beauty prediction model with the target classifier to obtain a facial beauty prediction result.
Firstly, the training set is obtained. The training set includes a plurality of facial images are used as the facial beauty prediction images.
For example, a Large Scale Asian Facial Beauty Database (LSAFBD) may be used to obtain the training set of the facial beauty prediction images.
Referring to FIG. 2, in the step S100, the classifying a training set of facial beauty prediction images to obtain a plurality of first images with noise labels and a plurality of second images with non-noise labels, includes, but is not limited to, the following steps.
In a step of S110, a probability calculation is performed according to the training set of the facial beauty prediction images to obtain a plurality of probability values each of which indicates whether a corresponding facial beauty prediction image has a noise label.
In a step of S120, a joint distribution of the training set is obtained according to the plurality of probability values.
In a step of S130, the training set is classified according to the joint distribution to obtain the plurality of the first images with noise labels and the plurality of the second images with non-noise labels.
It should be understood that a joint distribution function is also called a multidimensional distribution function. Taking the two-dimensional case as an example, (X, Y) is a two-dimensional random variable, and x and y are any real numbers, then a binary function F(X,Y)=P({X≤x∩Y≤y})=P(X≤x,Y≤y) is called a distribution function of the two-dimensional random variable (X, Y), or a joint distribution function of X and Y.
Referring to FIG. 3, in the step S130, the classifying the training set according to the joint distribution to obtain the plurality of first images with noise labels and the plurality of second images with non-noise labels, includes following steps.
In a step S131, an expected risk of classification is obtained according to the joint distribution of the training set, a real distribution of the training set, a weight of the facial beauty prediction image and a loss function.
In a step S132, the training set is classified according to the expected risk to obtain the plurality of first images with noise labels and the plurality of second images with non-noise labels.
For step S131, the expected risk of classification is obtained according to the joint distribution of the training set, the real distribution of the training set, the weight of the facial beauty prediction image and the loss function.
Further, a noise label data classification algorithm is disclosed. Firstly, the expected risk of classification should satisfy the following relational expression:
R
l
,
D
(
f
)
=
R
[
D
,
f
,
l
]
=
E
(
X
,
Y
)
~
D
[
l
(
f
(
X
)
,
Y
)
]
=
E
(
X
,
Y
ˆ
)
~
D
ρ
[
P
D
(
X
,
Y
)
P
D
ρ
(
X
,
Y
ˆ
)
l
(
f
(
X
)
,
Y
ˆ
)
]
=
R
[
D
ρ
,
f
,
P
D
(
X
,
Y
)
P
D
ρ
(
X
,
Y
ˆ
)
l
(
f
(
X
)
,
Y
ˆ
)
]
=
R
[
D
ρ
,
f
,
β
(
X
,
Y
ˆ
)
l
(
f
(
X
)
,
Y
ˆ
)
]
=
R
β
l
,
D
ρ
(
f
)
;
where, l denotes a loss function, PD(X,Y) denotes the real distribution of the training set, PDρ(X|Ŷ) denotes the joint distribution of the training set, and β(X|Ŷ) denotes the weight of the facial beauty prediction image.
For example, the weight of the facial beauty prediction image is determined by the joint distribution of the training set and a noise rate, i.e.
β
(
X
,
Y
ˆ
)
=
P
D
ρ
(
Y
ˆ
❘
X
)
(
1
-
ρ
-
Y
ˆ
)
P
D
ρ
(
Y
ˆ
❘
X
)
;
where, β(X,Ŷ) is the weight of the human facial beauty prediction image.
The noise rate is a minimum value of the joint distribution of the training set within a preset range. The noise rate can be expressed by the following equation:
ρ
-
Y
ˆ
=
min
X
∈
χ
P
D
ρ
(
Y
ˆ
❘
X
)
.
When a value of the joint distribution of the training set is not equal to 0, the weight is not negative; otherwise, when the value of the joint distribution of the training set is equal to 0, the weight is equal to 0.
In the step S200, the plurality of the second images with non-noise labels are subjected to the re-weighting processing.
Generally, the re-weighting processing may be configured to increase weights of the plurality of the second images with non-noise labels/the first images with noise labels, or decrease the weights of the plurality of the second images with non-noise labels/the first images with noise labels, or filter the plurality of the second images with non-noise labels/the first images with noise labels. The weight of labeled data is adjusted by the re-weighting processing.
The re-weighting means that training data with noise labels is weighted again, so that correct data can obtain more weight, thereby improving the accuracy and robustness of the model. Noise labels are typically generated by manual or automatic labeling, so it has negative influences for model performance. The labeled data needs to be detected and re-weighted to reduce the influence on the model. A specific strategy may be selected according to a data set and a model requirement, for example, a probability modeling based method such as deep learning is adopted, so that the influence of noise labels is effectively reduced, and the model precision is improved.
In addition, the plurality of the first images with noise labels are subjected to the filtering processing.
In the step S300, both the plurality of the first images and the plurality of the re-weighted second images form the target training set, and the first model is trained by using the target training set to obtain the second model.
The model can accurately find the plurality of the first images with noise labels and accurately estimate a joint distribution of the plurality of the first images with noise labels and the plurality of the second images with non-noise labels.
In the step S400, the image data is labeled by using the second model to obtain the first data with labels and the second data without labels.
In the step S500, the second data is trained through the classifier to generate the third data with pseudo labels, where, the classifier is obtained from training the first data with labels.
In the step S600, the training the classifier according to the first data, the second data and the third data to obtain a target classifier, includes but not limited to the following steps.
The classifier is trained according to the first data, the second data and the third data until the classifier converges to be the target classifier.
According to the above semi-supervised learning, the model is trained and optimized by the unlabeled data. The basic idea is that a basic model is trained by labeled data, then the model is configured to predict unlabeled data. A part of the unlabeled data with high confidence coefficient is selected as new labeled data, and the training process is repeated until the model converges or reaches preset times. Advantages of the algorithm are that the unlabeled data can be fully utilized, a generalization performance of the model is improved, and the algorithm is particularly suitable for application under a condition of less labeled data.
In the step S700, facial beauty prediction is performed on an image to be predicted through the facial beauty prediction model with the target classifier to obtain a facial beauty prediction result.
The method for facial beauty prediction can utilize noise label re-weighting learning to perform the re-weighting processing on the training set, thereby reducing dependence of the model on the noise labels, and improving a classification performance and a generalization capability of the model. In a model self-training process, samples with uncertain classification, as noise labels, are filtered out or subjected to a label re-weighting process, so that only samples with high confidence coefficient are reserved for model training. A utilization effect of unlabeled data is enhanced. Precision of the model is improved in a self-adaptive manner, and an overfitting problem is avoided.
In addition, when prediction is carried out by using the unlabeled data, a prediction result with high confidence coefficient is used as newly added labeled data and is added into the training data, thereby improving a utilization rate of the samples and the performance of the model. In an iteration process, labeled data are continuously added, and a data labeling quality is gradually improved, so that the model has strong robustness.
In addition, for factors such as label exceptions or noises in the data, the algorithm can automatically adjust the model to ensure the stability and the accuracy of the algorithm. Meanwhile, the algorithm is based on a simple and efficient algorithm framework, and has a relatively high running speed.
An embodiment of the disclosure discloses an apparatus for facial beauty prediction.
Referring to FIG. 4, the facial beauty prediction apparatus includes: a re-weighting unit 10, a self-training unit 20 and a prediction unit 30.
The re-weighting unit 10 is configured to classify a training set of facial beauty prediction images to obtain a plurality of first images with noise labels and a plurality of second images with non-noise labels, perform a re-weighting processing on the plurality of second images, form a target training set by the plurality of first images and the plurality of the re-weighted second images, and train a first model by using the target training set to obtain a second model.
The self-training unit 20 is configured to perform labeling processing on image data by using the second model to obtain first data with labels and second data without labels, use the second data to train through a classifier to generate third data with pseudo labels, and train the classifier according to the first data, the second data and the third data to obtain a target classifier.
The prediction unit 30 is configured to perform facial beauty prediction on an image to be predicted through a facial beauty prediction model with the target classifier to obtain a facial beauty prediction result.
It can be understood that the apparatus for facial beauty prediction in the embodiment adopts the facial beauty prediction method as described above. Each unit of the apparatus for facial beauty prediction in the embodiment corresponds to the steps of the method for facial beauty prediction described above, so that the same technical problems that the facial beauty prediction method solved as described above are solved, and the facial beauty prediction apparatus has the same beneficial effects as the facial beauty prediction method described above.
An embodiment of the present disclosure discloses an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor. The computer program is executed by the processor to implement the method for facial beauty prediction as described above.
The electronic device can be any intelligent terminal including a tablet computer, a vehicle-mounted computer and the like.
The above schemes at least have the following beneficial effects. Noise label re-weighting learning can be utilized to perform a re-weighting processing on a training set, so that dependence of a model on the noise label is weakened, and a classification performance and a generalization capability of the model are improved. In a model self-training process, samples with uncertain classification are taken as noise labels, and a filtering or a label re-weighting processing is carried out, so that only samples with high confidence coefficient are reserved for model training. A utilization effect of unlabeled data is enhanced. Precision of the model is improved in a self-adaptive manner, and an overfitting problem is avoided. In addition, when prediction is carried out by using the unlabeled data, a prediction result with high confidence coefficient is used as newly added labeled data and is added into the training data, so that a utilization rate of the samples is improved, and a performance of the model is improved. In an iteration process, labeled data are continuously added, and a data labeling quality is gradually improved, so that the model has strong robustness. In addition, for factors such as label exceptions or noises in the data, the algorithm can automatically adjust the model, and a stability and an accuracy of the algorithm are ensured. Meanwhile, the algorithm is based on a simple and efficient algorithm framework, and has a relatively high running speed.
Generally, for a hardware structure of the electronic device, the processor may be implemented in the form of universal CPU (Central Processing Unit), microprocessor, ASIC (Application Specific Integrated Circuit), or one or more integrated circuits, and is configured to execute relevant programs to implement the technical schemes disclosed in the embodiments of the present disclosure.
The memory may be implemented in the form of ROM (Read Only Memory), static storage device, dynamic storage device, or RAM (Random Access Memory). The memory can store an operating system and other application programs. When the technical schemes disclosed in the embodiments of the present disclosure are implemented by software or firmware, relevant program codes are stored in the memory and called by the processor to execute the method of the embodiments of the present disclosure.
An input or output interface is configured to realize information input and output.
A communication interface is used for realizing communication interaction between the device and other devices, and can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WiFi®, Bluetooth® and the like).
A bus transfers information between various components of the device, such as the processor, the memory, the input or output interface, and the communication interface. The processor, the memory, the input or output interface and the communication interface are communicatively connected to each other within the device via a bus.
An embodiment of the present disclosure discloses a computer-readable storage medium, which stores computer-executable instructions for performing the method for facial beauty prediction as described above.
It should be recognized that the steps of the method in the embodiments of the present disclosure may be implemented or carried out in computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer-readable memory. The method may use standard programming techniques. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a specific programmed integrated circuit for this purpose.
Further, operations of the processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes (or variations and/or combinations thereof) described herein may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as codes (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the methods may be implemented in any type of computing platform operatively connected to a suitable form, including but not limited to personal computer, smartphone, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging devices, or the like. Aspects of the disclosure may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated onto a computing platform, such as a hard disk, an optically read and/or write storage medium, RAM, ROM, etc., so that it is readable by a programmable computer. When the storage medium or device is read by the computer, the computer is configured and operated to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The disclosure described herein includes these and other different types of non-transitory computer-readable storage medium when such medium includes instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The disclosure also includes the computer itself when programmed according to the methods and techniques described herein.
A computer program can be applied to input data to perform the functions described herein to transform input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the disclosure, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
While embodiments of the present disclosure have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and variations can be made to the embodiments without departing from the principles and spirit of the disclosure. The scope of the disclosure is defined by the claims and their equivalents.
The above is a detailed description of preferred embodiments of the disclosure, but the disclosure is not limited to the embodiments. Those having ordinary skill in the art can make various equivalent variations or substitutions without violating the disclosure, and these equivalent variations or substitutions are included in the scope defined by the embodiments.Source: ipg260505.zip (2026-05-05)