Deep learning based multi attribute evaluation for holistic student assessment in physical education

The Multi-Attribute Evaluation Model, MAEM for physical education, combines different parameters of student development using data-driven tools and deep learning algorithms. Below are the key steps involved in developing and implementing the proposed model.
Problem definition and framework design
The objective is to compute a holistic evaluation score SSS for each student based on multiple attributes.
$${H}_{e}=\sum_{j=1}^{N}{W}_{j}{S}_{j,j}$$
(1)
where:
-
\({H}_{e}\): Holistic evaluation score for student e,
-
N: Number of attributes (e.g., Physical, Cognitive, Emotional, etc.)
-
\({W}_{j}\): Weight assigned to the j-th attribute
-
\({S}_{j,j}\): Score of the j-th attribute for the student e.
This equation combines multiple attributes into a single holistic score for each student, where the weight of each attribute reflects its importance in the overall evaluation.
Data collection
For multi-source data aggregation, the collected dataset Y can be represented as:
$$Y=\bigcup_{i=1}^{M}{Y}_{i}$$
(2)
where.
-
M: Number of sources (e.g., surveys, tests, external assessments)
-
\({Y}_{i}\): Data from the i-th source.
This equation demonstrates how various data sources are combined into a unified dataset for student evaluation.
If multiple scores are available for the same attribute, aggregated scores are computed as:
$${Q}_{m,k}= \frac{1}{M}\sum_{i=1}^{M}{Q}_{m,k}^{i}$$
(3)
where
This aggregation step ensures that scores from various sources are averaged to give a balanced evaluation of each attribute for each student. Figure 1 shows some data collected from different sources.

Data collected from different sources.
Preprocessing and feature engineering
Handling missing values
For numerical attributes, missing values are filled using the mean:
$${y}_{m,n}=\frac{\sum_{i-1}^{iM}{N}_{i,n}.{y}_{i,n}}{\sum_{i-1}^{iM}{N}_{i,n}}$$
(4)
where:
-
\({y}_{m,n}\): Filled value for the n-th attribute of student m.
-
\({N}_{i,n}\): Binary mask indicating whether the i-th value is missing (0 if missing, 1 otherwise)
This equation describes how missing values are filled using the mean, ensuring no data is lost during preprocessing. For categorical attributes, missing values are filled using the mode.
Figure 2 visually shows the number of missing values per attribute, displaying how missing data were handled during the preprocessing stage. The number of missing values for each attribute is plotted on the bar chart, highlighting the proportion of missing data across attributes. This helps in understanding how much data was imputed and which attributes required more attention during the cleaning process. Missing values were handled by imputing numerical attributes with the mean and categorical attributes with the mode, ensuring that no data was lost during preprocessing. The chart visually represents the number of missing values across different attributes, with the highest missing values observed for Age, Motivation, and Student Type, as shown in Fig. 2.

Number of missing values.
Normalization
Numerical features are scaled to the range [0,1] presented in Fig. 3:

$${\widehat{y}}_{m.n}=\frac{{y}_{m,n}-\text{min}({y}_{n})}{\text{max}\left({y}_{n}\right)-\text{min}({y}_{n})}$$
(5)
This normalization step ensures that all numerical features are scaled to a uniform range, making them comparable across different attributes.
Feature extraction
New features are derived as combinations of existing attributes, e.g.:
$$Leadership\_Teamwork=(Leadership+Teamwork)/2$$
(6)
$$Social\_Creativity=(Social+Creativity)/2$$
(7)
These new features combine existing attributes to capture meaningful relationships between different aspects of student development. Figure 4 illustrates the boxplot of extracted features, showing the distribution of the combined features such as Leadership_Teamwork and Social_Creativity. This visualization helps to understand the spread, median, and outliers in the newly derived features, which capture the relationships between different aspects of student development. These new features combine existing attributes to capture meaningful relationships between various aspects of student growth. The Leadership_Teamwork feature, for instance, has a wider spread, indicating more variability in scores, while Social_Creativity shows some significant outliers, suggesting that certain students scored very differently from the group’s median. This figure highlights how the derived features reveal key insights into students’ multi-dimensional development.

Boxplot of extracted features.
Development of deep learning models
The deep learning model processes numerical and categorical inputs through a unified architecture.
Input representation
Let.
Model structure
$$L_{nni} = {\text{ReLU}}\left( {X_{nni} Y_{nni} + a_{nni} } \right)$$
(8)
$$L_{ohe} = {\text{ReLU}}\left( {X_{ohe} Y_{ohe} + a_{ohe} } \right)$$
(9)
These equations describe how normalized numerical and one-hot encoded categorical data are processed through ReLU activation functions, allowing the model to learn complex, non-linear relationships between the attributes.
The combined representation is processed through dense layers:
$$L=ReLU({X}_{oa}\left[{{L}_{nni}:L}_{ohe}\right]+{a}_{oa})$$
(10)
This equation shows how the outputs from numerical and categorical inputs are combined and passed through dense layers, enabling the model to capture interactions between different data types.
Output
The final output layer predicts the holistic score S:
$$O=Softmax({X}_{oa}L+{a}_{oa})$$
(11)
The softmax function in this equation converts the raw output into a probability distribution, helping the model predict a holistic score that sums up the contributions of all attributes.
Multi-attribute scoring and assessment
Scores are aggregated using weighted sums:
$${H}_{e}=\sum_{j=1}^{N}{W}_{j}{\widehat{S}}_{i,j}$$
(12)
where
This aggregation process calculates the final holistic score by weighting and summing the contributions of each attribute.
Feedback mechanism
The gap \({D}_{m,j}\) Between the target and actual scores is calculated for each attribute:
$${D}_{m,j}={P}_{j}-{\widehat{S}}_{m,j}$$
(13)
where,
-
\({P}_{j}\): Target score for the j-th attribute.
-
\({\widehat{S}}_{m,j}\): Predicted or actual score for the j-th attribute of student m.
This feedback mechanism calculates the difference between the predicted and target scores, providing insights into areas that require improvement.
Pilot testing and model validation
Validation metrics are used to evaluate the model: Mean Absolute Error (E):
$$E=1/M(\sum_{m=1}^{M}|{A}_{m}-{P}_{m}|)$$
(14)
Coefficient of Determination (R2):
$${R}^{2}=1-\frac{\sum_{m=1}^{M}{({P}_{m}{-A}_{m})}^{2}}{\sum_{m=1}^{M}{({P}_{m}-\overline{P })}^{2}}$$
(15)
where \(\overline{P }\) is the mean of the target scores.
These equations are used to evaluate how accurately the model predicts the students’ holistic scores, with R2 indicating the proportion of variance explained by the model. Figure 5 illustrates the training and validation loss throughout the model’s training process.

Training and validation loss.
Monitoring and continuous improvement
Drift detection
Monitor the difference in feature distributions over time using metrics like Kullback–Leibler (KL) divergence:
$$E(O||R)=\sum_{j}O\left(j\right)\text{log}\frac{O\left(j\right)}{R\left(j\right)}$$
(16)
where O and R are the probability distributions of features, this equation monitors how feature distributions shift over time, indicating potential drifts in the data.
Retraining
Retrain the model when performance metrics fall below a threshold:
$$Trigger{ }\;Retraining\;if\;{ }R^{2} < th.$$
(17)
This condition ensures that the model remains updated by triggering retraining if its predictive accuracy drops. Figure 6 demonstrates the feature drift detection, comparing the feature distributions at Time 1 and Time 2. The blue bars represent the feature values at Time 1, while the orange bars correspond to the feature values at Time 2, highlighting any shifts in the feature distribution over time.”

link