This project analyzes the factors influencing studentsโ final exam performance using statistical methods and multiple linear regression modeling. The analysis includes data preprocessing, descriptive statistics, correlation analysis, visualization, outlier detection, model building, and evaluation.
The goal is to identify which academic and behavioral attributes best predict the final exam score (Exam_Score) and to evaluate how accurately these variables can be used for prediction.
- Number of Records: 6,607
- Number of Features: 20
- File Format: CSV
- Target Variable:
Exam_Score
AttendanceHours_StudiedPrevious_ScoresTutoring_SessionsAccess_to_Resources(encoded into dummy variables)
The project includes the following steps:
- Removal of missing values
- One-hot encoding for categorical variables
- Dataset cleaning and feature extraction
- Summary statistics for all selected variables
- Distribution inspection using histograms
- Pearson correlation with the target variable
- Identification of the strongest predictors
- Boxplots
- IQR (Interquartile Range) method
- Decision to retain outliers as valid observations
- Histograms
- Scatter plots with regression lines
- Bar charts for resource access levels
- Residual and QQ-plots for diagnostics
A multiple linear regression model was built using the selected predictors.
- Rยฒ: 0.6824
- RMSE: 2.08
- MAE: 1.12
- Residual diagnostics to verify model assumptions
- Attendance, hours studied, and tutoring sessions were the strongest positive predictors of exam performance.
- Limited access to educational resources significantly decreased exam scores.
- The regression model performed well and explained approximately 68% of the variance in exam scores.
- Outliers were valid and retained, and residual diagnostics confirmed model reliability.
โ StudentPerformance-final.ipynb - Final analysis and model โ Relationship of variables.ipynb - Full correlation and visual exploration โ StudentPerformanceFactors.csv - Datase โ README.md # Documentation
- Install required Python libraries:
pip install pandas numpy matplotlib scikit-learn scipy