This project automates the fault detection process in solar photovoltaic (PV) systems using drone-captured thermal images. By leveraging machine learning models, the goal is to identify defective modules based on temperature anomalies captured by infrared cameras.
Solar PV systems are prone to module failures due to operational stresses and installation errors. Traditional inspection methods are time-consuming and labor-intensive. This project addresses these challenges by developing a machine learning pipeline to analyze drone-captured thermal images and identify defective modules efficiently.
The dataset, sourced from Kaggle, consists of:
- Thermal (IR) images: Captured by drones.
- Annotations: Marking the corners and centers of each PV module and labeling them as "defective" or "non-defective."
- Imbalanced Data: Few examples of faulty modules compared to non-faulty ones.
- Limited Thermal Data: Addressed using feature extraction techniques.
- Temperature Statistics: Mean, standard deviation, maximum, and range.
- Distribution Metrics: Skewness and kurtosis.
- Fault Detection: Labels from annotations indicating module condition.
The following models were tested:
- Support Vector Machine (SVM)
- Random Forest Classifier
- Gradient Boosting Classifier
- K-Nearest Neighbors (KNN)
- Logistic Regression
Due to the imbalanced dataset, the F1-Score was chosen as the primary evaluation metric, ensuring a balance between precision and recall.
Each model was optimized using GridSearchCV with 5-fold cross-validation. The best parameters for each model were:
- C: 10
- Solver: liblinear
- Class Weight: None
- Penalty: l1
- C: 0.01
- Kernel: Linear
- Gamma: Scale
- Class Weight: Balanced
- Degree: 2
- Algorithm: auto
- Leaf Size: 20
- n_neighbors: 3
- Weights: uniform
- Max Depth: None
- Min Samples Split: 2
- n_estimators: 200
- Learning Rate: 0.1
- Loss: Exponential
- Max Depth: 5
- Min Samples Split: 2
- n_estimators: 50
Each model was optimized using GridSearchCV with 5-fold cross-validation.

- F1-Score: [66.67, 86.96, 77.78, 78.26, 88.00]
- Recall: [53.33, 83.33, 70.00, 64.29, 84.62]
- Precision: [88.89, 90.91, 87.50, 100.00, 91.67]
- Accuracy: [97.07, 98.90, 98.53, 98.16, 98.90]
- F1-Score: [100.00, 81.48, 84.21, 76.92, 88.00]
- Recall: [100.00, 91.67, 80.00, 71.43, 84.62]
- Precision: [100.00, 73.33, 88.89, 83.33, 91.67]
- Accuracy: [100.00, 98.17, 98.90, 97.79, 98.90]
- F1-Score: [80.00, 76.19, 70.59, 80.00, 91.67]
- Recall: [66.67, 66.67, 60.00, 71.43, 84.62]
- Precision: [100.00, 88.89, 85.71, 90.91, 100.00]
- Accuracy: [98.17, 98.17, 98.16, 98.16, 99.26]
- F1-Score: [63.64, 81.82, 84.21, 78.26, 78.26]
- Recall: [46.67, 75.00, 80.00, 64.29, 69.23]
- Precision: [100.00, 90.00, 88.89, 100.00, 90.00]
- Accuracy: [97.07, 98.53, 98.90, 98.16, 98.16]
- F1-Score: [75.00, 81.82, 66.67, 80.00, 81.82]
- Recall: [60.00, 75.00, 60.00, 71.43, 69.23]
- Precision: [100.00, 90.00, 75.00, 90.91, 100.00]
- Accuracy: [97.80, 98.53, 97.79, 98.16, 98.53]
we clearly notice SVM outperforming the other algorithms.
The SVM (Support Vector Machine) model achieved the best performance among all tested models, using the following optimal hyperparameters:
- C: 0.01
- Kernel: Linear
- Gamma: Scale
- Class Weight: Balanced
- Degree: 2
Upon testing, the model yielded the following average performance metrics:

- Accuracy: 98.60%
- F1-Score: 86.11%
- Precision: 86.53%
- Recall: 87.56%
- Kaggle dataset "Photovoltaic system thermography" by Marcos Gabriel. link
- Infrared thermal imaging for fault detection in solar panels dataset by Technodivesh
This project was made by Ouassim Milous as part of the Machine Learning for Robotics II course by professor Luca Oneto at the University of Genoa.

