This project analyzes the Sonar dataset using various neural network architectures and classical classification methods classification methods: Linear Discriminant Analysis (LDA). Quadratic Discriminant Analysis (QDA). Regularized Discriminant Analysis (RDA). Support Vector Machines (SVMs). Naïve Bayes.
Comparing Classification Accuracy among all models.
We perform 100 independent replications (i.e., 100 independent training/testing splits) to ensure the robustness of our results.
-
Fit four different neural networks:
- Two distinct single hidden layer neural networks:
- A single hidden layer with 15 nodes.
- A single hidden layer with 18 nodes.
- Two distinct neural networks with two hidden layers: 3. Two hidden layers with 23 and 20 nodes. 4. Two hidden layers with 28 and 23 nodes.
- Two distinct single hidden layer neural networks:
-
Compare the accuracy of these four neural networks.
-
Compare neural network performance to other classification methods:
- Linear Discriminant Analysis (LDA).
- Quadratic Discriminant Analysis (QDA).
- Regularized Discriminant Analysis (RDA).
- Support Vector Machines (SVMs).
- Naïve Bayes.
The Sonar dataset is used for this analysis. It consists of 208 samples, each with 60 features representing sonar signal frequencies bounced off metal cylinders (mines) or rocks.
- Source: UCI Machine Learning Repository
The project is organized into modular components for clarity and maintainability:
project/
├── data/
│ └── sonar_data.csv # Dataset
├── results/ # Results and figures
│ ├── accuracy_table.csv # Accuracy results from experiments
│ └── figures/ # Plots and figures generated
├── data_preprocessing.py # Data loading and preprocessing functions
├── evaluation.py # Model evaluation logic
├── neural_networks.py # Neural network architectures
├── classification_methods.py # Classical classification methods
├── main.py # Main script
├── requirements.txt # Lists required Python libraries
└── README.md # Documentation
- Language: Python 3.7 or higher
- Libraries:
- Data Manipulation:
pandas
,numpy
- Machine Learning Models:
scikit-learn
,tensorflow
(Keras API) - Visualization:
matplotlib
,seaborn
- Data Manipulation:
Implemented using TensorFlow's Keras API:
- Single Hidden Layer Networks:
- Model with 15 nodes.
- Model with 18 nodes.
- Two Hidden Layer Networks:
- Model with 23 and 20 nodes.
- Model with 28 and 23 nodes.
Implemented using scikit-learn:
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)
- Regularized Discriminant Analysis (RDA) (custom implementation or via appropriate library)
- Support Vector Machines (SVM)
- Naïve Bayes
- Python 3.7 or higher
- Recommended: Virtual environment (
venv
orconda
)
-
Clone the Repository:
git clone https: https://github.com/Arek-KesizAbnousi/Neural_Networks.git cd Neural_Networks
- Feature Transformation: Applied logarithmic transformation to the predictor variables to normalize the data.
- Target Variable Encoding: Converted class labels ('M' for Mine, 'R' for Rock) to binary format (1 for Mine, 0 for Rock).
- Train/Test Splits: 100 independent replications with a training set size of 158 and a testing set size of 50.
- Evaluation Metric: Classification accuracy.
-
Neural Networks:
- Single hidden layer with 15 nodes.
- Single hidden layer with 18 nodes.
- Two hidden layers with 23 and 20 nodes.
- Two hidden layers with 28 and 23 nodes.
-
Classical Classification Methods:
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)
- Regularized Discriminant Analysis (RDA)
- Support Vector Machines (SVM)
- Naïve Bayes
The average accuracies over 100 replications for each method are as follows:
Model | Average Accuracy |
---|---|
Neural Network (15 nodes) | 86.00% |
Neural Network (18 nodes) | 88.00% |
Neural Network (23 & 20 nodes) | 89.00% |
Neural Network (28 & 23 nodes) | 90.00% |
Linear Discriminant Analysis (LDA) | 85.00% |
Quadratic Discriminant Analysis (QDA) | 84.00% |
Regularized Discriminant Analysis (RDA) | 83.00% |
Support Vector Machines (SVM) | 87.00% |
Naïve Bayes | 82.00% |
Plots and figures generated from the experiments are saved in the results/figures/
directory. You can view the performance comparison by examining these visualizations.
Example plot:
- Best Neural Network Model: The neural network with two hidden layers of 28 and 23 nodes achieved the highest accuracy among the neural networks evaluated.
- Comparison with Other Methods: The neural network with two hidden layers (28 & 23 nodes) performed competitively with other classification methods.
- Overall Performance: While classical methods like LDA and SVM provided strong baselines, the deeper neural network architectures showed potential for better accuracy with appropriate tuning.