Machine Learning in Finance
MLP classifier for ETF return direction prediction using Pearson correlation feature selection with 10-fold cross-validation, plus exploratory climate data analysis.
This project replicates and extends a neural network-based financial forecasting framework applied to the iShares MSCI Chile ETF (ECH). Technical indicators serve as candidate features; Pearson correlation selects the most informative subset; and an MLP classifier with 10-fold stratified cross-validation produces direction-prediction accuracy. A second module analyzes daily weather data for Valencia, Spain using Meteostat.
| Topic | Method |
|---|---|
| ECH ETF return direction prediction | MLP Classifier + Pearson feature selection + 10-fold CV |
| Valencia 2024 daily weather analysis | Meteostat / synthetic data, EDA |
iShares MSCI Chile ETF (ECH), daily OHLCV data from 2009-12-12 to 2020-01-01, sourced via Yahoo Finance with a synthetic fallback if the network is unavailable.
Binary target: 1 if today's Open price is higher than the previous day's Open; 0 otherwise. This is a daily direction-prediction task.
22 candidate features across four categories:
| Category | Indicators |
|---|---|
| Price levels | Open, High, Low, Close, Volume, Adj Close |
| Moving averages | SMA 5/10/20, EMA 5/10/20 |
| Momentum | RSI(14), MACD, MACD Signal, MACD Histogram, ROC(10), Williams %R(14) |
| Volatility | Bollinger Bands (BBL/BBM/BBU/BBB/BBP), ATR(14) |
For each feature f, the absolute correlation |corr(f, Target)| is computed over the full sample and the top-k features are selected. This is a filter method: computationally cheap and immune to overfitting in the selection stage. Feature counts tested are 5, 10, 15, and 20, plus a baseline using all features.
Single hidden layer MLP with floor((n_features + 2) / 2) neurons, scaling the hidden layer with feature count. ReLU activation, Adam optimizer, adaptive learning rate.
Validation: 10-fold stratified cross-validation preserves class balance in each fold. Mean accuracy and standard deviation are reported across folds.
Key finding: Pearson-selected features typically match or exceed baseline accuracy, demonstrating that removing irrelevant features reduces noise and can improve generalization.
- Mean accuracy vs feature count with plus/minus one standard deviation band
- Accuracy gain (%) over baseline by feature count
- Top 20 features ranked by absolute Pearson correlation
Daily weather observations for Valencia, Spain (39.4699 N, 0.3763 W) for calendar year 2024, sourced from Meteostat with a synthetic fallback if the network is unavailable.
Columns used: tavg (avg temperature), tmin, tmax (daily range), prcp (precipitation), wspd (wind speed), pres (pressure).
Analysis:
- Daily temperature time series with min/max range shading
- Precipitation distribution across rainy days only
- Monthly average temperature bar chart capturing Valencia's Mediterranean seasonal cycle
- Monthly total precipitation illustrating the wet autumn / dry summer pattern
Summary statistics include annual mean temperature, hottest and coldest day, total rainy days, and max single-day precipitation.
Python 3.x
numpy
pandas
matplotlib
seaborn
scikit-learn
pandas_ta
yfinance (with synthetic fallback)
meteostat (with synthetic fallback)
git clone https://github.com/QuantSingularity/Neural-Network-Financial-Forecasting.git
cd Neural-Network-Financial-Forecasting
pip install numpy pandas matplotlib seaborn scikit-learn pandas_ta yfinance meteostat
jupyter notebook Neural-Network-Financial-Forecasting.ipynb- Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986). Learning Representations by Back-Propagating Errors. Nature, 323, 533-536.
- Pedregosa, F. et al. (2011). Scikit-learn: Machine Learning in Python. JMLR, 12, 2825-2830.
- Achelis, S.B. (2001). Technical Analysis from A to Z (2nd ed.). McGraw-Hill.