This repository is a project of our second year of engineering study. It took place in 1 month in a Data Engineering course.
Our goal was to create a web application based on the Flask package. This application should scrape data from website and build a dashboard with. Datas need to be stocked on database seen in course.
We decided to make an analysis of cryptocurrency and then fit a model on daily values to predict trend. To build this we use CoinGecko website to collect data from the most used crypto-currencies. We stock them on MongoDB client. After collecting all the datas, we start a Flask server which shows 3 graphics :
-
MarketCap: represent the total value of cryptocurrency in circulation. It is obtained by multiplying cryptocurrency value with the cryptocurrency volume.
-
Volume : represent the number of cryptocurrency in circulation.
-
Value : represent the cost of a cryptocurrency at one given time
The second page (when you clic on "predict") will show you a graph with the trend prediction such as :
-
Blue : first 70% values of cryptocurrency
-
Red : last 30% values of cryptocurrency
-
Green : Predict values bases on SAMIRAX model (This model is the basic interface for ARIMA-type models, including those with exogenous regressors and those with seasonal components)
-
Purple : Predict values bases on AMIRA model (Seasonal AutoRegressive Integrated Moving Average)
To make our application deployable everywhere, we made a docker network with a docker compose to create 3 containers : MongoDB, PyScraping and PyFlask.
##Getting Started
-
Docker is installed and updated.
-
Enough space (1,5 Go).
-
First docker image is base on MongoDb image on docker hub.
-
Second docker image is build on Python image on docker hub. Specially the 3.7 version. To adapt this image :
- we add usefully file on the work directory :
RUN mkdir /app/ && mkdir /app/templates/ WORKDIR /app/ #Add importants files ADD templates/ ./templates/ ADD requirements.txt . ADD CoinGecko.py . ADD server.py .
- we install packages found in requierement.txt by running the following command on the image:
pip install -r requirements.txt
To build our project, we need to create our 2 images with the following code on terminal :
docker-compose up -dThis command will build an run the project directly.
To run the project, it's the same command than 'Installation' :
docker-compose up -dIf there are no errors in the previous step, you can open your web browser and looking for this website on your localhost ip on port 5001.
Some think like that should appear :
To see the trend of our prediction, clic on the button "Predict" at the top right of the screen. Some think like that should appear :
This CoinGecko.py python code executed on our python3.7 image, is our scraping process. We use BeautifulSoup to scrape data on coingecko website. First we name all crypto that we want get data from.
jsonCryptoName = {"btc": "bitcoin", "eth": "ethereum", "bnb": "binance-coin", "ada": "cardano", "sol": "solana"}We build links of depend of crypto between 2 dates:
links = allLinks(jsonCryptoName, [str(datetime.now().date()), "2019-01-28"])We start MongoDb connection :
client = MongoClient("mongo", 27017)
db = client["coingecko"]The main part can start by running the scraping function :
scrapingAll(jsonCryptoName, links)This function find data and put them in the right form in MongoDB
for i in range(len(data) // 4):
coll.insert_one({"date": data_dates[i], "marketcap": data[(i * 4)], "volume": data[(i * 4) + 1],"open": data[(i * 4) + 2],"close": data[(i * 4) + 3]})Once all datas are in the database, the container shuts down.
This Server.py code executed on our python3.7 image, is our Flask server which represents datas.
In the same way, we start by connecting to MongoDB, secondly we run the server Flask :
app = Flask(__name__, template_folder='./templates')
app.run(host='0.0.0.0', port=5001, use_reloader=False)There are 2 main pages :
-
Graphics page
@app.route("/graph/<crypto>", methods=['GET', 'POST'])This page shows 3 graphics in function of cryptocurrency parameter. By default it's Bitcoin.
-
Predict page
@app.route("/graph/<crypto>/predict", methods=['GET', 'POST'])When this page is loaded, 2 models are trained and produce trend. They appear on the graphic.
# 1er model ARMAmodel = SARIMAX(y_train, order=(3, 2, 3)) ARMAmodel = ARMAmodel.fit(disp=0) y_pred_sarimax = ARMAmodel.get_forecast(len(test_set)) y_pred_df_sarimax = y_pred_sarimax.conf_int(alpha=0.05) y_pred_df_sarimax["Predictions"] = ARMAmodel.predict(start=y_pred_df_sarimax.index[0], end=y_pred_df_sarimax.index[-1]) y_pred_df_sarimax.index = test_set["date"] y_pred_out_sarimax = y_pred_df_sarimax["Predictions"] y_pred_out_sarimax.index = range(train_size, len(df)) df["sarimax"] = y_pred_out_sarimax # 2ème model ARIMAmodel = ARIMA(y_train, order=(4, 1, 3)) ARIMAmodel = ARIMAmodel.fit() y_pred_arima = ARIMAmodel.get_forecast(len(test_set["date"])) y_pred_df_arima = y_pred_arima.conf_int(alpha=0.05) y_pred_df_arima["Predictions"] = ARIMAmodel.predict(start=y_pred_df_arima.index[0], end=y_pred_df_arima.index[-1]) y_pred_df_arima.index = test_set["date"] y_pred_out_arima = y_pred_df_arima["Predictions"] y_pred_out_arima.index = range(train_size, len(df)) df["arima"] = y_pred_out_arima
By default the cryptocurrency used is Bitcoin but you can change it in the same way as home page.
One of improvement axis is to build a Cross Validation function to help models.
Another note is to manage to start the Flask container when the CoinGecko one is finished. We did not manage to implement it
Marchadier Julien - julien.marchadier@edu.esiee.fr
Rebuffey Valentin - valentin.rebuffey@edu.esiee.fr
Project Link: https://github.com/juju312000/DataEngineerProject


