One of the most fascinating fields in time series forcasting and in Data science overall is stock market prediction.
One hurdle I’m stumbling upon every time is creating a decent dataset for what I’m trying to predict.
There are so many financial websites our there – Google finance, Yahoo finance, Finviz – but still it is quite hard to get the data into your Pandas dataframe.
Here are 3 methods to build a dataset freely, and quickly.
If you’re looking to create a crypto (cryptocurrency – e.g. bitcoin, Ethereum) dataset checkout this post
I cannot exaggerate how useful this library is!
It provides easy connection to server data sources such as:
You can read about the different data sources here,
You can install the library simply from pip.
Colab users – use the upgrade
flag to make sure you’re using the latest version because the older ones contains bugs
pip install --upgrade pandas-datareader
Import the package, and other utils
import pandas_datareader as web
import datetime as dt
Setup the stock, the dates (for example from 2016 until today -> e.g. some date on 2022)
# start and end date
start_date = dt.datetime(2016,1,1)
end_date = dt.datetime.now()
chosen_currency = 'USD'
chosen_stock = 'AMZN'
data_source = 'yahoo'
And pull daily data for this date range
daily_prices = web.DataReader(f'{chosen_stock}-{chosen_currency}', data_source, start_date, end_date)
daily_prices.head()
This is how the output should look like –>
The data from pandas-datareader might not include as many interesting features for your model.
You might want to get data such as:
For that you can use yfinance
which is a library that pulls this kind of data, a link to the docs
Make sure you’re following the legal notice on the repo
Again, you can install the library simply from pip
pip install --upgrade yfinance
import the library
import yfinance as yf
Choose a stock ticker
chosen_stock = "AMZN"
and construct the stock object that will perform the data calls
stock_object = yf.Ticker(chosen_stock)
Let’s try to pull earning data
earnings = stock_object.earnings
quarterly_earnings = stock_object.quarterly_earnings
earnings.head()
This is an example output
Let’s do the same for financials ->
financials = stock_object.financials
quarterly_financials = stock_object.quarterly_financials
quarterly_financials.head()
And the output
A different and safer approach will be to download “ready-made” datasets – here are a few
A great source of data is Kaggle, there are new competitions in the field – just make sure you comply with the terms.
Another benefit are the notebooks of competitors which performing data exploration and predictions on the data
You can download the data manually from Nasdaq website
There are plenty more tools out there to obtain stock data, if you didn’t find what you’ve been looking for you should try those:
awesome quant
of tools – by far one of the best lists out there