3 ways you can build a stock market prices dataset (Algo-trading)

One of the most fascinating fields in time series forcasting and in Data science overall is stock market prediction.

One hurdle I’m stumbling upon every time is creating a decent dataset for what I’m trying to predict.

There are so many financial websites our there – Google finance, Yahoo finance, Finviz – but still it is quite hard to get the data into your Pandas dataframe.

Here are 3 methods to build a dataset freely, and quickly.

If you’re looking to create a crypto (cryptocurrency – e.g. bitcoin, Ethereum) dataset checkout this post

Method #1 – Pandas Datareader

logo

I cannot exaggerate how useful this library is!

It provides easy connection to server data sources such as:

  • Yahoo Finance
  • Google Finance
  • World bank
  • Eurostat
  • More!

You can read about the different data sources here,

Installation

You can install the library simply from pip.

Colab users – use the upgrade flag to make sure you’re using the latest version because the older ones contains bugs

pip install --upgrade pandas-datareader

Usage

Import the package, and other utils

import pandas_datareader as web
import datetime as dt

Setup the stock, the dates (for example from 2016 until today -> e.g. some date on 2022)

# start and end date
start_date = dt.datetime(2016,1,1)
end_date = dt.datetime.now()

chosen_currency = 'USD'
chosen_stock = 'AMZN'

data_source = 'yahoo'

And pull daily data for this date range

daily_prices = web.DataReader(f'{chosen_stock}-{chosen_currency}', data_source, start_date, end_date)
daily_prices.head()

This is how the output should look like –>

Method #2 – yfinance

The data from pandas-datareader might not include as many interesting features for your model.

You might want to get data such as:

  • Earning events
  • analysts recommendations – upgrades or downgrades
  • options data
  • news

For that you can use yfinance which is a library that pulls this kind of data, a link to the docs

Make sure you’re following the legal notice on the repo

installation

Again, you can install the library simply from pip

pip install --upgrade yfinance 

usage

import the library

import yfinance as yf

Choose a stock ticker

chosen_stock = "AMZN"

and construct the stock object that will perform the data calls

stock_object = yf.Ticker(chosen_stock)

Let’s try to pull earning data

earnings = stock_object.earnings
quarterly_earnings = stock_object.quarterly_earnings
earnings.head()

This is an example output

Let’s do the same for financials ->

financials = stock_object.financials
quarterly_financials = stock_object.quarterly_financials
quarterly_financials.head()

And the output

Method #3 – download existing dataset

A different and safer approach will be to download “ready-made” datasets – here are a few

Kaggle

A great source of data is Kaggle, there are new competitions in the field – just make sure you comply with the terms.

Another benefit are the notebooks of competitors which performing data exploration and predictions on the data

Official sites

You can download the data manually from Nasdaq website

More tools

There are plenty more tools out there to obtain stock data, if you didn’t find what you’ve been looking for you should try those:

  • Awesome list from awesome quant of tools – by far one of the best lists out there
  • GamestonkTerminal – a tool to pull data from several APIs

Comments

  • No comments yet.
  • Add a comment