This is a collection of sound effects-based datasets available.
They include sounds of everyday actions such as a door opening, footsteps, gunshots, and voices.
From environmental sounds to speech and music datasets.
Tip: Check the terms of each dataset before using
|AudioSet||The AudioSet dataset is a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. To collect all the data we worked with human annotators who verified the presence of sounds they heard within YouTube segments. To nominate segments for annotation, we relied on YouTube metadata and content-based search. The sound events in the dataset consist of a subset of the AudioSet ontology. You can learn more about the dataset construction in our ICASSP 2017 paper. Explore the dataset annotations by sound class below. |
2,084,320 YouTube videos containing 527 labels
|BBC sound effects||33066 sound effects with a text description. Type: mostly environmental sound. Each audio has a natural text description. (need to see check the license)||https://sound-effects.bbcrewind.co.uk/||1 text description per audio|
|AudioCaps||40 000 audio clips of 10 seconds, organized in three splits; a training slipt, a validation slipt, and a testing slipt. Type: environmental sound.||https://audiocaps.github.io/||1 text description per audio|
|Audio Caption Hospital Dataset||3700 audio clips from the “Hospital” scene and around 3600 audio clips from the “Car” scene. Every audio clip is 10 seconds long and is annotated with five captions. Type: environmental sound.||https://zenodo.org/record/4671263#.YgdBAN-ZNPY||5 text description per audio|
|Clotho dataset||Clotho consists of 6974 audio samples, and each audio sample has five captions (a total of 34 870 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long. Type: environmental sound.||https://zenodo.org/record/4783391#.YgdAa9-ZNPY||5 text description per audio|
|Audiostock||Royalty-Free Music Library. 436864 audio effects, each with a text description.||https://audiostock.net/se||1 text description per audio|
|ESC-50||2000 environmental audio recordings with 50 classes||https://github.com/karolpiczak/ESC-50||tag|
|UrbanSound8K||8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes||https://urbansounddataset.weebly.com/urbansound8k.html||tag|
|FMA||917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres.||https://github.com/mdeff/fma||tag and free-form text such as biographies (not necessarily descriptive)|
|FSD50K||51,197 audio clips of 200 classes||https://annotator.freesound.org/fsd/release/FSD50K/||tag|
|ACAV100M||100M video clips with audio, each 10 sec, with automatic AudioSet, Kinetics400 and Imagenet labels. -> Noisy, but LARGE.||https://acav100m.azurewebsites.net/explore_classification||categories|
|Free to use sounds||10000+ for 23$ 🙂||https://www.freetousesounds.com/product/all-in-one-sound-library-bundle/||captions|
|MACS – Multi-Annotator Captioned Soundscapes||This is a dataset containing audio captions and corresponding audio tags for a number of 3930 audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park). The files were annotated using a web-based tool. Each file is annotated by multiple annotators that provided tags and a one-sentence description of the audio content. The data also includes annotator competence estimated using MACE (Multi-Annotator Competence Estimation).||MACS – Multi-Annotator Captioned Soundscapes | Zenodo|
|Sonniss Game effects||A sound effect archive. Everything in it is royalty-free and can be used commercially. You can use them on an unlimited number of projects without attribution.||https://sonniss.com/gameaudiogdc#1605031061361-34588c70-73f2|
|WeSoundEffects||Sound effects bundle from 2020||https://wesoundeffects.com/we-sound-effects-bundle-2020/|
|Paramount Motion – Odeon Cinematic Sound Effects||SFX pack for 100$||https://www.paramountmotion.com/odeon-sound-effects|