Audio Sound Effects Datasets 2022

This is a collection of sound effects-based datasets available.

They include sounds of everyday actions such as a door opening, footsteps, gunshots, and voices.

From environmental sounds to speech and music datasets.

Dataset table

Tip: Check the terms of each dataset before using
NameDescriptionURLText Type
AudioSet The AudioSet dataset is a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. To collect all the data we worked with human annotators who verified the presence of sounds they heard within YouTube segments. To nominate segments for annotation, we relied on YouTube metadata and content-based search. The sound events in the dataset consist of a subset of the AudioSet ontology. You can learn more about the dataset construction in our ICASSP 2017 paper. Explore the dataset annotations by sound class below.
2,084,320 YouTube videos containing 527 labels
AudioSet (
BBC sound effects33066 sound effects with a text description. Type: mostly environmental sound. Each audio has a natural text description. (need to see check the license) text description per audio
AudioCaps40 000 audio clips of 10 seconds, organized in three splits; a training slipt, a validation slipt, and a testing slipt. Type: environmental sound. text description per audio
Audio Caption Hospital Dataset3700 audio clips from the “Hospital” scene and around 3600 audio clips from the “Car” scene. Every audio clip is 10 seconds long and is annotated with five captions. Type: environmental sound. text description per audio
Clotho datasetClotho consists of 6974 audio samples, and each audio sample has five captions (a total of 34 870 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long. Type: environmental sound. text description per audio
AudiostockRoyalty-Free Music Library. 436864 audio effects, each with a text description. text description per audio
ESC-502000 environmental audio recordings with 50 classes
UrbanSound8K8732 labeled sound excerpts (<=4s) of urban sounds from 10 classes
FMA917 GiB and 343 days of Creative Commons-licensed audio from 106,574 tracks from 16,341 artists and 14,854 albums, arranged in a hierarchical taxonomy of 161 genres. and free-form text such as biographies (not necessarily descriptive)
FSD50K51,197 audio clips of 200 classes
ACAV100M100M video clips with audio, each 10 sec, with automatic AudioSet, Kinetics400 and Imagenet labels. -> Noisy, but LARGE.
Free to use sounds10000+ for 23$ 🙂
MACS – Multi-Annotator Captioned SoundscapesThis is a dataset containing audio captions and corresponding audio tags for a number of 3930 audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park). The files were annotated using a web-based tool. Each file is annotated by multiple annotators that provided tags and a one-sentence description of the audio content. The data also includes annotator competence estimated using MACE (Multi-Annotator Competence Estimation).MACS – Multi-Annotator Captioned Soundscapes | Zenodo
Sonniss Game effectsA sound effect archive. Everything in it is royalty-free and can be used commercially. You can use them on an unlimited number of projects without attribution.
WeSoundEffectsSound effects bundle from 2020
Paramount Motion – Odeon Cinematic Sound EffectsSFX pack for 100$
credit: Based on a table from the amazing LAION community


