HEAR 2021 NeurIPS Challenge
Holistic Evaluation of Audio Representations


The aim of this challenge is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. The HEAR 2021 challenge invites you to create an audio embedding that is as holistic as the human ear, i.e., one that performs well across a variety of everyday domains: What approach best generalizes to a wide range of downstream audio tasks without fine-tuning?

HEAR 2021 evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. In the spirit of shared exchange, all participants must submit an audio embedding model following a common API that is general-purpose, open-source, and freely available to use.

PMLR will publish a HEAR special issue, with submission deadline of February 28th, 2022.



The HEAR 2021 challenge invites you to create an audio embedding that is as holistic as the human ear, i.e., one that performs well across a variety of everyday domains. The challenge starts with three diverse and approachable open tasks, but also includes a variety of held-out secret tasks. The three open tasks are: word classification, pitch detection, and sound event detection. Each is relatively simple on its own. Our twist is asking you to solve them all at once.

Teams develop an embedding of arbitrary size to be fed into a generic predictor by our evaluation algorithm. This predictor is shallowly trained for each team and each task.

The HEAR 2021 NeurIPS Challenge saw 28 models from 14 different teams attack 3 open tasks and 13 secret tasks. Here are the results. Recordings of team presentations at NeurIPS 2021 are available for viewing.

Please submit to the upcoming PMLR special issue, even if you did not participate in NeurIPS 2021.


We adopt the principles proposed by Groyal et. al (2019) for evaluating the quality of a learned representation: a good representation should (1) transfer to a wide range of different tasks, and, (2) transfer with limited supervision.

Wide Range of Tasks

HEAR 2021 benchmarks span multiple audio domains: speech, environmental sound, and music, with tasks that involve short and long time spans. In addition to well-known baselines, we have endeavored to find low-resource evaluation tasks that particularly benefit humanity.

Evaluation tasks with downstream learning:

  • Scene-based: Classification/multi-classification/tagging of an entire audio clip.
  • Timestamp-based: Sound event detection/transcription.

For evaluation tasks that require training, a shallow downstream model will be learned with no fine-tuning of participant models.

For some kinds of tasks, we will use only pairwise embedding distance (i.e., no learning). We will use unnormalized l1.

News and Announcements

  • 2022-01-21 -
    • Details about PMLR special issue submissions are now available. Submission deadline is February 28th, 2022.
    • All HEAR 2021 datasets are now available.
    • All HEAR pypi packages have been updated. Please run pip3 install --upgrade heareval hearbaseline hearpreprocess.
    • Recordings of NeurIPS team presentations are now available.
    • Due to a preprocessing bug, NeurIPS 2021 results for the crema-d, gztan genre, and gztan music/speech tasks are retracted. Corrected results are forthcoming.
  • 2021-11-29 -
    • We will be presenting live at the NeurIPS conference. See HEAR @ NeurIPS 2021 for more information.
    • Eduardo Fonseca has been added to the steering committee.
  • 2021-09-13 -
    • Our leaderboard is live, and will be updated with secret tasks and early submissions.
    • Open tasks and downstream evaluation code have been released.
  • 2021-08-24 -
    • The final submission deadline is October 31st 2021.
    • Leaderboard will now be a rolling update and you can submit multiple versions.
    • Release of the HEAR Baseline model
    • Release a validator tool for participants to check their submissions follow the common API: HEAR Validator
    • Tensor return types of get_timestamp_embeddings are clarified.
  • 2021-07-16 -
    • PMLR will host a HEAR special issue.
    • Some clarifications to the NSynth task are added. See the specific changelog for a detailed list of updates.
  • 2021-06-29 -
    • All three open tasks have been announced.
    • The API has been simplified and clarified. For a detailed set of updates, please see find the specific changelog.
    • Google Cloud Platform has generously donated compute resources for our leaderboard. Low resource participants, please reach out for GPU sponsorship!
    • The first leaderbord is ready for submission. Please submit here.

To stay up-to-date, we will be making announcements in several places:


We are proud to announce that HEAR 2021 was sponsored by Google and that all competition evaluations were performed on Google Cloud Platform.

NeurIPS Challenge Organizing Team

Joseph Turian and Jordie Shier and Bhiksha Raj and Björn W. Schuller and Christian James Steinmetz and Colin Malloy and George Tzanetakis and Gissel Velarde and Kirk McNally and Max Henry and Nicolas Pinto and Yonatan Bisk and Gyanendra Das and Humair Raj Khan and Camille Noufi and Dorien Herremans and Eduardo Fonseca and Jesse Engel and Justin Salamon and Philippe Esling and Pranay Manocha and Shinji Watanabe and Zeyu Jin

You can learn more about the committee here.

PMLR Journal Editors

Joseph Turian and Björn W. Schuller and Dorien Herremans and Katrin Kirchhoff and Paola Garcia Perera and Philippe Esling.

You can learn more about the journal editors here.