Research Overview


Toni Heittola

My research is about getting machines to make sense of the sounds around them, the way humans do. I've spent over two decades on this — interpreting complex acoustic environments, building systems that listen and understand real-world sound — which has produced more than 75 peer-reviewed publications along the way.

What I Work On

My primary focus is the automatic analysis of environmental audio content — both analyzing and synthesizing auditory scenes so machines can interpret complex acoustic environments. Whether it's a bird song in a forest, a siren cutting through urban traffic, or catching the mood of a bustling café, the goal is the same: machines that understand and respond to the sounds around them. See more information about applications for my research:

Key Research Areas

My current research spans several key areas:

Sound Event Detection (SED): Designing robust models to detect and timestamp sound events in real-world, often noisy, environments. See a detailed introduction to sound event detection in everyday environments.

Acoustic Scene Classification (ASC): the process of training computers to identify different environments, such as streets, parks, or offices, based on their distinct acoustic characteristics.

Evaluation Frameworks: Developing standardized metrics, protocols, and datasets for benchmarking audio analysis systems.

Dataset Development: Creating annotated audio datasets and evaluation benchmarks for reproducible research, and the release of open datasets.

Real-time Audio Analysis: Implementing real-time systems for on-device and streaming audio recognition.
See details about some related projects:

Academic Contributions

Across my career in environmental audio analysis, this work has produced 75+ peer-reviewed publications, several international patents, and more than 8K citations. I've also co-authored and released over 20 open-access audio datasets, widely used in academic research and machine learning competitions.

For more information about my research, please visit the following pages:

Academic Impact: My research centers on neural networks, deep learning, and audio signal processing.

  • 60+ publications, with over 8K citations (h-index 35).
  • SciVal ranks me 8th worldwide in scholarly output and 20th in field-weighted citation impact for "Neural networks; Deep learning; Audio Signal Processing" (2014–2023).

Open Data and Community Resources:

Evaluation Standards and Tools: I have contributed to defining standard evaluation metrics for SED and sound event localization and detection (SELD), and helped establish evaluation setups for ASC:

I also maintain several open-source toolboxes that are widely used in the community:
  • sed_eval (83K pip downloads, 143 GitHub stars)
  • dcase_util (128K pip downloads, 130 GitHub stars)
  • sed_vis (120 GitHub stars)

Service to the Research Community:

  • Peer reviewer for leading journals and conferences, including IEEE TASLP, IEEE JSTSP, ICASSP, WASPAA, EUSIPCO, DCASE, ISMIR, and AES.
  • Organizational roles within the DCASE Challenge (task coordinator, webmaster) and the DCASE Workshop (proceedings editor, technical chair, publication chair).

Academic Illustrations in DCASE Research:

  • Numerous illustrations for educational and scientific purposes, visually explaining complex concepts such as sound event detection, acoustic scene classification, and audio tagging.
  • Illustrations are provided with Creative Commons license to enhance accessibility and understanding in the field, benefiting researchers, educators, and students.

Academic biography: See a concise overview of my career in academia, highlighting relevant qualifications, experience, and accomplishments.


Community and Open Science

I’m an active contributor to the DCASE (Detection and Classification of Acoustic Scenes and Events) community, and have been since it started — helping shape it into the globally recognized platform it is today. My roles have included:

  • Task coordinator for Acoustic Scene Classification and Sound Event Detection challenge tasks:
    • Designer of evaluation protocols
    • Public datasets for challenge tasks
    • Developer of baseline systems
    • Open-source tools
  • Webmaster for the DCASE community website