Research Overview


Toni Heittola

My research focuses on enabling machines to understand the surrounding sounds, just as humans do. For over two decades, I have explored how computers can interpret complex acoustic environments. As a researcher in environmental audio analysis, I specialize in teaching machines to listen and understand real-world acoustic environments. The journey has led to over 75 peer-reviewed publications, reflecting my long-standing commitment to advancing machine listening and auditory scene understanding.

What I Work On

My primary research focus is the automatic analysis of environmental audio content. I have worked for over two decades on both the analysis and synthesis of auditory scenes with the goal of creating machines that can interpret complex acoustic environments. From identifying a bird song in a forest, a siren among urban traffic, or grasping the mood of a bustling cafe, I aim to enable machines to understand and respond to the rich variety of sounds in our surroundings. See more information about applications for my research:

Key Research Areas

My current research spans several key areas:

Sound Event Detection (SED): Designing robust models to detect and timestamp sound events in real-world, often noisy, environments. See a detailed introduction to sound event detection in everyday environments.

Acoustic Scene Classification (ASC): the process of training computers to identify different environments, such as streets, parks, or offices, based on their distinct acoustic characteristics.

Evaluation Frameworks: Developing standardized metrics, protocols, and datasets for benchmarking audio analysis systems.

Dataset Development: Creating annotated audio datasets and evaluation benchmarks for reproducible research, and the release of open datasets.

Real-time Audio Analysis: Implementing real-time systems for on-device and streaming audio recognition.
See details about some related projects:

Academic Contributions

I have spent my career building the field of environmental audio analysis from the intersection of machine learning, audio signal processing, and computational auditory scene analysis. My work has resulted in over 75 peer-reviewed publications, several international patents, and more than 8K citations. I have also co-authored and released over 20 open-access audio datasets, which have been widely utilized in academic research and machine learning competitions.

For more information about my research, please visit the following pages:

Academic Impact: Over the past decade, my research has focused on neural networks, deep learning, and audio signal processing.

  • I have authored more than 60 publications, which have collectively received over 8K citations with an h-index of 35.
  • According to SciVal, I rank 8th globally in scholarly output and 20th in field-weighted citation impact within the topic area "Neural networks; Deep learning; Audio Signal Processing" (2014-2023).

Open Data and Community Resources:

Evaluation Standards and Tools: I have contributed to defining standard evaluation metrics for SED and sound event localization and detection (SELD), and helped establish evaluation setups for ASC:

I also maintain several open-source toolboxes that are widely used in the community:
  • sed_eval (83K pip downloads, 143 GitHub stars)
  • dcase_util (128K pip downloads, 130 GitHub stars)
  • sed_vis (120 GitHub stars)

Service to the Research Community:

  • I actively contribute as a peer reviewer for leading journals and conferences (e.g., IEEE TASLP, IEEE JSTSP, ICASSP, WASPAA, EUSIPCO, DCASE, ISMIR, and AES).
  • I have taken on several organizational roles within the DCASE Challenge, such as task coordinator and webmaster.
  • I have also contributed to the DCASE Workshop in various capacities, including proceedings editor, technical chair, and publication chair.

Academic Illustrations in DCASE Research:

  • Numerous illustrations for educational and scientific purposes, visually explaining complex concepts such as sound event detection, acoustic scene classification, and audio tagging.
  • Illustrations are provided with Creative Commons license to enhance accessibility and understanding in the field, benefiting researchers, educators, and students.

Academic biography: See a concise overview of my career in academia, highlighting relevant qualifications, experience, and accomplishments.


Community and Open Science

I’m an active contributor to the DCASE (Detection and Classification of Acoustic Scenes and Events) community. Since its inception, I’ve played a key role in shaping the initiative into a globally recognized platform. Over the years, my roles have included:

  • Task coordinator for Acoustic Scene Classification and Sound Event Detection challenge tasks:
    • Designer of evaluation protocols
    • Public datasets for challenge tasks
    • Developer of baseline systems
    • Open-source tools
  • Webmaster for the DCASE community website