About Me


Researcher in Machine Learning, Audio Signal Processing, and Acoustic Scene Understanding

Toni Heittola
Postdoctoral Research Fellow

Tampere University, Finland

Office
Tampere University
Tampere, Finland
Hervanta Campus / Tietotalo
Room TF419

Links

I am a Postdoctoral Research Fellow at Tampere University, with my current affiliation being the Tampere Institute for Advanced Study (Tampere IAS). My research work focuses on automatic environmental audio content analysis, with emphasis on acoustic scene understanding and sound event detection.

My work is at the intersection of Machine Learning, Audio Signal Processing, and Computational Auditory Scene Analysis. I develop methods that enable machines to identify and understand sounds in everyday environments. These technologies are used in smart infrastructure, autonomous devices, environmental monitoring, and human-machine interaction.

Research

My areas of research interest include acoustic scene understanding, sound event detection, and acoustic scene classification. I have co-authored many high-impact publications in top conferences and journals related to these topics. If you are interested in collaborating on research, engineering solutions, open audio datasets, or anything related to audio content analysis, do check around at the links on this page or drop me an email.

For more information about my research, please visit the following pages:


Key Research Areas

  • Deep Learning for Audio: Applying convolutional and recurrent neural networks to time-frequency representations of audio
  • Sound Event Detection (SED): Constructing robust models to detect sound events in multisource audio mixtures
  • Acoustic Scene Classification (ASC): Classifying environments (e.g., street, park, office) based on their acoustic signatures
  • Evaluation Frameworks: evaluation metrics, evaluation protocols, benchmark dataset
  • Dataset Development: Creation of annotated audio datasets and evaluation benchmarks for reproducible research, release of open dataset
  • Real-time Audio Analysis: Implementing efficient systems for on-device and streaming audio recognition

Impact

  • Academic Impact: Over the past decade, my research has focused on neural networks, deep learning, and audio signal processing. I have authored more than 60 publications, which have collectively received over 8K citations. My h-index is 35, and according to SciVal, I am currently 8th worldwide in scholarly output and 20th in field-weighted citation impact in the subject area "Neural networks; Deep learning; Audio Signal Processing" (2014-2023).
  • Open Data and Community Resources: I have co-authored and released over 20 open-access audio datasets, with over 200K downloads. The datasets are widely utilized in research and machine learning competitions as benchmarks for evaluating state-of-the-art systems.
  • Innovation and Patents: My work has led to several international patents, including technologies for privacy-preserving deep audio representations, location-specific sound scene synthesis, and media event suggestion systems.
  • Evaluation Standards and Tools: I have contributed to defining standard evaluation metrics for sound event detection (SED) and sound event localization and detection (SELD), and helped establish evaluation setups for acoustic scene classification (ASC). I also maintain several open-source toolboxes that are widely used in the community:
    • sed_eval (83K pip downloads, 143 GitHub stars)
    • dcase_util (128K pip downloads, 130 GitHub stars)
    • sed_vis (120 GitHub stars)
  • Education and Outreach: I have been active in sharing knowledge through tutorials and publications. I co-organized a tutorial at ICASSP 2019 on acoustic scene and event detection, which was the second most attended (200 participants). I’ve also contributed chapters to the book Computational Analysis of Sound Scenes and Events.
  • Service to the Research Community: I actively contribute as a peer reviewer for leading journals and conferences, including IEEE TASLP, IEEE JSTSP, ICASSP, WASPAA, EUSIPCO, DCASE, ISMIR, and AES. In addition to reviewing, I have taken on several organizational roles within the DCASE Challenge, such as task coordinator and webmaster. I have also contributed to the DCASE Workshop in various capacities, including proceedings editor, technical chair, and publication chair.

Community Work

I am an active contributor to the Detection and Classification of Acoustic Scenes and Events (DCASE) research community. I contributed to kick-starting and spearheading the DCASE community to become an internationally renowned forum that hosts annual international machine learning evaluation campaigns and workshops. My work has contributed to expanding the community to hundreds of worldwide participants and has facilitated the creation of benchmark datasets and high-impact publications. I have served as a long-time task coordinator for the acoustic scene classification challenge and have also coordinated several tasks related to sound event detection. In these roles, I have developed open datasets, designed evaluation protocols, implemented baseline systems, and maintained the DCASE website. Through this work, I have helped shape the community’s research infrastructure and advance the field of environmental sound analysis.

Work

Over the past two decades, I have focused on machine learning methods for analyzing environmental audio as part of the Audio Research Group at Tampere University. I specialize in sound event detection, acoustic scene classification, and the development of datasets and evaluation frameworks for benchmarking.

Education

During my master’s studies, I explored musical genre classification and other music-related classification tasks. This work later evolved into research on musical instrument classification in multi-source environments. In recent years, my focus has shifted toward topics within Computational Auditory Scene Analysis, particularly sound event detection in complex acoustic settings. My doctoral thesis concentrated on computational analysis of everyday acoustic environments, with a specific emphasis on sound event detection.

Development

Over the years, I’ve had the opportunity to work on a wide range of projects that combine my passion for audio signal analysis and web development. Much of my work has focused on creating tools and systems that support research in sound classification, event detection, and acoustic scene analysis, especially within the DCASE (Detection and Classification of Acoustic Scenes and Events) community.

For more information about my development work, please visit the following pages:


Machine Learning for Audio

I have developed several open-source tools to support machine learning in audio:

  • Evaluation & Data Tools: Libraries like sed_eval and dcase_util help standardize evaluation and dataset handling.
  • Visualization: Tools such as sed_vis and js-datatable make it easier to present system outputs and annotations.
  • Tutorials & Examples: I have shared hands-on code from tutorials (e.g., ICASSP2019), and contributed example systems to the book Computational Analysis of Sound Scenes and Events.
  • DCASE Baselines: I have implemented reference systems for acoustic scene classification and sound event detection in both Python and MATLAB.

Website Development

I also enjoy building websites that support both academic and community-driven initiatives. One of the key projects I’ve developed is the DCASE Community Website, which I built using the Pelican static site generator. To meet the needs of the research community, I created custom plugins that help manage citations, datasets, events, and personnel information.

In addition to academic platforms, I’ve also worked on recreational projects like bbStat, a basketball statistics platform designed for regional leagues in Finland. This system, built with a custom Joomla component, serves thousands of players and fans each year by providing detailed game data, player stats, and league standings.

Web Utilities & Plugins

To make academic website development more efficient, I’ve created a suite of tools tailored for structured content and research-focused platforms. This includes a collection of Pelican plugins that automate the generation of publication lists, personnel directories, and other content from structured data sources like YAML and BibTeX. I’ve also developed enhancements for managing tables of contents, tracking file modifications, and listing recent articles. To complement these tools, I designed custom Bootstrap-based themes that give academic websites a clean, professional look while remaining easy to maintain and extend.