My research focuses on enabling machines to understand the surrounding sounds, just as humans do. For over two decades, I have explored how computers can interpret complex acoustic environments. As a researcher in environmental audio analysis, I specialize in teaching machines to listen and understand real-world acoustic environments. The journey has led to over 75 peer-reviewed publications, reflecting my long-standing commitment to advancing machine listening and auditory scene understanding.
What I Work On
My primary research focus is the automatic analysis of environmental audio content. I have worked for over two decades on both the analysis and synthesis of auditory scenes with the goal of creating machines that can interpret complex acoustic environments. From identifying a bird song in a forest, a siren among urban traffic, or grasping the mood of a bustling cafe, I aim to enable machines to understand and respond to the rich variety of sounds in our surroundings. See more information about applications for my research:
My research journey showcases a diverse portfolio of projects with broad-ranging applications, including:
My involvement in mentoring students at various stages of their academic journey spans a wide range of topics and applications, including:
- Zero-shot learning
- Representation learning
- Active learning
- Deep learning for sound event detection
- Compensation for loudspeaker distortions
- Audio captioning
- Tire type classification
- Semi-supervised musical instrument recognition
- Guitar chord detection
- Parameter adaptation in nonlinear loudspeaker models
Key Research Areas
My current research spans several key areas:
Deep Learning for Audio: Applying new approaches for audio content analysis.
- Introduction to computational audio content analysis
- Multi-label zero-shot audio classification, conference publication
- Positive and negative sampling strategies for self-supervised learning, conference publication
- Active learning for sound event detection, journal publication
- Sequential information in polyphonic sound event detection, conference publication
Sound Event Detection (SED): Designing robust models to detect and timestamp sound events in real-world, often noisy, environments. See a detailed introduction to sound event detection in everyday environments.
- Introduction to sound event detection in everyday environments
- Sound event detection with soft labels, conference publication
- Sound event detection: a tutorial, Magazine publication
- Sound event detection in the DCASE 2017 Challenge, journal publication
- Convolutional recurrent neural networks, journal publication
Acoustic Scene Classification (ASC): the process of training computers to identify different environments, such as streets, parks, or offices, based on their distinct acoustic characteristics.
- Data-efficient low-complexity ASC, conference publication
- Low-complexity ASC for multi-device audio, conference publication
- Audio-visual scene analysis, conference publication
- Generalization across devices, conference publication
- Closed and open set classification, conference publication
Evaluation Frameworks: Developing standardized metrics, protocols, and datasets for benchmarking audio analysis systems.
- Evaluation metrics for sound event detection
- Evaluation toolbox for Sound Event Detection
- Datasets and Evaluation, book chapter
- Metrics for polyphonic sound event detection, journal publication
- Joint measurement of localization and detection of sound events, conference publication
- Benchmarks for DCASE Challenge
Dataset Development: Creating annotated audio datasets and evaluation benchmarks for reproducible research, and the release of open datasets.
- Data acquisition: audio collection, annotation, and dataset creation
- Details about the key datasets used in my research
- Open datasets created and maintained by me
- Data-related tool
- DCASE DataList
Real-time Audio Analysis: Implementing real-time systems for on-device and streaming audio recognition.
See details about some related projects:
Academic Contributions
I have spent my career building the field of environmental audio analysis from the intersection of machine learning, audio signal processing, and computational auditory scene analysis. My work has resulted in over 75 peer-reviewed publications, several international patents, and more than 8K citations. I have also co-authored and released over 20 open-access audio datasets, which have been widely utilized in academic research and machine learning competitions.
For more information about my research, please visit the following pages:
Academic Impact: Over the past decade, my research has focused on neural networks, deep learning, and audio signal processing.
- I have authored more than 60 publications, which have collectively received over 8K citations with an h-index of 35.
- According to SciVal, I rank 8th globally in scholarly output and 20th in field-weighted citation impact within the topic area "Neural networks; Deep learning; Audio Signal Processing" (2014-2023).
Open Data and Community Resources:
- Co-authored and published over 20 open-access audio datasets while working at Tampere University. The datasets have been downloaded more than 200K times. The datasets are heavily utilized in machine learning competitions and academic studies as benchmarks for evaluating state-of-the-art systems.
- Maintaining DCASE DataList
Evaluation Standards and Tools: I have contributed to defining standard evaluation metrics for SED and sound event localization and detection (SELD), and helped establish evaluation setups for ASC:
- Metrics for polyphonic sound event detection, journal publication
- Joint measurement of localization and detection of sound events, conference publication
- sed_eval (83K pip downloads, 143 GitHub stars)
- dcase_util (128K pip downloads, 130 GitHub stars)
- sed_vis (120 GitHub stars)
Service to the Research Community:
- I actively contribute as a peer reviewer for leading journals and conferences (e.g., IEEE TASLP, IEEE JSTSP, ICASSP, WASPAA, EUSIPCO, DCASE, ISMIR, and AES).
- I have taken on several organizational roles within the DCASE Challenge, such as task coordinator and webmaster.
- I have also contributed to the DCASE Workshop in various capacities, including proceedings editor, technical chair, and publication chair.
Academic Illustrations in DCASE Research:
- Numerous illustrations for educational and scientific purposes, visually explaining complex concepts such as sound event detection, acoustic scene classification, and audio tagging.
- Illustrations are provided with Creative Commons license to enhance accessibility and understanding in the field, benefiting researchers, educators, and students.
Academic biography: See a concise overview of my career in academia, highlighting relevant qualifications, experience, and accomplishments.
Community and Open Science
I’m an active contributor to the DCASE (Detection and Classification of Acoustic Scenes and Events) community. Since its inception, I’ve played a key role in shaping the initiative into a globally recognized platform. Over the years, my roles have included:
- Task coordinator for Acoustic Scene Classification and Sound Event Detection challenge tasks:
- Designer of evaluation protocols
- Public datasets for challenge tasks
- Developer of baseline systems
- Open-source tools
- Webmaster for the DCASE community website