My research is about getting machines to make sense of the sounds around them, the way humans do. I've spent over two decades on this — interpreting complex acoustic environments, building systems that listen and understand real-world sound — which has produced more than 75 peer-reviewed publications along the way.
What I Work On
My primary focus is the automatic analysis of environmental audio content — both analyzing and synthesizing auditory scenes so machines can interpret complex acoustic environments. Whether it's a bird song in a forest, a siren cutting through urban traffic, or catching the mood of a bustling café, the goal is the same: machines that understand and respond to the sounds around them. See more information about applications for my research:
My research journey showcases a diverse portfolio of projects with broad-ranging applications, including:
My involvement in mentoring students at various stages of their academic journey spans a wide range of topics and applications, including:
- Zero-shot learning
- Representation learning
- Active learning
- Deep learning for sound event detection
- Compensation for loudspeaker distortions
- Audio captioning
- Tire type classification
- Semi-supervised musical instrument recognition
- Guitar chord detection
- Parameter adaptation in nonlinear loudspeaker models
Key Research Areas
My current research spans several key areas:
Deep Learning for Audio: Applying new approaches for audio content analysis.
- Introduction to computational audio content analysis
- Multi-label zero-shot audio classification, conference publication
- Positive and negative sampling strategies for self-supervised learning, conference publication
- Active learning for sound event detection, journal publication
- Sequential information in polyphonic sound event detection, conference publication
Sound Event Detection (SED): Designing robust models to detect and timestamp sound events in real-world, often noisy, environments. See a detailed introduction to sound event detection in everyday environments.
- Introduction to sound event detection in everyday environments
- Sound event detection with soft labels, conference publication
- Sound event detection: a tutorial, Magazine publication
- Sound event detection in the DCASE 2017 Challenge, journal publication
- Convolutional recurrent neural networks, journal publication
Acoustic Scene Classification (ASC): the process of training computers to identify different environments, such as streets, parks, or offices, based on their distinct acoustic characteristics.
- Data-efficient low-complexity ASC, conference publication
- Low-complexity ASC for multi-device audio, conference publication
- Audio-visual scene analysis, conference publication
- Generalization across devices, conference publication
- Closed and open set classification, conference publication
Evaluation Frameworks: Developing standardized metrics, protocols, and datasets for benchmarking audio analysis systems.
- Evaluation metrics for sound event detection
- Evaluation toolbox for Sound Event Detection
- Datasets and Evaluation, book chapter
- Metrics for polyphonic sound event detection, journal publication
- Joint measurement of localization and detection of sound events, conference publication
- Benchmarks for DCASE Challenge
Dataset Development: Creating annotated audio datasets and evaluation benchmarks for reproducible research, and the release of open datasets.
- Data acquisition: audio collection, annotation, and dataset creation
- Details about the key datasets used in my research
- Open datasets created and maintained by me
- Data-related tool
- DCASE DataList
Real-time Audio Analysis: Implementing real-time systems for on-device and streaming audio recognition.
See details about some related projects:
Academic Contributions
Across my career in environmental audio analysis, this work has produced 75+ peer-reviewed publications, several international patents, and more than 8K citations. I've also co-authored and released over 20 open-access audio datasets, widely used in academic research and machine learning competitions.
For more information about my research, please visit the following pages:
Academic Impact: My research centers on neural networks, deep learning, and audio signal processing.
- 60+ publications, with over 8K citations (h-index 35).
- SciVal ranks me 8th worldwide in scholarly output and 20th in field-weighted citation impact for "Neural networks; Deep learning; Audio Signal Processing" (2014–2023).
Open Data and Community Resources:
- Co-authored and published over 20 open-access audio datasets at Tampere University, downloaded more than 200K times and used as benchmarks across machine learning competitions and academic studies.
- Maintaining DCASE DataList
Evaluation Standards and Tools: I have contributed to defining standard evaluation metrics for SED and sound event localization and detection (SELD), and helped establish evaluation setups for ASC:
- Metrics for polyphonic sound event detection, journal publication
- Joint measurement of localization and detection of sound events, conference publication
- sed_eval (83K pip downloads, 143 GitHub stars)
- dcase_util (128K pip downloads, 130 GitHub stars)
- sed_vis (120 GitHub stars)
Service to the Research Community:
- Peer reviewer for leading journals and conferences, including IEEE TASLP, IEEE JSTSP, ICASSP, WASPAA, EUSIPCO, DCASE, ISMIR, and AES.
- Organizational roles within the DCASE Challenge (task coordinator, webmaster) and the DCASE Workshop (proceedings editor, technical chair, publication chair).
Academic Illustrations in DCASE Research:
- Numerous illustrations for educational and scientific purposes, visually explaining complex concepts such as sound event detection, acoustic scene classification, and audio tagging.
- Illustrations are provided with Creative Commons license to enhance accessibility and understanding in the field, benefiting researchers, educators, and students.
Academic biography: See a concise overview of my career in academia, highlighting relevant qualifications, experience, and accomplishments.
Community and Open Science
I’m an active contributor to the DCASE (Detection and Classification of Acoustic Scenes and Events) community, and have been since it started — helping shape it into the globally recognized platform it is today. My roles have included:
- Task coordinator for Acoustic Scene Classification and Sound Event Detection challenge tasks:
- Designer of evaluation protocols
- Public datasets for challenge tasks
- Developer of baseline systems
- Open-source tools
- Webmaster for the DCASE community website