My research focuses on enabling machines to understand the surrounding sounds, just as humans do. For over two decades, I have explored how computers can interpret complex acoustic environments. As a researcher in environmental audio analysis, I specialize in teaching machines to listen and understand real-world acoustic environments. The journey has led to over 75 peer-reviewed publications, reflecting my long-standing commitment to advancing machine listening and auditory scene understanding.

What I Work On

My primary research focus is the automatic analysis of environmental audio content. I have worked for over two decades on both the analysis and synthesis of auditory scenes with the goal of creating machines that can interpret complex acoustic environments. From identifying a bird song in a forest, a siren among urban traffic, or grasping the mood of a bustling cafe, I aim to enable machines to understand and respond to the rich variety of sounds in our surroundings. See more information about applications for my research:

Portfolio

My research journey showcases a diverse portfolio of projects with broad-ranging applications, including:

Guiding

My involvement in mentoring students at various stages of their academic journey spans a wide range of topics and applications, including:

Key Research Areas

My current research spans several key areas:

Deep

Deep Learning for Audio: Applying new approaches for audio content analysis.

Introduction to computational audio content analysis
Multi-label zero-shot audio classification, conference publication
Positive and negative sampling strategies for self-supervised learning, conference publication
Active learning for sound event detection, journal publication
Sequential information in polyphonic sound event detection, conference publication

SED

Sound Event Detection (SED): Designing robust models to detect and timestamp sound events in real-world, often noisy, environments. See a detailed introduction to sound event detection in everyday environments.

Introduction to sound event detection in everyday environments
Sound event detection with soft labels, conference publication
Sound event detection: a tutorial, Magazine publication
Sound event detection in the DCASE 2017 Challenge, journal publication
Convolutional recurrent neural networks, journal publication

ASC

Acoustic Scene Classification (ASC): the process of training computers to identify different environments, such as streets, parks, or offices, based on their distinct acoustic characteristics.

Data-efficient low-complexity ASC, conference publication
Low-complexity ASC for multi-device audio, conference publication
Audio-visual scene analysis, conference publication
Generalization across devices, conference publication
Closed and open set classification, conference publication

Evaluation

Evaluation Frameworks: Developing standardized metrics, protocols, and datasets for benchmarking audio analysis systems.

Evaluation metrics for sound event detection
Evaluation toolbox for Sound Event Detection
Datasets and Evaluation, book chapter
Metrics for polyphonic sound event detection, journal publication
Joint measurement of localization and detection of sound events, conference publication
Benchmarks for DCASE Challenge

Datasets

Dataset Development: Creating annotated audio datasets and evaluation benchmarks for reproducible research, and the release of open datasets.

Data acquisition: audio collection, annotation, and dataset creation
Details about the key datasets used in my research
Open datasets created and maintained by me
Data-related tool
DCASE DataList

Real-time

Real-time Audio Analysis: Implementing real-time systems for on-device and streaming audio recognition.
See details about some related projects:

Academic Contributions

I have spent my career building the field of environmental audio analysis from the intersection of machine learning, audio signal processing, and computational auditory scene analysis. My work has resulted in over 75 peer-reviewed publications, several international patents, and more than 8K citations. I have also co-authored and released over 20 open-access audio datasets, which have been widely utilized in academic research and machine learning competitions.

For more information about my research, please visit the following pages:

Publish

Academic Impact: Over the past decade, my research has focused on neural networks, deep learning, and audio signal processing.

I have authored more than 60 publications, which have collectively received over 8K citations with an h-index of 35.
According to SciVal, I rank 8^th globally in scholarly output and 20^th in field-weighted citation impact within the topic area "Neural networks; Deep learning; Audio Signal Processing" (2014-2023).

Open Data

Open Data and Community Resources:

Co-authored and published over 20 open-access audio datasets while working at Tampere University. The datasets have been downloaded more than 200K times. The datasets are heavily utilized in machine learning competitions and academic studies as benchmarks for evaluating state-of-the-art systems.
Maintaining DCASE DataList

Evaluation

Evaluation Standards and Tools: I have contributed to defining standard evaluation metrics for SED and sound event localization and detection (SELD), and helped establish evaluation setups for ASC:

Metrics for polyphonic sound event detection, journal publication
Joint measurement of localization and detection of sound events, conference publication

I also maintain several open-source toolboxes that are widely used in the community:

sed_eval (83K pip downloads, 143 GitHub stars)
dcase_util (128K pip downloads, 130 GitHub stars)
sed_vis (120 GitHub stars)

Community

Service to the Research Community:

I actively contribute as a peer reviewer for leading journals and conferences (e.g., IEEE TASLP, IEEE JSTSP, ICASSP, WASPAA, EUSIPCO, DCASE, ISMIR, and AES).
I have taken on several organizational roles within the DCASE Challenge, such as task coordinator and webmaster.
I have also contributed to the DCASE Workshop in various capacities, including proceedings editor, technical chair, and publication chair.

Visuals

Academic Illustrations in DCASE Research:

Numerous illustrations for educational and scientific purposes, visually explaining complex concepts such as sound event detection, acoustic scene classification, and audio tagging.
Illustrations are provided with Creative Commons license to enhance accessibility and understanding in the field, benefiting researchers, educators, and students.

Bio

Academic biography: See a concise overview of my career in academia, highlighting relevant qualifications, experience, and accomplishments.

Community and Open Science

I’m an active contributor to the DCASE (Detection and Classification of Acoustic Scenes and Events) community. Since its inception, I’ve played a key role in shaping the initiative into a globally recognized platform. Over the years, my roles have included:

Task coordinator for Acoustic Scene Classification and Sound Event Detection challenge tasks:
- Designer of evaluation protocols
- Public datasets for challenge tasks
- Developer of baseline systems
- Open-source tools
Webmaster for the DCASE community website