I am actively involved in guiding students at various stages of their academic journey, from Bachelor's theses to doctoral research. My supervision and advisory involvement focuses on topics related to general audio signal processing, machine learning, and computational sound scene analysis.

I enjoy working closely with students, sharing insights, and learning together as we tackle real-world challenges through research. The supervision is more than just guiding a project; it is about collaboration, discovery, and helping students develop into confident and independent researchers. Feel free to browse through the projects I have supervised. If something sparks your interest, don't hesitate to reach out. I am always happy to discuss potential topics and explore how we might work together.

Research Projects on Timeline

Over the course of my academic career, I have been actively involved in a wide range of research projects that have contributed to the completion of academic theses and degrees at various levels. These projects have spanned diverse topics within the fields of machine learning and audio signal processing, often forming the foundation for Master's and doctoral research.

The timeline below provides an overview of selected projects, highlighting the evolution of research themes and the wide range of topics explored over the years.

--- displayMode: normal # compact config: theme: forest gantt: topPadding: 50 leftPadding: 8 rightPadding: 8 topAxis: false numberSectionStyles: 2 barHeight: 22 barGap: 6 fontSize: 12 sectionFontSize: 16 # gridLineStartPadding: 0 --- gantt todayMarker off dateFormat YYYY-MM-DD axisFormat %Y section Advisory Involvement in Doctoral Projects Zero-Shot Learning: active, 2022-06-01, 2025-09-01 Representation Learning: active, 2021-01-01, 2024-06-01 Active Learning : active, 2017-01-01, 2020-10-01 Deep Learning: active,2015-01-01,2019-01-31 section Supervised Research Projects Speaker Distortions : done, 2024-06-01, 2025-05-31 Captioning : done, 2022-08-01, 2023-09-30 Real-time Sound Event Detection : done, 2020-01-01, 2020-11-30 Traffic Monitoring : done,2018-05-01,2019-08-31 Real-time Audio Analysis : done,2014-02-01,2014-09-30 Semi-Supervised Learning : done, 2012-06-01, 2013-10-31 Guitar Transcription : done,2011-05-01,2011-12-31 Speaker Modeling : done, 2008-04-01, 2010-03-31

Project Advisory Involvement

In addition to supervising individual theses, I have served in advisory roles for several doctoral research projects. These collaborations have focused on advancing the state of the art in audio classification, sound event detection, active learning, representation learning, and zero-shot learning.

As a doctoral advisor, my role has included guiding research direction, contributing to methodological development, and supporting students in publishing high-quality scientific work. These projects reflect long-term, in-depth engagements that align closely with my research interests and contribute to the broader academic community.

Doctoral-level projects where I have served in an advisory role:

Duygu Dogan – Zero-Shot Audio Classification (2022–2025)
Shanshan Wang – Audio-Video Feature Representation Learning (2021–2024)
Zhao Shuyang – Active Learning for Sound Event Detection (2017–2020)
Emre Cakir – Deep Neural Networks for Sound Event Detection (2015–2019)

Zero-Shot Audio Classification

Duygu Dogan

Doctoral project advisor

2022 — 2025

In recent years, the field of Zero-Shot Audio Classification has emerged as a promising approach for enabling machines to recognize sounds they have never encountered before, without the need for labeled training data. I have had the pleasure of collaborating with Duygu Dogan on a research project aimed at advancing the state of the art in this area. The project focuses on developing models that can generalize to novel sound classes by leveraging external semantic information, such as textual or visual embeddings, rather than relying solely on annotated audio datasets.

Use of image-based semantic embeddings to bridge the gap between audio and visual modalities, enabling zero-shot classification through cross-modal knowledge transfer.
Introduction of a temporal attention mechanism to enhance the model’s ability to detect and differentiate overlapping sounds in multi-label settings as this is an essential capability for real-world acoustic environments.

Conf

Duygu Dogan, Huang Xie, Toni Heittola, and Tuomas Virtanen. Multi-label zero-shot audio classification with temporal attention. In 2024 18th International Workshop on Acoustic Signal Enhancement (IWAENC), volume, 250–254. 2024. doi:10.1109/IWAENC61483.2024.10694459. 3 cites

Research Projects on Timeline

Project Advisory Involvement

Zero-Shot Audio Classification

Key findings and achievements from the project

Publications

Multi-Label Zero-Shot Audio Classification with Temporal Attention

Abstract

Keywords

Zero-Shot Audio Classification using Image Embeddings

Abstract

Keywords

Feature Representation Learning Using Audio-Video Data

Key findings and achievements from the project

Publications

Self-supervised Representation Learning on Audio-Video Data

Abstract

Positive and Negative Sampling Strategies for Self-Supervised Learning on Audio-Video Data

Abstract

Keywords

Audio-Visual Scene Classification: Analysis of DCASE 2021 Challenge Submissions

Abstract

A curated dataset of urban acoustic scenes for audio-visual scene analysis

Abstract

Keywords

Active Learning for Sound Event Classification and Detection

Key findings and achievements from the project

Publications

Clustering Analysis and Active Learning for Sound Event Detection and Classification

Abstract

Active Learning for Sound Event Detection

Abstract

An Active Learning Method Using Clustering and Committee-Based Sample Selection for Sound Event Classification

Abstract

Keywords

Learning Vocal Mode Classifiers from Heterogeneous Data Sources

Abstract

Active Learning for Sound Event Classification by Clustering Unlabeled Data

Abstract

Keywords

Deep Neural Networks for Sound Event Detection

Key findings and achievements from the project

Publications

Deep Neural Networks for Sound Event Detection

Abstract

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Abstract

Domestic Audio Tagging with Convolutional Neural Networks

Abstract

Multi-Label vs. Combined Single-Label Sound Event Detection With Deep Neural Networks

Abstract

Polyphonic Sound Event Detection Using Multi Label Deep Neural Networks

Abstract

Supervised Research Projects

Projects

2025

Compensation of Loudspeaker Nonlinearities with Deep Neural Networks

Abstract

2023

Enhancing Domain-Specific Automated Audio Captioning: a Study on Adaptation Techniques and Transfer Learning

Abstract

2020

Real-Time Sound Event Detection With Python

Abstract

Sound Based Classification of Studded Tires: Automatic Tire Classification System

Abstract

2019

Environmental sound recognition and prototype game design

Abstract

Clients

Synthetic generation of environmental audio learning examples for neural networks

Abstract

Clients

2017

Organizing acoustic scene excerpts into 2D map with t-SNE

Abstract

Clients

Acoustic scene classification on Andoird platform

Abstract

Clients

Animal onomatopoeia game