This page provides a curated overview of my research journey, highlighting a diverse portfolio of projects developed over the past two decades.

Introduction

My research covers various fields, including smart cities, healthcare, assistive technologies, and music information retrieval, unified by a central goal: improving how machines perceive and interpret sound in real-world environments. Across these projects, my research places strong emphasis on data collection and curation, the development of robust evaluation methodologies and protocols, practical system deployment, interdisciplinary collaboration, and fostering community engagement within the field of machine listening.

Research Journey

The timeline below outlines the evolution of my research over the past two decades, emphasizing key projects and thematic shifts that have occurred. I started with early work in music information retrieval and have since progressed to international initiatives centered on audio analytics in smart cities and acoustic scene understanding. Each milestone represents a significant advancement in our understanding of sound in real-world environments.

--- displayMode: compact # compact config: theme: forest gantt: # topPadding: 5 leftPadding: 83 rightPadding: 8 topAxis: false numberSectionStyles: 2 barHeight: 24 barGap: 6 fontSize: 12 sectionFontSize: 12 # gridLineStartPadding: 0 --- gantt todayMarker off dateFormat YYYY-MM-DD axisFormat %Y section Acoustic Scene
Understanding - : done, 2023-09-01, 2025-08-31 section Audio Content
Analysis Smart Cities: done, 2021-08-01, 2023-12-30 Everyday Environments : done, 2015-06-01, 2020-06-30 %% Interactive Learning : done, 2014-09-01, 2015-12-31 Assistive : done, 2014-05-01, 2016-12-31 Groundwork : done, 2009-01-01, 2013-06-01 section Acoustic
Monitoring Healthcare : done,2017-09-01, 2020-08-31 Noise : done, 2018-03-01,2020-03-31 Noise : done, 2013-01-01,2015-04-30 section Music
Information
Retrieval Instruments : done, 2006-04-01, 2008-12-31 Genre : done, 2002-01-01,2003-12-31 section Electroacoustics Loudspeaker Modeling : done, 2004-01-01, 2006-03-31 %% section Supervision &
Advisory Roles %% Zero-Shot Learning: active, 2022-06-01, 2025-09-01 %% Spkr Distortions : active, 2024-06-01, 2025-05-31 %% Captioning : active, 2022-08-01, 2023-09-30 %% Active Learning : active, 2017-01-01, 2022-01-19 %% Traffic Mon. : active,2018-05-01,2019-08-31 %% Semi-Supervised Learning: active, 2012-06-01, 2013-10-31 %% Guitar Transcription: active,2011-05-01,2011-12-31 %% Spkr Modeling : active, 2006-04-01, 2010-03-31 %% section %% Msc : milestone, 2003-02-14, 0d %% Phd : milestone, 2021-06-18, 0d

I am also actively involved in guiding students at various stages of their academic journey.

See page about my advisory involvement

Acoustic Scene Understanding

Research project

2023 — 2025

This project, funded by a personal grant from theTampere Institute for Advanced Study, aims to develop a multi-tiered audio analysis framework for acoustic scene understanding. The goal is to enable the automatic extraction of contextual meaning from everyday sounds.

You can find an introduction video to the project below:

Audio Content Analysis in Everyday Environments

Understanding the sounds we hear every day is really important for analyzing audio content. In my research, I work on creating smart systems that can recognize, categorize, and explain different sound events in the busy, real-world environments we all encounter. I work on applications that aim to improve lives in areas like smart cities, healthcare, assistive technologies, and environmental monitoring. By using machine learning, audio signal processing, and analyzing large amounts of data, I want to create systems that can better understand and respond to the sounds around them. My goal is to make technology more connected to its environment.

Audio Analytics for Smart Cities

MARVEL project

2021 — 2023

MARVEL logo

MARVEL (Multimodal Extreme Scale Data Analytics for Smart Cities Environments) was an EU-funded project under Horizon 2020 that brought together 17 partners from 12 different countries. Its main goal was to help smart city authorities make better use of innovative technologies, such as artificial intelligence and high-performance computing. The project combined these technologies into an Edge-Fog-Cloud framework to enhance city management and efficiency.

The core mission of MARVEL was to enable real-time, privacy-aware, and context-sensitive analytics of multimodal audio-visual data, enhancing situational awareness, public safety, urban mobility, and citizen engagement.

I contributed to the project in both project management and research roles, managing Tampere University’s participation as project manager, work package leader, task leader, and standardization manager within the consortium. My research focused on environmental sound classification and tagging, sound event detection, and automatic audio captioning for smart city applications.

In this project, we adapted state-of-the-art academic methods to real-world use cases, including vehicle detection, human action recognition, and hazardous sound detection. The project integrated audio analytics into smart city infrastructure, and demonstrated the value of sound as a critical sensing modality and promoted its broader adoption in urban environments.

Throughout the project, I developed and tested sound recognition systems for classification, tagging, detection, and automatic captioning. I optimized the processing pipelines for real-time analysis, and these systems were deployed as containerized services within the monitoring infrastructure developed during the project. In addition to technical development, I mentored and supervised research assistants and oversaw Master’s thesis work focused on the requirements and implementation of automatic audio captioning within such infrastructure.

Leadership in project coordination: Served as project manager for Tampere University’s involvement in the MARVEL consortium, contributing both technically and strategically.
Advanced audio analysis modules: Delivered state-of-the-art audio content analysis components tailored for real-world smart city applications.
Unified audio analytics component: Developed a versatile component for environmental audio analysis tasks within the MARVEL infrastructure. Application-specific configurations are set at deployment within a Kubernetes environment, with AI models dynamically loaded from a MinIO-based model repository. The component integrates into the MARVEL system via MQTT messaging using RabbitMQ.

Conf

Siniša Suzić, Irene Martín-Morató, Nikola Simić, Charitha Raghavaraju, Toni Heittola, Vuk Stanojev, and Dragana Bajović. UNS exterior spatial sound events dataset for urban monitoring. In 2024 32th European Signal Processing Conference (EUSIPCO), volume, 176–180. 2024. 1 cite

Introduction

Research Journey

Acoustic Scene Understanding

Audio Content Analysis in Everyday Environments

Audio Analytics for Smart Cities

Key achievements

Demonstration Videos of Audio Analytics Components

Publications

UNS Exterior Spatial Sound Events dataset for urban monitoring

Abstract

Enhancing Domain-Specific Automated Audio Captioning: a Study on Adaptation Techniques and Transfer Learning

Abstract

MARVEL D3.5 - Multimodal and privacy-aware audio-visual intelligence – final version

Abstract

Low-Complexity Acoustic Scene Classification in DCASE 2022 Challenge

Abstract

MARVEL - D5.1: MARVEL Minimum Viable Product

Abstract

MARVEL - D5.2: Technical evaluation and progress against benchmarks – initial version

Abstract

Analysis of Everyday Soundscapes

Key achievements

Publications

Acoustic Scene Classification in DCASE 2019 challenge: closed and open set classification and data mismatch setups

Abstract

Keywords

City classification from multiple real-world sound scenes

Abstract

Keywords

Joint Measurement of Localization and Detection of Sound Events

Abstract

Keywords

Sound event detection in the DCASE 2017 Challenge

Abstract

Keywords

Sound event envelope estimation in polyphonic mixtures

Abstract

Keywords

A multi-device dataset for urban acoustic scene classification

Abstract

Keywords

Using Sequential Information in Polyphonic Sound Event Detection

Abstract

Keywords

Acoustic Scene Classification: An Overview of DCASE 2017 Challenge Entries

Abstract

Keywords

Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge

Abstract

Keywords

The Machine Learning Approach for Analysis of Sound Scenes and Events

Abstract

Datasets and Evaluation

Abstract

Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017)

DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System

Abstract

Keywords

Assessment of Human and Machine Performance in Acoustic Scene Classification: DCASE 2016 Case Study

Abstract

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Abstract

Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016)

Domestic Audio Tagging with Convolutional Neural Networks

Abstract

Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features

Abstract

Keywords

TUT Database for Acoustic Scene Classification and Sound Event Detection

Abstract

Keywords

Metrics for Polyphonic Sound Event Detection

Abstract

Automatic recognition of environmental sound events using all-pole group delay features

Abstract

Multi-Label vs. Combined Single-Label Sound Event Detection With Deep Neural Networks

Abstract

Polyphonic Sound Event Detection Using Multi Label Deep Neural Networks

Abstract

Real-Time Sound Event Detection for Assistive Technology Applications