Toni Heittola, Annamaria Mesaros, Dani Korpi, Antti Eronen, and Tuomas Virtanen. Method for creating location-specific audio textures. EURASIP Journal on Audio, Speech and Music Processing, 2014.
Method for creating location-specific audio textures
Abstract
An approach is proposed for creating location-specific audio textures for virtual location-exploration services. The presented approach creates audio textures by processing a small amount of audio recorded at a given location, providing a cost-effective way to produce a versatile audio signal that characterizes the location. The resulting texture is non-repetitive and conserves the location-specific characteristics of the audio scene, without the need of collecting large amount of audio from each location. The method consists of two stages: analysis and synthesis. In the analysis stage, the source audio recording is segmented into homogeneous segments. In the synthesis stage, the audio texture is created by randomly drawing segments from the source audio so that the consecutive segments will have timbral similarity near the segment boundaries. Results obtained in listening experiments show that there is no statistically significant difference in the audio quality or location-specificity of audio when the created audio textures are compared to excerpts of the original recordings. Therefore, the proposed audio textures could be utilized in virtual location-exploration services. Examples of source signals and audio textures created from them are available at www.cs.tut.fi/~heittolt/audiotexture.
Abstract
This work proposes an approach for generating location-specific audio textures by reusing audio recordings captured at the target location. The method offers a cost-effective solution for creating rich and representative audio content for virtual location-exploration services. It addresses the challenge of acquiring extensive audio data from each location to ensure non-repetitiveness while preserving the unique auditory characteristics of the environment.
The approach consists of two main stages: analysis and synthesis. During the analysis stage, the source audio is segmented into acoustically homogeneous segments. In the synthesis stage, an audio texture is constructed by randomly selecting segments from the source material, ensuring that consecutive segments exhibit timbral similarity at their boundaries.
Listening experiments revealed no statistically significant differences in perceived audio quality or location-specificity between the synthesized audio textures and excerpts from the original recordings.
Audio Texture
An audio texture refers to a new, unique audio signal generated from a source recording captured at a specific location. The texture aims to represent the acoustic character of that location by preserving its general sound properties and characteristic sound events.
The proposed method for creating audio textures involves two main stages: analysis of the source audio and synthesis of the texture. In the analysis stage, the source audio is segmented and clustered in an unsupervised manner. The goal is to automatically identify acoustically homogeneous segments, ideally corresponding to distinct sound events within the scene.
During the synthesis stage, the audio texture is constructed by shuffling and concatenating these segments. The shuffling process is designed to maintain timbral continuity at segment boundaries, ensuring a natural and coherent listening experience.
Demonstration
Examples of generated audio texture for various locations are presented. Open demonstration by clicking image on the left. Segments used for the synthesis are presented in the lower panel.
Listening Tests
Listening tests were conducted to evaluate the quality of the synthesized audio textures and the possibility of using synthesized audio textures for representing the auditory scene of various locations.
Examples of audio samples used in the listening tests are available below.
Pub
Sample 1
Real sample
Synthesized Audio Texture
Sample 2
Real sample
Synthesized Audio Texture
Restaurant
Sample 1
Real sample
Synthesized Audio Texture
Sample 2
Real sample
Synthesized Audio Texture
Street
Sample 1
Real sample
Synthesized Audio Texture
Sample 2
Real sample
Synthesized Audio Texture
Track & Field Stadium
Sample 1
Real sample
Synthesized Audio Texture
Sample 2
Real sample
Synthesized Audio Texture