This glossary brings together key terminology from the field of computational audio content analysis.
Originally developed to provide Finnish translations for essential terms, the glossary aims to support consistent usage within the Finnish research community. To enhance its broader utility, each entry now includes a concise English definition, along with links to relevant Wikipedia and Wiktionary pages. For greater accessibility, selected terms have also been translated into German, Spanish, and French, using Wikipedia as a reference.
Please note that this glossary is a work in progress and is not intended to be exhaustive.
The data file used to generate this glossary is available in the following repository:
If you notice an error or would like to contribute, feel free to submit a pull request to the repository or contact me via email.
Special thanks to
- Tomasz Mąka for the Polish translations
- Irene Martin Morato for the Spanish translations
Related Dictionaries and Glossaries:
- English-Finnish Dictionary for General Audio Signal Processing Terms by Vesa Välimäki
- English-Finnish Dictionary for Statistics and Probability Theory Terms by Petri Koistinen
- English-Finnish Dictionary/Glossary for Language Technology by Kimmo Koskenniemi
- Bank of Finnish Terminology in Arts and Sciences
- Glossary of Statistical Terms by ISI
- Machine Learning Glossary by Google
- Tilastotieteen sanasto by Juha Alho, Elja Arjas, Esa Läärä ja Pekka Pere
Terms 639,
Translations
532
141
153
277
121 ,
Updated 2025-08-04
A
accuracy
The fraction of system output which was predicted correctly
See also: evaluation metric





acoustic feature
See also: feature



acoustic model
in speech recognition system, model learned from acoustic data



acoustic monitoring

acoustic pattern recognition

acoustic scene


acoustic scene analysis

acoustics




activation function
in neural network, a function to define the output of a neuron




active learning
additive model

additive noise


agglomerative hierarchical clustering

aggregation





algorithm




aliasing




Amazon mechanical turk (AMT)
crowdsourcing marketplace enabling the use of human intelligence to perform tasks
angular distance

annotation
adding metadata to audio



annotation granularity
annotator


anomalous sound detection

application programming interface (API)


area under the curve (AUC)
in binary classification, an evaluation metric to considers all classification thresholds
See also: reciever operating characteristic curve



artifical general intelligence (AGI)
See also: strong artificial intelligence, full artificial intelligence




artificial intelligence (AI)
an ability to have machines act with apparent intelligence





artificial neural network
See also: neural network

assisted living
a housing facility for people with disabilities or for adults who cannot or choose not to live independently



attention

attention mechanism
See also: attention

attribute

audification

audio analysis


audio caption
See also: automated audio captioning, audio captioning

audio captioning
See also: automated audio captioning

audio classification
See also: classification



audio dataset
a collection of audio examples used for system development

audio retrieval

audio signal processing



audio source separation


audio tagging
audio-visual, audiovisual




audiovisual data

auditory
relating to hearing



auditory event
Subjective perception of sound
See also: auditory scene




auditory scene

auditory scene analysis (ASA)
a model proposed by Albert Bregman for the basis of auditory perception


augmented intelligence


auralization



autoencoder
See also: neural autoencoder



automated audio captioning (AAC)
B
backbone network

background noise



backpropagation, backprop
method used in neural networks to calculate gradient descent



backpropagation algorithm

backward pass

bag of frames
representing frames without taking into account their order
balanced accuracy (BACC)

bandwidth

baseline

baseline system
batch
in neural network training, a set of examples used in one iteration for model training

batch normalization (BN)
a technique for improving the performance and stability of neural networks
See also: deep neural network, neural network, batch

beamforming
technique used in sensor arrays for directional signal reception or transmission



belief network

bias

big data



bigram
binary classification
a type of classification which outputs one of two mutually exclusive classes



binary mask

binaural
related to two ears


bioacoustics
cross-disciplinary science that combines biology and acoustics





block mixing
data augmentation technique
See also: data augmentation
boosting
a machine learning technique which iteratively combines weak classifiers into a classifier with higher accuracy

branch

brute-force search
Systematically going through all possible candidate solutions for the problem
See also: exhaustive search




C
category
a group to which items are assigned based on similarity or defined criteria



cepstrum

class label

classification
identification of which categories an item belong




classification accuracy
See also: accuracy

classification error

classification model
See also: model, classification
classification of events, activities and relationships (CLEAR)
evaluation campaign organized on 2006 and 2007
classification threshold
See also: classification
classifier
See also: classification



closed set

closed set classification
See also: open set classification, closed set
closed world problem

cluster
See also: cluster analysis




cluster analysis




clustering
grouping related examples together





cognitive modeling



collaborative learning
See also: federated learning


computational audio content analysis
See also: content analysis

computational auditory scene analysis (CASA)

computational linguistics
an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective

computational modeling

computer audition (CA)
field of study of algorithms and systems for audio understanding by machine

computer science




conditional probability




confidence interval (CI)
interval estimate computed from the statistics of the observed data, that might contain the true value of an unknown population parameter




confusion matrix
an NxN table to summarize classification performance (predicted class versus actual class)



connectionist temporal classification (CTC)
consistent

constant-Q cepstral coefficients (CQCC)
constant-Q transform (CQT)
content analysis





context


context-aware
context vector

contrastive learning

converged

convolution
mathematical operation of two functions to produce a third function that expresses how the shape of one is modified by the other





convolution kernel

convolutional neural network (CNN)
a neural network with convolutional layers along with pooling and fully connected layers
See also: neural network, deep neural network



convolutional recurrent neural network (CRNN)
See also: neural network, deep neural network
corpus


cost function
See also: lost function

critical band
cross-attention

cross-entropy




cross-entropy loss
See also: cross-entropy, entropy

cross-validation
a method for estimating generalization of a system for new data by reserving a subset of dataset only for testing





crowdsourcing

D
data

data acquisition



data augmentation
artificially increasing the number of training examples

data mining



data post-processing
See also: data preprocessing

data preprocessing
See also: data post-processing

data structure




data visualization

dataset

decision boundary
learned separating boundary between classes

decision function

decision tree
a learning method using tree-like decision graph



decorrelated

deep belief network (DBN)


deep learning (DL)
a multi-level algorithm that gradually identifies things at higher levels of abstraction


deep machine learning (DML)
See also: deep learning
deep neural network (DNN)
neural network containing multiple hidden layers

denoising

detection



detection and classification of acoustic scenes and events (DCASE)
detection error tradeoff (DET)
Plot of the false rejection rate versus false acceptance rate for classification systems

deterministic



diarization error rate (DER)
dimensionality


direction of arrival (DOA)


direction of arrival estimation (DOAE)
See also: direction of arrival
discrete cosine transform (DCT)
Transform to represent data points with a sum of cosine functions




discrete-time Fourier transform (DFT)


discriminant analysis





discriminative learning
modeling the dependence of a target variable y on an observed variable x
See also: generative learning
disentangled representation
disentangled representation learning
dissimilarity
See also: similarity




distance measure

distance metric

divisive hierarchical clustering

domain

domain adaptation
machine learning field to deal with cases in which a model trained on source distribution is used on different target distribution



domain generalization
domain shift
downmixing, down-mixing
mixing audio channels together
downstream task
duration
a length of audio signal


dynamic range





E
early fusion
features from multiple sources are combined into a single feature set before feeding to a classifier.
See also: feature level fusion
edge AI
Utilizing artificial intelligence in an edge-computing environment.
See also: edge computing

edge computing

embedding model

embeddings
a low-dimensional space into which high-dimensional vectors can be translated
See also: word embedding

empirical



ensemble
ensemble learning
use multiple learning algorithms to obtain better predictive performance than any of the constituent learning algorithms alone


entropy

envelope function

epoch
while traning neural networks, one pass of the full training set
See also: deep neural network, neural network


equal error rate (EER)
error rate

Euclidean distance



evaluation metric
event-based metric
See also: evaluation metric

event offset
event onset
everyday environment
everyday listening
the interpretation of the sound in terms of its source
exhaustive search
See also: brute-force search

expectation maximization (EM)
an iterative method to find maximum likelihood or maximum a posteriori estimates of parameters in statistical models




expectation value

experiment

experimental design

F
F-score, f1-score
an evaluation metric to take into account both the precision and the recall
See also: evaluation metric


false negative (FN)
an example wrongly predicted as negative class




false positive (FP)
an example wrongly predicted as positive class




fast Fourier transform (FFT)





feature
a measurable property of the acoustic signal
See also: acoustic feature



feature engineering
using domain knowledge of the data to manually create suitable features for machine learning
See also: feature learning
feature extraction




feature learning
automatically discover needed representations
See also: feature engineering


feature level fusion
features from multiple sources are combined into a single feature set before feeding to a classifier.
See also: early fusion
feature selection




federated learning
machine learning technique to train a model across multiple devices without sharing local data examples
See also: collaborative learning




feed forward network
See also: deep neural network, neural network



feedback

feedforward

feedforward neural network (FNN)
See also: deep neural network, neural network



few-shot learning
See also: one-shot learning
filter





filter bank
an array of band-pass filters



foley
Reproduction of everyday sounds in filmmaking
folksonomy
classification based on user's tags



forward pass

foundation model



frame





frame blocking
See also: frame
frame stacking
free field

frequency domain
See also: time domain





frequency resolution



full artificial intelligence (Full AI)
fully connected layer
See also: deep neural network, neural network

fundamental frequency


fuzzy logic




G
gammatone feature cepstral coefficients (GFCC)
gammatone filter
gated recurrent unit (GRU)
See also: neural network, deep neural network
Gaussian mixture model (GMM)
See also: mixture model


generalisation, generalization
generalize

generative adversarial network (GAN)
technique where a generator generates data candidates and a discriminator evaluates them.
See also: deep neural network, neural network


generative learning
See also: discriminative learning

gradient descent




graph neural network (GNN)

graphical processing unit (GPU)




ground truth
See also: reference label, annotation

H
hand-crafted feature
using domain knowledge of the data to manually create suitable features for machine learning
See also: feature engineering
hard-coded

harmonic



head-related transfer function (HRTF)


heterogeneous

heterogeneous dataset

heuristic
a practical and suboptimial solution



hidden layer
in neural network, layer between the input layer and the output layer
See also: deep neural network, neural network



hidden Markov model (HMM)





hierarchical classification





hierarchical clustering

histogram




histogram of oriented gradients (HOG)

holdout data
examples which are only used for testing the system's performance
See also: cross-validation
homogeneous

hyperparameter
in machine learning, a variable which is set before the learning process starts
See also: parameter

hyponym

I
i-vector
implementation



impulse response




independent component analysis (ICA)




indexing


inference

information retrieval


information theory




input


input layer
See also: deep neural network, neural network


inter-annotator agreement
a measurement of how well human annotators agree while annotation task

interclass correlation




intermediate statistics
intraclass correlation




intrinsic dimension

inverse fast Fourier transform (IFFT)


J
jackknife estimator

jackknife method

jitter



K
k-fold cross-validation
See also: cross-validation

k-nearest-neighbor (kNN)
See also: nearest neighbor

kernel



knowledge



Kullback–Leibler divergence




L
labeled data

labeled example
an example with audio and assigned category label
labeling


language acquisition




language-based audio retrieval
See also: audio retrieval
language model
See also: large language model, small language model




language-queried audio source separation
See also: audio source separation
large language model (LLM)
See also: language model, small language model

late fusion
Combaning outputs from multiple classifiers.
latent variable
layer
in neural network, a set of neurons
See also: deep neural network, neural network



lazy learner

leaderboard
a board showing the ranking of participant in a competition

leaf node

learning rate
a hyperparameter to control the size of the learning step, gradient step
See also: deep neural network, neural network


leave-one-out cross-validation (LOOCV)

likelihood





likelihood ratio

likelihood ratio test




linear discriminant analysis (LDA)

linear prediction
a mathematical operation to estimate future values as a linear function of previous values




linear prediction cepstral coefficients (LPCC)
linear regression




local binary patterns (LBP)


localization



log-likelihood

logistic regression
statistical model to use logistic function to model a binary dependent variable




long short-term memory (LSTM)
See also: deep neural network, neural network


loss

loss function
a function to measure how far prediction are from its label
See also: cost function




loudness





loudness level


low-complexity model

M
machine learning (ML)
field of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" from data, without being explicitly programmed
See also: deep machine learning





machine learning operations (MLOps)

machine listening
field of study of algorithms and systems for audio understanding by machine


machine-to-machine interaction
machine vision



macro-averaging
See also: evaluation metric

magnitude response


majority voting

masked multi-head attention
See also: attention mechanism, multi-head attention

maximum a posteriori estimator (MAP estimator)

maximum likelihood (ML)
See also: maximum likelihood estimator

maximum likelihood estimator (MLE)
See also: maximum likelihood




mean absolute error (MAE)



mean square error (MSE)
See also: root mean square error





mean squared error (MSE)




mel-frequency cepstral coefficients (MFCCs)



mel scale
non-linear perceptual frequency scale where listners judge frequencies to be equal in distance from one another.



mel-scaled spectrogram
meta learning


metadata

micro-averaging
See also: evaluation metric

mini-batch
See also: batch

misclassification




missing label
mixture model
See also: Gaussian mixture model


mixture signal
modal

modality
model
in machine learning system, a parameter set learned from the training data



model selection
modeling

monaural
related to one ears

monitoring


monoaural
See also: monophonic




monophonic
See also: monoaural

multi-annotator
multi-class classification
classification type where prediction is done between three or more classes


multi-condition training
multi-head attention
See also: attention mechanism

multi-label classification
classification type where multiple class labels may be assigned to each instance
See also: single-label classification
multi-modal

multi-task learning
approach where multiple learning tasks are solved at the same time


multichannel, multiple channel
See also: single-channel


multiclass classification


multilayer perceptron (MLP)
See also: neural network, deep neural network





multimodal

multiple kernel learning (MKL)
machine learning method to use a predefined set of kernels and learn optimal combination of these kernels
music information retrieval (MIR)
interdisciplinary science of retrieving information from music


N
naive Bayesian classification


naive listener

narrowband



natural language processing (NLP)




near field


nearest neighbor
See also: k-nearest-neighbor



nearest neighbor classifier
See also: k-nearest-neighbor

neural autoencoder
See also: autoencoder

neural network (NN)
network of (artificial) neurons





neuron
a node in a neural network taking in multiple values and generating single value as an output



node

noise




noise suppression


noisy label
non-negative matrix factorization (NMF)



nonlinear, non-linear


normal distribution





normalization
converting values into standard range of values



null hypothesis
general statement that there is no relationship between two measured phenomena





O
objective
a metric the algorithm tries to optimize


one-hot encoding
representing categorical variables as binary vectors so that only single element is set to one
one-shot learning
machine learning approach where aim is to learn from a single training example
ontology
a structure of concepts or entities within a domain which are organized by relationships





open data



open set

open set classification
See also: closed set classification, open set
open world problem

optimization

optimizer
in neural network, an implementation of gradient descent algorithm


order of magnitude

outliers
observation points that are distant from other observations





output
See also: input


output layer
last layer of a neural network outputting predictions
See also: deep neural network, neural network


overfitting
a model that models the training data too closely and fails to predict correcly on new data
See also: underfitting





P
paralinguistics

parallel



parameter
in machine learning, a variable which is adjusted during the learning process
See also: hyperparameter

parametrization

parsing

part of speech (POS)
See also: part-of-speech tagging





part-of-speech tagging (POS tagging)
the process of marking up a word in a text as corresponding to a particular part of speech
See also: part of speech




pattern


pattern recognition





perception




perceptron
perceptual
See also: perception


perceptual spread
performance
in machine learning, refers to the goodness of the model's predictions


performance metric

pilot study

pitch




pitch shifting
See also: pitch


polyphonic annotation
pooling
in neural network, reducing matrix into a smaller matrix
See also: deep neural network, neural network

post-pruning

posterior probability




pre-trained model
a model which has been already trained


precision
a measure how often prediction is correct when predicting the positive class
See also: recall, F-score, evaluation metric




prediction error


principal component analysis (PCA)
a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components



prior distribution

prior probability, a priori probability
See also: prior distribution




probability





probability measure

prototypical network
pruning


psychoacoustics
the scientific study of sound perception and audiology





Q
quantization, quantizing





R
random effect



random forest (RF)
an ensemble learning method which constructs a multiple decision trees at training stage



random noise



random selection

randomization




randomness




ranking

recall
a measure how many positive classes were correctly predicted
See also: precision, F-score, evaluation metric


receptive field

reciever operating characteristic curve (ROC curve)
a curve of true positive rate versus false positive rate at different classification thresholds

recognition


record

recurrent neural network (RNN)
a neural network to model sequential interactions through a hidden stage or memory
See also: neural network, deep neural network




recursive quantitative analysis
reference data

reference label
See also: ground truth, annotation
regression

regression analysis




regularization
in machine learning, penalizes a model's compelixity in order to prevent overfitting




reinforcement learning
machine learning technique to focusing on peformance, finding a balance between exploration of new knowledge and exploitation of current knowledge




repeatability



replicability

repository


reproducibility




retrieval


reverberation




reverberation time (RT)

robitics




robust classification


robust estimator

robustness




room acoustics

room response

room simulation

root mean square error (RMSE)
See also: mean square error




roughness

S
saliency

salient

sample

sample space

sampling frequency
See also: sampling rate



sampling rate
See also: sampling frequency


scalability




search algorithm




segment-based metric
See also: evaluation metric

segmentation



self-attention
See also: attention mechanism

self-organizing map (SOM)
artificial neural network that is trained using unsupervised learning to produce a low-dimensional discretized representation of the input space





self-supervised learning (SSL)




semantic information

semi-supervised learning
machine learning technique to use small amount of labeled data and large amount of unlabeled data in the learning stage



sensitivity
See also: evaluation metric


sensor


sensor node
See also: sensor


sequential analysis

shallow network
sharpness
short-time Fourier transform (STFT)




signal modeling


signal processing





signal-to-interference ratio (SIR)


signal-to-noise ratio (SNR)




significance

significance level



significance level, level of significance

similarity





similarity matrix

similarity measure
a measure to determine how similar two examples are
See also: similarity



single-channel
See also: multichannel
single-label classification
classification type where single class label may be assigned to each instance
See also: binary classification, multi-label classification
sinusoidal modeling
situational awareness (SA)
the perception of environmental elements and events with respect to time or space, the comprehension of their meaning, and the projection of their future status



small language model (SLM)
See also: language model, large language model

smoothing




softmax

sonification




sound event



sound event detection (SED)
See also: sound event


sound event instance
See also: sound event

sound event localization and detection (SELD)

sound pressure





sound pressure level (SPL)
See also: sound pressure


sound quality



sound scene synthesis

sound source

sound source distance estimation
soundscape


source distance estimation (SDE)

source proximity


source separation
See also: audio source separation
sparse matrix
matrix which has elements predominantly zero





sparsity
number of zero elements a matrix divided by the total number of elements
speaker diarisation
Process of spliting audio signal in to segments accroding to the speakers

specificity
See also: evaluation metric

spectral analysis

spectral centroid


spectral clustering
grouping related examples together using the eigenvalues of similarity matrix


spectral envelope


spectral flatness
spectral flux


spectral moments
spectral roll-off

spectral slope

spectrogram





spectrum





speech analysis


speech and audio analysis

speech enhancement
improvement of speech quality by using various algorithms

speech processing


speech recognition





speech segmentation



speech separation
spoken language understanding (SLU)
Extraction of the meaning out of the speech by using automatic speech recognition and natural language understanding.

standard deviation


statistic

statistical

statistical model





statistical significance




statistics

stochastic

stochastic gradient descent (SGD)



stochastic model

stopping rule

stratification


stratified sampling




stride
in convolution or pooling, the delta on horizontal or vertical dimension of the next input slice
strong annotation
See also: annotation, weak annotation
strong artificial intelligence (Strong AI)
See also: artifical general intelligence, full artificial intelligence

strong label
See also: strong annotation, weak label, weak annotation
study

subband power distribution (SPD)
subsampling layer

supervised learning
learning method which learns from labeled examples
See also: unsupervised learning





support vector machine (SVM)



survey

system


system development

T
t-distributed stochastic neighborhood embedding (t-SNE)

tag


taxonomy
a classification in a hierarchical system




temporal integration

tensor




test example

test set
subset of data used to test the system, disjunct from the training set
See also: training set, validation set

testing data
See also: test set
textual label
texture

threshold value

timbre




time-dependent

time domain




time domain envelope


time-frequency distribution

time-frequency representation


time stretching
changing the duration of an audio singal without affecting its pitch
See also: pitch shifting
timestamp




tolerance

training
a process of determining the optimal parameters of the model


training data
See also: training set


training example

training set
subset of data used to train the system, disjunct from the test set
See also: test set, validation set

traininig sample

transfer function




transfer learning
a research problem focusing on storing knowledge gained while solving one problem and applying it to a different problem



transformation

transformer
deep learning model that is utilizing self-attention mechanism to solve sequence-to-sequence task.



transient

transition probability

travelling salesman problem




trigram
true negative (TN)
an example correctly predicted as negative class


true negative rate (TNR)

true positive (TP)
an example correctly predicted as positive class


true positive rate (TPR)

U
unbalanced data

unbiased

uncertainty



uncorrelated

underfitting
a model with low predictive ability because it does not model the training data well nor it does generalize to new data
See also: overfitting

uniform distribution

unimodal

unlabeled data

unsupervised learning
machine learning technique to learn from unlabeled data
See also: supervised learning




V
validation


validation data

validation example

validation set
subset of data used to adjust hyperparameters, disjunct from the training set and test set
See also: training set, test set

variability

variable

variance

visualisation, visualization

vocalization, vocalisation
W
waveform


wavelets



weak annotation
See also: annotation
weak artificial intelligence (Weak AI)



weak label
See also: weak annotation, strong label, strong annotation
weak supervision
learning approach where noisy, limited or imprecise data is used to supervise the labelling process of larger training data to be used in supervised learning setting
See also: supervised learning, unsupervised learning
weakly labeled
See also: annotation, weak annotation
wideband


wildlife monitoring

windowing
See also: windowing function


windowing function
function that is zero-valued outside of some chosen interval
See also: windowing




word embedding
mapping word or phrase from the vocabulary into vector of real numbers
See also: embeddings

word error rate (WER)

word sense disambiguation
identifying which sense of a word is used in a sentence, when the word has multiple meanings




WordNet

Z
zero crossing rate (ZCR)

zero-shot learning (ZSL)
