Predicting the sound quality of an environment represents an important task especially in urban parks where the coexistence of sources of anthropic and biophonic nature produces complex sound patterns. To this end, an index has been defined by us, denoted as soundscape ranking index (SRI), which assigns a positive weight to natural sounds (biophony) and a negative one to anthropogenic sounds. A numerical strategy to optimize the weight values has been implemented by training two machine learning algorithms, the random forest (RF) and the perceptron (PPN), over an augmented data-set. Due to the availability of a relatively small fraction of labelled recorded sounds, we employed Monte Carlo simulations to mimic the distribution of the original data-set while keeping the original balance among the classes. The results show an increase in the classification performance. We discuss the issues that special care needs to be addressed when the augmented data are based on a too small original data-set.
Benocci, R., Potenza, A., Zambon, G., Afify, A., Roman, H. (2023). Data Augmentation to Improve the Soundscape Ranking Index Prediction. WSEAS TRANSACTIONS ON ENVIRONMENT AND DEVELOPMENT, 19, 891-902 [10.37394/232015.2023.19.85].
Data Augmentation to Improve the Soundscape Ranking Index Prediction
Benocci R.;Potenza A.;Zambon G.;Afify A.;Roman H. E.
2023
Abstract
Predicting the sound quality of an environment represents an important task especially in urban parks where the coexistence of sources of anthropic and biophonic nature produces complex sound patterns. To this end, an index has been defined by us, denoted as soundscape ranking index (SRI), which assigns a positive weight to natural sounds (biophony) and a negative one to anthropogenic sounds. A numerical strategy to optimize the weight values has been implemented by training two machine learning algorithms, the random forest (RF) and the perceptron (PPN), over an augmented data-set. Due to the availability of a relatively small fraction of labelled recorded sounds, we employed Monte Carlo simulations to mimic the distribution of the original data-set while keeping the original balance among the classes. The results show an increase in the classification performance. We discuss the issues that special care needs to be addressed when the augmented data are based on a too small original data-set.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.