Active vision is critical for navigating complex, unstructured environments like agricultural fields, where occlusions, diverse scales, and unknown elements can obscure task-relevant information. This paper investigates the use of deep learning architectures to estimate information gain and expected loss in continuous, multidimensional observation spaces from sequential camera inputs of small environment segments.In such environments, local estimations can be composed on-the-fly to predict the contribution of successive viewpoints, guiding active exploration strategies to efficiently cover the entire area. We compared multi-task architectures with various prediction heads for state estimation, information gain, expected loss, and best view prediction from observation sequences. Our results show that entropy-minimizing and loss-maximizing strategies outperform random sampling, with accuracy improvements of up to 11.9%. However, training multiple model heads simultaneously presented challenges, with convergence issues and training instability depending on the optimization problem formulation.Future work will explore adaptive multi-task training strategies, the impact of dataset size, and whole environment mapping. Our findings demonstrate the potential of deep learning in optimal sampling for complex environments, highlighting the integration of uncertainty estimation models with active vision systems as a promising direction for enhancing decision-making processes in real-world applications.
Masiero, E., Bursic, S., Trianni, V., Vizzari, G., Ognibene, D. (2024). In Search of Compositional Multi-Task Deep Architectures for Information Theoretic Field Exploration. In 20th IEEE International Conference on Automation Science and Engineering, CASE 2024 (pp.612-617). IEEE Computer Society [10.1109/CASE59546.2024.10711675].
In Search of Compositional Multi-Task Deep Architectures for Information Theoretic Field Exploration
Masiero E.;Bursic S.;Vizzari G.;Ognibene D.
2024
Abstract
Active vision is critical for navigating complex, unstructured environments like agricultural fields, where occlusions, diverse scales, and unknown elements can obscure task-relevant information. This paper investigates the use of deep learning architectures to estimate information gain and expected loss in continuous, multidimensional observation spaces from sequential camera inputs of small environment segments.In such environments, local estimations can be composed on-the-fly to predict the contribution of successive viewpoints, guiding active exploration strategies to efficiently cover the entire area. We compared multi-task architectures with various prediction heads for state estimation, information gain, expected loss, and best view prediction from observation sequences. Our results show that entropy-minimizing and loss-maximizing strategies outperform random sampling, with accuracy improvements of up to 11.9%. However, training multiple model heads simultaneously presented challenges, with convergence issues and training instability depending on the optimization problem formulation.Future work will explore adaptive multi-task training strategies, the impact of dataset size, and whole environment mapping. Our findings demonstrate the potential of deep learning in optimal sampling for complex environments, highlighting the integration of uncertainty estimation models with active vision systems as a promising direction for enhancing decision-making processes in real-world applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.