Scientific research implies the production of data describing phenomena still not studied and well understood. Sometimes the amount and rate of generation of produced data can be overwhelming, and anyway tools supporting a computer assisted analysis of scientific data can support systematic forms of data driven analysis. Machine learning can be an instrument in an overall flow including domain experts and computer scientists. Adopted machine learning approaches need to be unsupervised, employing just the input data as a teacher. We propose a two-step workflow: (i) achieving a compact representation of elements of the dataset by means of representation learning techniques, shifting the analysis from cumbersome representations to compact vectors in a latent space, and (ii) clustering points associated to instances to suggest patterns to the domain experts that will evaluate their potential meaning within the domain. The paper presents the rationale of the approach within a cloud based setting, and first experiments on an image dataset from the literature.
Cecconello, T., Puerari, L., Vizzari, G. (2022). Unsupervised Data Pattern Discovery on the Cloud. Intervento presentato a: 2021 International Conference of the Italian Association for Artificial Intelligence, AIxIA 2021 DP, Milano.
Unsupervised Data Pattern Discovery on the Cloud
Cecconello T.Primo
;Puerari L.;Vizzari G.
Ultimo
2022
Abstract
Scientific research implies the production of data describing phenomena still not studied and well understood. Sometimes the amount and rate of generation of produced data can be overwhelming, and anyway tools supporting a computer assisted analysis of scientific data can support systematic forms of data driven analysis. Machine learning can be an instrument in an overall flow including domain experts and computer scientists. Adopted machine learning approaches need to be unsupervised, employing just the input data as a teacher. We propose a two-step workflow: (i) achieving a compact representation of elements of the dataset by means of representation learning techniques, shifting the analysis from cumbersome representations to compact vectors in a latent space, and (ii) clustering points associated to instances to suggest patterns to the domain experts that will evaluate their potential meaning within the domain. The paper presents the rationale of the approach within a cloud based setting, and first experiments on an image dataset from the literature.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.