Domenico Talia

Chair of the ICT Center of Università della Calabria

Professor of Computer and Electronic Engineering,
University of Calabria

Designing Scalable Big Data Analysis on Clouds

The huge size of data stored in digital repositories requires advanced data analysis tools and services running on scalable computing architectures to extract useful information from big data sources. Cloud computing systems offer an actual support for meeting both the computational and data storage needs of big data mining and parallel machine learning applications. Complex data analysis tasks involve data- and compute-intensive algorithms that require large and efficient storage facilities together with high performance processors to get results in acceptable times. In this talk we introduce the topic and the main research issues in the area of cloud-based data analysis. We discuss how to make data mining services scalable and present the Data Mining Cloud Framework and Nubytics, designed for developing and executing distributed data analytics applications as workflows of services. In these frameworks data sets, analysis tools, data mining algorithms and knowledge models that are implemented as single services can be combined through a visual programming interface to be executed on Clouds. The main features of the programming interfaces are described and performance evaluation of big data analysis applications is discussed.