Platform as a service integration for scientific computing using DIRAC

TítuloPlatform as a service integration for scientific computing using DIRAC
Autor/aVíctor Fernández Albor
DirectoresTomás Fernández Pena
TipoTese doutoral
Data de lectura17/07/2015
Lugar de lecturaUniversidade de Santiago de Compostela
Doutorado Doutorado internacional
AbstractThe volume of data generated by scientific activities is continuosly increasing, pushing the researchers to demand more computing power every day. As a consequece, there is a clear need of High Performance Computing (HPC) services to facilitate the use of computing infraestructure distributed on different geographical and administrative locations. These services should provide the researchers with a seamless and secure access to a wide range of resources, taking advantage of the fact that nowadays universities and research centers are interconnected with high-speed networks. University resources range from small teaching computing rooms to medium-sized clusters of computers used by research groups. Grid and Cloud technologies are optimal to allow the use of this heterogeneous spectrum of computing resources, providing therefore a potentially very big computing power with possible resource relocation between different geographical sites. The objective of this thesis is to adapt the DIRAC distributed computing software, developed originally for the CERN LHCb experiment, to be used by several user communities with cloud and big data technologies. This environment allows the access to centralized software repositories which provide the necessary software to run scientific computer simulations in clouds in a scalable way, with the possibility to use both dedicated and non-dedicated resources. In this sense, the platform should be tested for scientific computing. This work begins with a research to find the requirements and then proceeds to the basic integration process. Afterwards, our solution is optimized for the scientific software used in clouds, tunning the virtualized environments. For this purpose, it is necessary to perform a statistical study as close as possible to the production environments to identify and create the appropriate infrastructure, avoiding the loss of performance in resources. The next step is to use virtualized technologies, adapting the new architecture, to create systems that would enable the execution of jobs requiring large amounts of data in the field of big data in a distributed way.