Postdoc: Convergence of HPC And BigData Infrastructures: Case Study on Large Graph Drawing Algorithms

Postdoc: Convergence of HPC And BigData Infrastructures: Case Study on Large Graph Drawing Algorithms

University of Bordeaux

Bordeaux, France

Description of the project, activities and work context

Nowadays, networks (weighted graphs) are widely used for data analysis. The enthusiasm for this method of analysis is due to the fact that graph visualization facilitates the analysis of existing relationships between entities (persons, web pages, ...). These relationships can be automatically computed from the data, for instance, by computing a correlation matrix between entities and then thresholding it. These relationships can also evolve over time giving rise to dynamic-graphs or timestamped-graphs. By abstracting from the type of data and focusing only on relationships, graph visualization offers a remarkable tool for analysis of heterogeneous and dynamic Big Data.

In graph-based analysis methods, a first step is to lay out (or draw) the graph. This step algorithmically assigns a position to entities and relationships. The algorithmic complexity of the methods used is in the general case of the order of O(n^3). They cannot therefore be used on real data sets. For instance, in territorial surveillance tools to guarantee internal security the graphs have often hundred thousand of entities and millions of relationships. Recent work has shown that by combining "Barnes and Hut" or "fast multipole" (FMM) resolution methods and a multi-scale decomposition of graphs, it is possible to reduce this complexity to O(nlog(n)) while maintaining the properties of the original algorithm. Algorithms with O(nlog(n)) complexity can be efficiently launched on BigData infrastructures. Therefore, it is theoretically possible to create methods that efficiently lay out graphs on these architectures.

At the moment, only heuristics exist to lay out graphs on big data infrastructure and there is no method guaranteeing the same results as the original algorithms. Using of the resolution methods developed for HPC infrastructures on Big Data infrastructures remains an unresolved problem. The success the adaptation of these HPC resolution methods will allow to device graph layout algorithms which can be integrated into Big Data analysis ecosystems. This type of algorithm and study find their place in areas such as the observation and monitoring of social networks.

  • The first objective of the post-doctoral fellowship will be to write an HPC version of the multi-scale mapping algorithm O(nlog(n)) based on an FMM. To do this, the candidate will implement the multi-scale algorithm above the FMM HPC library ScalFMMM developed in the team HiePACS. This algorithm will be evaluated on our HPC infrastructure PLAFRIM and will lead to the writing of a report presenting this first distributed parallel implementation of this graph drawing algorithm. The data will then be transferred to the Big Data oriented infrastructure LSD to ensure visualization. This first objective is mostly based on distributed algorithms and HPC;
  • The second objective of the post-doctoral fellowship will be to port ScalFMM to make it work on the Big Data infrastructure LSD. This is mostly about making HPC - Big Data infrastructures converge, and it will allow us to compare traditional HPC software (StarPU, MPI) and Big Data ones (Spark, YARN) on a Big Data infrastructure. This comparison will result in the writing of a second report;
  • The last objective will be to run the O(nlog(n)) distributed multi-scale parallel graph drawing algorithm, carried out for the first objective, directly on the Big Data oriented machine LSD. It will be done by taking advantage of the convergence effort achieved for the second objective. Finally, we will have a powerful graph drawing algorithm directly usable with the existing visualization on the LSD infrastructure. Having all the algorithms running on the Big Data infrastructure will prevent the costly transfer of a significant amount of data between an HPC machine and a Big Data machine. This work will lead to the writing of a third report.

Apply Now

Don't forget to mention EuroScienceJobs when applying.

Share this Job

More Job Searches

France     Academic     Data Science     Maths and Computing     Postdoc     University of Bordeaux    

© EuroJobsites 2019

EuroJobsites is a UK registered company number: 4694396 VAT number: GB 880 9055 04

Registered address: EuroJobsites Ltd, Unit 8, Kingsmill Business Park, Kingston Upon Thames, London, KT1 3GZ, United Kingdom

Newsletter | Recruit | Advertise | Privacy | Contact Us

© EuroJobsites 2019

EuroJobsites is a UK registered company number: 4694396 VAT number: GB 880 9055 04

Registered address: EuroJobsites Ltd, Unit 8, Kingsmill Business Park, Kingston Upon Thames, London, KT1 3GZ, United Kingdom

This website uses cookies to make your experience better. Continued use of this website means you accept our cookie policy.  Accept Cookies