Serverless Computing for Data Analytics

Software engineer offer

 

Background  The recently funded H2020 CloudButton project aims to democratize big data by overly simplifying its programming model with the help of serverless technologies.The core idea is to tap into stateless functions to enable radically-simpler, more user-friendly data processing systems. Average users of the cloud do not want to spend hours understanding complex analytics stacks (e.g., Spark, Yarn, or Ignite), and to struggle with the choice of instance types, cluster sizes, etc. What they want is just a simple interface to execute their optimized, single-machine code in parallel. CloudButton is the technological response to this emerging need. To demonstrate impact, the project targets two strategic settings with large data volumes and diverse analytics requirements: bioinformatics (genomics, metabolomics) and geospatial data (LiDAR, satellital).

 


Objectives The main objective of this position is to demonstrate the usability of the CloudButton software stack to mine large data sets. These data sets will be of three types: benchmarks (such as the AMPLab big data benchmark), open data sets (e.g., Common Crawl), and data available from the CloudButton partner (such as the library of images from EMBL). For each of these use cases, the engineer will implement demonstrators that underline the ability of the software stack developed in CloudButton to extract meaningful information from large data volumes using a serverless infrastructure. A base example is computing the Pagerank distribution over Common Crawl, a publicly available dataset of webpages. Those implementations will be compared against the state of the art, e.g., Spark and Hadoop, as well as the reference architectures from AWSlabs.

 


Start date As soon as possible, for a duration of 12 months. Accepting applications now, will remain open until filled.

 


To Apply Required skills and background:

  • MSc in Computer Sciences
  • Background in distributed systems / database / algorithms
  • Knowledge in object-oriented programming, shell scripting and cloud computing
  • Good developer and experimenter

 

Please provide:

  • a full curriculum vitæ
  • a cover letter stating your motivation and fit for this position

 


Contact Pierre Sutra