OSCARS Postdoctoral Fellowship: Tracking and Sharing Data Provenance with RO-Crate in Lab Integrated Data (LabID-PROV)

EMBL - European Molecular Biology Laboratory

OSCARS Postdoctoral Fellowship: Tracking and Sharing Data Provenance with RO-Crate in Lab Integrated Data (LabID-PROV)

EMBL - European Molecular Biology Laboratory

Heidelberg, Germany

Closing date: 15 December 2024
Contract duration: 20 Months
Grading: Postdoc/Stipend
Reference number: HD02740

EMBL is Europe’s life sciences laboratory – an intergovernmental organisation with more than 110 independent research groups and service teams covering the spectrum of molecular biology. It operates across six sites in Heidelberg (headquarters), Barcelona, Cambridge, Grenoble, Hamburg and Rome. Our mission is to perform basic research in molecular biology; train scientists, students and visitors at all levels; offer vital services to scientists in the public and private sectors within the member states; develop new instruments and methods; and engage actively in technology transfer.

This postdoctoral position is funded by OSCARS, a Horizon Europe project that fosters the uptake of Open Science in Europe by consolidating the achievements of world-class European RIs in the ESFRI roadmap and beyond into lasting interdisciplinary FAIR data services and working practices across scientific disciplines and communities (https://oscars-project.eu/about-oscars). The successful candidate will be the key actor of the LabID-PROV project (https://www.oscars-project.eu/projects/labid-prov-tracking-and-sharing-data-provenance-ro-crate-lab-integrated-data).

Lab Integrated Data (LabID, https://gbcs.embl.de/labid), is a web-based integrated platform for research data management featuring sample and dataset management, an inventory management system and an electronic lab notebook. It is designed to help individual scientists, research groups and core facilities better manage, annotate and share their experiments, assays, samples and datasets actively according to FAIR principles. While in LabID processed data can already be stored and connected to its primary data, associated assays (e.g. sequencing, light or electron microscopy) and original samples, accurate modelling of both Workflow (WF) and Workflow Run (an object modelling the execution a WF and gathering input & outputs datasets as well as the execution metadata e.g. parameters, configurations…) is currently lacking. In this project, we proposed to extend the LabID data model to include these concepts, offering a unified application to manage derived data provenance independently of analysis procedure and platform, and providing a concrete solution to ensure the traceability of derived data. To this end, we will use and integrate with several EOSC-Life resources e.g. WorkflowHub, RO-Crate, Galaxy, Zenodo and Workflow Run RO-Crate profiles to streamline derived data import and export.

You will work together with the LabID development team (in particular to implement features on the server side and in the user interface) and with different actors of the Data Science Centre (https://www.embl.org/about/info/data-science-centre/).

Your role

  • Implement new features in the LabID Python command line interface (CLI) and in the server back end (Python Django) to enable (1) the import of workflow run results (e.g. Galaxy, Nextflow, Snakemake, CWL) and (2) the export of stored derived data as RO-Crate objects following the Workflow and Workflow Run RO-Crate profiles;
  • Implement use cases using both omics and imaging data, reflecting real-world scenarios, to demonstrate the integration of the new LabID capabilities into the existing Open Science landscape (submission/retrieval of WF to/from WorkflowHub, publication of RO-Crate objects to Zenodo…);
  • Generate online LabID tutorials demonstrating best practices in WF development and the FAIR dissemination of associated derived data with their provenance;
  • Organize a LabID-PROV workshop at the end of the project;
  • Join the relevant communities (e.g. RO-Crate, Nextflow, Snakemake, WorkflowHub, Galaxy, Zenodo) to (1) ensure the compatibility of the implemented solutions and (2) work in collaboration with the communities on missing features if needed;
  • Join relevant events (e.g. workshops, conferences) to promote and disseminate LabID-PROV;
  • Write final reports.

You have

  • PhD in e.g. Bioinformatics, Computational Biology or similar disciplines;
  • Strong commands in programming with Python & REST APIs;
  • Experience with Git and good software development practices (Unit Testing, CI/CD) in general;
  • Experience and strong interest in Scientific Data Management, FAIR concepts and Open Science in general;
  • Experience in omics and/or image data analysis using workflow management systems such as Galaxy, Snakemake, Nextflow or CWL;
  • Experience in writing workflows with e.g. Galaxy, Snakemake, Nextflow or CWL;
  • Strong interest in Open Source projects;
  • Ability to work in a team;
  • Good written and oral communication skills in English.

You might also have

  • Experience with Django Development;
  • Experience with web front end development with Vue.js;
  • Experience in submitting data to public repositories (e.g. EBI).

Why join us

We are Europe’s research laboratory for the life sciences – an intergovernmental organisation performing scientific research in disciplines including molecular biology, physics, chemistry and computer science. We are an international, innovative and interdisciplinary laboratory.

This is a great opportunity to develop expertise at the interface of data analysis and translational epigenetics, and to develop computational methods to advance our understanding of life. You will be part of a very interactive, collaborative and intercultural team.

EMBL is an inclusive, equal opportunity employer offering attractive conditions and benefits appropriate to an international research organization with a very collegial and family friendly working environment. We are committed to diversity and equality, and encourage women and underrepresented groups to apply. Competitive salary and social security benefits, financial support for relocation, a relaxed culture, professional development, an on-site nursery, canteen and other staff facilities make EMBL a great place to work.

Don’t meet every single requirement? We are dedicated to building a diverse, inclusive and authentic workplace, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply nevertheless.

What else you need to know

  • International applicants: We recruit internationally and successful candidates are offered visa exemptions. Read more on our page for international applicants;
  • Diversity and inclusion: We strongly believe that inclusive and diverse teams benefit from higher levels of innovation and creative thought. We encourage applications from women, LGBTQ+ and individuals from all nationalities;
  • EMBL is a signatory of DORA. Find out how we implement best practices in research assessment in our recruitment processes here;
  • Job location: This role is based in Heidelberg, Germany;
  • How to apply: To apply please submit a cover letter and a CV through our online system.

Apply Now

Don't forget to mention EuroScienceJobs when applying.

Share this Job

More Job Searches

Germany      Bioinformatics      Computing/Programming      International Organisations      Maths and Computing      On-site      Postdoc      Statistics      EMBL - European Molecular Biology Laboratory     

© EuroJobsites 2024