This lesson is still being designed and assembled (Pre-Alpha version)

Version Control and Open Science Practices

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How is version control system relevant for biomedical research?

  • How to maintain history of contributions and contributors?

  • How to apply open science practices to work transparently and collaborate openly?

Objectives
  • Describe the importance of version control systems

  • Nudge the use of GitHub/GitLab for open collaboration

  • Share open science practices for transparent and ethical research

Version Control, Open Science and Identifiers to Cite Research Objects

[Add a biological context or case study]

Maintaining History through Version Control

Contrast in project history management. On the left - choosing between ambiguosly named files. On the right - picking between successive versions (from V1 to V6).

Version control allows tracking of history and go back to different versions as needed. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807

Practices and recommendations described in this lesson are applicable to all areas of biological research. What can be considered slightly different in computational projects is that every object required to carry out the research exist in digital form. Starting from research workflow, data, software, analysis process, resulting outcomes as well as how researchers involved in the project communicate with each other. This means that research objects can be organised and maintained without losing the provenance or missing knowledge of how each of these objects is connected in the context of your project.

Versioning Every Research Object

Management of changes or revisions to any type of information made in a file or project is called versioning. Version Control Systems are platform and technical tools that allow all changes made in a file or research object over time is recorded. Version Control Systems, or VCS, allows all collaborators to track history, review any changes, give appropriate credit to all contributors, track and fix errors when they appear and revert or go back to earlier versions.

Different VCS can be used through a program with web browser-based applications (such as Google Docs for documents) and more dynamically for code and all kinds of data through command-line tools (such as Git) and their integration into the graphical user interface (Visual Studio Code editor, Git-gui and gitkraken). The practice of versioning is particularly important to allow non-linear or branched development of different parts of the project, testing a new feature, debugging and error or reusing code from one project to different data by different contributors.

GitLab, GitHub, or BitBucket are online platforms that allow version-controlled projects online and allow multiple collaborators to participate. Different members can download a copy of the online repository (most recent version), make changes by adding their contributions locally on their computer and push the changes to GitLab/GitHub/BitBucket (a new version!) allowing others to build on the new development.

Read All you need to know about Git, GitHub & GitLab on Towards Data Science and version control in The Turing way for more details on workflow, technical details of using git and versioning large datasets.

Apply Open Science Best Practices

Open Science invites all researchers to share their work, data and research components openly so that others can read, reuse, reproduce, build upon and share them. Particularly in computational research and software development projects, open source practices are widely promoted. Unfortunately, making research components open doesn’t always mean that they can be easily discovered by everyone, can be reproduced and built upon by others or everyone will know how to use them. Applying open and inclusive principles to open science and reproducible research requires time, intention, resources and collaboration, which can be overwhelming for many (see Ten arguments against Open Science that you can win). However, by normalising the use of research best practices on a day-to-day basis, you can ensure that everyone has a chance to build habits around opening their work for others in the team, asking for regular feedback, getting attributed for their work and enjoying the process of collaboration.

Open doesn’t mean sharing everything, but making it ‘as open as possible and as closed as necessary’. Your research can still be reproducible without all parts necessarily being open. Research projects that use sensitive data should be more careful and follow research data management plans closely (discussed in the next chapter).

Important Reasons for Practicing Openness

Open Science in Research

  • Maintains transparency
  • Allows others to attribute your work fairly
  • Stops others from reinventing the wheel
  • Invites collaborators from all around the world
  • Makes your work easy to release to be cited by others

Image shows a person having internal debate about open vesus closed research. Open means new opportunities and inclusivity but closed maybe required to ensure data sensitivity or wrongly assumed for funding for novel work.

Open versus Closed Research. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807

Research Objects can be Released with Digital Object Identifiers (DOI)

DOIs are alphanumerical unique and persistent identifiers with a permanent web address for different research objects that can be cited by you and other researchers. Each pre-print and publication is published with a DOI, but independent of the paper, different research objects can be published online on servers that offer DOIs at any stage of your research. Some of these servers are Zenodo, FigShare, Data Dryad (for data), Open Grants (for grant proposals) and Open Science Framework (OSF) (for different components of an open research project). It allows you to show connections between different parts of research as well as cite different objects from your work independently.

When working on GitHub for instance, you can connect the project repository with Zenodo to get a DOI for your repository. The Citation File Format, then lets you provide citation metadata, for software or datasets, in plaintext files that are easy to read by both humans and machines. Read the Making Research Objects Citable chapter in The Turing Way Guide to Communication.

Every Little Step Counts towards Openness

Open Science can mean different things in different contexts: open data, open source code, open access publication, open scholarship, open hardware, open education, open notebook, citizen science and inclusive research. Expert open science practitioners might consider applying a combination of open science practices and make decisions in their work to maintain different kinds of openness. However, for the new starters in your team, open science can be as simple as ensuring that:

Image shows a woman slowly gaining trust and confidence in opening up her research project and benefitting from open collaboration

Small steps towards open science. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807

Encourage taking small steps towards openness as a responsibility towards research integrity in your team. There are many community-driven resources, guidance and opportunities in open science that provided structured support to learn about open science. For instance, The Turing Way chapter on Open Research and FOSTER Open Science provides an introduction to help researchers understand what open science is and why it is something you should care about. Another hands-on opportunity is provided by Open Life Science, which is a 16-week long training and mentoring for anyone in research interested in going through the programme to apply open science practices systematically in their research projects.

Conclusion

[What gaps have we filled in this section - add biological context]

Resources and References for Technical Details

Key Points

  • Version controlled repository help record different contributions and contributor information openly.

  • Open Science is an umbrella term that involve different practices for research in the context of different research objects.

  • Online Persistent Identifiers or Digital Object Identifiers are useful for releasing and citing different versions of research objects.