Caleb Kibet is a bioinformatician and Mozilla Fellow advocating for open science. He is based in Nairobi, Kenya.


A focus on research data management

Open Science has gained popularity among researchers in many regions of the world. But scepticism abounds among African researchers, and research remains generally closed in Africa. Why is this? As with any new technology or process, there is resistance — especially when it seems imposed and foreign, and with no clear benefits to "us." As much as we would want to say science is an equalising factor, I am not blind to reality — African researchers fear negative consequences of open science, and rightfully so. There is a feeling that there is a lot to lose, and many barriers exist that hinder the adoption of open science practices.

Let's take the push for open access, for example. It offers the most benefit to a resource-constrained environment, but also has a prohibitive cost to the authors. For a student with little or no research funding, publishing open access is just a dream: They cannot afford the article processing charge. For researchers with limited funding, investing in Data Management infrastructure is a luxury, and the cost of preparing data for sharing and archiving is prohibitive. The incentives to package data to be reusable are non-existent to these individuals. So I ask, what does open science offer in a resource-constrained setting? And how can all these barriers be eliminated?

Some of the questions and realities above are unpopular, and may even be contestable as untrue. But hear me out. The earlier we accept some of these realities, the better we can ease the adoption of open science practices in all settings.

Let us use data management to highlight some of the issues. Data management is indispensable. Faithful stewardship of research data enhances their value by making it easy to reuse, which is how science makes strides forward — by allowing other researchers to validate results, and build upon available data. It also helps researchers avoid duplication — they don't end up collecting data that are already available. Thirdly, proper data management prevents data loss — it ensures the accuracy, reliability and integrity of your data.

But, data management is expensive. It requires expertise, infrastructure, and investment in data cleaning, organisation, processing, and archiving. Data management is not a single individual's affair, and for an individual, it's too costly to achieve. How then can resource-constrained researchers be encouraged and supported to better manage their data — for their benefit, mostly, but also for the scientific community? We all agree that in the end, data sharing is for the benefit of others, which means we should all come together — the community, funders, open science advocates, and institutions — to support the process.

The main barrier to proper data management is a lack of awareness, lack of know-how, and lack of resources. The data management workflow starts with the presence of a policy or a framework under which researchers can work. Institutions formulate strategies, and individuals can then plugin and implement them in their work. But when the majority of the institutions do not have Research Data Management policies, the researcher has nothing to guide their process.

Further, most institutions do not even have research data management as a priority area. And in instances where it is a priority, many still do not know where to start. Most of the templates or policies available are from well-established institutions able to provide infrastructure support — they support the researchers through the whole data life-cycle. Meanwhile, in a majority of African academic and research institutions Data management is still managed at the project level, and mostly by the researchers or students.

Efficient research data management requires a policy, a data management plan, meta-data standards, and infrastructure for data storage and archiving. For successful research data management, the researchers must be trained and supported through the whole life cycle. There is a need for a framework that not only stipulates what to do, but also provides help with how, provides a training curriculum, guidelines on data storage and examples of implementation. Ultimately, data management should be guided by the discipline, especially with regards to data management plan and storage infrastructure.

As a bioinformatician with a keen interest in genomics, I will use Genomic Data management as an example in my framework. I believe the framework is adaptable to other disciplines. Limiting the framework to genomic data and associated meta-data gives it focus and an umbrella under which it will operate and actualise.

There are unique challenges for storing genomic data. For example, it is not feasible or practical to store intermediate data, but a pipeline that generates data through the whole workflow would ensure complete reproducibility of research. Also, there exist servers which store published data, a publication requirement. Archiving, therefore, is not the challenge. The challenge is how to manage and share the data when research is ongoing. Here the focus is on sharing data internally within the institution and providing the metadata externally — a foundation for data sharing.

These are my thoughts on research data management and the adoption of open science practices. My vision is a framework that supports the whole research data life-cycle to ensure data FAIRness. But the question remains: what does open science mean for resource-constrained settings?

“Your granary will never be filled by your neighbour.” We need to have uncomfortable conversations and ask candid questions about open science: Why should we care about it? Why should we adopt it, what does it mean for us? Then, we must develop a framework that works for Africa.

The benefits of open science abound and are appreciated and well-motivated, but the pathway towards open science is not. The whole framework involves awareness, training and then practice. Practice requires a clear structure, one designed for access and ease of adoption. We need a framework that reduces the barriers to entry and provides a clear pathway to implementation.