Getting the data for research is tedious, time-consuming and expensive. However, sometimes, the data that one might need is already available but it's under restrictions or legal protections.
This makes it difficult for researchers to make more timely and insightful studies because they would need to repeat processes which have already been done before unless they pay for the scientific literature.
So Carl Malamud, an American technologist and public domain advocate, in collaboration with bioinformatician Andrew Lynn from the Jawaharlal Nehru University, plans to mine a ton of scientific research papers to create a repository wherein researchers could crawl text and data to gather insights.
The unprecedented project is generating much excitement because it could, for the first time, open up vast swathes of the paywalled literature for easy computerized analysis.
Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and generate useful scientific hypotheses. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text.
(Image credit: Smita Sharma/Nature)