by Chris McGivern & SF Team, 1 April 2019
Dr Peter Murray-Rust is the pioneer of ContentMine, a Shuttleworth-funded project aiming to extract - or 'mine' - research data from scientific, technical and medical literature, and put it in the public domain for societal good. We caught up with Peter to reflect on his two-year Fellowship and ongoing connection to the Shuttleworth community…
Peter is a Doctor of Philosophy, Reader Emeritus in Molecular Informatics at the University of Cambridge and Senior Research Fellow Emeritus of Churchill College, Cambridge.
He has enjoyed a long and successful career as a chemist with positions at Stirling and Nottingham universities as lecturer and professor and has also worked for Glaxo Group Research in drug discovery. His specific interest is in semantic chemistry - structuring written texts, so they are readable by computers, and automating the analysis of scientific publications.
“My concern is that not all scientific information produced is processable by machines,” explains Peter. “The tragedy of the print era is that we still think of information as something always printed on paper and displayed on our screens.
“That’s OK if you are a sighted human who knows what you are talking about. But if you are blind, it’s a hell of a lot more complicated.
“My big dream is that machines can read, extract and give context to all scientific literature. I’m building a chemical semantic web - a Google for science, if you like - and I think it will revolutionise the way we use knowledge.
“People should be producing information that everyone can consume and use. Lots of organisations do this as it’s a legal requirement, but the scholarly publishing industry doesn’t care.”
Peter’s ire is well-directed. From a financial perspective, the scholarly publishing industry is in rude health. Worth around $25 billion, the vast majority of that value is split between a few powerful and influential private companies. Some of them own a more significant share of the market than Apple has in the technology sector.
But while company balance sheets continue to grow, this is not an industry any reasonable person would call healthy. These secretive monopolies hoard human knowledge. The vast majority of research is publicly-funded, and they invest big money to protect and increase that value by hiding potentially life-saving research behind expensive and exclusive paywalls. It is a power imbalance that ultimately stunts scientific progress.
The cost to society is high, and the Liberian Ebola outbreak of 2014-15 is a horrifying example. Research published in a 1982 paper stated the country should be considered part of the Ebola endemic zone and underlined the strong likelihood of an outbreak in the future. But because it was hidden behind a paywall, nobody knew about the coming crisis. Eleven thousand people lost their lives.
“The Liberian government picked up on this article after the outbreak,” explains Peter. “They were, quite rightly, outraged. We spend a huge amount on publicly-funded research, yet there is a suggestion 85% of medical research is wasted because it’s not validated, reused or replicated.
“That’s a first-order problem of $150 billion-odd going down the drain. But what’s worse is the opportunity cost for the human race. We’re not doing what we should be, and it causes terrible injustice. The Ebola outbreak is just one example.
“This is what I am fighting against. A few others see it happening, but academics are part of the problem, not the solution. The answer has to come from the outside. So, I’ve taken on the challenge of alerting people to what’s happening and try to come up with ideas to help the world change it. One of those ideas is ContentMine.”
ContentMine is the next step towards making Peter’s vision of contextually collecting and processing scientific information a reality. He has worked tirelessly to create the foundations for this dream over his long career. This infrastructure includes co-creating Chemical Markup Language, an alternative to Word for chemists, and a repository and browser for housing and displaying chemical information. Peter also established the Blue Obelisk group - a collective of volunteer chemists from around the world working together openly to make chemical data intelligible - during the early-to-mid-2000s.
However, it was Peter’s passion for open data and science that would eventually lead to the conceiving of ContentMine. While at Cambridge, Peter met future Shuttleworth Fellow Rufus Pollock, where they discussed open data and science at great length. He was invited to take a position on the board of Knowledge Forge, which would later become the Open Knowledge Foundation (OKF).
“OKF became the heartbeat of everything I did from that moment,” he says. “There were a lot of projects to do, mailing lists to get involved in, and it took me to meetings all over the world.
“It was at one of these meetings that someone told me the Shuttleworth Foundation might like what I was doing. I read a little about it and saw it was very good on the open stuff, so I put a proposal together.
“I made a video with the help of Jenny Molloy, who is a current Shuttleworth Fellow. She’s incredibly kind and modest and would never tell you she helped me. We put it in, and the Foundation liked it, but it didn’t get in. Looking back, I have no regrets because it really wasn’t ready.
“I didn’t think about it again until I saw Karien Bezuidenhout at a meeting in Geneva in 2013. We got talking, and she suggested I apply again. This time, I knew exactly what I wanted to do.”
ContentMine was a first-of-its-kind idea with intriguing potential. Science allows us to make sense of the world around us today, but it’s also a foundation for all we will learn tomorrow. Mining and extracting scientific facts from scholarly literature and presenting information in context means more people peering behind the paywall and making better, faster and more efficient use of new knowledge for the benefit of society.
Peter’s background as a passionate advocate for openness spoke for itself. The timing seemed right, too, with the ‘Hargreaves’ reform of UK copyrighting removing barriers to text and data mining for non-commercial research. Peter’s new proposal - complete with a now-customary presentation of soft toys explaining his ideas - won him a place on the Fellowship in 2014. The goal was to establish ContentMine and begin the liberation of 100 million facts from scholarly articles with free and open source software.
The first stage of Peter’s Fellowship focussed on building software. There were already existing alternatives, but they were highly complex and academic rather than distributable. As his goal was to open up scientific literature for everyone - from scholar to citizen - Peter felt the need to build something more accessible. It was a significant task.
“I had collaborators come and build the system with us,” says Peter of the early days in the Fellowship. “We decided to build from scratch and wrote something called Quick Scrape to scrape scientific literature - we got off to a good start.
“But it soon became obvious that we would have a decision to make: Do we build a service on our site and serve it to people? Or do we create software we can distribute to those who are using it on their laptops? It was a major problem all through the Fellowship.”
By the end of Peter’s two-year Fellowship, some good things were coming together. The software made it easy for anyone to mine data and give it context. Users could type in a search term to reveal an index of related papers, before linking the results to Wikipedia. And the tools could be used to extract data from sources outside of scholarly literature, such as public filings. Peter had also built a team of committed young scientists and developers with a shared vision of open science, and a few years down the line, this would lead to ContentMine starting its own fellowship programme.
A few development problems stunted progress and eventually prompted Peter to take over the software work himself. He happily admits this spread him thin, but the primary issue was persuading enough of an audience to use the software at scale. When your work is so far ahead of the rest, early adopters are hard to come by. Especially in a field as traditional as scholarly research.
Despite the new copyright legislation, academics felt it a risk to embrace the new freedoms offered by ContentMine. Peter and his team were incredibly careful to avoid copyright issues and only published open papers while using uncopyrightable facts and snippets from closed journals. Yet there was - and still is - a wariness of treading on the toes of the powerful and often-litigious giants of the industry.
“The resistance to the idea also meant I felt the need to spend a lot of time working on the political side of things,” he explains. “I felt it was important to push the advocacy thing - perhaps, to the exclusion of the tech side - and the second year became all about that. But we just couldn’t generate the market. It’s getting better now, but it’s still tough, to an extent.”
Our investment in Peter is still paying off to this day. His work has led to increased access to scholarly publications for researchers, and his continued engagement with young, talented individuals has resulted in a new wave of scientists committed to open practices. ContentMine’s fellowship programme is a perfect example.
“In general, our fellows have loved it,” smiles Peter. “It’s a group of six and the youngest was fifteen and from a Dutch school. These are the young people that will take these ideas forward.”
Despite constant pressure and pushback from the scholarly publishing industry, Peter is as loud as ever in championing open access publishing. And he is still relentlessly committed to solving this issue he rightly sees as an injustice.
“I’m trying to produce ways of helping the world to change it,” he explains. “Industry funds academia, and academia pumps the knowledge industry wants back into the system. It’s a vicious circle where everyone is in it for their interests.
“I’m a great believer that technology is a servant of liberation,” he continues. “And I write liberation software, dedicated to liberty and justice. There’s no point in doing ContentMine if you aren’t fighting for freedom. Otherwise, you end up being a worthy but unmotivated nonprofit that does service stuff, or ends up being bought by some megacorp.”
But Peter isn’t just fighting for freedom. With concerns over the increasingly worrying state of the environment, he has recently sketched out a new idea to help the fight for survival. The plan is to create an Open Knowledge Base for publications on Climate Change to provide researchers, institutions and citizens with useful analytic tools and technology to give deeper insights and more accurate, timely research.
“We are under a huge threat; scientists know this,” he explains. “They know how to extrapolate. Today we know curlew numbers have dropped by 50% in the last ten years. You only have to plot ten years ahead, and that means no curlews.
“My vision now is that we can still capture the whole world’s knowledge and use it for the betterment of the planet…and all the organisms that live in it.”
Peter - or PMR as he is affectionately known - is an active, engaged and well-loved member of the Shuttleworth community. His generosity of spirit and wealth of knowledge from years on the front lines of open activism is an inspiration and help to many.
“There are very few communities in the world that operate like this,” says Peter. “If you look at what we have now - over fifty of us, including the fellows, alumni and the team - that community is very strong and very precious. I trust everybody.
“You can also see that most fellows always keep going in some way. It’s not just a nice two or three years and then move on. Most will continue, if not in the precise detail of what they were doing, but certainly in the same spirit. And there are no failures amongst the fellows. Not one. For some, their time hadn’t come. I’m a good example - my time came after my Fellowship.
“I’ve seen the Foundation change over the last four years by necessity. The world has thrown up larger and more pressing challenges, and it does a brilliant job in identifying the key points to attack before it becomes critical. Take fact-checking: I didn’t understand the criticality of Peter Cunliffe Jones’s work [Africa Check], but we all know about it now. The Foundation saw it ahead of the need.”
“The fellowship has totally changed my life and is one of the biggest things in my life. It’s my guiding light and gives me an environment that fuels me and I can pay back into. The one and only frustration I have is that more people should know what it does.”
Connect with Peter [Discover ContentMine](http://contentmine.org/