AI reveals new scientific knowledge hidden in old research papers

Can machine learning predict future scientific discoveries, or uncover those we’ve missed?

Sep 20, 2019 |
3 min read

Scientific research is no friend to the casual reader. Those with the brainpower required to split the atom and uncover dark matter are not great with pen and paper. In short, be glad Nobel winning scientists don’t write paperbacks.

The upshot of this is a trove of hard to digest research material dating back to the first time someone said: “hang on, this is great, let me grab a quill/pen/rock…”

With the odd diligent exceptions, scientific research tends to sit in dark, dusty folders. That was until July this year when a study published in Nature magazine revealed that science may have a new ally.

The new report highlights a fascinating project involving an open-source AI called Word2Vec. In the project, scientists handfed the program abstract data before letting it loose on archives of scientific data.

The results were striking for several reasons. First, the AI had no issue chewing through the wordy written scientific reports. Second, within a very short period, the AI was able to data-mine previously undiscovered discoveries.

In a later study, the team gave the program historical, scientific papers from which it was able to make predictions before their time.

With the gift of hindsight, the team confirmed the program was able to link discoveries, which didn’t come to light until much later. Meaning, the AI was able to collate facts in such a way that it was able to predict breakthroughs, which in real terms, took science years to complete.

Words and numbers

Before making its findings, the team gave Word2Vec, access to several scientific abstracts. Later, they allowed to run amok through a wealth of scientific papers at the Lawrence Berkeley National Laboratory.

Among the breakthroughs, were potential unseen discoveries in the field of thermoelectric materials. This occurred despite Word2Vec having no prior knowledge of thermoelectrics.

See Also

AI, children, and the rise of the robots: the impact of AI on 21st century parenting

According to one of the researchers, Anubhav Jain, the AI works by using word association and mathematics. This unique combination allows it to connect logical dots between different scientific studies.

To prime the algorithm, the team fed it over 3.3 million sources. This strict diet allowed the AI to develop a 500,000 words vocabulary.

Putting that in perspective, a 2013 study by the University of reading estimated the average UK adult vocabulary at approximately 10,000 words.

From baby to DNA

The system works by applying what it knows about language and making connections between similar words and groupings. It does this by applying basic mathematical equations to word groupings and then looks for patterns.

Take the words baby and DNA for example, Word2Vec assigns those words a number, and then it will look at other words that connect to those words.

It’s a more straightforward idea than it sounds, common words which relate to the word baby might be words like small, young, and offspring. Taking these words into account, you can see how easy it would be for the AI to link the word baby to DNA.

The more data fed into the AI, the more connections it can make. Because the system is always learning, its vocabulary continues to grow. What’s most fascinating about the platform, is how the algorithm makes connections we humans have missed. It is this relentless, ceaseless energy that makes the Word2Vec platform perfect for this kind of research.

Alongside thermoelectric materials, Word2Vec also managed to understand and link elements on the periodic table. It also grasped a basic understanding of the basic structures of molecules.

The potential of the AI is obvious, and because it’s autonomous, it requires no guidance or supervision. The team is hopeful that given the opportunity Word2Vec, will uncover more lost connections. The hope is that one day, the process will give us a more in-depth insight into our world.

According to the lead researcher Vahe Tshitoyan, the system could even sniff out forgotten cures, treatments and medicines from medical records and data.

If you’d like to learn more about the open-source Word2Vec platform, watch the video below.

 

More stories: