Why Meta’s large language model does not work for researchers

Were you unable to attend Transform 2022? Check out all of the summit sessions in our on-demand library now! Watch here.

When Alan Turing came up with the Turing Test in 1950, it was a test of a machine’s ability to exhibit intelligent behavior indistinguishable from that of a human. Turing proposed that a computer can be said to possess artificial intelligence (AI) if it can create human-like responses to questions.

Thanks to large language models, we’re now at the point where computers can write text on just about any subject we give them — and for the most part, it’s very convincing and human-like.

Tell it to write a sentence on, “Why does Elon Musk like to knit?” and what it outputs is arguably as good as what any human could write:

Some possible reasons why Elon Musk might enjoy knitting could include the fact that it is a relaxing and meditative activity that can help to clear one's mind, and it also allows for a great deal of creativity and self-expression.
Additionally, knitting can be a very social activity, and Elon Musk may enjoy the opportunity to chat and connect with other knitters.

[Source: OpenAI Playground using text-davinci-002 model]

Summarizing complex text

Examples like this are fun, but the bigger value proposition of using large language models is less about writing wacky prose and more about the summarization of complex text. These use cases are exciting across industries. For instance, AI can distill information about potential prospects for sales intelligence purposes, or it can summarize investment documents in finance.


MetaBeat 2022

MetaBeat will bring together thought leaders to give guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

Register Here

However, what I’m particularly interested in is summarizing scientific papers for researchers.

The problem is there is an overload of research being published around the world. More than 4.2 million academic research papers were published in 2020. Even within specialized fields of research, there are hundreds of papers published every year — how can a researcher keep on top of it all while pursuing their own research? A paper’s abstract only hints at the research detail within.

When Meta recently open-sourced its language model, OPT-175B, it sounded promising for academic researchers. It’s said to offer better performance than OpenAI’s GPT-3 and uses just 15% of GPT-3’s compute resources to train it.

Putting models to the test

So how does Meta perform on scientific research?

We tested it first on a technical press release, and this was the summary we received:

​​Forward Looking Statements, which speak only as of the date of this press release. Artelo undertakes no obligation to publicly update any forward-looking statement, whether as a result of new information, future events or otherwise.
Investor Relations Contact:
Westwicke Partners on behalf of Artelo Biosciences
Mike Piccinino, CFA

It’s not a great start. The model focuses on the investor legalese at the end of the press release, calculating that it is the most important information reader’s need to know. You might argue that it is important, but it’s not what we’re looking for. The investor contact isn’t even correct, it was fabricated by the model!

Next, we tried it on a paper from the Journal of Electronic Materials:

Journal of ELECTRONIC MATERIALS, Vol. 38, No. 7, 2009 DOI: 10.1007/s11664-009-0701-y (cid:1) 2009 TMS Special Issue Paper thermally. The samples were then pulverized and analyzed by XRD, TGA, and XPS. The XRD patterns showed that CaCo2O4+d crystallizes as a calcium-ferrite-type structure, which consists of a corner- and edge- shared CoO6 octahedron network including one-dimensional double chains. The CaCo2O4+d phase has a finite density of states at the Fermi level. The XPS results showed that CaCo2O4+d has a composition of CaCo2O4+d = (Ca2CoO3)0.62CoO2. The XPS results also showed that CaCo2O4+d has

[Source: Meta OPT-175B]

At first glance, it appears to have important information, but there’s clearly superfluous information such as the details of the paper that aren’t important to the summary, and I’d more accurately describe the result as paraphrasing a part of the text rather than summarizing all aspects of it.

Small-scale models outperform

So how does a smaller language model compare? Through experience in the field, we designed our Iris.ai IPG model to have just 25 million parameters — as opposed to 175 billion from Meta and OpenAI — but trained it on millions of domain-specific scientific articles. Our research has shown that this model performs very differently on the same paper:

Metallic temperature dependence of the seebeck coefficient s with a large thermoelectric power (s=151lv/kat387k) has a finite density of states at the fermi level. In this paper, we report the crystal structure and physical properties of caco2o4+d. We find a new compound caco2o4+d, which exhibits a large thermoelectric power, even though it has a finite density of states at the fermi level. Motivated by the simple guiding principle mentioned previously, we searched for new phases thermoelectric properties related as a thermoelectric material applicable to high-temperature use.

[Source: Iris.ai IPG]

You can see the sentence structure is slightly more simplistic than a large language model, but the information is much more relevant. What’s more, the computational costs to generate that news article summary is less than $0.23. To do the same on OPT-175 would cost about $180.

The container ships of AI models

You’d assume that large language models backed with enormous computational power, such as OPT-175B would be able to process the same information faster and to a higher quality. But where the model falls down is in specific domain knowledge. It doesn’t understand the structure of a research paper, it doesn’t know what information is important, and it doesn’t understand chemical formulas. It’s not the model’s fault — it simply hasn’t been trained on this information.

The solution, therefore, is to just train the GPT model on materials papers, right?

To some extent, yes. If we can train a GPT model on materials papers, then it’ll do a good job of summarizing them, but large language models are — by their nature — large. They are the proverbial container ships of AI models — it’s very difficult to change their direction. This means to evolve the model with reinforcement learning needs hundreds of thousands of materials papers. And this is a problem — this volume of papers simply doesn’t exist to train the model. Yes, data can be fabricated (as it often is in AI), but this reduces the quality of the outputs — GPT’s strength comes from the variety of data it’s trained on.

Revolutionizing the ‘how’

This is why smaller language models work better. Natural language processing (NLP) has been around for years, and although GPT models have hit the headlines, the sophistication of smaller NLP models is improving all the time.

After all, a model trained on 175 billion parameters is always going to be difficult to handle, but a model using 30 to 40 million parameters is much more maneuverable for domain-specific text. The additional benefit is that it will use less computational power, so it costs a lot less to run, too.

From a scientific research point of view, which is what interests me most, AI is going to accelerate the potential for researchers — both in academia and in industry. The current pace of publishing produces an inaccessible amount of research, which drains academics’ time and companies’ resources.

The way we designed Iris.ai’s IPG model reflects my belief that certain models provide the opportunity not just to revolutionize what we study or how quickly we study it, but also how we approach different disciplines of scientific research as a whole. They give talented minds significantly more time and resources to collaborate and generate value.

This potential for every researcher to harness the world’s research drives me forward.

Victor Botev is the CTO at Iris AI.


Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers