Technical progress or simple AI alchemy?

Patrick Krauss, professor at the Friedrich-Alexander-University of Erlangen-Nuremberg (FAU), published an article: “Great language models are zero-hit reasonerson Twitter. The document claims that prompts increase the accuracy of GPT-3.

Chain of Thought (CoT) prompting, a technique for eliciting complex multi-step reasoning through step-by-step example responses, has achieved peak performance in arithmetic and symbolic reasoning, according to the article. “We create large black boxes and test them with more or less meaningless sentences in order to increase their accuracy. Where is the scientific rigor? It’s AI alchemy! What about Explainable AI? said Patrick.

With 58 LLM papers published on in 2022 alone and the global NLP market expected to reach $35.1 billion by 2026, LLMs are one of the fastest growing research fields.

Chain of thought incentive

The idea was proposed in the article “Chain of Thought Prompting Elicits Reasoning in Large Language Models”. Researchers from the Google Brain team used a thought-prompting chain – a coherent series of intermediate reasoning steps that lead to the final answer to a problem, to improve the decision-making ability of the big model of speech. They demonstrated that large enough language models could generate thought chains if demonstrations of thought chain reasoning were provided in the examples for prompting a few moves.


To test their hypothesis, the researchers used 3 language models based on transformers: GPT-3 (Generative Pre-trained Transformer), PaLM (Pathways Language Model) and LaMDA (Language Model for Dialogue Applications). The researchers explored the chain of thought prompting various language patterns across several landmarks. The chain of thought incitement outperformed the standard incitement for various annotators and different exemplars.

TOC Zero-Shot

Researchers from the University of Tokyo and the Google Brain team have improved the rapid thought-chain method by introducing Zero-shot-COT (chain-of-thought). LLMs become decent reasoners with a simple prompt, according to the journal.


The results were demonstrated by comparing performance on two arithmetic reasoning benchmarks (MultiArith and GSM8K) on Zero-shot-CoT and baselines.

Alchemy of AI

Patrick’s tweet sparked a huge debate. “This is an empirical result, which enriches our understanding of these black boxes. Empiricism is a standard and well-established approach in science, and I find it surprising that it’s new to you. @Dambski further states that this discussion is subjective to what is considered the definition of understanding. Anything that increases the chances that the model correctly predicts its behavior for a given input increases understanding of that system, whether or not it can be explained,” said @Dambski Twitter handle.

Rolan Szabo, a Romanian machine learning consultant, gave another analogy: “From a theoretical point of view, I understand the disappointment. But from a pragmatic point of view, Github Copilot is writing the code for me today. -everywhere boring, although I don’t quite understand how he refers to it.

Many supported Patrick’s statement. Piotr Turek, OLX Group Head of Engineering, said: “Frankly, calling it engineering is offensive to engineers. It’s the alchemy of chaos”

Soma Dhavala, Senior Researcher at Wadhwani AI, said: Although we thought we had solved a problem, we either made it someone else’s problem or the problem resurfaced in a different avatar. Case in point: With DL, we don’t need feature engineering, that was the claim. Well yes, but we have to do architectural engineering.

Guillermo R Simari, Emeritus Professor of Logic for Computer Science and Artificial Intelligence, said: “I wouldn’t be entirely against this approach. My concern is: what will we have learned about the thought process at the end? Will I better understand the human mechanism? Or do I just have something “working”? Whatever that means…” To which, Patrick Krauss said that was exactly what he meant.

The discussion took a turn when Andreas K Maier, a professor at Friedrich-Alexander-University Erlangen-Nuremberg (FAU), asked if such large language models were available for public access so that one could actually observe what is happening in latent space during inference.

To this comment, Patrick said that the unavailability of LLMs is exactly the problem. “One problem is of course that some of these models are only available as APIs. Without access to the real system, it could become something like AI psychology,” Andreas added. Currently, Meta AI’s Open Pretrained Transformer (OPT-175B) is the largest open-access LLM.

Comments are closed.