Why doesn't AI work very well and what can we do about it?

Despite billions of dollars invested in ever-larger models, problems with AI hallucinations persist. So what's going wrong, and is there a better path forward? Professor Artur d'Avila Garcez of City St George's, University of London, explores.

Summary:

Current LLMs struggle with navigating novel situations because they lack 'formal reasoning'
Neural networks excel at pattern recognition and learning from messy data while traditional, reliable symbolic systems excel at logical reasoning, handling edge cases and working with limited data
Combining these systems will create reasoning systems that are reliable, efficient, explainable and safe

If you've used ChatGPT or similar AI tools, you've probably experienced amazement and frustration. One moment it's writing elegant code or summarising complex documents, the next it’s citing research papers that don't exist or getting simple logic puzzles wrong. Three years after ChatGPT's release, the hallucinations haven't gone away. There's growing evidence with the release of GPT5 that scaling up isn't delivering the required reliability.

The problem with ‘scale is all you need’

Current AI (particularly large language models (LLMs)) works by predicting the next word in a sequence, having been trained on vast data to learn statistical patterns about the words that tend to follow another word. This approach has produced impressive results, but it has fundamental limitations.

Think of it as learning a language by listening to conversations without ever understanding the rules of grammar or the meaning of words. You might become good at producing plausible sentences, but you'd struggle to handle situations you haven't encountered before.

This is why LLMs can write a convincing paragraph about quantum physics but fail at simple arithmetic problems if the numbers are different from their training data. Small changes in input can produce diverging results due to the inevitable accumulation of errors in neural network calculations. A small change in how a problem is worded can affect performance dramatically.

The AI industry's solution has been to throw more resources at the problem, but this approach has serious obstacles: it requires unimaginable amounts of data, brings very high energy costs and raises pressing questions of copyright violation. And it's not clear it's working anymore. Improvements are plateauing despite escalating costs.

The missing piece: formal reasoning

To understand the alternative, consider how humans think. We do pattern-matching from experience, but we know how to turn those patterns into abstract rules and concepts. When we learn that ‘if A is larger than B, and B is larger than C, then A must be larger than C’, we can apply this rule to any objects labelled A, B and C (block towers, numbers, planets…) — even ones we've never seen before. We've learned an abstract rule, not just to interpolate training examples.

Traditional AI systems from the 1980s used explicit rules and logic — so-called ‘symbolic AI.’ These systems were reliable and explainable but inflexible and required extensive manual programming. Modern neural networks are the opposite: flexible and good at learning from data, but unreliable and inscrutable. What if we could combine both approaches?

What is neurosymbolic AI?

Neurosymbolic AI offers a different approach based on the ‘neurosymbolic cycle’: a continuous loop of learning from data, extracting symbolic knowledge from trained networks, reasoning about what's been learned and consolidating that knowledge back into more compact networks.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

Here's how it works in practice. Knowledge is extracted from a partially trained network by using it as an oracle to produce a simpler network or by deriving logical rules from its behaviour. These extracted rules can be examined, verified, corrected if needed and reused across different problems.

Consider the simple example of teaching AI to understand the rule of transitivity, as in the case of the objects A, B and C. The neurosymbolic approach to learning ‘larger than’ in, for example, computer vision would share this knowledge across different domains, from towers of blocks to arithmetic magnitudes. The general rule structure is the same. Once the system extracts the logical rule as a description of its behaviour, the rule applies correctly to any three magnitudes regardless of whether they appeared in the training data. This is extrapolation, not just interpolation.

How neurosymbolic AI solves problems

The neurosymbolic approach addresses several critical problems:

Reliability: reasoning using symbolic rules extracted from a trained network applies consistently across any number of steps, avoiding the accumulation of errors seen in continuous neural network calculations. This means more trustworthy AI systems that don't suddenly fail when asked slightly different questions.
Efficiency: instead of requiring ever more training data and parameters, successful combination of data and knowledge enables network compression rather than scaling up. Knowledge reuse across tasks requires less data over time. The recipe is ‘learn a little, reason a little, repeat’ — the opposite of training massive models from scratch.
Explainability: symbolic descriptions enable intervention with what-if questions. Given a rule extracted from a network, domain experts can ask: what if this condition was false? What minimal change would alter the outcome? This transparency is crucial for high-stakes applications like medical diagnosis or financial decisions.
Safety: when AI systems can act autonomously, guardrails are needed. Neurosymbolic AI can achieve safety via verifiable descriptions of network modules, imposing requirements specified in logic rather than hoping that post-hoc testing or human feedback will catch every bug.

The path forward

We're already seeing hints of the neurosymbolic approach in Google's AlphaGeometry system combining neural networks with discrete search and in DeepSeek’s use of model compression. Amazon’s latest formal methods approach promises to combine LLMs and interactive theorem-proving using neurosymbolic AI. But these are just first steps.

The fundamental insight is that different types of problems require different types of computation. Neural networks excel at pattern recognition and learning from messy data. Symbolic systems excel at logical reasoning, handling edge cases and working with limited data. Rather than treating these as competing paradigms, neurosymbolic AI treats them as complementary tools in a unified framework.

The accumulation of errors in continuous learning systems needs to be controlled through symbolic manipulation. The combination of the two approaches reconciles what computer scientist Leslie Valiant described as the statistical nature of learning and the logical nature of reasoning. That’s what’s at the centre of neurosymbolic AI research.

Conclusion

The current AI plateau doesn't mean we've reached the limits of AI. It means we've reached the limits of a particular approach. The scale-based approach to AI has failed to achieve artificial general intelligence despite vast financial investment. Now, we need systems that can learn from fewer examples, reason reliably, handle novelty and reuse knowledge across domains.

Neurosymbolic AI offers a principled path forward. By having a better understanding of AI capabilities and leveraging both learning and reasoning, technological advancements can address the challenges of data efficiency, reliability and trust. Rather than simply making neural networks bigger, we need to combine the flexibility of learning with the reliability of logical reasoning.

The third wave of AI will be neurosymbolic. It won't be built solely on scale, but on sound integration of learning and reasoning afforded by neurosymbolic AI theory. For developers, project managers and technical professionals, understanding this shift will be crucial as we move towards AI systems with decentralised agents that are collectively impressive, but also controllable and trustworthy.