Professor Neil Gordon MBCS, Chair of the BCS ICT Ethics Specialist Group, explores the implications of Anthropic’s new philosophy that LLMs must understand the moral reasoning behind their behaviour.
Summary:
- Anthropic say LLMs must actually understand the principles behind their decision rather than being simply rules-based in order to be truly ethical
- Anthropic envisions ‘broadly’ safe and ethical LLMs which use context to make decisions, arguing this will make models safer and more resilient — but still imperfect
- It is important to be aware that by using LLMs we may be ‘importing’ embedded ethical frameworks that do not align with our societal or regulatory expectations
- LLMs must be evaluated carefully, taking into account the transparency, accountability and commercial goals of their creators
- Truly ethical AI can only be achieved by taking a continuous, active role in its development
Anthropic’s recent publication of Claude’s new constitution marks a notable shift in how AI companies are attempting to embed ethical reasoning into large language models.
Rather than relying solely on rule based guardrails, Anthropic say that models must actually understand the principles behind their behaviour: ‘AI models like Claude need to understand why we want them to behave in certain ways. If we want models to exercise good judgment across a wide range of novel situations, they need to be able to generalise — to apply broad principles rather than mechanically following specific rules'.
This is more than a technical refinement. It signals a shift from compliance to comprehension. If taken seriously, it could reshape how we evaluate, select and trust AI systems in professional, educational and public sector contexts.
Why this matters: the limits of rule following machines
The idea that AI should generalise ethical principles rather than follow rigid rules echoes longstanding concerns in both philosophy and science fiction. A 2021 Reith Lecture, Beneficial AI and a Future for Humans, highlighted this, alongside the danger that we cannot completely specify an objective in a way that a machine would understand that may not have unintended consequences. For example, asking an AI system to deacidify the oceans could lead to a solution which does so but kills humanity.
These anxieties are not new. Isaac Asimov’s iRobot stories explored these issues back in the 1940s, where even seemingly clear rules such as ‘A robot may not injure a human being…’ can interact and conflict, being interpreted in unexpected ways that lead to unanticipated outcomes. Anthropic’s approach implicitly acknowledges this: a model that merely follows rules is brittle. A model that has a context as to why those rules exist may be more resilient — but also potentially still unpredictable.
For you
Be part of something bigger, join BCS, The Chartered Institute for IT.
‘Broadly safe’ and ‘broadly ethical’: why the language matters
One striking feature of Anthropic’s framing is its emphasis on being ‘broadly safe’ and ‘broadly ethical.’ This phrasing is doing a lot of heavy lifting. It recognises that this isn’t an absolute thing, and that no AI system can be perfectly safe or universally ethical. But it also raises questions about thresholds, accountability and whose definition of ‘ethical’ is being used. The tech sector has been here before. Google’s once famous ‘don’t be evil’ motto was quietly downplayed from its code of conduct in 2018 (https://tinyurl.com/yc3kyaac). Corporate ethics statements are typically aspirational, potentially performative, and occasionally abandoned when inconvenient. So, when companies describe their AI systems as ‘broadly ethical’, we should ask:
• Broadly according to whom?
• Broadly according to what standard?
• Broadly for whose benefit?
The geopolitical dimension: whose ethics are we importing?
In my inaugural lecture last year, The Future of Education and Humanity in the Age of AI: What’s the Point of Us? (https://tinyurl.com/2k8n7csd), I raised a concern that is becoming increasingly urgent for the UK: our growing dependence on AI platforms developed, trained and governed in other countries.
These systems inevitably reflect the cultural norms, political priorities and commercial incentives of their creators and locations. Whether it is Anthropic in the US, Hangzhou Deepseek in China, or OpenAI with its complex governance structure, each model potentially carries embedded assumptions about:
• individual rights
• acceptable risk
• the balance between safety and innovation
• the role of the state versus the market
When UK institutions adopt these systems wholesale, we are not just adopting technology — we are importing ethical frameworks that may not align with our own societal values or regulatory expectations.
Can we trust commercial organisations to deliver ethical ai?
This brings us to the central question for any developer, ethics committee or professional body: how far can we trust commercial organisations to provide ethical AI platforms? Commercial AI development is shaped by:
- the company’s own values and aims
- shareholder expectations
- competitive pressure
- the need to scale rapidly
- the desire to avoid regulation
- national strategic interests
Even when companies articulate noble intentions, their incentives may not align with the public good, especially from a global perspective. Constitutions, safety layers and ethical guidelines are valuable — but they are typically self authored, self policed and subject to change without notice.
Towards a framework for choosing ethical AI
Professionals, educators and public sector organisations increasingly need a way to evaluate the ethical posture of AI systems. This should involve:
- Transparency — does the company publish its safety principles, training data policies and governance structures?
- Accountability — is there an independent oversight mechanism, and can users challenge or audit decisions made by the model?
- Cultural and geopolitical alignment — do the model’s values align with the users’ context of regulatory norms and societal expectations, or is there a risk of importing foreign political or ethical assumptions?
- Commercial context — how does the company make money? Do its incentives support long term safety, or short term growth?
- Adaptability and local control — can users customise or constrain the model to reflect local societal ethical perspectives? Is there a pathway for public sector or academic oversight?
This is not about finding a ‘perfectly ethical’ AI, but about making informed, context sensitive choices.
The role of professional ethics committees
Ethics committees or ethicists have a crucial role to play in shaping how organisations navigate this landscape. They can scrutinise the claims made by AI companies, develop specific guidance and encourage procurement processes that prioritise ethical alignment. Furthermore, they can support staff in understanding the risks and opportunities of different platforms. They can help shift the conversation from ‘which AI is the most powerful?’ to ‘Which AI is the most trustworthy for our context?’
Conclusion: a call for ethical agency
Anthropic’s constitutional approach is a promising development. It signals a move toward more principled, reflective AI systems. But it also highlights the limits of relying on corporate self regulation and imported ethical frameworks.
Moreover, for a given individual, whether for private use or in their professional capacity, we may not have a choice in which platform we use. Anthropic’s 84 page constitution provides a lot of detail, though much of that is potentially still context dependent. Asking Claude itself ‘Can you explain “Claude’s Constitution” then?’ results in the answers: ‘I should clarify - I don't have a formal document called "Claude's Constitution” that I can point you to’ and that ‘There's no silver bullet in AI safety. Even well-intentioned AI systems can cause harm through misuse, overreliance, or failures to understand context.
The question isn't whether AI can cause harm — it can — but how we build robust safeguards, maintain human oversight, and cultivate a culture of responsible deployment.’
If we want AI that is not only ‘broadly safe’ but meaningfully aligned with our values, we must take an active role in evaluating, selecting and shaping the systems we use. Ethical AI is not something we can buy off the shelf. It is something we must scrutinise and continually renegotiate in order to help shape the future development of the AI systems we intend to use.
Take it further
Interested in this and similar topics? Explore BCS' books and courses:
- Innovating ethically to drive business change
- BCS Foundation Certificate in the Ethical Build of AI
- Artificial Intelligence Foundation Pathway
- The Psychology of AI Decision Making: Unpacking the ethics, biases, and responsibilities of AI
- Getting Started with ChatGPT and AI Chatbots: An introduction to generative AI tools