Meletius Igbokwe MBCS, Cloud and Infrastructure Security Manager at Lima, explores the hidden exchange behind free AI tools, where convenience comes at the cost of your information — and why understanding that balance is critical.
There's a peculiar moment the first time you use a free artificial intelligence tool. You type a question, hesitate for half a second, then press enter. The response arrives with astonishing speed: clear, confident, more polished than you expected. It feels like finding money on the pavement. Except you haven't found anything. You've made a trade.
In September 2025, one major AI provider quietly changed its data retention policy. Conversations that were once deleted after 30 days would now, unless users opted out, be stored for five years. Every question you asked. Every draft you refined. Every strategic idea you explored. Five years of material available for analysis and model improvement. The company explained that longer retention helps build better AI systems. That's true. What most people don't realise is the scale, permanence and reach of this captured information.
Millions contributing without knowing
Surveys in 2024 showed that 78% of organisations had adopted AI in some capacity. That means millions of people across education, healthcare, technology, finance and the public sector now rely on these tools daily: teachers preparing lesson plans; analysts exploring datasets; engineers sketching ideas; solicitors drafting correspondence.
Most don't know where their information goes, who can access it or how long it stays there. They think they're simply using a tool, but they’re also building it. Every interaction contributes to the system's improvement. Most people are unaware of this invisible contribution, not because they are careless, but because the trade is rarely made explicit.
How these systems actually learn
When you submit information to a free AI tool, you're not interacting with traditional software. You're querying a large language model (LLM); a statistical engine trained on billions of words to predict patterns and generate coherent text.
Think of it like reading thousands of mystery novels until you instinctively know the detective will find a crucial clue around page 200. An LLM does something similar, except it's read billions of documents, and its 'instinct' is pure mathematics. The model doesn't store all those documents. Instead, it extracts patterns and structures, encoding them into vast networks called parameters. Modern models contain hundreds of billions of these parameters.
Here's the crucial bit: these models don't stop learning once deployed. Consumer versions continue improving through new interactions.
Your question about debugging code or interpreting a policy document becomes part of future refinements, shaping how the model responds to similar queries. Whether this happens depends entirely on which version you're using. Free tools usually collect everything unless you explicitly disable training. Enterprise versions typically don't.
The sharp divide between consumer and enterprise tools
The gap between consumer and enterprise AI tools is massive. Enterprise contracts, particularly in the UK and Europe, contain strict commitments around data handling. Many business AI systems run inside virtual private cloud environments: isolated, organisation-controlled spaces where data doesn't leave the company's boundaries. Encryption keys are controlled internally. Data is deleted on schedules you set yourself. Model providers contractually agree not to use the information for training.
Free consumer tools operate very differently. They retain information for years, use it for training and may expose it to broad internal processing pipelines. The contrast isn't subtle. It's fundamental. Consumers get free AI in exchange for helping improve the technology. Enterprises pay substantial fees for privacy guarantees. The AI provider generates revenue both ways: from consumer data improving their models, and from businesses paying to avoid that collection.
When good intentions cause serious leaks
Real incidents illustrate this better than hypotheticals. In April 2023, engineers at a major semiconductor manufacturer used a free AI tool to help debug complex code. Over three separate incidents within 20 days, they unintentionally uploaded proprietary source code, testing protocols, and sensitive engineering notes. Because the tool's default settings permitted training on user content, the data became part of the provider's broader dataset — impossible to retrieve, impossible to delete.
This wasn't a malicious breach. No attacker was involved. It was simply a misunderstanding of how the system handled information. Yet the result was the same: valuable intellectual property entered a system designed to share knowledge. The company responded by developing internal AI tools, but the information had already leaked.
In March 2023, a bug in another popular AI tool's caching system exposed conversation titles and payment details of 1.2% of premium subscribers during a nine hour window. Although brief, thousands could have been affected. Even temporary glitches can reveal sensitive fragments.
How enterprises actually protect their data
Enterprise systems operate under clearer, stricter controls. Data remains within organisational boundaries. Identity systems enforce access. Encryption keys are managed internally. Providers commit legally to not using customer information for training or cross-organisational analysis. These measures aren't flawless, but they offer far stronger protection than free tools.
For you
Be part of something bigger, join BCS, The Chartered Institute for IT.
The rise of small language models (SLMs) strengthens this protection. Not long ago, running high quality AI required enormous models hosted on massive cloud infrastructure. In 2022, good performance needed models with 540 billion parameters.
By 2024, comparable performance was possible with just 3.8 billion parameters. That's a 142-fold reduction. Organisations can now deploy capable AI tools on local servers or even powerful laptops. For industries handling sensitive data, this changes everything: processing stays entirely within the organisation's control.
The settings most people never touch
For personal use, many free AI tools now include options to limit data retention, but these settings are rarely enabled by default. They're often buried within menus under labels like 'data controls,' 'training preferences,' or 'privacy settings.'
The difference these settings make is dramatic. Allowing training can mean data retained for years. Disabling it usually reduces retention to around 30 days. Some platforms now offer temporary chat modes that delete content immediately after use, which provides safer handling of sensitive queries.
One clear rule for workplace use
For any work-related task involving internal, confidential or proprietary information, there's one clear rule: don't use free consumer AI tools.
The convenience isn't worth the uncertainty. If your organisation hasn't provided sanctioned tools, they likely have strong reasons. The absence of an approved system is itself a signal about risk tolerance, data governance concerns or regulatory considerations.
The regulatory pressure building
European and UK data protection frameworks add another layer of complexity. GDPR principles like storage limitation, purpose limitation and data minimisation clash directly with systems that retain and repurpose large volumes of text.
Regulators, including the UK Information Commissioner's Office, have raised questions about whether major AI providers meet these obligations. Several providers have faced temporary restrictions or formal inquiries about transparency and retention practices. These regulatory discussions will shape the next generation of AI governance.
Why society can't keep up
AI adoption is moving faster than anything we have seen before. Previous technologies spread slowly enough for society to develop norms alongside them. We learned email etiquette through years of collective experience. We learned how public social media really was through lived mistakes. With AI, adoption has been incredibly rapid: hundreds of millions of users within two years.
Behaviour hasn't kept pace with capability. People who would never send confidential documents through unsecured channels often paste those same documents into AI systems without hesitation, believing the tool to be private or temporary. The privacy risk can be greater.
Three questions for every interaction
Using AI safely doesn't require advanced technical knowledge. It requires answering three simple questions every time you interact with a system:
- What am I sharing?
- Where does it go?
- How long does it stay there?
These questions create clarity. Treat free AI tools like busy public spaces: you wouldn't discuss sensitive business in a crowded café or debug confidential code at a library computer. Free AI tools deserve the same caution.
Choosing consciously instead of accidentally
This isn't an argument against using AI. These systems are powerful, useful, often genuinely impressive. They make work faster, reduce friction, and expand what individuals can achieve. But every free tool involves an exchange.
The invisible transaction only becomes problematic when it goes unseen. Once you recognise that exchange, you can decide whether each interaction makes sense. Sometimes it will. Sometimes it won't. What matters is that the choice is deliberate rather than accidental.
Take it further
Interested in this and similar topics? Explore BCS' books and courses: