Exploring identity through big data

Louise Bennett FBCS, Chair of BCS Security, looks at the opportunities and dangers of one of the implications of big data: identity discovery through data aggregation.

Think for a moment about all the data that you have given to organisations when you signed up for a subscription or purchased a ticket. Add to that your loyalty card data and what you have posted to social networks, your browsing history and email, your medical and education records.

Then add in your bank records, things friends and others have posted about you, memberships and even CVs posted to job sites.

Are you happy about people joining all this data together into an aggregated view of your life and mining it? If they do so, what are the implications for privacy and will it benefit you or ‘them’ more?

There are many commercial models on the internet. Some services are free or below cost because there is value in the data that customers give up when they use those sites or services.

The quid pro quo is usually targeted advertising. As Viviane Reding of the European Commission said on 22 Jan 2012, ‘Personal data is the currency of today’s digital market’. It is widely said that if you are not paying the full cost of a service you are a product, not a customer.

Most young people either do not think about this or they accept it, and it can be a win-win situation. You can apparently get something for nothing, or almost nothing, if you pay for it with your identity attributes.

Do you need to get online?

However, you may not want your identity attributes to be used and privacy may really matter to you. If that is the case, do you need to get offline and lose out on some deals you might be offered? What does big data mean for your privacy? Can you retain online privacy or is identity discovery through the aggregation of your personal data attributes inevitable?

Personal information disseminates over time into many different areas and once published on the internet it is improbable that it can ever all be deleted. There are also powerful commercial tools available to mine information about an individual or organisation. The next time you use a social media site or search engine consider what adverts or suggestions are made to you. They will often be tied to your habits.

For this reason, many people will want to use different identities for different activities on the internet to frustrate potential data aggregation. Many of us will feel there has been an invasion of our privacy if, out of the blue, a connection we deliberately withheld is made about us.

For example, you may wonder: ‘How on earth did the organisation my husband has just bought something from know my mobile phone number. We did not give it to them and it is in another name. So how could they text my smart phone to tell me his purchase will be delivered to our home tomorrow?’

Increasing regulation

Concerns about data aggregation and data mining on the internet are likely to increase rather than decrease in the coming years. There is also likely to be pressure for regulation because of the potential privacy implications. One example of this is the proposed new EU Regulation on Data Protection. This

includes a section on ‘the right to be forgotten’. However, if the Regulation ever gets agreed (which is unlikely with about 4,000 amendments tabled and a 2014 deadline before the EU elections), the right to be forgotten is one thing that will probably be removed.

Such a right is certainly technically challenging, if not impossible, in the internet age. The best privacy activists can hope for is a right to relative obscurity. The online world increasingly uses a network of attributes to determine identity. If these attributes are just matched for a one-off identity check that is one thing, if they are stored and aggregated in big databases it raises more concerns.

When we think about privacy, particularly in relation to commercialisation of the internet, government surveillance and data collection, it is revealing to consider the outrage at Edward Snowden’s revelations about elected governments engaging in lawful espionage compared to the absence of concern that businesses (accountable only to their shareholders) have all this data in the first place.

Many individuals object to identity discovery through data aggregation, whether by governments or business. This is especially true where it is used to find out about a person’s preferences and life, using data that the individual regards as sensitive, personal data. It is of even more concern when it is used for cyber-stalking and cyber-bullying, or transfers into the real world as stalking or other criminal activities.

This in turn can lead to people feeling it is legitimate to withhold information about themselves or provide incorrect information in responding to requests they feel are unjustified (e.g. mandatory fields on their age, ethnicity or religion being requested before they receive their goods or services).

This is especially important where identity discovery is looking for attributes that are not actually identity attributes, but give information about a person’s preferences or life choices (such as sexuality or membership of organisations).

Attributes of identity

The ‘attributes’ aspect of identity are key to the responsible use of big data. Everything is context dependent. We rarely engage completely online. Often the trust context is developed offline (through our friends or trusted brands) and carried through to the online experience.

It is vital to determine what attributes are required in a particular interaction and how trustworthy attributes can be conveyed in a manner that maximises the benefits of the availability of those attributes, while minimising the disbenefits of revealing more attributes than are strictly needed. This requires detailed analysis and not broad generalisations.

While technology solutions may exist, the social and economic aspects of implementation are very complex. They are also very personal and will change for an individual over time, even in identical contexts. What was a playful prank at school could have implications when applying for jobs years later if it can be linked to your identity.

The opportunities

The pace of innovation in online commerce and delivery of government services is accelerating. By making everything digital, exploiting the power of big data and the ubiquity of mobile communications, there are huge opportunities to improve productivity, enhance the value to individuals and manage risks effectively.

While the potential upsides are great, the downsides are also stark. The downsides lie mainly in the potential loss of privacy (both real and perceived) and the erosion of trust, if those online cannot provide evidence of their trustworthiness in the context of the transactions they wish to make.

Context and demonstrable trustworthiness are key to the use of personal data attributes in the online world. They are blended with our experiences in the offline world. The success of ‘bricks and clicks’ commercial models is testament to this.

Those who mine big data need to think very hard about how they monetise personal data attributes. They need to be transparent about what they are doing and provide evidence that they are trustworthy if they are to handle our attributes in an acceptable manner, and be successful in an online world.