Friday, January 11, 2008

DataPortability - bad name for a real problem

Image: (c) corbis
It's about you

The recurring theme in social web data is that the data concerned is about an individual (ie. you). Anything that describes (a part of) you or your actions or preferences or you name it, qualifies to be part of your 'profile'. It doesn't matter if you typed it in yourself, or if it is generated by some system. (As long as something pertains to you, represents you, it should be under your control. I'm taking an extremely user- or identity-centric approach here).

Currently almost any data describing you is not under your control. See The Profile Problem of which the Scoble vs Facebook case is an example (not the latest, methinks).

Let me be clear: I totally agree with the problems that DataPortability tries to solve. Only the name of the project is misleading to say the least.

The name Data Portability (and some of the things I've read on their website and forum) gives me the feeling that they want to be able to extract and import 'user data' from and into various existing social-web applications, just like Scoble did. That's a nice problem to solve, but it all depends on what level of abstraction the standards are formulated.

Let's do a thought experiment, Imagine a less-than-world-wide web where pre-existing publication platforms were hosting content, each in its own proprietary way. At some point the need for inter-operability (linking) and re-usability of this content becomes apparent.

Thanks to Tim Berners-Lee, and lucky for us, we don't have this situation on the current world-wide (webpage) web. We're not porting documents from one web-server implementation to the other, and we don't need or want standards to do so. Instead we have the HTML standard for the document format, the HTTP protocol for accessing and creation, and the URL for addressing.

These are key components. But on the social web, we do have this situation.

The problem isn't then that we can't port or migrate data (we really shouldn't be wanting to go that way), the problem lies deeper in that there aren't any standards for representing, addressing and manipulating this type of content.

Similar to the standards that make the world-wide web so successful, I sincerely hope that the DataPorters want to come up with a similar addressing standard for identities and for individual attributes. Already OpenID covers the part of the solution.

Once identities and specific attributes are uniquely addressable, data at these addresses should be in a standard document format (like HTML documents which resides at url addresses, 'attribute documents' should reside at attribute-url's). I would like to have an addressable document in a standard document format for each of the attributes that are part of my self-representation. (Unlike web-pages, these attribute-documents shouldn't be automatically world-readable, the documents themselves should include disclosure settings, and part of the access protocol should be the decision to disclose or not, depending on the 'requesting party'.)

It follows then that we also need a protocol for sending messages to create, retrieve, update, remove and share (disclose) these attribute-documents (defined by their unique addresses); HTTP does this for viewable documents and we should have something similar for identity-centric attribute documents.

Once the standards are there at the right level of abstraction (the attribute level imho), the rest is easy. You could implement a Facebook in no time, and it would be inter-operable with any other such system (no need to shovel data to and fro).


It seems to me that trying to make current proprietary social-web applications inter-operable after the fact, by devising a standard at the application level, is a waste of effort. Too bad we have existing applications without standards, but this still leaves us with the need for low-level standards for the identity-attribute domain. Once we have that, data portability is a non-issue.

Thursday, January 10, 2008

The Profile Problem

Ama-gi written in Sumerian cuneiformImage via WikipediaProfiles are used extensively on the web and in the enterprise. With the rise of 'the social web', the number of profiles you can use to describe (parts of) yourself is greatly increased.

On the web, each site or service needs to know something about you. This is your profile (for that site or service). In an enterprise setting, a company keeps a profile about all their employees or customers (i.e. you) in their identity management system.

When seen from a perspective of personal freedom, the current practices pose a few problems, which will be discussed next.

Problem 1: You don't have control over the data that describes you.

This is the most important problem. Apart from the specific elements you are allowed to store (see Problem 2), the data you entered is stored on a system beyond your control. It might be difficult to make changes, or to correct errors. You might be dependent on the service provider or others to have your data changed or deleted. You don't control the security that surrounds your data. You can't choose the systems on which your data is kept. You're not allowed to re-use your own self-representation.

Problem 2: Profiles dictate what you can and cannot tell about yourself.

Most online services and enterprise identity systems come with a prescribed set of properties (like your name, favorite music etc.) that you need to populate with values concerning yourself. The profile dictates what is required and what is optional. There is often no room for extra information.

Problem 3: Lack of sharing control.

You often don't control who sees the profile, and, in cases where sharing is relevant, you can't control the parts to disclose in enough detail.

On the web, you are stuck with the options the site or service offers you for sharing with other users (as is common in many online community services). Can you share all, nothing, or parts of your profile? Which parts?

Problem 4: Duplication of effort.

This is not really a problem related to individual freedom, but it is worth mentioning anyway.

Each service has its own profile page for you to fill out. This is a duplication of effort for you, since you have to maintain the same set of properties over and over again at each site. It also is a duplication of effort on the part of all the service providers, who build and maintain a profile infrastructure, user interfaces and the data it holds.

Admitted, the enterprise identity management system solves the duplication of effort problem by using an enterprise-wide profile management system, so the duplication of effort is mainly seen on the web, not inside the enterprise. However, the first two problems remain. Do you have any more influence over your profile (say, as a customer or employee), now that it rests in an enterprise identity system?

What needs to be done?

Nowadays, with the proliferation on-line services an social media, the need for a solution for these problems is needed more than ever. The challenge lies in identifying what an acceptable solution looks like.

We should take a step back from technical issues, and investigate the essence of self-representation, since a profile for an on-line service is just that. Taking the person being represented as the starting point, issues of personal freedom, privacy and control become relevant.

In any case, to prevent the problems identified above, the proposed solution should minimally have the following characteristics:

It should work with arbitrary attribute collections. It should allow for fine-grained disclosure (sharing) options under the users control. The data should be stored with a provider/technology of choice.

Starting from there, one can see the need for standards which allow an individual to manage the the definition, modification, querying, and disclosure of personal attributes.

Monday, January 7, 2008

The Rise of Networked Individualism

In The Social Affordances of the Internet for Networked Individualism, a very interesting analysis is done of the shift from a group-centered towards a person-to-person society. See especially the chapter The Rise of Networked Individualism.

Attention Profiling: APML

Attention Profiling: APML Beginner's Guide - Robin Good's Latest News

The APML standard proposes a unified format for capturing a persons interests. The main idea is to keep the user in control of his data (good). It focuses on certain types of data, mainly those that describe a users preferences and interests. One of the goals is to make recommendations and filtering easier and to prevent information overload.

I would say the user-in-control approach is very good, but why limit to certain kinds of data? Do we really need a standard format for each domain or application? My point of view would be to create a standard at a more general level. It should be generic enough to let the person describe himself however he/she wants (see The Profile Problem) and for describing detailed disclosure policies. I will elaborate in another article.

Data portability

Recently I came across the data portability concept. Finally, it seems that the idea is gaining hold that you should be the owner of the data that describes you. Not Facebook or any other organization which tools you use.

This article covers some of the issues: Are You Paying Attention?: Top 3 Privacy issues for Data Portability on Social Networks

Now to argue one step further, I think you should not only be the owner of the data describing you, but also the designer.
Creative Commons LicenseExcept where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License