Follow Jeff Sayre on Twitter

A Flock of Twitters: Decentralized Semantic Microblogging


In my last article, Flocking To the Stream, I ended with this thought about the growing issue of social-networking fatigue:

…as the number of streams continue to increase and as the flow rate of each stream picks up, people will grow tired of having to subscribe to, having to join yet-another-stream phenomenon (YASP). Does the Web truly need additional stream providers each with their own data silos? Is there a user-centric solution to this rapidly growing, overflowing-stream issue that puts YASP to rest once and for all?

This article answers these two questions in great detail but the succinct preview version is as follows:

  1. The Web does not need additional stream providers each who exert significant control over a vast number of individuals, each who require their users to have a separate new user account (a new digital identity)
  2. The Web does not need additional closed data islands (data silos)
  3. The Web does need a means with which each individual can create, maintain, and control their own identity, efficiently and effectively manage stream conversations, and therefore not be beholden to a few, large data-silo stream providers
  4. The only way to accomplish point three is for the emergence of a distributed, decentralized, Open Source microblogging ecosystem that leverages the power of the Semantic Web

Table of Contents

Since some of the following may be too generic for more advanced readers, I’m providing this Table of Contents to help readers navigate to those parts with which they may have the most interest. The first four sections are a general review of the problem and solution. The rest of the article provides my detailed thoughts on this issue.

Although you should feel free to skip ahead, doing so might result in missing a crucial connection.

A Web of Damned Streams

From a user’s perspective, one of the issues with YASP is that their Web identity is strewn throughout the Web with some of their thoughts clumped in one data silo while others are deposited in another data silo. This makes it very difficult for each user to manage all their streams and associated relationships.

What happens when a new, exciting stream comes along? When a new Stream comes along, users have to weigh the potential benefits of membership against the likely pain and inconvenience caused by having to create a new identity, build a new network, and manage yet another stream.

Social networks benefit from what is called user lock in—the very real fact that, most things being equal between social networks, a user will likely decide to stick with a social network because it takes too much work to move data from one site to another. So, instead of moving their data and possibly closing their account, a user will simply open up another account at a competing social network.

Of course, this version of lock in assumes that social networks allow the moving of, or the copying of, their members’ data from one network to a competing network. In reality, the vast majority of social networks do not even allow their members free access and control over their personal data.

The issue facing most Web 2.0 users is that they have a multitude of accounts, each with its own username and password, each associated with a specific web service, and each located in a separate, independent repository—the proverbial walled garden of disparate user data, the omnipresent data silo.

Although most of the large social networks do expose a portion of their users’ data via proprietary APIs, they do not run an open network. They guard their data closely, assuming ownership of all their users’ personal streams. It is easy to understand why this is the case. A social network’s competitive advantage is their users’ data.

The current Web is dominated by the Web-2.0 social networking meme. It is not a healthy, vibrant Web. In fact, the current Web is becoming filled with damned streams, silos whose data barely trickles out and are not openly accessible to the rest of the Web. Google Buzz, Facebook, and Twitter could almost be considered alternate Webs, their members’ data mostly disconnected from the greater Web.

From a user’s standpoint, it is even worse. Most of these fortresses have rules and regulations that make it difficult for users to freely access and use their data elsewhere. Two years ago, Robert Scoble found out this shocking fact when he tried to move his social graph from Facebook to another service.

What’s the result of all these damned data silos? The promise of the Social Web is hindered. Later I’ll discuss the difference between the Social Web and social networks.

A Flock of Twitters

Instead of people becoming more dependent on highly centralized, proprietary microblogging services like Twitter, FriendFeed, Google Buzz, and Facebook, What if users could embed microblogging capabilities into their personal websites?

I don’t mean simply tie their Twitter, Facebook, and other social media streams into their website via behind the scenes, proprietary APIs—which they can already do. I mean actually host their own microblogging platform, become their own microblogging provider.

People should be able to subscribe directly to your microblog, to you and not to one of your myriad profiles on someone’s data silo. The way it currently works is that a user interested in what you have to say not only has to join Twitter (or Facebook, or Google Buzz, etc), but they must also subscribe to your stream on that particular service.

But what if a user who was interested in what you had to say could simply subscribe to your microblog, in essence subscribe to you? What if they could pull microblogging content from your site that originated directly on your site? What if there were a flock of Twitters and not just a single, centralized Twitter?

Why Decentralized?

Whereas a flock of Twitters may seem like an interesting concept, you may wonder if there actually is a benefit to creating a decentralized, distributed microblogging platform.

Part of the original vision of the Internet was to create a distributed communications network that did not have a central point of failure. The Web added a layer that allowed anyone, in theory, the opportunity to operate their own communications platform or channel (called a website).

But today’s Web-2.0 data-siloed social networks have created a handful of massive points of communication failure in the daily lives of hundreds of millions of people.

As an example, over the past two months, Twitter has experienced increasing unreliability. In fact, on January 20, 2010, Twitter was down for 90 minutes causing an uproar in the community. Whereas this might have been a fluke, or possibly have been related to their growth rate, the cause does not really matter. What does matter is that millions of people felt lost without their connection to their network.

This illustrates another fact of Web-2.0 life—that the promise of a Web where everyone had their own communications channel has been usurped. Although most people naively believe they do have their own communications channel by having a Twitter, Facebook, or LinkedIn account, in reality they are beholden to a few Web behemoths to offer them communication services.

By creating a truly decentralized and distributed microblogging platform, users can once again regain control over their Web experience and create their own communications channels. They will benefit from increased data control, data accessibility, data usability, and data security.

A final benefit to decentralized microblogging: data portability is no longer an issue when you own, host, manage, and control your own data store—at least with regard to your microblog activity. You do not have to port the data into a new silo because your data is always right where it should be—in your own silo. Your data is kept by you, managed by you, and controlled by you. You may have to move periodically your database to a new server or another web hosting firm, but that is not an issue of data portability.

Even with decentralized microblogging, there will still be data silos. The silos will just be micro silos (or solo silos) where all the data contained within each silo represents one entity and is controlled by that one entity. It is the perfect entity-to-silo ratio.

A final point. There is a theoretical limit to the number of microblog installs. It is the extant human population. Actually, it is more than that if you make allowances for the fact that businesses, governmental entities, and clubs could host and manage their own microblogs. A user, after all, does not have to be an individual person. A user can be a business.

Why Semantic?

Offering users the ability to operate their own microblogging platform is an enticing thought. But a decentralized, distributed microblogging system does not guarantee that data will be readily available and open throughout the Web.

Instead of having a few, very large closed data silos, a Web of microblogs would in essence be millions of very small closed data silos.

Why is being open important?

One of the promises of the Web in its early conception was to create a network were disparate data sources were interconnected in such a way that integration and interoperability issues went away. To accomplish that goal, data needs to be exposed.

Exposing data creates an entirely new realm of beneficial possibilities. Instead of websites being searched for matching keywords and phrases, the underlying data can be directly queried.

So, how do we open up all the micro silos? By leveraging the power of the Semantic Web.

This article will not go into a deep explanation of the Semantic Web. However, you can think about it in this broad way. Web browsers navigate hypertext; Semantic Web applications navigate hyperdata—data that is encoded with semantic markup and interconnected to other semantically-coded data in other locations. So, whereas hypertext is text linking to other text (documents), hyperdata is data linking to other data. (See 1 & 2 below)

Semantic Web applications are built using a stack of W3C-specific technologies— in particular the Resource Description Framework (RDF) and the Ontology Language (OWL). The Semantic Web technology stack is particularly important, as it provides a standardized way of encoding data without the need for a central controlling authority.

When data is semantically tagged, with the underlying metadata modeled using RDF and URIs, machines can “see and understand” the content. By this, I am not referring to some type of artificial intelligence (AI) engine that can infer meaning from data.

Instead, the data that has been encoded with semantic markup (semantic metadata) becomes structured in such a way that the intent, the meaning intended by the author is unambiguous. This is accomplished by using various ontologies (vocabularies) to tag the upper-level data with sufficient, relevant metadata that structure and meaning is added to the human-readable data.

Once data is opened up to discovery by being semantically marked up, the Web becomes a truly interconnected network.

For more information on the Semantic Web, you can start here:

  1. Henry Story’s excellent presentation Building Secure Open & Distributed Social Networks
  2. For a more detailed explanation of hyperdata, read Nova’s article, The Semantic Web, Collective Intelligence and Hyperdata
  3. For more information on the Semantic Web (definitions, RDF, and development tools), visit this link
  4. For a brief history of the Semantic Web, read James Hendler’s article, What is the Semantic Web really all about?

Since it is difficult to succinctly and accurately describe the Semantic Web in layman’s terms, I encourage you to read other sources and become well versed in the Semantic Web–its concepts, underlying technologies, and how you can participate in it.

Evolving Nova’s Stream Concept

Before I get too far into the specifics, I need to present a new interpretation of what Nova Spivack calls the Stream.

One of the powers of Nova’s Stream concept–at least in my opinion–is that it evokes the imagery of a flowing body of water. As I began gathering my thoughts for this article, it became apparent that his Stream metaphor could be expanded, could be evolved in a way that sets the table for a more meaningful discussion about decentralized semantic microblogging.

Nova describes the Stream as follows:

Just as the Web is formed of sites, pages and links, the Stream is formed of streams.

Streams are rapidly changing sequences of information around a topic. They may be microblogs, hashtags, feeds, multimedia services, or even data streams via APIs.

In my extension to his concept, I diverge somewhat from his original definition of the Stream. Instead of viewing each stream as an information flow around a particular topic, I’ve reimagined the stream as the flow of ideas from a given individual. A Stream is thus a monologue that contributes to a greater conversation.

A Drop of An Idea

In keeping with the metaphor of a flowing body of water, I envisioned a water-cycle like flow from a single idea to an ocean of open discussion. Therefore, I call my model of a decentralized microblogging ecosystem the Meta-Hydrological Model.

With that concept in mind, you can think of a single idea posted by a user as a drop. Just as a user of Twitter adds to a conversation by posting a tweet, and a user of FriendFeed or Facebook makes what is generically called a micropost, a user in this new conversation ecosystem posts a drop. So a drop is equal to a tweet is equal to a micropost.

Here is a simplified, graphical representation of the Meta-Hydrological Model (also called the Meta-Flow for short).

Click to see full size

The aggregation of all of a given user’s Drops is that user’s Stream. Viewed in this way, if a Stream is what a single user produces, then the River is the confluence of disparate users’ Streams. I’ll describe this in more detail later.

Within each user’s Stream, ideas might coalesce into specific topics. I call these Channels (Stream Channels). Channels are Drops that are grouped under a specific topic to form substream categories.

The final part of the Meta-Hydrological Model is what I call the MicroBlogOcean (MBO). The MBO is the sum total of all microblogging activity in the global conversation ecosystem. It is all the conversations, represented by all the Rivers.

Below is a natural, visual representation of this model as seen from space.

Satellite image of the Amazon River delta from NASA's Landsat GeoCover Program

Channeling Your Stream, Seining Your River

In our hydrological metaphor, a River is the confluence of disparate users’ Streams. But it is not a passive mixing of user ideas. Instead, each user has their own unique River, a River that they assemble, that they control. In particular, a River is the aggregation of all the Streams to which a given user is subscribed. It is similar to your following list on Twitter.

With Twitter, however, there is no practical way to filter the streams of those whom you follow. You subscribe to their entire stream of consciousness. Wouldn’t it be great if you could decide what thoughts, what information you let other users send flowing down your River? Wouldn’t you like the option to grab just the content in which you are truly interested?

Whereas users could of course choose to subscribe to your entire treasure trove of thoughts, by organizing your content into Channels, you provide a means whereby your subscribers can filter out what they do not care to see. They would have the option to subscribe just to your substream(s) and not your entire Stream.

Why is this important?

Well, as an example, for absolutely every person I currently follow on Twitter, I don’t care who just booted whom out as the mayor of whateverville. I don’t want that drivel polluting my pleasant paddle down my River. It adds zero value to my day and provides little if any entertainment.

I also rarely need to know (nor care to know) whenever someone has just stopped by a Starbucks, or is eating at this and such restaurant 1000 miles away, or is on a treadmill listening to Kid Rock on their fancy Zune. It’s also the case for many people whom I follow that I’m not actually interested in all the serious topics about which they micropost. In effect, I actually subscribe to them only for a small subset of their shared knowledge.

Now, to be perfectly fair, I bet some of my followers would be very glad to filter out my microposts on the Semantic Web, whereas others would be happy to stop seeing my microposts about WordPress or BuddyPress. It may also be the case that no one cares at all to see any of my general thoughts that I occasional let float down their River. I think my subscribers, my followers, should have the right to filter out what they consider to be MY drivel.

By providing a mechanism for channeling thoughts into topics, our new microblogging client would provide a better user experience. The utility of user Channels could be further improved by offering public and private Channels. A Public Channel would be visible to all and open to subscription. A Private Channel would only be available to those users who are granted access via their WebID (more on the concept of using the WebID later).

The MicroBlogOcean

As mentioned above, the totality of all microblogging activity is called the MicroBlogOcean (MBO). In this global conversation ecosystem, Drops are constantly being pushed to and pulled from the MBO cloud.

To provide and manage the myriad MBO services, a new type of SaaS model needs to be created. I call this software-based service a Confluence Hub. A confluence is the point where two or more bodies of water meet. Therefore, a Confluence Hub is the place where Drops sent by various users meet up, are processed, and wait for further action.

Notice User has only subscribed to User 3's Channel. Click for full size

This is how the process works. A user’s client sends a Drop to the closest Confluence Hub where an amalgamator combines them for transmittal to all that user’s subscribers. The Drops are organized by Channels, if any, and cached. If a Confluence Hub (CH) is down, then the Drop is automatically rerouted to the next closest hub.

An aggregate is a collection of items that are gathered together from different sources. The role of the client-side aggregator then, is to poll, to query the primary Confluence Hub Server (CHS) of each user Stream to which a user is subscribed, pulling the resultant dataset into their River on a predefined, regular interval.

Only the content the User wants gets through. Click for full size

So, whereas a user’s Drop is pushed to the closest, active Confluence Hub, the Drops of each user that they follow are pulled into their River from the MBO cloud.

Using our hydrological-based metaphor, Drops are created and stored on each owners’ site. This means any Drops that are de facto responses to someone else’s Drop are contained within disparate sites across the Web. Whereas the user’s client would cache all incoming Drops (in their River) and the application might even have an option to save a discussion to disk, the original Drop remains located in the owner’s Stream.

The Meta-Flow concept is not a perfect analogy to a natural hydrological flow. Whereas Drops do travel to Confluence Hubs, copies of those Drops are pulled into each subscribing-user’s client to form their unique River. The MicroBlogOcean therefore contains multiple references to the same original Drop and the Rivers actually flow out of the MBO rather than into it.

Although I personally believe this hydrological-based metaphor does a sufficient job of breaking down and describing the component parts of the overall decentralized microblogging ecosystem, for purposes of user understandability, the terms may need to be replaced with a more generic, globally-recognized nomenclature. Although, what is more globally recognizable than the water cycle?

Social Web Versus Social Network

When talking about the Semantic Web, it’s important to differentiate between social networks and the Social Web. These terms are not synonyms. In fact, the Social Web is not even the sum of all social networks.

Why is this the case?

Today’s social networks are nothing more than the famous walled gardens of the Web—as was previously discussed.

With their closely-guarded data silos, social networks are not full participants in the Web, they are not participants in the interconnected data ecosystem. So, unlike an ecological web (think of a food web), the Web-based Internet is not as much of an intact web as it is a land of social network islands that punctuate an ocean of truly connected websites.

The Social Web, on the other hand, is a fully functioning and healthy ecosystem were all data are globally connected. In my view, the only way to bring to fruition the promise of the Social Web is to embrace the Semantic Web.

The term Social Semantic Web is often used to differentiate between the current social-network based Web and a truly connected Web of Data. Since I believe that the Social Web requires the Semantic Web, I view the two terms as synonyms.

What might a truly connected Social Web look like?

I use this image as a graphical representation of what an open, fully linked, global Social Web would look like (see the caption for the actual description of the image). Imagine that each end point is a user creating their Drops that freely flow down their Stream, into their River, finally ending up in the MBO cloud. Each node, the point were multiple Streams converge, would be a Confluence Hub Server.

This image is a tracing of all the Internet traffic circa late 2006. It is licensed under a Creative Commons License (by-nc-sa/1.0) and created by

Where would the big social networks appear on this graph?

Twitter would be a single point in this image with a few tenuous tendrils extending out representing the limited access that Twitter allows to their data silos via their proprietary APIs. There would be no lines representing conversations between users as the totality of conversation all occurs within the walled-off Twitter space.

The same holds true for Facebook, Google Buzz, FriendFeed, LinkedIn, and many of the other social networks. The lines connecting these services would be nothing more than gossamer strands representing the brute-force pushing of limited duplicate content between these data silos.

You might be thinking that conversations regularly occur between users of these platforms. For instance, I can choose to show my latest tweets on Facebook or LinkedIn, I can choose to display my latest Facebook or LinkedIn status updates on Twitter, and so forth. But these are not conversations. They are just snapshots of conversation that are occurring within other data silos.

Anatomy of a Drop

A Drop contains more than just the visible content, more than just the human-readable layer. A Drop is a packet composed of several layers, each providing additional metadata that makes the management and discovery of data more feasible.

Click to see full size

Content Layer: that part of the Drop that is actually intended to be seen by humans; also referred to as the droplet

Metadata Condensate: when the Drop is being assembled, different metadata layers are aggregated together, which are then deposited into a super-metadata layer. This layer encodes all the supporting data that makes extensibility, management, delivery, and discovery of the user’s Drop possible.

The Metadata Condensate layer is composed of five sub layers:

Rich-media Layer: pointers to associated audio, video, or picture files

Semantic Layer: the machine-readable, semantically-marked up metadata

Rights Layer: the granted usage rights for the Drop

Using the proposed Protocol for Implementing Open Access Data as a model, Drops, Channels, and even entire Streams could be marked with usage rights

Security Layer: WebID to tag Drop to specific user; whether Drop is public, private

Stream Management Layer: unique Drop ID; time stamp; GIS metadata (location-based tagging for mobile microblogging); Channel tag for grouping Drop content (allows filtering by other users); whether Drop is to be broadcast to all, a specific user group, or to one specific user; Drop broadcast delay; Drop time decay (a finite lifespan for Drop if desired); client metadata (whether Drop was sent via Web client, desktop client, via a CHS service, etc.)

Semantifying the Drop addressees several key issues that hinder current microblogging platforms. First, by providing a mechanism where machine-readable metadata can be effectively and efficiently associated with Drops, this unlocks each micro data silo, opening it up to outside services to access via query. Second, organizing, grouping, classifying Drops into Channels allows for meaningful filtering of users content. Third, by using a FOAF+SSL backed WebID, privacy and identity management across the MBO becomes possible.

Whereas users can still add tags (via micro and nanoformats) when composing each Drop–and maybe even some basic html markup, like the “a” link tag–the real benefit accrues from the automatic encoding of semantic metadata into the Drop.

Additional ontological encoding could occur on each Drop via a Semantic Interface Options box on the Drop composition panel.

It’s important to note that although each individual user will have the right to determine how much of their microblogging content is shareable across the Web and even with whom it can be shared, in concept, if a user is wishing to participate in the global microblogging community, it is assumed that they will wish others to see what they have to say.

This is just an initial concept of the structure of a Drop. It may be that one or more of the Metadata Condensate layers (or parts of a given layer) should be included under the Semantic Layer.

Some Technical Thoughts

This article is primarily a presentation of an initial concept. The technical details obviously need to be fleshed out. But I have ideas toward that end which I’ll present here in no particular order of importance.

User and Stream Management

How do users login into their Streams? How do users subscribe to another person’s Stream?

By using a combination of FOAF+SSL, the micbroblogging ecosystem would authenticate and authorize users based on their WebIDs.

So, as an example, a single user (authenticated via their WebID and FOAF+SSL) of type foaf:Person will subscribe to, will follow the Streams of many users of type foaf:Person.

NOTE: At this point some readers may be asking why OpenID, a well-known SSO, is not being suggested. The reason is that OpenID has some important limitations. But the use of FOAF+SSL does have a big limitation at this time ( thanks to nathan for pointing this out). Many smartphones do not support SSL certificates. One possible solution is to use FOAF+OpenID. But, all things being equal, a WebID backed by FOAF+SSL is more powerful, easier to use, and takes advantage of the FOAF Semantic Web ontology.

Fault Tolerance and Redudancy

Redundant distribution and replication to geo-disparate Confluence Hub Servers could provide additional fault-tolerance for those stream providers who want too ensure that their subscribers are guaranteed access to their Streams at all time. This would be very useful in crisis situations where the real-time nature of microblogging has proven extremely beneficial during several recent natural disasters.

Platform Ecosystem

My model of a decentralized semantic microblogging ecosystem (the Meta-Hydrological Model) requires three basic software components:

Personal Stream Server (PSS): the client software that a user uses to create their Stream and manage their River.

Community Stream Server (CSS): for those users who do not want to manage their own self-hosted solution, a community-based, public Stream provider is necessary. Such providers could offer the service for free or for a fee. The important issue here is that all users with an account at a Community Stream Server would be the owners of all their data, deciding how the data is used and exposed. If they wished to move their data (their Stream identity) to another server, they could easily do so. Community Stream Servers would be configured so that users could brand their identity, using their own domain names.

Confluence Hub Server (CHS): this has been discussed in more detail above. In addition to the aforementioned duties, each CHS would also be responsible for co-aggregating the realtime view of the MicroBlogOcean.

Unlike the handful of DNS root zones in the Domain Name System, the number of Confluence Hubs would not be limited by any authority. Anyone who meets a set of minimum requirements (hardware, software, and bandwidth) could host a CHS. Although anyone could download the CHS platform software, only those whose setup meet the minimum requirements would be able to initiate an active CHS service.

Client-Server Software Architecture

The software architecture of client and server, as well as the UI/UX, is beyond the scope of this article. Although I do have a few top-level suggestions/ideas:

  1. Software stack must utilize all open-source based technologies
  2. Use of a graph database backend (or a similar NOSQL DB) which is better suited at modeling the graph-like nature of social networks. For more details on this comment, see my Powering Startups to Become Smartups series.
  3. Possibly the use of a language that allows for coding of a Web-based interface as well as desktop client software (Java, Python, or Ruby to name a few). One of the drivers of growth and success for Twitter has been the development of 3rd-party desktop clients. It may make sense to offer an initial version of such a client along with the Web-based interface.

These are just kernels of an idea about possible architectural considerations.

Possible Extensions to FOAF and SIOC Ontologies

As the FOAF specification states, “FOAF documents describe the characteristics and relationships amongst friends of friends, and their friends, and the stories they tell.” In the world of social networking–especially decentralized microblogging–the concept of friend can be very nebulous.

This is why microblogging services like Twitter and Google Buzz use the term follower, and FriendFeed (owned by Facebook) uses the term subscriber. It is a one-way relationship that does not have implicit reciprocity.

In other words, just because I follow you does not imply that you follow me, that you plan on following me, or that you will ever follow me. In fact, in practically all cases, users with large followings do not know and are not even aware of the vast majority of their followers.

The FOAF concepts of “friend” and “know” are often not in tight alignment with the realities of the newer social networks. A better classification of these relationships needs to be created.

A new FOAF class of foaf:Following may be all that is needed to rectify this type mismatch. A list of all the people that a given user is following could easily be compiled by querying the system for all unique foaf:Following relationships. This list could be further broken down by unique social networks by extending the query to include property foaf:account. It would equally be simple to determine all of the people who are following a given user.

Addendum: Thanks to comments below from John Breslin and Alexandre Passant who pointed out the SIOC specification does have the sioc:follows property. So, using foaf:Person with sioc:follows could properly classify a following relationship.

How should users of a globally decentralized semantic microblogging platform be classified?

Each user would be identified via their WebID and not their sioc:User type—which is utilized only for marking up the various accounts a user has throughout the Web of social networks.

Whereas the SIOC Core Ontology is designed for easy extendability, the emergence of decentralized microblogging may necessitate an addition to the core classes as the current classes do not fully capture the uniqueness of such a system.

Whereas discussions within traditional blogs and forums occur on the same site (within the same data silo), discussions on a decentralized microblogging cloud are not the same. The discussions occur across the cloud, across the Social Semantic Web. This then becomes an issue of classifying relationships within the Social Web and not between disparate social networks and their data silos.

Some Early Players in This Space

There are a few early players in the decentralized microblogging platform space and at least one in the open source centralized blogging arena. It is important to note that only one of the players below is working on a decentralized semantic microblogging implementation.

  • SMOB: self described as an open, distributed Semantic MicroBlogging framework
  • 6d: self described as decentralized social network. This is not a true microblogging platform but I thought it should be included for reference.
  • onesocialweb: an open-source application created by the Vodafone Group described as a free, open, decentralized microblogging platform
  • StatusNet: the open source, centralized microblogging platform that powers
  • ADDED March 16, 2010: GNU social: A project of the Free Software Foundation to create a “decentralized social network that you can install on your own server.”
  • ADDED July 2, 2010: Diaspora. I’ve included this early project as it has made quite a stir across the InterWebs, raising over $200k via Kickstarter. I’ve contacted them several times to discuss how Semantic Web technologies could improve their platform but have not heard a tweet back.

Which of these is the right solution?

While all of these are encouraging entrants in the space, SMOB shows the most promise at this time as it is the only platform that is working on bringing about the Social Web through decentralized semantic microblogging.


It’s time to return to the original concept of the Web-based Internet—an interconnected, decentralized and distributed, open and independent cacophony of individuals who control their own Webspace, operate their own communication channel, and freely communicate with others without having to worry about a central point of failure.

The only way to build a truly open and decentralized global microblogging network is by leveraging the power of the Semantic Web. Doing so will help usher in the reality of the Social Web.

Decentralizing and individualizing Stream creation and management will help ensure that the MicroBlogOcean does not have a central point of failure and does not require a central-controlling authority. With a properly semantified and structured Stream, even efficient and effective privacy and identity management become feasible.

This article is just one drop in the bucket (yep, I had to say it). It is a first version of an evolving concept. As people provide constructive feedback and the idea gets debated, I’ll openly evolve this concept to better reflect the realities of the emerging Social Web and the technologies that will help bringing it to fruition.

Additional Background Information: Read my short post about Facebook and privacy issues: Privacy in the Facebook Era

My Related Articles About the Social Web

  1. It’s Time for Blogging to Evolve
  2. Flowing Your Identity Through the Social Web
  3. My four-part series, Web 3.0: Powering Startups to Become Smartups
  4. How the Death of Net Neutrality Effects You
  5. Goodbye Google Old Friend: It’s time for the Open-Source Internet
  6. Thinking Outside the Privacy Box
  7. Regaining Control of Privacy and Identity: It’s up to Each Individual

FOLLOW UP (March 16, 2010): A number of people have asked me via Twitter how to follow developments on this topic. Unfortunately, Twitter is not well suited for “following” ideas since there is no way to create groups. The real-time Web is not about building groups that can upload documents, create lists, and have an easily-searchable history. So, we have created the Semantic Mirco Blogging group on You can sign up there and participate in the discussions.

FOLLOW UP (March 20, 2010): There are a number of additional services that a distributed, decentralized semantic microblogging platform could perform. One such service would be to replace the current closed-siloed, location-based, check-in services like Foursquare, Gowalla, Brightkite, etcetera. Currently, these services are is competition for users’ time. They also pose some issues as discussed in my brief post here and this article, Check-In Fatigue. Or, Why I’m Rooting For An All-Out Location War. If users could use their own microblogging space to not only post their Drops but also to post location-based check-ins, it would allow for the filtering, the channeling, of that data so that their subscribers could opt-out of having such check-ins float down their River.

FOLLOW UP (September 15, 2010): The Diaspora project open sourced its initial source code today. You can read more about it here. Although they have come a far way, at this time Diaspora still falls short from being a fully-realized Social Web platform. Perhaps they, or the community building up around it, will semantify the codebase in the future.

Article Comments

  1. Erik Bigelow says:

    Hey thanks for the mention of 6d, and great article! We’ve just finished a couple more features and we’re getting close to a beta version. Shoot me and email if you’d like and I’ll give you a demonstration of it pretty soon. We’re gonna be at SXSW so we’re really pushing to have a mostly complete beta by then.

  2. John Breslin says:

    Great article Jeff – I need to re-read when flu wears off, some very nice analogies in here and thanks also for the SMOB reference.

    I hope to comment more when better.


  3. billy cripe says:

    Thanks for the in depth article as well as a great trip into the practicalities of semweb applicability!
    However I worry anytime something starts with,”we don’t more…” While I’m not a microblogging provider and agree with the points about webbing the aggregations and sem-describing to form critical novel linkages, I think that we’ve shown ouselves to be poor predictors of what we need more or less of.
    What if Facebook had said that when myspace was in ascendency? What if google had said that in the alta-vista and dogpile days?

    • Jeff Sayre says:


      Thanks for your comment.

      The point is that the Web–as a manifestation of human activity, thought, and interaction–does not need a few big players deciding how issues such as data ownership, privacy, identity, and communication should be offered up and controlled. The Web should provide a means, a mechanism, whereby individuals can decide what they want and need, and how best to accomplish that. One way to offer that is by the concept I outline in this article.

      As I state in my article, Privacy in the Facebook Era:

      There are compelling reasons why opening up personal data to the world is desirable. But it should not be up to governments or corporations to make that choice on behalf of their citizens and users. In a free society, it should be the citizens who drive the push toward more open data, not a few elite power players who force the issue.

  4. John Breslin says:

    Hi Jeff –

    I like the move from a model that subscribes to users to one that allows you to subscribe to topics from the people you follow. It’d be cool to see the extension of the pictures you have that show those drops being filtered based on the channels identified in the Stream Management Layer.

    A short note: sioc has a follows property that links User(Accounts) to other User(Accounts); we also have the MicroblogPost and Microblog classes but they by no means fulfill all the requirements in your Anatomy of a Drop section – normally, we’ve modelled based on existing systems, but I think this could require a new semantic model on which a new decentralised microblogging platform could be built.

    Great article – it deserves more coverage!


    • Jeff Sayre says:


      That’s an interesting idea to create a graphic showing Drops being filtered based on Channels. I’ll see what I can do!

      I’ve responded in more detail to both you and Alex below.

  5. Jeff Sayre says:

    John & Alex-

    Thanks for your comments. I appreciate your input and additional insights.

    I can’t believe I failed to see the sioc:follows property in the SIOC specifications. The sioct:Microblog and sioct:MicroblogPost classes are great additions to the Types module.

    I do still wonder whether the FOAF specs need to be extended. The foaf:knows property is not sufficient in capturing the nuanced relationships in microblogging networks. Although, using foaf:Person with sioc:follows I guess does properly classify the relationship. I will amend the section above to include this thought.

    Regarding the Anatomy of a Drop section, it is a first attempt at creating a sufficiently viable Drop architecture. Although the concept of decentralized semantic microblogging is rather simple, there are numerous details that need to be worked out. One of those is how each Drop (micropost) should be encoded.

    I would love to develop this idea into a more concrete, workable platform. If the Social Semantic Web is to be fully realized, I believe something along these lines is vitally needed. I believe your SMOB platform is a fantastic start.

  6. […] found a real good article from Jeff Sayre about decentralized Semantic Microblogging. He describes a concept were you are the owner of your data and the microblogging service. I like […]

  7. Great article!

    the project is using XMPP and total decentralization of the data sources to cope with this kind of infrastructure. Some of these thoughts remind me of the scenarios where inidviduals or systems own their data and then provide a processing and access format infrastructure for others, have blogged a bit more on that under



    • Jeff Sayre says:


      Thanks for the information and article link. Your article is very interesting! As I mentioned above, I think one of the key pieces of technology that could help bring this vision to fruition is the use of a graph DB for part of the system. This is obviously a place where Neo4j could be leveraged to provide a much-need, user-centric solution.

  8. […] Since we’re all unlikely to agree on the same tool to migrate to, that’s where decentralized semantic microblogging comes in, although that’s even further away than a replacement for twitter at present. The […]

  9. What do you mean by saying StatusNet is centralized? It’s decentralized using OpenMicroBlogging and OStatus, isn’t it?

    I have my account on, but can still follow all norwegians who reside on, and they can follow me back.

    • Jeff Sayre says:

      Yes, does implement the OMB protocols and the newly-released OStatus protocol stack, but it is not decentralized per se. It gets installed on a particular server (or server cluster) and users must join that community via registering a new account on that instance. So, a given user could join many different communities. As such, even though the data is open (by virtue of its mandatory Creative Commons licensing) and shareable across the network (because of OMB and OStatus) it still adds to the data silo issue with respect to user identity.

      However, might be a good example of what I call a Community Stream Server in my article above. If users could also create their own solo instances of on their website–an instance that would only serve their posts and would not allow for other user signups–then that might be the start of the system I detail above. Of course, the Semantic Web stack is another, very important piece that needs to be added.

  10. James Tizard says:

    Jeff – A thoughtful and inspiring piece. I wrote a response with some thoughts on the mobile site of the equation…

  11. Hi Jeff,
    what a wonderful article .)
    I’m an early adopter of SMOB since the first version and now testing the new one. ( on Dagoneye on SMOB

    It’s time to return to the original concept of the Web-based Internet—an interconnected, decentralized and distributed, open and independent cacophony of individuals who control their own Webspace, operate their own communication channel, and freely communicate with others without having to worry about a central point of failure.

    Reading it, it’s natural to me thinkin on Project VRM.
    Have you ever seen some Project VRM stuff? ( it’s the natural evolution of the Cluetrain Manifesto ) – Project VRM

    There is a natural combination with the tools based on semweb technologies, and particularly talking about SMOB, and the implementation of the VRM vision.
    It’s time to make something more explicit, probably.
    And to make more connections between scenarios of usage of semweb stuff in the VRM Market vision.

    -> VRM with FOAF + OpenID
    -> VRM sketchnotes

    • Jeff Sayre says:

      Thanks for the comment and compliment.

      I am aware of Project VRM. I a firm believer in pushing the boundaries of the current thinking about the Internet’s–and the Web’s—potential and probable future.

      In order for some semblance of my vision to come to fruition, I believe that there needs to be a shift in the formulaic Web-2.0 meme. What it means to create Social Web platforms will have to be remapped to actually encompass the Web, and not myopically remain focused on the internal networks of each social networking startup.

      In my view, there is a distinct difference between creating social network platforms and Social Web platforms. The first does not preclude closed data silos. The second does not allow closed data islands.

  12. Hanns says:

    Hi Jeff, what is your opinion to the diaspora ( project on

  13. Jeff Sayre says:


    I emailed the diaspora guys with a few suggestions based on my vision above. In brief, the disapora project needs to embrace Semantic Web technologies, with a particular focus on FOAF+SSL backed WebIDs. I do hope to have some sort of a dialog with them about this in the near future.

  14. […] Sayre wrote an article called A Flock of Twitters: Decentralized Semantic Microblogging that I highly recommend for an in-depth look at the reasons for establishing an open source […]

  15. Declan says:


    I like the article and your extension of the stream methaphor. It aligns nicely with work we are doing in this area at the Centre for Next Generation Localisation (CNGL) in Ireland. I have a question though: People are obviously happy (or not ticked off enough to move) with the social contracts they have with the likes of Facebook, Twitter, Foursquare, etc. They are willing to let providers silo their data in exchange for easy to use and access applications (the daily new account rates of these providers attest to this). So my question is, can we actually make the technology you describe transparent, user friendly and free enough so that it will be attractive to the masses (without the need to install or manage stream servers or pay for third party provisions)?


  16. Gorka Julio says:

    Wow. Very interesting work. I’m wonder how can I help you in this way. I create nanoformats ( thinking in this things. Do you have any plan?

    • Jeff Sayre says:

      I’m working with a few people on implementing the foundation of this idea, but we have a ways to go.

  17. […] Übrigens: Eine sehr interessante Idee zur Lösung des Semantik-Problems präsentiert Jeff Sayre – er nennt es dezentralisiertes semantisches Microblogging. […]

  18. Martin says:

    Great post! Expect the default behavior to improve as browsers compete to provide the best experience!!

Share on Twitter
Share on Facebook
Share on FriendFeed
Share on LinkedIn
Share on StumbleUpon
Share on Digg
Share on Delicious
Share on Technorati
Add to Google Bookmarks