Follow Jeff Sayre on Twitter

Web 3.0 Smartups: the Social Web and the Web of Data

By

<Smartups Series Part 2 of 5>

In the first installment of my Web 3.0 series, Powering Startups to Become Smartups, I presented a general overview of the Web’s evolving paradigm. I made the argument that today’s Web-based startups needed to step outside the current Web-2.0 box and think like a Web-3.0 company. By leveraging the power of Web 3.0, a common-place startup could transform itself into a smartup.

In this second installment, I’m going to talk about what most people think of when they hear the term Web 3.0—the Semantic Web or Web of data. In the process, I hope to correct some common misconceptions about what the Semantic Web is and what it is not.You need to think outside the factory that makes all the boxes

For those of you fluent in Semantic Web technologies, this article may seem simple. But I think it will still provide you with some useful ammunition for convincing hesitant parties to embrace the Web of Data. For those who hunger for more after reading this, I have provided a listing of additional resources at the end of this article—from general introductions to the Web of Data to detailed implementation guides.

A Tangled Web We Weave

Before I go into more detail, it is imperative that we define a few terms and concepts. The Web of Data goes by a number of names, with two competing movements marketing their vision for its implementation. Whereas I often use the term Semantic Web, I do so in the broadest sense of the term, as a synonym to the term Web of Data. From outside the technologists’ debates, I believe that the concept of a Web of Data might be the most apt description of the foundation of Web 3.0. Why is that the case?

As a keen observer of nature all of my life, and as a trained scientist, I instantly understood the broader concepts of the Web of Data (a.k.a. Linked Data, the Semantic Web). The ecological web and the Web of Data are similar in theory. The Web of Data is humanity’s meta-food web, where homogeneous participants (i.e. humans; a single species) with varying heterogenous needs, all produce and consume data in an interconnected, thriving, vibrant, and interdependent information ecosystem.

The unit of trade, the least common denominator, in Web 1.0 was the file or document. With Web 3.0, the Web of Data, the units of trade are data, the information that is contained within files or within databases. In the Web of Files, the links between documents are hyperlinks. In the Web of Data, the links, the threads that join the data, are URIs. This creates what Nova Spivack and Henry Story call hyperdata.

As I state in my article, A Flock of Twitters:

Web browsers navigate hypertext; Semantic Web applications navigate hyperdata—data that is encoded with semantic markup and interconnected to other semantically-coded data in other locations. So, whereas hypertext is text linking to other text (documents), hyperdata is data linking to other data.

Thus a key piece of Web 3.0 is the concept of the linking of data throughout and across the Web.

A final term that needs to be discussed before proceeding is the word semantics.

The Semantics of Semantics

When talking about the Semantic Web, there is a common misconception regarding the definition of “semantics” (yes, that is ironic). In a nutshell, semantic means meaning. In regards to the Semantic Web, it refers to the meaning and relationship between data.

So it is not surprising to me when I see blog posts, forum threads, or even Twitter conversations where people seem to think that the Semantic Web is simply about tagging their posts, using microformats, or adding micro or nanosyntax to their tweets. As a WordPress / BuddyPress developer, when conversations about the Semantic Web come up, it is not uncommon for fellow developers to make the same mistake.

There is another common misconception having to do with the word semantics as its applied to the Web. When we speak of semantic technologies and Semantic Web technologies, we are referring to two different sets of technologies with different purposes. When speaking of Semantic Web technologies, we are not referring to AI (artificial intelligence) functions for natural language processing. Whereas some semantic technologies can be complimentary, Semantic Web technologies deal with modeling relationships between data to help machines understand and discover connections.

When people talk about tagging of blog or social network content–within Twitter or Facebook, for instance–they are talking about what is traditionally referred to as folksonomies. This is different than semantically tagging the upper-level data with an underlying layer of metadata.

Tagging allows for user classification of content. This type of content can be described as metadata to be seen by people. While a powerful concept, it has its draw backs. For instance, as these two classic examples demonstrate, user-generated classification is often ambiguous.

When a user tags some content “Apple”, to what are they referring? Is it the fruit called apple, is it the company Apple, Inc., is it a tag for a picture of an apple tree, or is it someone’s nickname? There is no way to clearly determine the underlying meaning of that tag.

If a user refers to the person “Bill Gates” in a post, do they mean Bill Gates, the founder of Microsoft, Inc., or do they mean one of the many other people on Earth with the same name?

Most platforms that allow user-generated tagging do not filter tags or check for potentially redundant classifiers. For instance, it is not uncommon to see the tags “internet”, “web”, “interweb”, or even “Intertubes” used to refer to the same object. Although this redundancy might make it easier for user searches, it can lead to confusion. Of course, we all know that the Web is only part of the Internet and that the term interweb is a tongue-in-check reference to those that do not know the difference.

With regards to Web 3.0, semantic tagging (also called markup or encoding) is not the same thing as user-generated tagging. Tags, while useful, do not provide sufficient metadata. They do not indicate the relationship between the tag and the object it references. Semantic tagging generates metadata to be seen by machines, not people.

In Web 3.0, both types of tagging will continue: the Web-2.0 practice of people tagging posts, pictures, and documents for the benefit of other people; and the semantic tagging of upper-level data with an underlying layer of metadata.

Microformats: I will not go into the debate about how microformats can usher in the Web of Linked Data—suffice it to say I believe it cannot. It is important to know that what the Microformats community calls POSH (Plain Old Semantic HTML) is not the same thing as semantic markup which uses RDF classes and properties to model relationships between data. In short, microformats are primarily for people, not machines. They do not facilitate machine understanding. In order for the Web of Data to self-organize, the interconnections must be autodiscoverable, not pieced together be people. This is a prime example of the difference between hyperlinks and hyperdata.

But user-generated content is meant for people to see. Machines have a difficult time “seeing and understanding” the human-readable content. This results in the need for complex search algorithms just to squeeze out relatively-useful search results. Furthermore, any associations between disparate datasets almost always has to be made by a human being.

Semantic tagging, on the other hand, helps to structure the upper-level data, via an encoded metadata layer, thereby making it machine-readable, machine-processable, machine-interpretable. This makes data more easily searchable and queryable, facilitating in the autodiscovery of connections between data.

Why is semantic encoding beneficial? In the example above, proper semantic encoding would provide a clear definition of what the user was referring to when they wrote Apple or Bill Gates. The meaning would not be ambiguous. So, what disambiguates the relationship between the word and the meaning?

The Vocabulary of the Web of Data

As I discussed above, semantic tagging (marking up) of upper-level data via an encoded metadata layer creates an additional layer of data for machine consumption. But what does this mean and how does it make data that is machine-readable, machine-processable, machine-interpretable; how does it facilitate data discovery?

In the Semantic Web, data are typically marked up using a stack of W3C-specific technologies, in particular the Resource Description Framework (RDF) and the Ontology Language (OWL). RDF is a machine-processable language that represents information about data (or other resources). OWL is a set of languages that offer vocabularies (ontologies) for representing the unambiguous relationships between data.

The W3C stack provides a standardized way of encoding data without the need for a central controlling authority or proprietary software. This means that semantic markup is abstracted from a reliance on a particular database technology and users have the flexibility to expand or define new vocabularies.

Through the use of globally unique names in the form of uniform resource identifiers (URIs), RDF triples are created to represent relationships between the subject and the object. By using differing ontologies, different relationships between data can be described.

In its simplest form, an RDF triple takes the form of subject, predicate, object. Each component of the triple contains a URI. The subject and object contain URIs that locate (point to) them while the predicate usually uses a URI to describe the relationship. This relationship is defined in the classes and properties of a given ontology.

Triples are stored in a database, in various formats, in what are appropriately called triple stores. Data that are semantically encoded via RDF triples can be discovered via SPARQL—the query language for RDF. Semantic Web-powered sites expose their data to the rest of the Web via what is called a SPARQL endpoint.

So, using the example above, we would indicate with the following simple triple that we were talking about Apple, Inc. and not apple the fruit or Apple a pet pangolin:

Subject: Apple

Predicate: is a type of

Object: Company

Of course the above triple is not expressed in RDF form. The proper form would contain appropriate URIs for each component.

Note: It’s important to mention that there are other triple-based data models besides subject, predicate, object—the entity-attribute-value (EAV) triple, for instance, or the node-edge-node object triple in a graph database.

Relationships can be better defined and further refined by using additional ontologies (vocabularies). New ontologies can easily be created to provide a new set of classes and properties with which to describe relationships.

Here are a few, popular ontologies, some specifically important to the Social Web:

You might have heard about RDFa and may be wondering about the differences between RDF and RDFa. As this is not an in-depth, technical discussion of Semantic Web technologies, I’ve glossed over much of the specifics. In brief, RDF (which comes in various serializations) is for machine consumption only whereas RDFa allows machine-readable data to be combined with human-readable data via the HTML format.

The Social Web: An Emergent Property of the Web of Data

When data are semantically encoded with the proper technologies for their discovery put in place, they become exposed to the rest of the Web. This opens up the Web creating a true information ecosystem.

Web 3.0 is thus about creating a Web of Data that is interconnected and open, whereas Web 2.0 is about creating network services that attract users to store more data on the Web but keep that data cloistered in closed silos. Usability, discoverability, portability, and user-control are an after thought (and usually a not-at-all thought) to Web-2.0 boxes. To Web-3.0 smartups, these issues are integral to their service.

I’ve already discussed at length the differences between Social Networks and the Social Web. I will not rehash those details here. Suffice it to say that there is a big difference between these two concepts and their underpinning and differentiating technologies.

A keen smartup is a lean startup that wisely embraces the Social Web and the Web 3.0 paradigm.

The Social Web is an emergent property of the Web of Data. It is a logical outcome of the Web’s increasing social connectivity and the semantification of data to make it machine understandable, discoverable, and open.

Users are growing tired of having to reenter pieces of their social graph into each new Web-2.0-style social network that comes along. They’re also beginning to realize that they have few tools to effectively manage their identity across the Web. Those smartups that innovate with these user concerns in mind will profit the most from the Web-3.0 paradigm.

When startups think like smartups, they design their data architecture and utilize a data infrastructure that allows for the opening of their data to the rest of the Web. They focus on providing users a mechanism for controlling and managing their IdentitySpace via WebIDs and Web-based access control lists—both of these made possible in part by Semantic Web technologies.

As more datasets become semantically linked and open to machine autodiscovery, a critical mass will build, resulting in a true Web of Data. The ultimate actualization of this concept is sometimes referred to as the Giant Global Graph. This is the stage where a Global Meta-Database Management System (GMDBMS) emerges, where data stored in disparate locations can be globally queried and integrated into a federated database.

Now that you’re smartening up to the benefits and power of the Web of Data, the next step to explore is the Web 3.0 dataspace. This Friday, we venture into the technological challenges of data storage and management in the Social Web.

Semantic Web Resources

Since it is difficult to succinctly and accurately describe the Semantic Web in layman’s terms, I encourage you to read other sources and become well versed in the Semantic Web—its concepts, underlying technologies, and the ways in which your smartup can participate.

Here are a few additional resources that will help you become better versed in Web 3.0 and Semantic Web issues.

Books

Videos

Web Articles

_________________________________________________
</Smartups Series Part 2 of 5>

Continue on to Part 3—Web 3.0 Smartups: Moving Beyond the Relational Database

Other Smartup Series Installments

Part 1 — Web 3.0: Powering Startups to Become Smartups

Part 3 — Web 3.0 Smartups: Moving Beyond the Relational Database

Part 4 — Web 3.0 Smartups: the New Web Business Space

Part 5 — Building the Social Web: the Layers of the Smartup Stack

Article Comments

  1. zazi says:

    “RDF is for XML” – RDF/XML is just one serialisation of RDF graphs as RDFa is another one, or?

    • Jeff Sayre says:

      Thanks for the suggestion. Yes, that is another, perhaps more accurate way of describing the differences.

      As I mentioned, this is not a technical discuss of the SemWeb technologies. So, I’ve obviously left out great detail, choosing to present just enough to help those that are not fluent to begin building a knowledge base.

      There could be a whole series about RDF, its serializations, and the many ways in which it can be used to describe, unambiguously, the relations between data.

  2. iricelino says:

    Dear Jeff,
    Congratulations! I’ve been working with Semantic Web technologies for the last 8 years and this post is by far the _clearest_ attempt to explain the Semantic Web to non-expert people.

    I’m going to send the link to this page several times, every time somebody asks me what the Semantic Web is or every time somebody misunderstands what can be considered (or not) part of the Semantic Web.

    I have just two notes about the content of your post.

    (1) About RDF/RDFa you say: “In brief, RDF is for XML and RDFa is an HTML-friendly format.”.
    I understand that you were trying to say it briefly, but I believe this wording can be misleading, since most people mistake the RDF data model with the RDF/XML syntax.
    I’d suggest something like: “RDF is for machines while RDFa is to combine HTML human-readable pages with some machine-readable (RDF) data.”

    (2) In the section “The Semantics of Semantics” you don’t cite another very common misunderstanding: Semantic Web is different from Semantic Technologies. Most people think that the Semantic Web is about using natural language processing on the Web (or even on documents which are not on the Web) and they don’t understand that (most) Semantic Technologies are usually complementary to the Semantic Web. I believe that this clarification could be worth a couple of lines.

    Just my 2 cents,
    Irene

    • Jeff Sayre says:

      Irene –

      Thank you for the compliment and taking the time to share your suggestions. I appreciate your insights and feedback, as distilling down the Semantic Web into clear and easy-to-understand language is quite difficult.

      I will alter the RDF/RDFa text per your comment and added a brief section on semantic technologies.

  3. Great post. It certainly must’ve taken you quite a while to write.
    I love “The Semantic of Semantics” expression :)

    Regards
    – Analyn

    • Jeff Sayre says:

      Analyn –

      Thanks! Yes, I like that expression as well. It is amusing and odd that the definition of semantics is misunderstood by some.

  4. Warren says:

    Great article for those without deep knowledge in the world of semantics. I’ll look forward to the next section, as we just launched InfiniteGraph (graph database). I guess you’ll get into that area some eh? I’ve spoke to a number of people in the semantic application space and it seems like they’re starting to see users whose systems can get quite large, but the triple-store databases struggle with with scale (I don’t know why truthfully). We’re just a generic graph db, so I don’t get into these conversations too deeply, but it’s a place we’d like to go and I’m sure you’re next article will be very helpful. Thanks!

  5. Sareh says:

    Dear Jeff,
    I’ve started studying about web of data since a few months ago,I’m interested in Link Discovery especially SameAs link,could you explain more about this field or suggest some good resources?
    what are the current algorithms for discovering same entities in different datasets?
    for example there are 10,000 owl:sameas link between dbpedia and new york times dataset!
    I want to know with which algorithm they were discovered?
    surely it must be a automated record linkage algorithm!
    I like to know more about it :)
    Thanks alot

Leave a Reply

Share on Twitter
Share on Facebook
Share on FriendFeed
Share on LinkedIn
Share on StumbleUpon
Share on Digg
Share on Delicious
Share on Technorati
Add to Google Bookmarks

Archives