Web 3.0 Smartups: the Social Web and the Web of Data
By Jeff Sayre
<Smartups Series Part 2 of 5>
In the first installment of my Web 3.0 series, Powering Startups to Become Smartups, I presented a general overview of the Web’s evolving paradigm. I made the argument that today’s Web-based startups needed to step outside the current Web-2.0 box and think like a Web-3.0 company. By leveraging the power of Web 3.0, a common-place startup could transform itself into a smartup.
In this second installment, I’m going to talk about what most people think of when they hear the term Web 3.0—the Semantic Web or Web of data. In the process, I hope to correct some common misconceptions about what the Semantic Web is and what it is not.You need to think outside the factory that makes all the boxes
For those of you fluent in Semantic Web technologies, this article may seem simple. But I think it will still provide you with some useful ammunition for convincing hesitant parties to embrace the Web of Data. For those who hunger for more after reading this, I have provided a listing of additional resources at the end of this article—from general introductions to the Web of Data to detailed implementation guides.
A Tangled Web We Weave
Before I go into more detail, it is imperative that we define a few terms and concepts. The Web of Data goes by a number of names, with two competing movements marketing their vision for its implementation. Whereas I often use the term Semantic Web, I do so in the broadest sense of the term, as a synonym to the term Web of Data. From outside the technologists’ debates, I believe that the concept of a Web of Data might be the most apt description of the foundation of Web 3.0. Why is that the case?
As a keen observer of nature all of my life, and as a trained scientist, I instantly understood the broader concepts of the Web of Data (a.k.a. Linked Data, the Semantic Web). The ecological web and the Web of Data are similar in theory. The Web of Data is humanity’s meta-food web, where homogeneous participants (i.e. humans; a single species) with varying heterogenous needs, all produce and consume data in an interconnected, thriving, vibrant, and interdependent information ecosystem.
The unit of trade, the least common denominator, in Web 1.0 was the file or document. With Web 3.0, the Web of Data, the units of trade are data, the information that is contained within files or within databases. In the Web of Files, the links between documents are hyperlinks. In the Web of Data, the links, the threads that join the data, are URIs. This creates what Nova Spivack and Henry Story call hyperdata.
As I state in my article, A Flock of Twitters:
Web browsers navigate hypertext; Semantic Web applications navigate hyperdata—data that is encoded with semantic markup and interconnected to other semantically-coded data in other locations. So, whereas hypertext is text linking to other text (documents), hyperdata is data linking to other data.
Thus a key piece of Web 3.0 is the concept of the linking of data throughout and across the Web.
A final term that needs to be discussed before proceeding is the word semantics.
The Semantics of Semantics
When talking about the Semantic Web, there is a common misconception regarding the definition of “semantics” (yes, that is ironic). In a nutshell, semantic means meaning. In regards to the Semantic Web, it refers to the meaning and relationship between data.
So it is not surprising to me when I see blog posts, forum threads, or even Twitter conversations where people seem to think that the Semantic Web is simply about tagging their posts, using microformats, or adding micro or nanosyntax to their tweets. As a WordPress / BuddyPress developer, when conversations about the Semantic Web come up, it is not uncommon for fellow developers to make the same mistake.
When people talk about tagging of blog or social network content–within Twitter or Facebook, for instance–they are talking about what is traditionally referred to as folksonomies. This is different than semantically tagging the upper-level data with an underlying layer of metadata.
Tagging allows for user classification of content. This type of content can be described as metadata to be seen by people. While a powerful concept, it has its draw backs. For instance, as these two classic examples demonstrate, user-generated classification is often ambiguous.
When a user tags some content “Apple”, to what are they referring? Is it the fruit called apple, is it the company Apple, Inc., is it a tag for a picture of an apple tree, or is it someone’s nickname? There is no way to clearly determine the underlying meaning of that tag.
If a user refers to the person “Bill Gates” in a post, do they mean Bill Gates, the founder of Microsoft, Inc., or do they mean one of the many other people on Earth with the same name?
Most platforms that allow user-generated tagging do not filter tags or check for potentially redundant classifiers. For instance, it is not uncommon to see the tags “internet”, “web”, “interweb”, or even “Intertubes” used to refer to the same object. Although this redundancy might make it easier for user searches, it can lead to confusion. Of course, we all know that the Web is only part of the Internet and that the term interweb is a tongue-in-check reference to those that do not know the difference.
With regards to Web 3.0, semantic tagging (also called markup or encoding) is not the same thing as user-generated tagging. Tags, while useful, do not provide sufficient metadata. They do not indicate the relationship between the tag and the object it references. Semantic tagging generates metadata to be seen by machines, not people.
In Web 3.0, both types of tagging will continue: the Web-2.0 practice of people tagging posts, pictures, and documents for the benefit of other people; and the semantic tagging of upper-level data with an underlying layer of metadata.
But user-generated content is meant for people to see. Machines have a difficult time “seeing and understanding” the human-readable content. This results in the need for complex search algorithms just to squeeze out relatively-useful search results. Furthermore, any associations between disparate datasets almost always has to be made by a human being.
Semantic tagging, on the other hand, helps to structure the upper-level data, via an encoded metadata layer, thereby making it machine-readable, machine-processable, machine-interpretable. This makes data more easily searchable and queryable, facilitating in the autodiscovery of connections between data.
Why is semantic encoding beneficial? In the example above, proper semantic encoding would provide a clear definition of what the user was referring to when they wrote Apple or Bill Gates. The meaning would not be ambiguous. So, what disambiguates the relationship between the word and the meaning?
The Vocabulary of the Web of Data
As I discussed above, semantic tagging (marking up) of upper-level data via an encoded metadata layer creates an additional layer of data for machine consumption. But what does this mean and how does it make data that is machine-readable, machine-processable, machine-interpretable; how does it facilitate data discovery?
In the Semantic Web, data are typically marked up using a stack of W3C-specific technologies, in particular the Resource Description Framework (RDF) and the Ontology Language (OWL). RDF is a machine-processable language that represents information about data (or other resources). OWL is a set of languages that offer vocabularies (ontologies) for representing the unambiguous relationships between data.
The W3C stack provides a standardized way of encoding data without the need for a central controlling authority or proprietary software. This means that semantic markup is abstracted from a reliance on a particular database technology and users have the flexibility to expand or define new vocabularies.
Through the use of globally unique names in the form of uniform resource identifiers (URIs), RDF triples are created to represent relationships between the subject and the object. By using differing ontologies, different relationships between data can be described.
In its simplest form, an RDF triple takes the form of subject, predicate, object. Each component of the triple contains a URI. The subject and object contain URIs that locate (point to) them while the predicate usually uses a URI to describe the relationship. This relationship is defined in the classes and properties of a given ontology.
Triples are stored in a database, in various formats, in what are appropriately called triple stores. Data that are semantically encoded via RDF triples can be discovered via SPARQL—the query language for RDF. Semantic Web-powered sites expose their data to the rest of the Web via what is called a SPARQL endpoint.
So, using the example above, we would indicate with the following simple triple that we were talking about Apple, Inc. and not apple the fruit or Apple a pet pangolin:
Predicate: is a type of
Of course the above triple is not expressed in RDF form. The proper form would contain appropriate URIs for each component.
Note: It’s important to mention that there are other triple-based data models besides subject, predicate, object—the entity-attribute-value (EAV) triple, for instance, or the node-edge-node object triple in a graph database.
Relationships can be better defined and further refined by using additional ontologies (vocabularies). New ontologies can easily be created to provide a new set of classes and properties with which to describe relationships.
Here are a few, popular ontologies, some specifically important to the Social Web:
- Dublin Core
- FOAF (Friend of a Friend)
- SIOC (Semantically-Interlinked Online Communities)
- SKOS (Simple Knowledge Organization System)
- GoodRelations (Web Ontology for E-Commerce)
You might have heard about RDFa and may be wondering about the differences between RDF and RDFa. As this is not an in-depth, technical discussion of Semantic Web technologies, I’ve glossed over much of the specifics. In brief, RDF (which comes in various serializations) is for machine consumption only whereas RDFa allows machine-readable data to be combined with human-readable data via the HTML format.
The Social Web: An Emergent Property of the Web of Data
When data are semantically encoded with the proper technologies for their discovery put in place, they become exposed to the rest of the Web. This opens up the Web creating a true information ecosystem.
Web 3.0 is thus about creating a Web of Data that is interconnected and open, whereas Web 2.0 is about creating network services that attract users to store more data on the Web but keep that data cloistered in closed silos. Usability, discoverability, portability, and user-control are an after thought (and usually a not-at-all thought) to Web-2.0 boxes. To Web-3.0 smartups, these issues are integral to their service.
I’ve already discussed at length the differences between Social Networks and the Social Web. I will not rehash those details here. Suffice it to say that there is a big difference between these two concepts and their underpinning and differentiating technologies.
A keen smartup is a lean startup that wisely embraces the Social Web and the Web 3.0 paradigm.
The Social Web is an emergent property of the Web of Data. It is a logical outcome of the Web’s increasing social connectivity and the semantification of data to make it machine understandable, discoverable, and open.
Users are growing tired of having to reenter pieces of their social graph into each new Web-2.0-style social network that comes along. They’re also beginning to realize that they have few tools to effectively manage their identity across the Web. Those smartups that innovate with these user concerns in mind will profit the most from the Web-3.0 paradigm.
When startups think like smartups, they design their data architecture and utilize a data infrastructure that allows for the opening of their data to the rest of the Web. They focus on providing users a mechanism for controlling and managing their IdentitySpace via WebIDs and Web-based access control lists—both of these made possible in part by Semantic Web technologies.
As more datasets become semantically linked and open to machine autodiscovery, a critical mass will build, resulting in a true Web of Data. The ultimate actualization of this concept is sometimes referred to as the Giant Global Graph. This is the stage where a Global Meta-Database Management System (GMDBMS) emerges, where data stored in disparate locations can be globally queried and integrated into a federated database.
Now that you’re smartening up to the benefits and power of the Web of Data, the next step to explore is the Web 3.0 dataspace. This Friday, we venture into the technological challenges of data storage and management in the Social Web.
Semantic Web Resources
Since it is difficult to succinctly and accurately describe the Semantic Web in layman’s terms, I encourage you to read other sources and become well versed in the Semantic Web—its concepts, underlying technologies, and the ways in which your smartup can participate.
Here are a few additional resources that will help you become better versed in Web 3.0 and Semantic Web issues.
- The Social Semantic Web
- Pull: The Power of the Semantic Web to Transform Your Business
- Programming the Semantic Web
- Highly recommend this introduction to the Web of Data by Kate Ray: film – web 3.0
- An excellent, animated explanation. The Future Internet Video: Service Web 3.0
- Ted Talk: Tim Berners-Lee on the next Web
- Making Sense of the Semantic Web
- RDFa Basics
- Looking ahead to Linked Data on the Web
- W3C Semantic Web FAQ
- What is the Semantic Web really all about?
- Understanding Linked Data via EAV Model based Structured Descriptions
</Smartups Series Part 2 of 5>
Continue on to Part 3—Web 3.0 Smartups: Moving Beyond the Relational Database
Other Smartup Series Installments