Subverting the Open Web: Schema.org’s Scheme to Control Structured Data
By Jeff Sayre
When the initial news about Schema.org hit the Twitterverse two weeks ago, a few people asked for my opinion. Being the responsive, diligent, social-media maven that I am–who has close to zero free nanoseconds–I took a pathetically-cursory look at Google’s announcement and at the Schema.org website and quickly tweeted back this less-than-thoughtful response.
Over the next few hours it became clear that some people in the Semantic Web and Open Web Standards world had some initial misgivings about the Schema.org initiative. Although what I had tweeted was accurate–I am a big proponent of structured data on the Web and I believe efforts to make it more mainstream are necessary and historically have usually been worthwhile–I obviously had not done my homework and perhaps had replied too hastily.
Since much has already be said and written about why Schema.org is or is not good for the Web I will not be rehashing those debates (although I have linked to a few resources below that are in line with my views). Instead, I want to focus on the higher-level concepts of the Open Web and Open Web Standards.
Open Web and Open Web Standards
First, it is important to see the differences between these two concepts. To help understand the differences, let’s look at an example. The WordPress Blogging platform is an Open Source project. As such it can squarely be placed in the Open Web camp. Its codebase is freely available for anyone to see, utilize, adapt, and expand upon. The project is also open and very supportive of new people who wish to pitch in and help evolve the platform.
However, neither the WordPress project nor its core codebase can be classified as fitting into the Open Web Standards community. Whereas WordPress utilizes a number of Open Web Standards in its products, the project itself does not create standards for the Web.
Who then creates standards? On the Web, standards are not promulgated via for-profit corporations (i.e., Google, Microsoft, and Yahoo!). They are also not promulgated by open source projects. Instead they are promulgated through standards bodies, like the World Wide Web Consortium (W3C). Whereas it is true that much of the W3C member base is comprised of people who are representing corporations, there is a big difference between representatives from corporations participating on standards committees and corporations getting together to push their own set of standards.
In my view, this latter point is the big issue with Schema.org. It is an attempt by three large companies, who each have significant influence on the Web, to promulgate their joint vision of how structured data on the Web should be modeled. It is not an effort by a recognized standards body whose focus is clearly to further the Open Web.
How Open is Schema.org?
Just by reading the linked-to announcement by Google in the first paragraph of this article, it is clear that the joint venture has decided to bypass much of the standards work hashed out by various Semantic Web working groups and that they are making a move to control how machine-readable data is structured—at least within their search-engine world. In particular, this paragraph makes a strong statement:
That’s why we’ve come together with other search engines to support a common set of schemas…With schema.org, site owners can improve how their sites appear in search results not only on Google, but on Bing, Yahoo! and potentially other search engines as well in the future
Further on down Google’s announcement, you’ll find this point that should raise concern:
One caveat to watch out for: while it’s OK to use the new schema.org markup or continue to use existing microformats or RDFa markup, you should avoid mixing the formats together on the same web page, as this can confuse our parsers
How should one interpret this language? To me, it sounds like this is what they are saying:
This is how Google, Bing, and Yahoo! prefer and recommend structured data be represented on the Web. In fact, if you use Schema.org’s structured-data format, your content will get preferential treatment (or at least experience unique benefits) in our search results. However, do not mix markup formats as it will confuse our parsers and hurt your page rankings in our search-engine algorithms. It is best if you simply stick with Schema.org’s markup format.
With the power and influence these three search behemoths exert on the success of Web properties, webmasters, designers, and developers might be foolish to ignore this new initiative. More importantly, if they choose to ignore schema.org, for instance by using some other format with which to represent machine-readable data, it could be to the detriment of their clients and projects.
Note: If you read the official documentation on the Schema.org site and the Terms and Conditions section, it becomes clear why observant developers have additional concerns with the trios power play.
By creating a completely new set of markup types, they are in essence subverting the Open Web as only those people associated with Schema.org (select employees of Google, Microsoft, and Yahoo!) have access to alter and add new data model types. Thus, although they claim that their goal is to “continue making the open web richer and more useful,” the Schema.org’s schema is not truly open. How does pushing a currently-closed data schema support the Open Web?
The Best Way to Support the Open Web is by Adopting Existing Standards
With the launch of Schema.org, Google, Microsoft, and Yahoo! are thumbing their collective noses at not only the Open Web but also the Open Web Standards community. There have been literally tens of thousands of hours over the past decade volunteered by hundreds of people across the globe to develop a myriad of Open Web Standards—standards that have helped Google, Microsoft, and Yahoo! become successful at what they do. With respect to this issue, prodigious efforts and solid progress have been made at developing Open Web Standards for structured data.
If Google, Microsoft, and Yahoo! truly wish to make the “open web richer and more useful,” they should adopt, support, promote, and help evolve existing Open Web Standards for the representation of machine-readable data. That would be in keeping with the true spirit of the Open Web.