Follow Jeff Sayre on Twitter

Subverting the Open Web: Schema.org’s Scheme to Control Structured Data

By

When the initial news about Schema.org hit the Twitterverse two weeks ago, a few people asked for my opinion. Being the responsive, diligent, social-media maven that I am–who has close to zero free nanoseconds–I took a pathetically-cursory look at Google’s announcement and at the Schema.org website and quickly tweeted back this less-than-thoughtful response.

Over the next few hours it became clear that some people in the Semantic Web and Open Web Standards world had some initial misgivings about the Schema.org initiative. Although what I had tweeted was accurate–I am a big proponent of structured data on the Web and I believe efforts to make it more mainstream are necessary and historically have usually been worthwhile–I obviously had not done my homework and perhaps had replied too hastily.

Since much has already be said and written about why Schema.org is or is not good for the Web I will not be rehashing those debates (although I have linked to a few resources below that are in line with my views). Instead, I want to focus on the higher-level concepts of the Open Web and Open Web Standards.

Open Web and Open Web Standards

First, it is important to see the differences between these two concepts. To help understand the differences, let’s look at an example. The WordPress Blogging platform is an Open Source project. As such it can squarely be placed in the Open Web camp. Its codebase is freely available for anyone to see, utilize, adapt, and expand upon. The project is also open and very supportive of new people who wish to pitch in and help evolve the platform.

However, neither the WordPress project nor its core codebase can be classified as fitting into the Open Web Standards community. Whereas WordPress utilizes a number of Open Web Standards in its products, the project itself does not create standards for the Web.

Who then creates standards? On the Web, standards are not promulgated via for-profit corporations (i.e., Google, Microsoft, and Yahoo!). They are also not promulgated by open source projects. Instead they are promulgated through standards bodies, like the World Wide Web Consortium (W3C). Whereas it is true that much of the W3C member base is comprised of people who are representing corporations, there is a big difference between representatives from corporations participating on standards committees and corporations getting together to push their own set of standards.

In my view, this latter point is the big issue with Schema.org. It is an attempt by three large companies, who each have significant influence on the Web, to promulgate their joint vision of how structured data on the Web should be modeled. It is not an effort by a recognized standards body whose focus is clearly to further the Open Web.

How Open is Schema.org?

Just by reading the linked-to announcement by Google in the first paragraph of this article, it is clear that the joint venture has decided to bypass much of the standards work hashed out by various Semantic Web working groups and that they are making a move to control how machine-readable data is structured—at least within their search-engine world. In particular, this paragraph makes a strong statement:

That’s why we’ve come together with other search engines to support a common set of schemas…With schema.org, site owners can improve how their sites appear in search results not only on Google, but on Bing, Yahoo! and potentially other search engines as well in the future

Further on down Google’s announcement, you’ll find this point that should raise concern:

One caveat to watch out for: while it’s OK to use the new schema.org markup or continue to use existing microformats or RDFa markup, you should avoid mixing the formats together on the same web page, as this can confuse our parsers

How should one interpret this language? To me, it sounds like this is what they are saying:

This is how Google, Bing, and Yahoo! prefer and recommend structured data be represented on the Web. In fact, if you use Schema.org’s structured-data format, your content will get preferential treatment (or at least experience unique benefits) in our search results. However, do not mix markup formats as it will confuse our parsers and hurt your page rankings in our search-engine algorithms. It is best if you simply stick with Schema.org’s markup format.

With the power and influence these three search behemoths exert on the success of Web properties, webmasters, designers, and developers might be foolish to ignore this new initiative. More importantly, if they choose to ignore schema.org, for instance by using some other format with which to represent machine-readable data, it could be to the detriment of their clients and projects.

Note: If you read the official documentation on the Schema.org site and the Terms and Conditions section, it becomes clear why observant developers have additional concerns with the trios power play.

By creating a completely new set of markup types, they are in essence subverting the Open Web as only those people associated with Schema.org (select employees of Google, Microsoft, and Yahoo!) have access to alter and add new data model types. Thus, although they claim that their goal is to “continue making the open web richer and more useful,” the Schema.org’s schema is not truly open. How does pushing a currently-closed data schema support the Open Web?

The Best Way to Support the Open Web is by Adopting Existing Standards

With the launch of Schema.org, Google, Microsoft, and Yahoo! are thumbing their collective noses at not only the Open Web but also the Open Web Standards community. There have been literally tens of thousands of hours over the past decade volunteered by hundreds of people across the globe to develop a myriad of Open Web Standards—standards that have helped Google, Microsoft, and Yahoo! become successful at what they do. With respect to this issue, prodigious efforts and solid progress have been made at developing Open Web Standards for structured data.

If Google, Microsoft, and Yahoo! truly wish to make the “open web richer and more useful,” they should adopt, support, promote, and help evolve existing Open Web Standards for the representation of machine-readable data. That would be in keeping with the true spirit of the Open Web.

Outside Resources

The False Choice of Schema.org

TummelVision 67: Tantek Çelik explains open web standards for poets

Article Comments

  1. Hugo - hnla says:

    there is a big difference between representatives from corporations participating on standards committees and corporations getting together to push their own set of standards.

    While that may be true, historically the W3C working groups have been crowded out by the large organisations, and in many respects they occupy seats on those groups to ensure that they can bring as much influence to bear on what is eventually ratified, or , and as was regarded by many, to run interference where things might not be running the way they liked to wit MS on the CSS3 group where they were largely responsible for holding back CSS for years.

    Also, and playing devils advocate here, to some degree Standards are initially envisioned and promoted by large organisations, the W3C doesn’t create standards it provides the forum where the Standard may be debated and agreed on, and it falls to organisations to have the initial ideas.

    However personally I have always harangued anyone that would listen about the danger of the monolithic corporations on the net such as Google and their real intentions and dismay at the manner in which they are slavishly followed by many.

    At the end of the day it perhaps will fall to developers to not use these new data model types and try and send a message that these Standards must remain open and not to be subverted by the likes of Google and Microsoft they belong to the W3C and working groups to debate and ratify, this all brings to mind the situation with HTML5 and the odd notion that you have two competing? bodies working on it yet one leader but that’s another story.

    • Jeff Sayre says:

      What you state is true. The W3C membership is primarily comprised of member companies—corporations that pay rather high fees to enable select employees to participate in the various activities of the organization. Corporations see value in their membership as it allows them to affect change in current standards and model developing standards to suit their for-profit agendas.

      In fact, without being “sponsored” by a corporation, there is no way for an individual to join the W3C. An individual “joins” via being accepted as an invited expert. This is the capacity in which I serve on two W3C Incubator Groups. But even as an invited expert, I technically am not a W3C member.

      As with any body composed of representatives from outside organizations, each with their own ulterior motives, the standards process of a W3C group is riddled with politics. Even so, the process usually results in a more impartial outcome as a given group’s activity is truly open to all. Outside guests can attend any group’s discussions via IRC, offering their input and even contributing to working drafts.

      Although Web standards-making bodies are far from perfect, they are the closest entity we have to offering open discussions. This is in stark contrast to what Google, Bing, and Yahoo! have done with setting up Schema.org. Their process is not open and they cannot be considered a Web standard’s body.

  2. […] June 15 – Jeff Sayre – Subverting the Open Web: Schema.org’s Scheme to Control Structured Data […]

Leave a Reply

Share on Twitter
Share on Facebook
Share on FriendFeed
Share on LinkedIn
Share on StumbleUpon
Share on Digg
Share on Delicious
Share on Technorati
Add to Google Bookmarks

Archives