The future of microformats and semantic technologies
By Alfie • Jan 12th, 2009 • Category: Features
discipline+technology ÷ current state = reality
Google indexes over 1 trillion unique URLs (http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html), that’s a lot of text! But what does that text tell us? To a human, we can read the information with ease, but for a machine, it is near impossible to actually understand the text.
If we look at the following statement:
Fruit flies like bananas.
You probably see nothing wrong with it, but a computer can’t make heads or tails of what it says. A computer might parse this as “Fruit” the noun, “flies” the verb, just “like” the fruit “bananas” seem to fly. While a computer is good at tasks like spell checking, it hasn’t manage to be able to understand the meaning behind the text.
This is where microformats (http://microformats.org/) step-in by adding an additional layer of semantics into your HTML. By using existing HTML constructs and existing, established formats; very common idioms are encoded. Microformats is one of the easiest ways to learn how to paint on additional meaning so that computers can begin to understand your text.
Sites all over the web have embraced microformats and have encoded their HTML with more meaning. This has several interesting side-effects. Firstly, by giving more meaning to the HTML, it is conceivable that as companies such as Google, Yahoo!, Technorati and others spidering the web can not only index the text, but index the information behind it. This could allow for even smarter customized searches.
Two of the more popular microformats are hCard (http://microformats.org/wiki/hcard) and hCalendar (http://microformats.org/wiki/hcalendar). hCard is modeled on the vCard specification and is used to describe people, place and organizations. It is the most common way to share contact details between address books. hCalendar is modeled on the iCalendar format, which is used to represent events and is the most common interchange format between calendar applications.
With an agreed upon format for describing people in HTML you could save time filling out forms. Instead of constantly re-typing your name and address, you could simply enter a URL and the service would fetch the page, parse it for your name and information and populate the form automatically.
Events work in a similar fashion. As you mark-up your HTML with event data and services crawl the web, instead of using a single service like upcoming.org or meetup.com to find local events, it becomes possible to use search engines. Enter your postal code, some search terms and you get back all the related information in your area.
This is all part of the bigger picture of moving from a web of individual websites that connect you to back-end, secret, proprietary databases, to an inversion were the web itself is the database. As Tom Coates argues, it becomes a web of data (Link?). Sites that encode the HTML using microformats could become data sinks for other applications.
discipline+technology+speculation/reflection ÷ likely hurdles=future 1 (body)
Alan Turing described the famous Turing Test in a 1950s paper on the topic of Computing Machinery and Intelligence and in the last 60 years not a single computer program has been able to fully pass itself off as human and parse written conversational language. With the explosion of blogs, the daily minutia of the world is online – what hope is there for a computer to be able to parse those trillions of pages and understand what is truly being said?
This is where microformats come-in to help boot-strap some of these issues. As the field of AI continues to improve, we still don’t have a Natural Language Parser as effective as a human brain. If microformats and others can help fill the gaps, then the machine can begin to understand what it is parsing and return better results.
The future of microformats is bright, by making it simple to encode your data, there is no reason not too. Tackling very common facets of the web, such as; people, places and events, microformats have helped to break the chicken and the egg issue. “Why should I mark-up my data if no one else is?” or “I’m not going to mark-up my data if there are no tools to extract it”.
Luckily the menagerie of tools is copious and being extended everyday. Firefox has the Operator toolbar (http://labs.mozilla.com/2006/12/introducing-operator/) which can detect and act on any information found in the page. It is called Operator because the paradigm is that of the old school telephone operators. They would patch one call to another, constantly switching physical cables to complete the connection. This plug-in does the same sort of thing, moving data from the HTML to another service. One click and you can extract an Organization’s business card and import it to your address book, or you can plot the address onto a map. Acting like an operator, the browser begins to re-route the bits allowing you to re-mix and mash them-up easily. Microsoft have also shown interest and are developing some tools to be integrated into IE8 as well as releasing Oomph (http://visitmix.com/Lab/Oomph).
The most interesting applications for microformats come on small screen and non-traditional computing devices. On a mobile phone, re-keying some one’s contact information is not fun and is prone to data-entry errors. With one-click, copying information into the address book or adding an event to your calendar is a snap. We are beginning to see some foray into this arena with Mosembro (http://lexandera.com/mosembro/), a browser for the Android Platform capable of detecting and extracting microformats natively.
discipline+technology+speculation/reflection ÷ ideal situation=future 2 (extro)
The best part about adding microformats into your data is not all the reasons you can think of, but all the reasons you can’t. If you give meaning to your text, then others will use it in ways you never expected, creating and adding more value to the data you produce. This is really where the future is, sharing of knowledge and building tools to make the world a better place through the open exchange of information.
Microformats are boot-strapping these capabilities to allow for this open movement of data. O’Reilly publishing has been a big proponent of the idea: “Create more value than you capture” and microformats allow you to do this. You are creating value in your data by giving it meaning, as well as creating value when you add more meaning to other people’s data. When you add tags to pictures on flickr, when you comment on people’s blogs, when you bookmark a link in del.icio.us you are continually adding value to the site and to the web. Others can act on that information in new ways that you never predicted.
future1 ÷ future 2 + action pathways= activities (action plan)
Microformats are not an all or nothing scenario, in the HTML you just mark-up the text you are comfortable with. Microformats are not hidden away in some database or some other flat-file on the server that you forget about – they are in plain view through a browser window. Things that are out-of-sight are also out-of-mind and microformats attempt to keep the data fresh by keeping it as transparent as possible.
On my own website, there are plenty of instances of the string “Brian Suda”. To a computer that is just a bunch of characters, but a human will recognize that string as a name, which represents a person. With the microformat hCard it is possible make that jumble of letters explicitly a person.
<div><a href=”http://suda.co.uk”>Brian Suda</a></div>
Firstly, we need to add some values into the class attribute to define some types. We need to add a class=”vcard” to the outer <div>. This means that only the data inside the <div> should be examined for further semantic information about the hCard.
<div class=”vcard”><a href=”http://suda.co.uk”>Brian Suda</a></div>
Then we need to add a class attribute to the <a> element and add a value of “fn” which stands for “formatted name”. By declaring this, it embiggens the string from just characters to the name of a person.
<div class=”vcard”><a class=”fn” href=”http://suda.co.uk”>Brian Suda</a></div>
That is the most basic hCard you can create. The jumble of characters has now been converted  into a person object that search engines and others can understand and dutifully act on.
If you have your own website, take a moment and mark-up your name in a similar fashion and create your own hCard. If you want to learn more and mark-up additional text besides an FN, you can learn more on the hCard specifications page (http://microformats.org/wiki/hcard).
Sites like GetSatisfaction.com make use of microformats when you sign-up (http://getsatisfaction.com/people/new). Instead of having to re-type your information, you point to the URL of your hCard and the system will use the data found on that page to begin the account creation process.
Microformats are good for other things than saving you a few minutes re-typing your name. Dopplr and other sites are using hCards to help you import your friend list. Instead of trying to re-find all your contacts over and over again with every new social network you join, you can instead point Dopplr to an existing URL with hCards. It will parse those values and search its internal database to find and recommend matches. Much of this and other code has been open-sourced.
As you develop or re-develop web applications, this is a feature that you can easily build-in to your system to begin to access the web of data and use it to your advantage and streamline processes for your customers.
We are representing more and more of our lives online. We are known by our usernames and our websites. One of the goals we all should strive too is have an online representation of ourselves that is semantic enough to be read as easily by both machines and humans. Microformats are the first step to this. Let’s all begin to invert those proprietary databases and create an open web of data.
—————–
Brian Suda is a master informatician working to make the web a better
place little by little everyday. Since discovering the Internet in the
mid-90s, Brian Suda has spent a good portion of each day connected to
it. His own little patch of Internet is http://suda.co.uk, where many
of his past projects and crazy ideas can be found.

Alfie is a web and mobile troublemaker.
Email this author | All posts by Alfie
agit8.org.uk – now in my rss reader)))
————————
my blog: http://zehon.ru/
[...] » Features » The future of microformats and semantic technologies [...]