The XFN microformat and a spider in Python

One microformat which is one of the more used ones is XFN. XFN stands for XHTML Friends Network and is about marking links to the pages of your contacts and friends. Additionally you can use XFN to mark links to your own pages (e.g. linking all your profiles together).

Doing that is quite easy, you simply add a HTML attribute rel="me" to the link-tag, e.g.

<a href="http://flickr.com/photos/mrtopf" rel="me">my flickr pages</a>

If I have this on my profile page it says that I this flickr URL is also one of my pages. That way some spider can collect all the information about me. In my case I prepared a page on my site which can serve as a starting point for such a spider, linking to many of my profiles.

But what if somebody else links with rel=”me” to one of my pages?

In this case it helps to check for symmetrical relationships. This means that the page I link to should also link the same way back.

What about different URLs pointing to the same page

Apparently URLs can be written differently and if it’s just omitting the “www.” or having a trailing slash. This can be a problem for identifying your profiles. One way to avoid this problem is to canonicalizing the link addresses, like Google’s Social Graph API does.

The problem for websites might be still to find out what links are the same as it always means spidering the web which might be hard if you don’t have a link page like I have. Google of course has a better position here as they have all pages already indexed and finding out those links is then more or less asking the own database. That’s what Google’s Social Graph API can do for.

It’s also about friends

XFN is not only about your web sites but also about who your friends are. The same mechnism is used but instead of “me” other names are used, some can also be mixed, e.g.

<a href="http://tanya.example.org" rel="friend met colleague">...

This means, the person you link to is your friend, you met in real life and it’s a colleague. A more general way of simply linking to contacts is using the “contact” name. This is what e.g. Twitter does in it’s contact list. Here is an example with the hCard microformat mixed in:

 <span class="vcard">    <a href="http://twitter.com/Scobleizer"        class="url" rel="contact" title="Scobleizer"><img 			alt="Scobleizer" 			class="photo fn" id="profile-image"                         src="...." height="24" width="24"></a>  </span>

In this case the rel=”contact” is the XFN part, the other bold parts belong to the hCard microformat.

Of course you need to crawl the web here to really retrieve all your contacts and also find out which friends links are actually pointing to the same person. Thus you probably also need to follow all the me-Links for every friend. Otherwise you cannot know that

http://flickr.com/photos/mrtopf

and

http://twitter.com/mrtopf

are the same person unless there is some Link-Graph with me-Links which links these together (e.g. by starting from my connect-Page).

Thus using XFN is probably only useful if you want to write spiders. It also might mean that you cannot display information in real-time as spidering might take some time. Alternatively you can of course use the Google API.

A XFN Spider in Python (actually 2)

A while back I created a Python script which retrieves rel=me links starting from one page. It does no canonicalization and only outputs a list. But it should be easy to extend. There is one script I wrote initially and another one which does the same job but uses Linden Lab’s eventlet library for retrieving links in a non-blocking, more concurrent way.

You can find both scripts at the Google Code project I created:

http://code.google.com/p/pydataportability/

Both scripts need the HTML parser BeautifulSoup to be installed and the eventlet apparently needs eventlet to be installed (I used the SVN version).

Both scripts are called with a starting URL, optionally you can add -v if you want verbose output on what it does.

Here is an example output:

./xfn2_eventlet.py http://mrtopf.de/connect
44 profiles found
http://mrtopf.de/connect
http://mrtopf.de/blog
http://taotakashi.wordpress.com
http://www.slexchange.com/modules.php?name=Marketplace&MerchantID=13238
http://dev.comlounge.net
https://www.xing.com/profile/Christian_Scholz4
http://www.linkedin.com/in/mrtopf
http://mrtopf.tv
http://comlounge.tv
http://mrtopfde.blip.tv
http://taotakashi.blip.tv
http://flickr.com/photos/mrtopf
http://flickr.com/people/mrtopf
http://mrtopf.de
http://flickr.com/people/mrtopf/contacts
http://flickr.com/photos/taotakashi
http://flickr.com/people/taotakashi
http://flickr.com/people/taotakashi/contacts
http://www.facebook.com/profile.php?id=652229223
http://twitter.com/mrtopf
http://twitter.com/mrtopf/friends
http://pownce.com/mrtopf
http://pownce.com/mrtopf/friends
http://del.icio.us/mrtopf
http://www.facebook.com/profile.php?id=Tao Takashi
http://technorati.com/people/technorati/mrtopf
http://technorati.com/blogs/mrtopf.de%2Fblog
http://technorati.com/blogs/mrtopf.de%2Fpodcast
http://mrtopf.de/podcast
http://technorati.com/blogs/dev.comlounge.net
http://technorati.com/blogs/comlounge.net
http://comlounge.net
http://technorati.com/blogs/mrtopf.blogspot.com
http://mrtopf.blogspot.com
http://technorati.com/blogs/mrtopf.tv%2Fvlog
http://mrtopf.tv/vlog
http://technorati.com/blogs/comlounge.tv%2Fblog
http://comlounge.tv/blog
http://technorati.com/blogs/twitter.com%2Fmrtopf
http://mrtopf.jaiku.com
http://www.last.fm/user/mrtopf
http://www.last.fm/user/mrtopf/friends
http://upcoming.yahoo.com/user/28980
http://www.slideshare.net/mrtopf

Here it’s also interesting to see how they get more and more although I haven’t added more services myself. But it means that more and more services actually add those links (like flickr).

Technorati Tags: , , , , ,

7 Responses to “The XFN microformat and a spider in Python”

  1. Ian Kallen Says:

    I’ll check that Google Code URL again, looking forward to seeing what you’re posting there. I’m also interested in seeing FOAF, XFN, hCard and OpenID link tags used for identity federation.
    -Ian

  2. Microformats in Plone — mrtopf.de Says:

    [...] posted yesterday already about the XFN microformat. But there are more, here a short [...]

  3. Guido Stevens Says:

    Nice, Christian. This code just /asks/ to be extended into a generic rel=”*” parser that stores the social graph it discovers into a zc.relationship index… I hope to find time to play around with this stuff soon.

  4. Reinout van Rees Says:

    On saturday I put a new front page online for my http://vanrees.org site. In the lower right part I list all the flickr-linkedin-twitter type links. With xfn rel-tags.

    So that’s only a listing of my own pages. I did deliberately use my frontpage for that, as it makes it easier to get bi-directional links. On linkedin, I just link to my front page and the front page itself links back. No separate contact page.

  5. Christian Scholz Says:

    @Ian that’s definitely also interesting and I just posted about possible hCard support in Plone. It might also be nice to have a general microformats parser in Python

    @Guido yep, it makes sense to extend them and if you want to go for it, feel free to do so. I’d suggest though that we model the graph as objects then, e.g.

    graph = SocialGraph(url=”http://mrtopf.de/connect”)
    friends = graph.friends
    paul = friends['paul'] # paul might be the nickname defined in hCard
    pauls_profiles = paul.profiles

    and so on. Maybe support for symmetrical relationships could also be added that way to filter out just these.

    I also wanted to make them an egg btw and was wondering if the eventlet support can be made optional in only one file.

    @Reinout definitely makes sense. The idea behind my page was though to collect all those links in one page and as they are quite a few not to pollute the blog with it. But theoretically the spider should follow the homepage link to the connect page (must be marked as rel=me) and then find the links there. Of course it’s not directly symmetrical but only indirect. I wonder if that makes sense ;-)

    Then again my homepage is only a picture now anyway so why not add links and contact information to them.

  6. Guido Stevens Says:

    For an extensive reflection on the use of two-way links to prevent relationship injection, see:
    this transcyberia blog post

  7. Recent Links Tagged With "eventlet" - JabberTags Says:

    [...] public links >> eventlet links for 2007-11-27 Saved by dbucciar on Thu 06-11-2008 The XFN microformat and a spider in Python Saved by spritemoney on Thu 06-11-2008 links for 2008-02-03 Saved by suewolff on Mon 27-10-2008 [...]

Leave a Reply