by Christian Scholz on April 21, 2008
The last few days I have been working on creating the basic parts of a DataPortability library for Python. My goal is to create a library which supports the standards we propose in the DataPortability Project to help leveraging Python based implementations. It of course also might be useful without this aim in fact you might simply need a library for handling microformats or one of the other standards.
Right now the library is very barebone, supporting basic parsing of the hCard microformat, the XFN microformat and XRDS Simple. What comes out of the parser are then VCard, XFNRelationships or Service objects, depending on which parser you use.
Please note that all this is still very raw and very far from being complete. See this as development version, stuff might still change a bit. Documentation will also be added shortly.
Here is a list of eggs available:
The parser class handling the main work. It containts two HTML parsers right now (BeautifulSoup and ElementTree) and aims to be as extensible as possible. I use the Zope Component Architecture (ZCA) here for making this happen (note that it does not mean that it’s a Zope Application, the ZCA is a small but very useful part of Zope and allows me to register components in a decentral way. I will add examples later when I write more documentation.
The hCard parser which registers with the base package. If this is imported it will automatically try to find hCards in HTML documents
The XFN parser which registers with the base package. As the hCard parser it will automatically try to find XFN relationship inside an HTML document
This library parses XRDS Simple formatted files using ElementTree and returns Service objects
This egg is mostly useful in an example buildout and creates some examples scripts in it’s bin/ directory (see example below)
To show how it works I created an example buildout. If you don’t know what an buildout is then don’t worry, you don’t need to know for the example (basically it allows you to create a sandbox and install various eggs in it. Very useful for developing in a group or deploying software. Check out this screencast for a demo). In case you don’t know what Python eggs are, then please read my tutorial on them.
So here is what you do to get the examples working:
- Download the examples package.
- Extract it using
tar zxf pydataportability_examples-0.1.tgz
- Change to the directory using
- Run the bootstrap.py file:
python bootstrap.py(this will download the necessary software for the buildout to work)
- Run the generated buildout script:
bin/buildout(this will download all the pydataportability eggs and installs the example scripts in bin)
Now you have 3 example scripts in the bin/ directory. To run them just call them, e.g.:
bin/xrds bin/mf_etree bin/mf_bsoup
The first one parses an XRDS file and returns the found objects, the others parse the contents of my twitter page (which contains microformats) and return the found microformat objects. One uses the ElementTree parser, the other the BeautifulSoup parser.
Where is the source?
The simplest thing to get the source is probably to get the source buildout and build it (you should be somewhat familiar with buildout though). Just do this:
svn co https://pydataportability.googlecode.com/svn/buildouts/microformats pydataportability cd pydataportability python bootstrap.py bin/buildout
This will create a development sandbox. You will find the same examples in bin/ and you will find the source eggs in src/.
The source can also be found in SVN, just look at the Google Code page for it. Everything starting with pydataportability. is part of it.
Make also sure to read the README in the example buildout.
More documentation will be added later (as soon as Google re-enables wiki editing). There is also a Google Group for discussing pydataportability.