Atomic pickles

Python pickles are great! They allow you to serialize a whole lot of stuff: primitives, lists, dicts, & even classes. If your’re writing python they are very handy but persisting them robustly requires some thought. Let’s say your application maintains some state. When your app fires up, you would like to read that state and resume your operation. One way of doing this is just to pickle your statefull objects to a known file: state.pickle, for example.

import pickle
with open("state.pickle", "wb") as f
    pickle.dump(appStateObject, f)

Clean. Simple. There’s just one problem. Every time the file gets opened, it is truncated and the dump starts serializing the object to disk. If their is a problem during serialization you would end up with a foobared pickle. You could serialize the whole object to memory first, then write it out. If you application fails (or is terminated) while writing to disk… it will still have a foobared pickle. So what to do? Write it to another file.

import pickle
import tempfile
import os.path

f = None
filename = "state.pickle"
tfile = None
try:
   with tempfile.NamedTemporaryFile(dir=os.path.dirname(filename), delete=False) as tfile:
       pickle.dump(appStateObject, tfile)
   os.rename(tfile.name, filename)
finally:
   if tfile and os.path.exists(tfile.name):
       os.remove(file.name)

The trick is to save the pickle in the same directory as the final file but under another filename. If things go south, any existing pickle won’t be foobared. If the file gets written out without errors, os.rename() will overwrite the existing pickle (if it exists) in a atomic manner on POSIX systems. The same holds true for any file you open in write mode and truncation occurs. Never assume that you will leave the file in a usable state if things go wrong.

You could resort to storing your object in a database that has robust transaction support. That would be a lot of baggage to store a simple pickle.

Value Priced Microcontroller Development Kit

Building your own microcontroller projects can be fun and rewarding. One thing that’s always bothered me is that development kits are far too often overpriced. Microcontollers often cost less than $5 but kits tend to hover between $50 to $100. Texas Instruments has released a development kit that’s not only fully featured but incredibly affordable: MSP-P430G2.

MSP430 Develpment Kit

For a limited time the kit is only $4.30 and includes a USB flash emulation module (to program, debug or monitor), 20 pin DIP Socket, 2 micro controllers, a mini USB cable, and an integrated LEDs/switches. Code Composer Studio is available for free but includes a 16k object code limit. Code Composer Studio doesn’t run on Linux yet but Linux support is scheduled for version 5.0 with MSP430 scheduled for 5.1. In the mean time, I’ll run it under windows. Get hacking!

Normalizing an Object Database

Object databases like ZODB provide a very natural and efficient way to persist objects. It’s very easy to get started defining objects, attributes, methods and relationships. In the time it takes most DBAs to create UML diagrams, a competent software engineer can complete a substantial portion of application logic using an object database. Inevitably, there will be a requirement to generate some reports from the database. All the time saved using the object database can easily be consumed trying to reuse that database to generate reports. The ad hoc nature of the object database becomes a burden when trying to generate normalized data across dissimilar objects.

Consider the UML diagram above. Reporting engines like JasperSoft provide mechanisms to use custom data sources like object databases. With Jaspersoft, you can use a custom Java bean as a data source. The logic to extract data then becomes embedded in your custom code. As the number of entity types in your application increases, the number of custom data sources that adapt the reporting engine to your database also increases. Moreover, you can’t use the query tools in the reporting engine to help you build your queries because your data source is not a traditional data warehouse/mart schema. Ideally, the production database should be exported to a reporting database that is more suitable for reporting. Note that not all the entities in the diagram are useful for reporting. For example, it may not be useful to export “Clothing” or “Server” objects. These entities only contain common attributes found in subordinate entities. Customizing our output for reporting would therefore be very useful.

Most business users are familiar with SQL but not very familiar with object databases. Furthermore, reporting tools are usually centered around relational databases. It stands to reason that it may worthwhile to export an object model to a relational database. One approach to doing this is to use an Entity-Attribute-Value model. To export our data model to an EAV model we walk the object graph and emit entity, attribute, and relationship data to the relational database. The EAV model, however, is very tedious to use as a reporting database. Every attribute is modeled as a row in the database. To generate a traditional database table where attributes are columns requires the use of a PIVOT transform or a JOIN for every attribute. Designing an ETL processes for doing this is straightforward.

Object databases are ostensibly extensible; it is very likely that applications built with object databases are extensible as well. This is certainly the case with CRM tools; these application define new objects, relationships and attributes that are specific to the organization they are being deployed in. How do these objects get exported to your reporting database? I mentioned earlier that an ETL process can get this data into your reporting database. Using this method, however, forces you to create an export procedure for every reportable entity that’s added to your system. If adding entities to the object database is easy, exporting those entities should be just as easy (or automatic!).

Exporting Entities

Exporting the object database can be done in three steps. First, export the object and their properties to a staging table which is an EAV model. Then, enumerate all the entities and create schemas that encapsulate all the properties present for each model. Finally, fill the generated database tables with the data in the EAV model. The generated schema is not a handcrafted masterpiece but it will satisfy most reporting requirements.

Next week…. Some actual code.

Building Zenoss 3.0.x from source

I’ve read some recent posts about building Zenoss from source. Here are the steps I go through for building Zenoss 3.0.x from source code. I do most of my development using Ubuntu; lately, I’ve been using Ubuntu 10.04. I use the server version (because I develop using a virtual machine) but any flavor of 10.04 should work.

Dependencies:

The first order of business is to install the dependencies that are needed to build Zenoss. You can find some guidance for this in a text file in the svn tree. Section 3.4 of that file shows the Ubuntu packages required for building Zenoss. The dependencies can be installed with the following command.


sudo apt-get install subversion swig swig1.3 libmysqlclient-dev build-essential autoconf libreadline-dev libssl-dev unzip zip libreadline5 libssl0.9.8 ssh libsnmp-base libsnmp15

I’ve created a meta package that has all these dependencies. This .deb meta-package can be installed with dpkg or via apt-get by adding the repo to your sources.list.

Next, create a zenoss user. I chose a very creative name for my zenoss user: zenoss. Create an installation directory and give this user ownership of that directory.


sudo adduser zenoss
sudo mkdir /usr/local/zenoss
sudo chown zenoss.zenoss /usr/local/zenoss

This user also requires some environment variables to be set. These variables can be set in the zenoss user’s .bashrc file.


export ZENHOME=/usr/local/zenoss
export PYTHONPATH=$ZENHOME/lib/python
export PATH=$ZENHOME/bin:$PATH
export INSTANCE_HOME=$ZENHOME

Log in as the zenoss user and checkout the source code for zenoss.


sudo su - zenoss
svn checkout http://dev.zenoss.com/svn/branches/zenoss-3.0.x/inst \
          zenossinst_3.0.x
cd zenossinst_3.0.x

This will checkout the installation source in to a directory named zenossinst_3.0.x. Next, build zenoss. Notice that SVNTAG is exported. During the source build the installer checks out additional source code to $ZENHOME using this SVNTAG. If it’s not set, the installer defaults to getting program files from trunk (instead of the branch). The build harness is a bit brittle. If the build fails you must do a `make clean` before you try rebuilding again. If there are files checked out in $ZENHOME, the build will assume that an existing zenoss install is already present and attempt an upgrade. This will make the installer behave strangely because there may not be a fully functioning zenoss install present. As a personal practice, I always remove the files in $ZENHOME.


export SVNTAG=branches/zenoss-3.0.x
make clean
rm -Rf $ZENHOME/*
./install.sh

The SVNTAG is the relative path (from svn root) of branch you are interested in building. SVNTAG defaults to ‘trunk’. This can be a real problem if the installer for 3.0.x was checked out but SVNTAG was not set. The installer will check out other files from ‘trunk’ instead of the zenoss-3.0.x branch. The build harness also has problems with environment variables like MYSQL_PWD; the initial part of the installer works correctly but later fails because some subsequent scripts don’t get this setting passed down to them.

Zenoss builds it’s own version of python & rrd (and its dependencies). The build takes about 30 minutes on my box. You will need a mysql instance (which the build will prompt you for).