Census Data Used for DIC experiments

Since there has been a lot of demand for it, here is the census data I used for the experiments in the IDC experiments published in SIGMOD '97. There are two files: pumsaxdc.bin (the actual data) and items.names (the descriptions of the items). The .bin file is the concatenation of a number of baskets. Each basket is a string of short ints (be careful with byte order; I believe they are intel) terminated by 0.

To get a name of the item, go to the line with the item number in the fields file. (count starts from 1 I believe)

The name of the item is a (name, value) pair. To get a description take a look at the my.fields file. Note that some numerical parameters have been bucketized into other values. Hope this helps and let me know if there are problems. Note I haven't touched this in a while and I may have forgotten something important.

Notes: lots of disclaimers apply to this data. Also, sorry but I haven't been able to make the larger Arizona data set available.

Cheers,
--sergey


sergey@cs.stanford.edu
Last modified: Fri Oct 24 13:47:47 PDT 1997