 -*- Text -*-

This directory contains the experimental framework used to measure and
test XDFS and Xdelta.  Located in the "tests" directory, one level up,
are several programs that share a common mechanism for traversing a
tree of RCS files.

The framework includes a method of configuring various different
executions of these programs using inputs in the "config" directory.
These programs accept as arguments some number of file names, which
are opened and read to determine part of the configuration.  In order
for you to run these programs you will need to modify some of the
paths contained within files in the "config" directory.  The config
files containing system-specific paths are:

  config-output    A directory where output is written, must already
                   exist.

  data-*           Files prefixed by "data-" define the input data
                   set, which should be a subdirectory tree of RCS
		   files.

  input-paths      This contains paths for the RCS, diff, and null
                   executables.

  output-*         These files name directories where constructed
                   archives are built and where Berkeley DB logs
		   are placed.

If you want to reproduce my experiments you must apply the patch
contained in the "tools" directory to the RCS sources.  This patch
causes RCS to fsync() its output following a checkin.

The "scripts" directory contains a number of of Python, Bourne shell,
and Gnuplot scripts used to run experiments and analyze experimental
data.  The "test.all" script has the sequence of experiments I ran
commented out and a few samples in their place.  Edit the "test.all"
script so that it knows the location of the "config" and "test"
directories.  Then, once you have set the config paths properly, you
should be able to execute "test.all".  It uses the input data set
defined in "config/data-samples" and runs one experiment for each of
XDFS, RCS, Xdelta, and diff.

If you successfully run "test.all", the output directory (named in
"config/config-output") should contain four new directories named
according to the experiment that was run (where "samples" is replaced
by the basename of your input data set directory):

  ex=at,id=samples,mth=RCS-tree
  ex=at,id=samples,mth=XDFS-f
  ex=dt,id=samples,md=Date,mth=Diff
  ex=dt,id=samples,md=Date,mth=Xdelta

Each of these directories contains the raw data needed to generate
the graphs and tables in my report.

The "http" directory contains two scripts used to collect and
transform the data into RCS files.  It does not contain the data set I
used for the experiments in section 6.3 of my report.  To use these
scripts, you should create one subdirectory per web site you wish to
collect data from and enter its name into the file "dir-list".  Place
a file in each directory named "url" with the URL to fetch.  Then
place a call to "runwget" in your crontab and allow it to collect data
for some period of time.  Once you have collected enough data, the
"torcs" script will create an RCS archive for each directory.

The test applications are:

  Archivetest is used to compare XDFS and RCS.  It has several
  methods, described in the report: FS-none, XDFS-none, XDFS-r,
  XDFS-f, RCS-tree, and RCS-linear.  The config files named
  "cluster-max-*" allow the cluster size for XDFS-r to be set.  The
  source buffer can also be configured, for example, using
  "src-buf-max-files-14" and "src-buf-min-size-256"

  Deltatest is used to compare Xdelta and diff.  It has a "mode"
  option that can be set to "Orig" or "Date" to control whether RCS
  ancestry is used or not.

  Extracttest verifies the correct operation of the
  xdfs_extract_delta() and xdfs_apply_delta() functions.

  Proxytest is a demonstration of how to use xdfs_extract_delta() and
  xdfs_insert_delta() to transport versions between two XDFS
  archives, simulating a delta-compression proxy configuration.

  Synctest is used to measure the compressibility of the HTTP data
  sets.
