Internals/OptimizeHTTP
Optimizing a repository for HTTP transfer
To reduce number of files needed to transfer over network, optimize —http command packs a repository into two tarballs, basic.tar.gz and patches.tar.gz, with following content:
basic.tar.gz
- _darcs/hashed_inventory
- _darcs/meta-filelist-pristine
- _darcs/meta-filelist-inventories
- _darcs/meta-*
- _darcs/hashed.pristine/*
- _darcs/inventories/*
meta-filelist-* files contain directory listings for hashed.pristine and inventories dirs, in reverse order wrt tarball itself. While getting, files from this listings are downloaded using cache in parallel with tarball.
meta-* files in general contain additional files and information that could extend the tarballs functionality in some way. They are expected to have a small size, so that negative effect on performance would be minimal.
patches.tar.gz
- _darcs/patches/*
Getting an optimized repository
- Download and unpack basic.tar.gz. Result: lazy repository from time when optimize —http has been done.
- Pull from parent repository. Result: lazy repository from current time.
- Download and unpack patches.tar.gz. Result: full repository.
Benchmarks
How does “optimize —http” improve the user experience?
- Jérémie’s repo (~900 patches): from 10s (get —no-packs) to 1s (get)
http://darcs.net/(~9300 patches): “darcs optimize —http” takes 14s to run. _darcs goes from 54 MBytes to 64 MBytes (indeed _darcs/packs/ is 11 MBytes) Complete get: from 37 to 2 minutes, lazy get from 27 seconds to 7 seconds.
screened + 12 patches:
. packs no-packs
lazy 30s 1m30
full 2m30s 31m
