Internals/Repository

This page describes the structure of a darcs repository. Yes, this _darcs thingy that appears after you do a “darcs initialize”! In this page, we describe repositories without referring to Darcs code. You may want to start by reading the Model page, to have an more global vision of Darcs repositories.

This is work in progress, so I will put a lot of todo everywhere.

You can look into gzipped files with zless. Almost everything in _darcs is gzipped.

_darcs after an initialization

This is what we have after darcs init:

_darcs/
|-- format
|-- hashed_inventory
|-- patches
|-- prefs
|   |-- binaries
|   |-- boring
|   `-- motd
`-- pristine.hashed
    `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
  • format contains the two lines hashed and darcs-2. This file is read by darcs before attempting to read from or write in the repository. hased refers to the repository format, and darcs-2 refers to the patch format. For more information, see the Darcs 2 description page and http://article.gmane.org/gmane.comp.version-control.darcs.devel/5393
  • hashed_inventory is a plain text file describing the last recorded state of the repository.
  • patches is a directory containing gzipped files, each one containing a named patch. This directory is initially empty.
  • prefs are plain text files that contain various options
  • pristine.hashed contains gzipped files, each one containing either a directory content, or a file content. The contents of all current directories and file of the last version of the repository are present. In the current case, the file e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is present to describe the current empty root directory of the repository.

After adding one patch

Let’s start preparing a patch:

$ echo "file content" > somefile
$ darcs add somefile

We have the extra files in _darcs:

_darcs/
|-- format
|-- hashed_inventory
|-- index
|-- index_invalid
|-- patches
|   |-- pending
|   `-- pending.tentative
|-- prefs
|   |-- binaries
|   |-- boring
|   `-- motd
|-- pristine.hashed
|   `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
|-- tentative_hashed_inventory
`-- tentative_pristine
  • index : extra optimization file added by darcs since 2.3.1
  • index_invalid : same
  • patches/pending : the patch being built. Contains now addfile ./somefile
  • patches/pending.tentative : todo
  • tentative_hashed_inventory
  • tentative_pristine

Now record:

$ darcs record -a -m"my first patch"

What we have now in _darcs:

_darcs/
|-- format
|-- hashed_inventory
|-- index
|-- index_invalid
|-- inventories
|   `-- 0000000205-0332fe4dd444b6b9f94ba71ea1ce3b6fa7cb564e5d4b9f6c0fc7044073ee08db
|-- patches
|   |-- 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b
|   |-- pending
|   `-- pending.tentative
|-- prefs
|   |-- binaries
|   |-- boring
|   `-- motd
|-- pristine.hashed
|   |-- 694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df
|   |-- 83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603
|   `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
`-- tentative_pristine

New files:

  • hashed_inventory: its content has changed, more on that later
  • inventories directory: contains files of the same kind that hashed_inventory, more on that later.
  • patches/0000000172-...: gzipped patch.

The contents of this last patch is:

[my first patch
Guillaume <me@mail.com>**20101016142609
 Ignore-this: 9af21412b424aef171164f2b98bc9d10
] addfile ./somefile
hunk ./somefile 1
+file content

So it is really a darcs patch, with its metadata (name, author name, timestamp, and an extra hash to be sure no confusion can be made), and its data: addfile, one hunk consisting in one line addition in file somefile.

pristine_hashed has two more files. 694b... contains:

file content

And 83bf... contains:

file:
somefile
694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df

This last file is in fact the description of the current last recorded state of the repository: an initial directory with the file somefile, whose contents are given in the file 694b.... This is how darcs gets the contents of files when doing a darcs get. But wait, how did I know that this file 83bf... was the description of the base directoty of the last recorded state? Well I know it because hashed_inventory now contains:

pristine:83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603
[my first patch
Guillaume <me@mail.com>**20101016142609
 Ignore-this: 9af21412b424aef171164f2b98bc9d10
] 
hash: 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b

 So, this ``hashed_inventory`` file describes the current recorded state of the
 repository and its first line gives the file name of the current root. That means
 ``darcs get`` has all the information to retrieve files by looking at this
 ``hashed_inventory`` file fist.

Now one remark. Why do we keep this file printine.hashed/e3b0... if we no longer need it? Well, that’s because darcs wants to be fast and does not delete the pristine files over time. Also, this is something we could think of implementing and see if we can have a “tidying record” that is as fast as the current record. If you run darcs optimize in that directory, _darcs now contains:

_darcs/
|-- format
|-- hashed_inventory
|-- index
|-- index_invalid
|-- inventories
|   `-- 0000000205-0332fe4dd444b6b9f94ba71ea1ce3b6fa7cb564e5d4b9f6c0fc7044073ee08db
|-- patches
|   |-- 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b
|   |-- pending
|   `-- pending.tentative
|-- prefs
|   |-- binaries
|   |-- boring
|   `-- motd
|-- pristine.hashed
|   |-- 694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df
|   `-- 83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603
`-- tentative_pristine

So we got rid of that e3b0... file that is no longer useful. Over time your darcs repositories may grow in size because of this pristine.hashed directory that accumulates files. Run “darcs optimize” if you are in desperate need of disk space (the effect is dramatic if you have big files, like binary files, in your repository). See also the GrowingPristineProblem.

hashed_inventory, inventory

An inventory is a file that describes the state of a repository by listing patches. It may start by the hash of another inventory, so that inventory files never get too big.

hashed_inventory is the inventory of the current state of the repository. The subdirectory inventories stores other inventories useful for the history of the repository.

_darcs/inventories/ contains gzipped context files. Each inventory starts with a hash of the other inventory file it relies upon. Let us take a repository with already many patches. Let us take one inventory file

Starting with inventory:
0000009036-9cbf750ff34fa7b3940af47b7c95ec812d2e536f5feada8d0e89ed530cecddcc
[TAG 1.5.3
Guillaume <me@mail.com>**20100513150110
 Ignore-this: 4d602c25b18ca30228400f8800e27253
] 
hash: 0000005948-e154869978642799facaca2180634f353d45df6e7478244f4fb16ea831ec612c
[switch to GHC 6.12 Prelude, fix warnings and take sme advice from hlint
Guillaume <me@mail.com>**20100604121359
 Ignore-this: 7286831df91ffb8974deeb6a67527fa0
] 

...

If we look at the file inventories/0000009036–9cbf750ff34fa7b3940af47b7c95ec812d2e536f5feada8d0e89ed530cecddcc

Starting with inventory:
0000005042-37894faa0a3f90fcba049147fdb28490d53b1a27b5763feff3a940906a8e0823
[TAG 1.5.2
Guillaume <me@mail.com>**20091110191538
 Ignore-this: 7af98721b507b5b53d95688aeee45eff
] 
hash: 0000003430-515b0a6e2c0fd55f0fb7fdf85b59387ee78a7c97306b56cd5767e0afedc62303
[comment no longer relevant
Guillaume <me@mail.com>**20100217132511
 Ignore-this: e854183117a8d980ccab7efdf5a66a3d
] 
hash: 0000000232-c7d79d1acf8a1847869c73e7852937b91d65a179f91e3d5b0581a354f6596cfe
[defer more to getMods
Guillaume <me@mail.com>**20100217173918
 Ignore-this: f6e2633492d31565723729e787a62dd2
] 

TODO what is the logic behind inventory file segmenting ?

See that inventory files contain the metadata of patches but not their contents. There isa hash for that, and the hash is used as a file name in _darcs/patches/, to store the metadata again + the patch content.

Why is there patch metadata in inventory files, while it is also in _darcs/patches/ files? This is for lazy repositories. In lazy repositories you don’t download patches files but you have inventory files. So at least you can do darcs changes without having to downlad extra files. However if you want to do darcs changes -v this downloads all patches. By the way this is a way to “complete” your repository into a full one.