Internals/Repository
This page describes the structure of a darcs repository. Yes, this _darcs thingy that appears after you do a “darcs initialize”! In this page, we describe repositories without referring to Darcs code. You may want to start by reading the Model page, to have an more global vision of Darcs repositories.
This is work in progress, so I will put a lot of todo everywhere.
You can look into gzipped files with zless. Almost everything in _darcs is gzipped.
_darcs after an initialization
This is what we have after darcs init:
_darcs/
|-- format
|-- hashed_inventory
|-- patches
|-- prefs
| |-- binaries
| |-- boring
| `-- motd
`-- pristine.hashed
`-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
formatcontains the two lineshashedanddarcs-2. This file is read by darcs before attempting to read from or write in the repository.hasedrefers to the repository format, anddarcs-2refers to the patch format. For more information, see the Darcs 2 description page and http://article.gmane.org/gmane.comp.version-control.darcs.devel/5393hashed_inventoryis a plain text file describing the last recorded state of the repository.patchesis a directory containing gzipped files, each one containing a named patch. This directory is initially empty.prefsare plain text files that contain various optionspristine.hashedcontains gzipped files, each one containing either a directory content, or a file content. The contents of all current directories and file of the last version of the repository are present. In the current case, the file e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 is present to describe the current empty root directory of the repository.
After adding one patch
Let’s start preparing a patch:
$ echo "file content" > somefile
$ darcs add somefile
We have the extra files in _darcs:
_darcs/
|-- format
|-- hashed_inventory
|-- index
|-- index_invalid
|-- patches
| |-- pending
| `-- pending.tentative
|-- prefs
| |-- binaries
| |-- boring
| `-- motd
|-- pristine.hashed
| `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
|-- tentative_hashed_inventory
`-- tentative_pristine
index: extra optimization file added by darcs since 2.3.1index_invalid: samepatches/pending: the patch being built. Contains nowaddfile ./somefilepatches/pending.tentative: todotentative_hashed_inventorytentative_pristine
Now record:
$ darcs record -a -m"my first patch"
What we have now in _darcs:
_darcs/
|-- format
|-- hashed_inventory
|-- index
|-- index_invalid
|-- inventories
| `-- 0000000205-0332fe4dd444b6b9f94ba71ea1ce3b6fa7cb564e5d4b9f6c0fc7044073ee08db
|-- patches
| |-- 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b
| |-- pending
| `-- pending.tentative
|-- prefs
| |-- binaries
| |-- boring
| `-- motd
|-- pristine.hashed
| |-- 694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df
| |-- 83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603
| `-- e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
`-- tentative_pristine
New files:
hashed_inventory: its content has changed, more on that laterinventoriesdirectory: contains files of the same kind thathashed_inventory, more on that later.patches/0000000172-...: gzipped patch.
The contents of this last patch is:
[my first patch
Guillaume <me@mail.com>**20101016142609
Ignore-this: 9af21412b424aef171164f2b98bc9d10
] addfile ./somefile
hunk ./somefile 1
+file content
So it is really a darcs patch, with its metadata (name, author name, timestamp, and an extra hash to be sure no confusion can be made), and its data: addfile, one hunk consisting in one line addition in file somefile.
pristine_hashed has two more files. 694b... contains:
file content
And 83bf... contains:
file:
somefile
694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df
This last file is in fact the description of the current last recorded state of the repository: an initial directory with the file somefile, whose contents are given in the file 694b.... This is how darcs gets the contents of files when doing a darcs get. But wait, how did I know that this file 83bf... was the description of the base directoty of the last recorded state? Well I know it because hashed_inventory now contains:
pristine:83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603
[my first patch
Guillaume <me@mail.com>**20101016142609
Ignore-this: 9af21412b424aef171164f2b98bc9d10
]
hash: 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b
So, this ``hashed_inventory`` file describes the current recorded state of the
repository and its first line gives the file name of the current root. That means
``darcs get`` has all the information to retrieve files by looking at this
``hashed_inventory`` file fist.
Now one remark. Why do we keep this file printine.hashed/e3b0... if we no longer need it? Well, that’s because darcs wants to be fast and does not delete the pristine files over time. Also, this is something we could think of implementing and see if we can have a “tidying record” that is as fast as the current record. If you run darcs optimize in that directory, _darcs now contains:
_darcs/
|-- format
|-- hashed_inventory
|-- index
|-- index_invalid
|-- inventories
| `-- 0000000205-0332fe4dd444b6b9f94ba71ea1ce3b6fa7cb564e5d4b9f6c0fc7044073ee08db
|-- patches
| |-- 0000000172-de1342a0b690a33830231c0929ce6b63fa23315c47f6a1d6552a34f744aeaa9b
| |-- pending
| `-- pending.tentative
|-- prefs
| |-- binaries
| |-- boring
| `-- motd
|-- pristine.hashed
| |-- 694b27f021c4861b3373cd5ddbc42695c056d0a4297d2d85e2dae040a84e61df
| `-- 83bf551b64dc5f0e5684e1e42268c4ec56df209a4604cd7e936c169c3fa47603
`-- tentative_pristine
So we got rid of that e3b0... file that is no longer useful. Over time your darcs repositories may grow in size because of this pristine.hashed directory that accumulates files. Run “darcs optimize” if you are in desperate need of disk space (the effect is dramatic if you have big files, like binary files, in your repository). See also the GrowingPristineProblem.
hashed_inventory, inventory
An inventory is a file that describes the state of a repository by listing patches. It may start by the hash of another inventory, so that inventory files never get too big.
hashed_inventory is the inventory of the current state of the repository. The subdirectory inventories stores other inventories useful for the history of the repository.
_darcs/inventories/ contains gzipped context files. Each inventory starts with a hash of the other inventory file it relies upon. Let us take a repository with already many patches. Let us take one inventory file
Starting with inventory:
0000009036-9cbf750ff34fa7b3940af47b7c95ec812d2e536f5feada8d0e89ed530cecddcc
[TAG 1.5.3
Guillaume <me@mail.com>**20100513150110
Ignore-this: 4d602c25b18ca30228400f8800e27253
]
hash: 0000005948-e154869978642799facaca2180634f353d45df6e7478244f4fb16ea831ec612c
[switch to GHC 6.12 Prelude, fix warnings and take sme advice from hlint
Guillaume <me@mail.com>**20100604121359
Ignore-this: 7286831df91ffb8974deeb6a67527fa0
]
...
If we look at the file inventories/0000009036–9cbf750ff34fa7b3940af47b7c95ec812d2e536f5feada8d0e89ed530cecddcc
Starting with inventory:
0000005042-37894faa0a3f90fcba049147fdb28490d53b1a27b5763feff3a940906a8e0823
[TAG 1.5.2
Guillaume <me@mail.com>**20091110191538
Ignore-this: 7af98721b507b5b53d95688aeee45eff
]
hash: 0000003430-515b0a6e2c0fd55f0fb7fdf85b59387ee78a7c97306b56cd5767e0afedc62303
[comment no longer relevant
Guillaume <me@mail.com>**20100217132511
Ignore-this: e854183117a8d980ccab7efdf5a66a3d
]
hash: 0000000232-c7d79d1acf8a1847869c73e7852937b91d65a179f91e3d5b0581a354f6596cfe
[defer more to getMods
Guillaume <me@mail.com>**20100217173918
Ignore-this: f6e2633492d31565723729e787a62dd2
]
TODO what is the logic behind inventory file segmenting ?
See that inventory files contain the metadata of patches but not their contents. There isa hash for that, and the hash is used as a file name in _darcs/patches/, to store the metadata again + the patch content.
Why is there patch metadata in inventory files, while it is also in _darcs/patches/ files? This is for lazy repositories. In lazy repositories you don’t download patches files but you have inventory files. So at least you can do darcs changes without having to downlad extra files. However if you want to do darcs changes -v this downloads all patches. By the way this is a way to “complete” your repository into a full one.
