Best Practices for Darcs
This page is to collect ways of using darcs for criticism and possible inclusion in the darcs manual as a "best practice".
Contents
What repositories are useful?
Here's a thread that discusses some of these issues. Recommendations include keeping copies of your repositories under ~/public_html and on a laptop as well as in your working directory so it's always available; having a project directory, then putting darcs repositories under that for each separate area of work.
When should you branch?
In darcs, each branch exists in its own repository.
You can create a branch anytime you want independent lines of development; on one branch you might work on a bug fix, on another you might work on a feature. Or you might implement different features on different branches, or you might even try implementing the same feature in different ways on different branches. You might want a separate branch for maintenance updates to the "1.0" series, while you work on the "2.0" series elsewhere.
Once you've created a branch, changes are merged back and forth by use of darcs pull and darcs push.
This mail about using darcs in a similar way to CVS might also be helpful.
How to create a branch?
In darcs, every repository that shares common ancestry of patches with another can be considered a branch. Its commands don't talk about branches specifically, but simply about repositories and patches. So the short answer is "use the same commands you would for creating a new repository and moving patches back and forth to it."
One feature of branches in other source control systems is the ability to save disk space by not totally duplicating the storage of the source between the trunk and its branches. darcs can help with this by creating hard links to branches that exist on the same file system. When creating a new repo on the same file system, darcs get will create hard links. Otherwise, you can use darcs optimize --relink --sibling /repo/dir after the fact. But if both branches are pushed to two repositories on another machine, the hard link is not recreated on the new machine. See the docs for those commands for more details.
Besides this, darcs does not do any kind of branch management. Since a repository is exactly one branch, it is not possible for a single "server" repository published at a given URL to contain multiple branches. Discovery of the various branches in a published project must be done by other means.
How should you copy working directories between machines?
If you've recorded all your changes, just use darcs push/pull.
If you have unrecorded changes, you can record them, then push/pull, and then unrecord them on both machines after the synchronisation.
Some people use unison to propagate unrecorded changes between machines. If you do that, make sure that you never copy any of the files under _darcs. You should also make sure that Darcs itself is not accessing the repositories at the same time (Unison is oblivious of the locks that Darcs uses). rsync or cp -a could also be used, with similar caveats. (See, however, the page about unison for some more hints about using Unison with Darcs.)
What is the best way to name patches?
The author has seen two approaches which makes a lot of sense in this area. However, remember that this is a highly subjective matter.
The first approach is to follow an existing coding convention from another organization or project. For example, in an IRC discussion, "twb" demonstrated how he uses a variation of the GNU changelog conventions for his patch names: see ftp://twb.ath.cx/words/darcs-kludges.html#grouping-single-file-projects for the full details. In summary:
- No single patch alters more than one file.
- A patch name can take one of two forms:
Pattern
Typically used for...
Example
path: comment
file-wide changes.
main.c: renamed file_io typedef to FILE_IO for consistency
path (function [, function [...]] ): comment
changes that affects one or more specific functions or methods.
main.c (initializeScreen): Changed SDL_HWSURFACE to SDL_SWSURFACE
A second approach, which is very similar in retrospect, prefixes each patch name with a support ticket identifier (e.g., a bug ID number from Bugzilla). This is convenient because you can isolate changes relavent to a specific bug via regex-matches using darcs changes -p for example. The author discovered this technique while perusing the darcs mailing lists. In particular, check out: http://lists.osuosl.org/pipermail/darcs-users/2005-March/006113.html
How should you handle version numbering and releasing?
Tag your repository with the version number of your release. Use whatever numbering scheme you feel comfortable with; darcs stores version numbers as plain-text strings anyway.
Run "darcs dist" to create a source tarball.
If you forget what the last tag name was, you can use the following recipe to find out: darcs changes | grep tagged | head -1. If you wish to see all the tagged versions, you can leave off the head command. If you want to see the dates associated with each tag as well, you can use grep -B 1.
You can also obtain the results using XML processing as well. A relatively simple example of this approach is: darcs changes --xml-output | sed -n 's,<name>TAG \(.*\)</name>,\1,p'
Another way to list all tags, with date and author, is: darcs changes -t . The period is a regexp that matches all tags. You could for example list all rc tags with darcs changes -t rc. The very useful command darcs changes --from-tag . shows the last tag and all "extra" patches not tagged by it.
Since of darcs 1.0.9 there is also the darcs query tags command, to list all tags.
What do you do when you've got lots of patches and releases in a repository?
Add a checkpoint every now and then (darcs optimize --checkpoint) -- this will allow you to do --partial gets, and not worry about old versions most of the time. But if for some reason you need to worry about old versions, you can get a full copy of the repository and pull and push patches between the partial and the full repository.
If you want to support older versions in a limited fashion (security updates, bug fixes) you want to create branches of the old version.
Avoiding Trouble
Following these suggestions may help to minimize frustrations with darcs.
No stupid find tricks (or upgrade to darcs 2)
Watch out for operations which work recursively on your directory. For example, something like this will definitely cause inconsistencies in darcs:
for i in `find . -name '*.lhs'`; do sed -e 's/^import Foo/import Bar.Foo/' $i > $i.2; mv $i.2 $i; done
In darcs 1, the problem was that this command would also affect the files in _darcs/pristine. Darcs 2 makes this unlikely if you use either the hashed format (which is compatible with darcs 1) or the new darcs 2 format.
Avoid very large files
As we speak (dec 2005) darcs has trouble dealing with very large patches (this is likely due to the exponential time merge issues). If you have very large files that you absolutely want to store in the repository (corpora, etc); it may often be useful to gzip them first. I managed to reduce an 8M text file into 1M, for example.
Avoid conflicts (and upgrade to darcs 2)
In darcs 1, large conflicts and complex conflicts can cause darcs to use an exponential amount of CPU power to solve the problem, giving the appearance that darcs is "spinning" or "hanging". Darcs 2 mostly avoids the problem in practice, but you should use the new darcs 2 format repositories instead. See ConflictsFAQ for details
Avoid external merges when pulling unresolved conflicts
Avoid using --external-merge when pulling patches containing unresolved conflicts from a repository since information may be lost in this scenario. The existing conflicts in the remote repository will not be passed to the external merge tool since doing this at the same time as passing conflicts with the local repository is either impossible or too complicated. Instead the version of the file from the remote repository will be passed to the external merge tool in its pre-conflict state.
Don't change patches that have left your working repo
Once a patch has left your working repo, it could cause confusion if you then unrecord or amend-record that patch. Instead, create a new patch to resolve the issue.
Setting up a secure repo on the Internet
Say you have a server on the Internet, and you want to put a darcs repo there for several people to contribute to. If they all have accounts on the machine and can SSH to it, then no problem. But what if they don't have accounts on the machine? This recipe is for you!
See: Workflows and RepoViaSSH
Recovering from conflicts
If you discover a conflict between patches, it is not immediately obvious to a beginner how to retrieve the situation. Let's say you made some local changes that you forgot to record before pulling from another repo, and that you would like to resolve the conflict by discarding your unrecorded local changes. Here is the way to do it:
- darcs revert # removes conflict-markers in the files
- darcs unpull # unpull enough patches till before your 'pull'
- darcs pull # re-pull all the patches again
Recovering from darcs hanging
On some occasions darcs has been known to 'spin' and take a tremendously long time to complete a task (hours!). Although rare, it's useful to know how to recover from such a situation.
If you can't wait for darcs to finish, you should be able to safely use Control-C to cancel the operation and try another approach. Unless you are doing one of the un- commands at the time, you shouldn't be able to mess up your repo. You may need to clean up a lock file named _darcs/_lock (documented elsewhere on the wiki), and run darcs check and darcs repair to check and repair any inconsistencies that may have developed.
See: ConflictMisery
Automated Testing
If you have some automatic tests for your code, you can make darcs run the test suite each time a darcs record command is issued.
darcs setpref test 'make test' # darcs setpref test 'the command to run when testing'
If the test suite fails, the patch will not be recorded. If it passes, it will.
