Contents:
Contents
One-way Interoperability with CVS
I (Emilio Lopes) recently started to use Darcs to track a relatively large CVS project, the GNU Emacs development sources. The source tree is about 90 MB large as of August 2005.
I decided to have (at least) two copies of the repository: a pristine source tree and a "private" source tree, where I make my changes to the software. The pristine source tree is shared between CVS and Darcs, while my private source tree is a Darcs-only project.
The pristine source tree
The pristine source tree is shared between CVS and Darcs and acts as an unidirectional gateway. It's unidirectional because I don't have write-access to the CVS project. The Darcs project tree was initialized and imported as usual starting from a clean CVS project tree.
No real work is done on this tree, except for
cvs -q update -dP
followed by
darcs record --all --patch-name="Update from CVS $(date '+%F %T')"
I didn't bother to use more sophisticated tools to convert CVS non-atomic changes in self-contained patches. I might investigate this possibility latter.
Since CVS can't handle moves or renames I occasionally have to take care of these and issue the appropriate darcs mv command myself. If you find this too bothersome, you can add the option "--look-for-ads" to the record command above.
The private source tree
All local changes to the software are done in the private source tree. It was created from the pristine source tree with a get command. I work on the private source tree as usual. Regularly I issue a darcs pull command to get the newest upstream changes from the pristine source tree.
As I don't have write-access to the CVS project I submit my changes as diff(1) patches to the upstream developers. For this I use the darcs diff command.
Converting CVS' '.cvsignore' to Darcs' 'boring'
Because I actually build the software on this tree, I had to instruct Darcs to ignore the products specific to this project. I accomplished this by translating CVS' .cvsignore files to a Darcs boring file. Note that there exists one .cvsignore per project subdirectory. The actual convertion was done by the following program:
#! /bin/sh
# -*- Scheme -*-
exec scsh -o srfi-1 -o srfi-13 -o let-opt -e main -s "$0" "$@"
!#
;;; cvsignore2darcsboring --- convert .cvsignore files to Darcs' boring format
;; Copyright (C) 2005 Emílio C. Lopes
;; Author: Emílio C. Lopes <eclig@gmx.net>
;; Created: Mon Aug 15 10:55:56 CEST 2005
;; Version: 0.1
;; This program is free software; you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation; either version 2, or (at your option)
;; any later version.
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU General Public License for more details.
;; If you have not received a copy of the GNU General Public License
;; along with this software, it can be obtained from the GNU Project's
;; World Wide Web server (http://www.gnu.org/copyleft/gpl.html), from
;; its FTP server (ftp://ftp.gnu.org/pub/gnu/GPL), by sending an
;; eletronic mail to this program's author or by writting to the
;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
;; Boston, MA 02110-1301, USA.
;; If you find this program useful please consider making a donation to
;; the Free Software Foundation. See http://www.fsf.org/help/donate.html
;; (USA), http://www.fsfeurope.org/help/donate.en.html (Europe) or
;; http://fsf.org.in (India) for details on how to accomplish this.
;; In some countries your donation is tax-deductible.
;;; Commentary:
;; This program converts a set of CVS' .cvsignore files in one
;; (hopefully) equivalent Darcs ignore file. To use it change to the
;; directory containing the root of your CVS project and start this
;; program there. It will search recursively for all .cvsignore files
;; in the directory tree and write the corresponding Darcs boring file
;; to the standard output. You can then append the output of this
;; program to Darcs' default boring file (_darcs/prefs/boring).
;; No files are actually created or deleted by this program.
;; To run this program you need a copy of Scsh, the Scheme Shell.
;; It is freely available from http://www.scsh.net. This program was
;; written for Scsh version 0.6.6.
;;; Code:
(define (main prog+args)
(process-dir (cwd) '()))
(define (process-dir dir path-list)
(with-cwd dir
(let ((cvsignore ".cvsignore"))
(if (file-exists? cvsignore)
(for-each writeln (cvsignore->darcsboring cvsignore (path-list->file-name path-list)))))
(for-each (lambda (subdir)
(process-dir subdir (append path-list (list subdir))))
(subdirs "."))))
(define (cvsignore->darcsboring cvsignore prefix)
(cvs-globs->darcs-regexps
(append-map (field-splitter) (file-as-string-list cvsignore))
prefix))
(define (cvs-globs->darcs-regexps cvs-globs . maybe-prefix)
(let-optionals maybe-prefix ((prefix ""))
(map (lambda (glob)
(string-append "^" (maybe-add-slash prefix) (glob->regexp glob) "$"))
cvs-globs)))
(define (glob->regexp glob-pattern)
(fold (lambda (from/to string)
(replace-regexp-in-string (rx ,(car from/to)) (caddr from/to) string))
glob-pattern
'(("." -> "\\.") ("?" -> ".?") ("*" -> ".*"))) )
(define (maybe-add-slash str)
(if (or (string-null? str)
(string-suffix? "/" str))
str
(string-append str "/")))
(define (replace-regexp-in-string regexp replacement string)
;; Replace all occurrences of REGEXP with REPLACEMENT in STRING.
;; If REPLACEMENT is a procedure, it is applied to the match
;; structure for the given match and should return a string to be
;; used in the result.
;; Example:
;; (replace-regexp-in-string (rx "foo") "bar" "foo foobar abc foofoo")
;; => "bar barbar abc barbar"
(regexp-substitute/global #f regexp string 'pre replacement 'post))
(define (file-as-string-list file)
;; Return contents of FILE as a list of lines.
(call-with-input-file file
(lambda (port)
(port->string-list port))))
(define (subdirs dir)
(with-cwd dir
(filter (lambda (subdir)
(and (file-directory? subdir)
(not (member subdir '("cvs" "CVS" "_darcs")))))
(directory-files dir))))
(define (writeln . args)
(for-each display args)
(newline))
;;; cvsignore2darcsboring ends here
I also had to manually add the pattern \.a$, used for library archives, which CVS ignores by default.
Performance
I started using Darcs for this project with version 1.0.3 and performance is acceptable. It's sure slower as GNU Arch (tla), which I previously used for this task, but Darcs' interface is much more convenient, specially the diff command in conjunction with commuting patches.
One specific command which at that time was inexplicable slow was
darcs diff <file>
since it is supposed to do nothing more than
diff _darcs/current/<file> <file>
This problem seems to be solved now as of version 1.0.5.
Two-Way Interoperability with CVS
Curiously enough, I (Stephen Turnbull) recently started to use Darcs to manage a relatively large CVS project, the XEmacs development sources. The source tree is about 70 MB large as of October 2005 (there's actually another 20MB of Unicode data as well, but this is copied verbatim from unicode.org).
My motivation was that there are now at least three major more or less asynchronous branches extant, but as is well-known, CVS is a disaster for managing such a situation. We also have a couple of prolific developers who drop "megapatches" on the project, which is admittedly partly an organizational problem, but does have a technical dimension as well. In any case, we'd prefer to give those developers a process that works for them if it can be done without making life impossible for others.
For realism, let's call the three branches "stable", "devel", and "xft". Now, "stable" and "devel" forked at the last release, and there is relatively little cross-fertilization now. What little there is is quite local; ordinary patches work well enough. I haven't had an opportunity to use darcs for that yet. So in practice there are two groups of branches: the singleton "stable", and the group of "devel" and "xft" (a relatively recent fork from "devel"). Let's consider them as separate cases.
One CVS branch plus Darcs
We assume that Darcs branches will be created often; that's the nature of Darcs, you create branches when you want to work on destabilizing tasks. As Emílio points out, you need (at least) two copies of the repository: an "integration" source tree and one or more "private" source trees, where you make your changes to the software. The integration tree is shared between CVS and Darcs, while each private tree is a Darcs-only project. An integration tree is physically exactly what Emílio called a pristine tree, but I use "integration" because the tree can become "dirty" in the sense that while you are preparing to commit to upstream, the tree is non-trivially out of sync. That is patches may be committed to upstream while you are preparing to create and commit a submission in the integration tree, creating a conflict you must manage---neither Darcs nor CVS can without your help. This problem doesn't occur in the one-way "gateway" tree setup.
For communication among multiple private trees, you just use push and pull as usual. No special discipline is required here, except that in general you're better off pulling your own changes into a private tree from another private tree rather than the integration tree, as the integration tree is likely to diverge more from any private tree than two private trees do from each other. On the other hand, there's no reason to pull upstream changes from the integration tree if you've already merged them to a private tree you need to pull from anyway. (Note: That's the theory. In practice I've found that you may get better results with one tree than another due to different patch ordering, but I can't be more specific than that at this point. If you suspect that this might be true for a given merge, just make a branch and test it out!)
In managing the integration tree, you want to cvs update frequently, and immediately use darcs record to tease out "clean changesets". I recommend avoiding the --all flag; it doesn't take that much more time to use the interactive facilities, and your darcs history will be much cleaner and more informative. Emílio says that no development work should be done in the pristine (integration) tree, and I agree with that. Of course you must resolve any Darcs conflicts before committing to CVS. If you find that you cannot cvs commit without a cvs update, use darcs unpull or darcs rollback to revert the integration tree to synchonicity with the upstream CVS repository.
Emílio builds in the tree. With CVS I built in a subdirectory called +build or some variant, but I'm now considering building in a sibling tree, rather than deal with both .cvsignore and the boringfile.
Tools like cvs2darcs and Tailor can help you automate this process. See Software That Works With Darcs on the FrontPage for more information.
Two CVS branches plus Darcs
This is the real point! I'm not at a point where I'm willing to forcefully advocate converting the XEmacs project entirely to Darcs management. None of the tools available can preserve our history (due to some unwise fiddling with the CVS repository, and quite probably CVS bugs, too---our history goes back to December 1996). And there is a lot of pressure to convert to Subversion, which probably can be done with less fuss than converting to a modern system like Darcs. But Subversion is not much better than CVS at handling complex, intertwined branches. I'm hoping that Darcs will provide a local solution to source code configuration, leaving CVS (or Subversion) as a distribution mechanism.
With two or more CVS branches, things become rather more complicated. I'm only at the very beginning of this experiment, so I expect to learn a lot in the next few weeks. But I've already started to codify the process in my own mind, so I think it's worth setting it down here.
In order to use Darcs to manage merges between two (or more) CVS branches, you will need an integration tree for each CVS branch. However, this gets tricky because of the Darcs model of conflicts. Specifically, suppose that some patch was developed on the "devel" branch, and merged to the "xft" branch in the CVS repository. Then
cd devel cvs update darcs record -a -m "Today's devel fixes" cd ../xft cvs update darcs record -a -m "Today's xft fixes"
will eventually result in a nasty conflict in Darcs when you do darcs pull ../devel in "xft", because Darcs considers any independent changes to the same location to be a conflict, even if they are identical. This applies in spades to ChangeLogs and other files that normally grow by prepending or appending. See The ChangeLog Problem below.
Fortunately, CVS does not! CVS, since it doesn't track changes, assumes that identical textual changes are the same logical change. That means that
cd devel cvs update darcs record -a -m "Today's devel fixes" cd ../xft darcs pull ../devel cvs update
"works" because CVS gives the "file already contains changes" message and happily updates the metadata without causing a conflict.
In my limited experience, it turns out that this process generalizes to multiple private Darcs branches (of course!), to bidirectional synchronization where the branches are closely related enough that bug fixes (a prime example) flow in both directions, and to three or more integration branches. However, the discipline of using an interactive record to merge upstream changes really becomes important here. First, you are aware of the changes, and can recognize a common change that appears in the upstream logs. Second, just as CVS requires an update before a commit, you should discipline yourself to do an interactive darcs pull from related branches to pick up any "cherry-picking" that has occured upstream, too, before doing a CVS update.
I think I understand the logic of why this has worked in practice; I'm pretty confident that it should work for common cases.
However, this process can't possibly generalize to multiple developers privately synchronizing branches in this way. If you run into that situation, you should hold out for a common "advanced" solution among the developers who are active on more than one branch simultaneous.
I don't know offhand of software to support this workflow, but I will be looking for it. Please update this section if you know of some.
The ChangeLog Problem
If you're working with Darcs, you should avoid physical ChangeLogs (or any similar mechanism, such as a history comment in each source file) if at all possible. The problem is that there will necessarily (by policy!) be concurrent insertions at the same place, and every single one is a conflict. Furthermore, my experience so far indicates that since resolving the conflict is a local change, unless you pull the resolution patch into the other branch, the resolution patch will conflict with further logs on the other branch when you pull them. On the other hand, we already know that pulling the resolution patch into the other branch will cause a conflict, since there's nothing to resolve there. Pulling both the conflicting patch and the resolution patch also seems to cause a conflict, as pulling the conflicting patch triggers a conflict before the resolution is realized.
Worse yet, the repeated conflicts generate nested merger patches, which take Darcs huge amounts of time to analyze.
Now, I don't know if my analysis of the conflict process is 100% correct, but I've certainly observed deeply nested mergers after only a few patches. This is a situation worth avoiding at all costs!
So far the following process seems to work: ensure a one-way patch flow by only adding logs to the integration branch. In other branches, use Darcs to keep the history, and only create a physical log when code is added to the integration branch, whether by cvs update from upstream, or by pulling a patch from a task branch. Then you can safely pull the ChangeLog from the integration branch to the task branches at the same time as you pull the code.
The comments about the ChangeLog Problem apply to Darcs v. 1.0.4pre4 and earlier versions, but has not been updated since that version was current.
Related Pages
Switching from CVS from the official documentation.
CVS-style development with darcs, a thoughtful post to the darcs users list.
David Roundy gives his thoughts on interoperability with CVS (Oct, 2003)
