Introduction
While working at Evolution Robotics (now part of iRobot) I was in charge of migrating some of they’re svn repositories into git, one was in particular was really complicated and forced me and my colleague to learn lots of the internals of the migration tool, svn and git. This repository had over 12000 commits, 100 branches and tags, many subprojects inside we needed to split, all together making a mess, with non regular branching rules, commits that modified lots of files, etc.
The initial migration attempt I made was by using git over svn, but it didn’t correctly migrated tags and branches, and was incredible slow to work with. Then my boss told me about svn-all-fast-export so I gave it a try, and felt in love with it.
svn-all-fast-export is a tool developed by the KDE team that lets you migrate svn repositories into git, you can get this tool from here.
This tool uses rules that get parsed and applied to an svn sync copy, not a regular svn copy, you need some repository information not available with svn copy, you can make this copy over the svn protocol, so no special permissions are needed.
svn sync copy
I will not explain much about this rules, just tell you they work, and you can use it to create an svn copy of a repository.
# this steps needs to be run only once # replace ${SVN_COPY} with the folder where you want to store it. # replace ${SVN_URL} with the url you want to clone from. $ svnadmin create ${SVN_COPY} $ cat << 'EOF' > ${SVN_COPY}/hooks/pre-revprop-change #!/bin/sh USER='$3' if [ "$USER" = "svnsync" ]; then exit 0; fi echo "Only svnsync user can change revprops" >&2 exit 1 EOF $ chmod +x ${SVN_COPY}/hooks/pre-revprop-change svnsync init \ --username svnsync \ file://`pwd`/${SVN_COPY} \ ${SVN_URL}
Then each time you want to synchronize (or after the first time you executed the previous lines)
# replace {SVN_COPY} with the name of the directory where you are # storing the svn clone $ svnsync sync file://`pwd`/${SVN_COPY}
Writting rules
The quality of your migration depends a lot on how much you can automate. After all, the task is tedious and repetitive, and you don’t want to screw it up, specially when you have thousands of commits, or are splitting one repository into several smaller ones.
There’s one problem though, documentation for svn-all-fast-export isn’t very explicit, and is kind of outdated, so you will have to read the code in order to understand what’s going on, or what is allowed in the rules file. In case you want to play around with the code you should look into src/ruleparser.cpp.
Simple explample
Let’s start with the minimal config we can have.
create repository project1 end repository match /project1/trunk repository project1 branch master end match match / end match
Let’s do some explanation before we go any further
create repository project1 end repository
This rule will create a git repository named project1, you should note that if folder project1 exists and it’s a git repository it will not get overwritten, BUT if you are not careful you may end up with many disconnected git objects. Most of the time you shouldn’t worry about this, as svn-all-fast-export is smart enough to resume accordly, but be warned.
match /project1/trunk/ repository project1 branch master end match
This rule will match any path that inside {SVN_REPOSITORY}/project1/trunk/* to project1 on branch master. Make sure you understand what this means before going on. It’s a very important concept, and it’s rather easy to forget about it while you’re righting rules.
NOTE: Don’t forget about the extra slash after your path, otherwise the tool will crash or blame, and you will get your hairs out for just a stupid missing slash (true story happened me a lot).
It’s worth to mention that matches can make use of regular expressions and all it’s features, like grouping and substitution.
Repository creation
The first thing you need to do is tell svn-all-fast-export which are the repositories it will use, this rules are going to be executed before the first commit is processed. I’m not sure if they need to be the first thing in the config file, but it’s nice to do it.
The rule for creation is:
create repository [PATH] <description [TEXT]> end repository
[PATH] is the path where the repository is going to be stored, a nice non documented feature is that the repository can have slashes in the name, meaning you can create a directory structure with your git repositories if you want, so in our rules we ended with something as:
common/ repo1.git repo2.git specific/ repo1.git repo2.git somethingelse/ repo1.git repo2.git
It’s worth to mention that somethigelse/repo1.git is not the same as common/repo1.git. Also as the results you get are bare repositories it’s better if you add .git to the path name.
<description [TEXT]>
is an optional rule that lets you set the repository description (file git/description), if you will later be pushing this bare repositories into your central repositories then you should make use of this optional feature, otherwise it will not be honored (at least gitolite doesn’t do it).
Helpers
svn-all-fast-export has many helpers, with most non been documented.
Variables
Variables can easily be created, this is a nice feature that we used when splitting repositories. For example this will let you have one variable which will hold the name of all your repositories, so you don’t have to type them over and over.
declare VARIABLE="content"
Example usage:
create repository something/project1.git end repository create repository something/project2.git end repository declare PROJECTS=(project1|project2) match /trunk/${PROJECTS}/ repository something/$1.git branch master end match
Importing rules
In complex projects is very easy to end up with rule files that are thousands of lines long, if you are used to read code like I am, you may also get upset when you see code that has thousands or tens of thousands lines, and what you do to avoid is splitting code into several files, and then importing it. So why not doing it in the migration rules? Lucky for us there’s a way to do it.
import [FILE]
Where [FILE] points to the file you want to load. Files are loaded from the rules file folder, so you non absolute references will match the same directory as the caller file.
import will open [FILE] and copy the content of it into the line that says import. Same file can be imported multiple times.