Migrating large svn repositories into git

Introduction

While working at Evolution Robotics (now part of iRobot) I was in charge of migrating some of they’re svn repositories into git, one was in particular was really complicated and forced me and my colleague to learn lots of the internals of the migration tool, svn and git. This repository had over 12000 commits, 100 branches and tags, many subprojects inside we needed to split, all together making a mess, with non regular branching rules, commits that modified lots of files, etc.

The initial migration attempt I made was by using git over svn, but it didn’t correctly migrated tags and branches, and was incredible slow to work with. Then my boss told me about svn-all-fast-export so I gave it a try, and felt in love with it.

svn-all-fast-export is a tool developed by the KDE team that lets you migrate svn repositories into git, you can get this tool from here.

This tool uses rules that get parsed and applied to an svn sync copy, not a regular svn copy, you need some repository information not available with svn copy, you can make this copy over the svn protocol, so no special permissions are needed.

svn sync copy

I will not explain much about this rules, just tell you they work, and you can use it to create an svn copy of a repository.

# this steps needs to be run only once
# replace ${SVN_COPY} with the folder where you want to store it.
# replace ${SVN_URL} with the url you want to clone from.

$ svnadmin create ${SVN_COPY}

$ cat << 'EOF' > ${SVN_COPY}/hooks/pre-revprop-change
#!/bin/sh

USER='$3'

if [ "$USER" = "svnsync" ]; then exit 0; fi

echo "Only svnsync user can change revprops" >&2
exit 1
EOF

$ chmod +x ${SVN_COPY}/hooks/pre-revprop-change

svnsync init \
        --username svnsync \
        file://`pwd`/${SVN_COPY} \
        ${SVN_URL}

Then each time you want to synchronize (or after the first time you executed the previous lines)

# replace {SVN_COPY} with the name of the directory where you are
# storing the svn clone

$ svnsync sync file://`pwd`/${SVN_COPY}

Writting rules

The quality of your migration depends a lot on how much you can automate. After all, the task is tedious and repetitive, and you don’t want to screw it up, specially when you have thousands of commits, or are splitting one repository into several smaller ones.

There’s one problem though, documentation for svn-all-fast-export isn’t very explicit, and is kind of outdated, so you will have to read the code in order to understand what’s going on, or what is allowed in the rules file. In case you want to play around with the code you should look into src/ruleparser.cpp.

Simple explample

Let’s start with the minimal config we can have.

create repository project1
end repository

match /project1/trunk
   repository project1
   branch master
end match

match /
end match

Let’s do some explanation before we go any further

create repository project1
end repository

This rule will create a git repository named project1, you should note that if folder project1 exists and it’s a git repository it will not get overwritten, BUT if you are not careful you may end up with many disconnected git objects. Most of the time you shouldn’t worry about this, as svn-all-fast-export is smart enough to resume accordly, but be warned.

match /project1/trunk/
   repository project1
   branch master
end match

This rule will match any path that inside {SVN_REPOSITORY}/project1/trunk/* to project1 on branch master. Make sure you understand what this means before going on. It’s a very important concept, and it’s rather easy to forget about it while you’re righting rules.

NOTE: Don’t forget about the extra slash after your path, otherwise the tool will crash or blame, and you will get your hairs out for just a stupid missing slash (true story happened me a lot).

It’s worth to mention that matches can make use of regular expressions and all it’s features, like grouping and substitution.

Repository creation

The first thing you need to do is tell svn-all-fast-export which are the repositories it will use, this rules are going to be executed before the first commit is processed. I’m not sure if they need to be the first thing in the config file, but it’s nice to do it.

The rule for creation is:

create repository [PATH]
<description [TEXT]>
end repository

[PATH] is the path where the repository is going to be stored, a nice non documented feature is that the repository can have slashes in the name, meaning you can create a directory structure with your git repositories if you want, so in our rules we ended with something as:

common/
    repo1.git
    repo2.git
specific/
    repo1.git
    repo2.git
somethingelse/
    repo1.git
    repo2.git

It’s worth to mention that somethigelse/repo1.git is not the same as common/repo1.git. Also as the results you get are bare repositories it’s better if you add .git to the path name.

 <description [TEXT]>

is an optional rule that lets you set the repository description (file git/description), if you will later be pushing this bare repositories into your central repositories then you should make use of this optional feature, otherwise it will not be honored (at least gitolite doesn’t do it).

Helpers

svn-all-fast-export has many helpers, with most non been documented.

Variables

Variables can easily be created, this is a nice feature that we used when splitting repositories. For example this will let you have one variable which will hold the name of all your repositories, so you don’t have to type them over and over.

declare VARIABLE="content"

Example usage:

create repository something/project1.git
end repository

create repository something/project2.git
end repository

declare PROJECTS=(project1|project2)

match /trunk/${PROJECTS}/
     repository something/$1.git
     branch master
end match

Importing rules

In complex projects is very easy to end up with rule files that are thousands of lines long, if you are used to read code like I am, you may also get upset when you see code that has thousands or tens of thousands lines, and what you do to avoid is splitting code into several files, and then importing it. So why not doing it in the migration rules? Lucky for us there’s a way to do it.

import [FILE]

Where [FILE] points to the file you want to load. Files are loaded from the rules file folder, so you non absolute references will match the same directory as the caller file.

import will open [FILE] and copy the content of it into the line that says import. Same file can be imported multiple times.

Matching rules

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s