How to migrate a Drupal 7 site to Markdown files

Table of contents

We decided to migrate our old 7sabores blog site from Drupal7 to Markdown to be consumed using GatsbyJS.

The first question was, is there any contributed module available to do that easily?

The answer was no. Even in D7, finding a module with this behaviour could be very tricky. The only solution was PHP scripts + D7 Bootstrap + Markdown Library.

Let’s get started. First, download the scripts from here. Paste the user-migrate.php and node-migrate.php scripts in your D7 root and then install the dependencies.

Regarding dependencies, we need a library to convert our HTML body into Markdown. The package league/html-to-markdown (https://github.com/thephpleague/html-to-markdown) will help you do it.:

Using the following line will allow you to install the package and update our composer.json:

composer require 'league/html-to-markdown'

Let’s break down each major component required to migrate our Drupal 7 data to markdown files. In the following example, we just export users and nodes from Drupal.

Users

The following script will provide an author.yaml file from Drupal users in a predefined directory.

<?php

//Loading Drupal Bootstrap
define('DRUPAL_ROOT', getcwd());
$_SERVER['REMOTE_ADDR'] = '127.0.0.1';
require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

//Configure Directories
$destination_dir = '../static';
$user_dir = $destination_dir . '/user/';
$file_name = 'author.yaml';
if (!file_exists($destination_dir)) {
    mkdir($destination_dir, 0777, true);
}

$uids = db_select('users', 'u')
    ->fields('u', array('uid'))
    ->orderBy('u.created', 'ASC')
    ->execute()
    ->fetchCol();

$users = user_load_multiple($uids);

if (!file_exists($user_dir)) {
    mkdir($user_dir, 0777, true);
} else {
    delete_directory($user_dir);
    mkdir($user_dir, 0777, true);
}

$text = '';
$user_count = 0;
foreach ($users as $user) {
    if ($user->name) {
        echo "- $usernamen";

        $username = clean_text($user->name);
        $text .= "- id: $usernamen";

        $name = '';
        if ($user->field_nombre['und'][0]) {
            $name = $user->field_nombre['und'][0]['value'];
        }
        if ($user->field_apellido['und'][0]) {
            $name .= ($name) ? ' ' . $user->field_apellido['und'][0]['value'] : $user->field_apellido['und'][0]['value'];
        }
        if ($name) {
            $name = clean_text($name);
            $text .= "  name: $namen";
        }

        if ($user->field_pais['und'][0]) {
            $country = clean_text($user->field_pais['und'][0]['value']);
            $text .= "  country: $countryn";
        }

        if ($user->field_estado_provincia['und'][0]) {
            $province = clean_text($user->field_estado_provincia['und'][0]['value']);
            $text .= "  province: $provincen";
        }

        $user_count++;
    }
}

write_file($user_dir . $file_name, $text);
echo "n$user_count users were exported.n";

/*
 * Delete all directories from a passed directory.
 */
function delete_directory($dir)
{
    if (!file_exists($dir)) {
        return true;
    }
    if (!is_dir($dir) || is_link($dir)) {
        return unlink($dir);
    }
    foreach (scandir($dir) as $item) {
        if ($item == '.' || $item == '..') {
            continue;
        }
        if (!delete_directory($dir . "/" . $item, false)) {
            chmod($dir . "/" . $item, 0777);
            if (!delete_directory($dir . "/" . $item, false)) return false;
        };
    }
    return rmdir($dir);
}

/*
 * Write file function.
 */
function write_file($filen_name, $text)
{
    $my_file = fopen($filen_name, "w") or die("Unable to create file!");
    fwrite($my_file, $text);
    fclose($my_file);
}

/*
 * Add single quote to large strings.yup
 */
function clean_text($text)
{
    if (count(explode(' ', $text)) >= 2) {
        $text = "'$text'";
    }

    return $text;
}

Customizing the script

The script fully adapts to the defined user structure. You need to review the user structure into Drupal before, in order to execute it and then check if the fields are matching up.

In this case, the markdown converter is not used. You will need to use it in order to continue reading the node script.

About the exported directory structure, a file author.yaml will be created into a directory called “user”. The destination directories and the filename are configurable in the first lines of the script.

Running user exporting script

Using PHP CLI you can execute the script:

php user-migrate.php

Use migrate

User migrate 2

Output (static/user/author.yaml):

- id: admin
name: 'Admin Admin'
country: US
- id: johndoe
name: John Doe'
country: CR
province: Heredia

Nodes

This script is a little bit more complex than the user script. We can pass the content type as an argument; if the content type doesn’t have any custom fields, the script only exports the common fields of a node e.g: title, date, author, path, summary/body.

<?php

//Loading Drupal Bootstrap
define('DRUPAL_ROOT', getcwd());
$_SERVER['REMOTE_ADDR'] = '127.0.0.1';
require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$destination_dir = '../static';
if (!file_exists($destination_dir)) {
    mkdir($destination_dir, 0777, true);
}

//HTML Converter autoload
requireAutoloader();
$converter = new LeagueHTMLToMarkdownHtmlConverter();

//Args
if (!isset($argv[1])) {
    echo "error: A content type is required.n";
}

$content_type = $argv[1];

//Query
$nids = db_select('node', 'n')
    ->fields('n', array('nid'))
    ->fields('n', array('type'))
    ->condition('n.status', 1)
    ->condition('n.type', $content_type)
    ->orderBy('n.created', 'ASC')
    ->execute()
    ->fetchCol();
$nodes = node_load_multiple($nids);

$content_type_dir = $destination_dir . '/' . $content_type . '/';
if (!file_exists($content_type_dir)) {
    mkdir($content_type_dir, 0777, true);
} else {
    delete_directory($content_type_dir);
    mkdir($content_type_dir, 0777, true);
}

$count_nodes = 0;
if ($nodes) {
    foreach ($nodes as $node) {
        echo "- $titlen";


        $text = "---n";
        $title = clean_text($node->title);
        $text .= "title: $titlen";

        $date = date('Y-m-d', $node->created);
        $text .= "date: $daten";

        $user_author = user_load($node->uid);
        $author = clean_text($user_author->name);
        $text .= "author: $authorn";

        $path = drupal_get_path_alias('node/' . $node->nid);
        $text .= "path: $pathn";

        //Blog custom fields
        if ($content_type == 'blog') {
            //Term load example
            if ($node->field_tema['und']) {
                $text .= "topics:n";
                foreach ($node->field_tema['und'] as $topic) {
                    $taxonomy = taxonomy_term_load($topic['tid']);
                    $taxonomy_name = clean_text($taxonomy->name);
                    $text .= "  - $taxonomy_namen";
                }
            }

            //Getting a file uri
            if ($node->field_cover['und']) {
                $uri = clean_media($node->field_cover['und'][0]["uri"]);
                $text .= "cover: $urin";
            }
        }

        //Lesson custom fields
        if ($content_type == 'lesson') {
            if ($node->field_vimeo_free['und'][0]['vimeo']) {
                $vimeo = $node->field_vimeo_free['und'][0]['vimeo'];
                $text .= "vimeo: $vimeon";
            }

            if ($node->field_tema['und']) {
                $text .= "topics:n";
                foreach ($node->field_tema['und'] as $topic) {
                    $taxonomy = taxonomy_term_load($topic['tid']);
                    $taxonomy_name = clean_text($taxonomy->name);
                    $text .= "  - $taxonomy_namen";
                }
            }

        }

        if ($node->body['und'][0]['summary']) {
            $summary = clean_media(strip_tags($converter->convert($node->body['und'][0]['summary'])));
            $text .= "summary: $summaryn";
        }

        $text .= "---nn";
        if ($node->body['und'][0]['value']) {
            $body = clean_media(strip_tags($converter->convert($node->body['und'][0]['value'])));
            $text .= "$bodyn";
        }

        $filename = file_name($path);
        write_file($content_type_dir . $filename, $text);
        $count_nodes++;
    }

    echo "n$count_nodes nodes ($content_type) were exported.n";
}

/*
 * Loading league/html-to-markdown library.
 */
function requireAutoloader()
{
    $autoloadPaths = array(
        __DIR__ . '/vendor/autoload.php',
        __DIR__ . '/../../../autoload.php',
    );
    foreach ($autoloadPaths as $path) {
        if (file_exists($path)) {
            require_once $path;
            break;
        }
    }
}

/*
 * Delete all directories from a passed directory.
 */
function delete_directory($dir)
{
    if (!file_exists($dir)) {
        return true;
    }
    if (!is_dir($dir) || is_link($dir)) {
        return unlink($dir);
    }
    foreach (scandir($dir) as $item) {
        if ($item == '.' || $item == '..') {
            continue;
        }
        if (!delete_directory($dir . "/" . $item, false)) {
            chmod($dir . "/" . $item, 0777);
            if (!delete_directory($dir . "/" . $item, false)) return false;
        };
    }
    return rmdir($dir);
}

/*
 * Write file function.
 */
function write_file($filen_name, $text)
{
    $my_file = fopen($filen_name, "w") or die("Unable to create file!");
    fwrite($my_file, $text);
    fclose($my_file);
}

/*
 * Return the file name.
 */
function file_name($path)
{
    $explode_path = explode('/', $path);
    return $explode_path[1] . '.md';
}

/*
 * Convert media url linked to asset folder.
 */
function clean_media($str)
{
    $asset_dir = '../assets/';

    $str = str_replace('public://', $asset_dir, $str);
    $str = str_replace('/sites/default/files/styles/large/public/', $asset_dir, $str);
    $str = str_replace('sites/default/files/styles/medium/public/', $asset_dir, $str);

    return $str;
}

/*
 * Add single quote to large strings.yup
 */
function clean_text($text)
{
    if (count(explode(' ', $text)) >= 2) {
        $text = "'$text'";
    }

    return $text;
}

Customizing the script

The script is totally adaptable based on the defined content type structure.

Running the script

php node-migrate.php blog

Node migrate

Node migrate 2

Output (static/blog/abrir-carpeta-sublime-text-consola.md):

title: 'Como abrir una carpeta con Sublime Text desde consola en mac'

date: 2013-11-06

author: johndoe

path: blog/abrir-carpeta-sublime-text-consola

topics:

- Programación

Assets

Regarding images, we need to copy the /sites/default/files folder into the new static directory with the name assets. The directory destination and the name for the assets dir are configurable. The function clean_media will be in charge to convert Drupal default URLs to the new ones in the asset dir.

Results

The exported directory structure contains one file (*.md) for each node, using the path as filename, and are included under one directory, per content type.

Directory structure

This example just supports nodes and users from Drupal. If you need to export permissions, list of modules, taxonomies itself, or others, you must modify the script or create another one.

Now you are ready to start a new Gatsby project using by the content from a D7 site.

Enjoy your coding!