Cloning an SVN Repository into Git with all Tags and Branches

git-svn is awesome, but I recently realized that it does not automatically create all tags and branches in Git that are present in SVN. It really just pulls the all into remote branches and leaves them there. This is the full process I used to carry all branches and tags over from an SVN repository into a Git repository.

Install Git SVN (Ubuntu)

sudo apt-get install git-svn;

Pull the SVN Repository into Git

If svn repo has standard trunk, branches, and tags, use this command at the root above those directories

git svn clone -s http://somesvnrepo.com/somesvnrepo/ --username=user@example.com destinationdirname;

Script to create tags

This will retain tag messages from SVN.

#!/bin/sh
git for-each-ref --format="%(refname:short) %(objectname)" refs/remotes/tags |
while read tag ref;
do
    tag=`echo $tag | sed "s|tags/||g"`;
    comment="$(git log -1 --format=format:%B $ref)";
    git tag -a $tag -m "$comment" $ref;
    git branch -r -d "tags/$tag"
done;

Script to create branches (after running tags script)

#!/bin/sh
git for-each-ref --format="%(refname:short) %(objectname)" refs/remotes |
while read branch ref;
do
    git branch $branch $ref;
done;
git branch -d trunk;

Push to Git Repo

git remote add origin https://somegitrepo.com/somegitrepo;
git push -u origin --all;
git push -u origin --tags;

Setting up Multiple Redis Instances on a Single Magento Server

I had a bit of trouble figuring out a scalable way to set up multiple Redis instances on a single Magento server. There are a few posts (1, 2, 3) that were helpful, but included too much manual setup for my purposes. They all also required starting/stopping all redis instances individually.

Note: I have not found a good way to make this work with Upstart. If your Redis installation is using Upstart and you would like to use this method, just move your /etc/init/redis-server.conf file to /etc/init/redis-server.conf.bak. When I figure out a good solution for Upstart, I’ll update this post.

I am using Ubuntu 12.04 LTS, but most of this applies to any distro. This setup shares a Redis default config among all servers. Each new server only requires that a server specific configuration file be created with only four required settings. All redis instances may be started at once with

service redis-server start

or each server may be started individually with

service redis-server start server1

. This works will all service commands.

Install Redis and PHP Redis Client

apt-get update;
apt-get install -y php-pear php5-dev make redis-server;
pecl install redis;
echo 'extension=redis.so' > /etc/php5/conf.d/redis.ini;
service php5-fpm restart; #If you are running PHP-FPM

Set Up Your Init Script

cd /etc/init.d;
mv redis-server redis-server.bak;
touch redis-server;
chmod 0755 redis-server;
nano redis-server;

Paste these contents:

#!/bin/bash
### BEGIN INIT INFO
# Provides:             redis-server
# Required-Start:       $syslog $remote_fs
# Required-Stop:        $syslog $remote_fs
# Should-Start:         $local_fs
# Should-Stop:          $local_fs
# Default-Start:        2 3 4 5
# Default-Stop:         0 1 6
# Short-Description:    redis-server - Persistent key-value db
# Description:          redis-server - Persistent key-value db
### END INIT INFO

if [ -n "$2" ]
then
NAME=$2

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/bin/redis-server
DAEMON_ARGS=/etc/redis/servers/$NAME.conf
DESC=redis-server

RUNDIR=/var/run/redis
PIDFILE=$RUNDIR/$NAME.pid

test -x $DAEMON || exit 0

set -e

case "$1" in
  start)
        echo -n "Starting $DESC: "
        mkdir -p $RUNDIR
        touch $PIDFILE
        chown redis:redis $RUNDIR $PIDFILE
        chmod 755 $RUNDIR
        if start-stop-daemon --start --quiet --umask 007 --pidfile $PIDFILE --chuid redis:redis --exec $DAEMON -- $DAEMON_ARGS
        then
                echo "$NAME."
        else
                echo "failed"
        fi
        ;;
  stop)
        echo -n "Stopping $DESC: "
        if start-stop-daemon --stop --retry forever/QUIT/1 --quiet --oknodo --pidfile $PIDFILE --exec $DAEMON
        then
                echo "$NAME."
        else
                echo "failed"
        fi
        rm -f $PIDFILE
        ;;

  restart|force-reload)
        ${0} stop $2
        ${0} start $2
        ;;

  status)
        echo -n "$DESC is "
        if start-stop-daemon --stop --quiet --signal 0 --name ${NAME} --pidfile ${PIDFILE}
        then
                echo "running"
        else
                echo "not running"
                exit 1
        fi
        ;;

  *)
        echo "Usage: /etc/init.d/$NAME {start|stop|restart|force-reload}" >&2
        exit 1
        ;;
esac

else

FILES=/etc/redis/servers/*
for f in $FILES
do
SERVERNAME=$(sed 's|/etc/redis/servers/||g' <<< $f)
SERVERNAME=$(sed 's|.conf||g' <<< $SERVERNAME)
/etc/init.d/redis-server "$1" "$SERVERNAME"
done

fi

exit 0

Set Up Your Config

Default config file

cd /etc/redis;
mkdir servers;
cp redis.conf servers/server1.conf;
cd servers;
nano server1.conf;

Strip out all non server specific configuration from server1.conf. The remaining lines should include pidfile, port, logfile, and dbfilename. pidfile, logfile, and dbfilename should all match the server conf file name “server1″. The resulting file should look something like this:

# Redis server config
include /etc/redis/redis.conf

# When running daemonized, Redis writes a pid file in /var/run/redis.pid by
# default. You can specify a custom pid file location here.
pidfile /var/run/redis/server1.pid

# Accept connections on the specified port, default is 6379.
# If port 0 is specified Redis will not listen on a TCP socket.
port 6379

# Specify the log file name. Also 'stdout' can be used to force
# Redis to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /prod/null
logfile /var/log/redis/server1.log

# The filename where to dump the DB
dbfilename dump-server1.rdb

Restart Redis instances:

service redis-server restart;

Additional Servers

To create new servers, simply copy the server1.conf file to server2.conf and update all relevant strings to your new server name (in this case, server2). If your server name is unique enough, you can just run the file through sed. You’ll also need to update the port manually to a new port.

cd /etc/redis/servers;
cp server1.conf server2.conf;
sed -i s/server1/server2/g server2.conf;
grep -hR ^port . | sort; # Determine the highest used port number
nano server2.conf; #Update the port number

Resulting file:

# Redis server config
include /etc/redis/redis.conf

# When running daemonized, Redis writes a pid file in /var/run/redis.pid by
# default. You can specify a custom pid file location here.
pidfile /var/run/redis/server2.pid

# Accept connections on the specified port, default is 6379.
# If port 0 is specified Redis will not listen on a TCP socket.
port 6380

# Specify the log file name. Also 'stdout' can be used to force
# Redis to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /prod/null
logfile /var/log/redis/server2.log

# The filename where to dump the DB
dbfilename dump-server2.rdb

Start the new Redis instance(s):

service redis-server start;

Please let me know if you see any issues with this or have any suggestions. This seems to work for me very well so far.

Magento Customer and Visitor Logging

Related to the issue in my previous post about log cleaning, I would like to take some time to discuss customer and visitor logging in a little more detail.

The log_url_info table can bloat your database quickly. It logs every single unique visited URL. I have commonly seen this grow to 5GB or more, even when storing logs for only a short time period. As a general rule, database size has a direct impact on performance.

So what is this data really used for and is there a safe way to reduce it without affecting site functionality?

It seems Magento’s primary purpose for this data is analytics. In my opinion, there are much more robust platforms for this like Google Analytics or Omniture which should be used. If anyone sees analytics value in Magento’s data over other platforms, please share your experience. Also note that if you are using a proxy cache like Squid or Varnish, a frontend analytics platform like Google Analytics will track data more accurately since requests are not always sent to the server; Varnish will serve a page directly, the Magento app will not run, and the customer visit will not be logged.

These logs also drive the “Online Customers” page in the admin. When you load the “Online Customers” page, the log_visitor table is read and the log_visitor_online table is populated with the users who have visited within the threshold you have set; by default, 15 minutes.

/**
 * Mage_Log_Model_Resource_Visitor_Online 
 * 
 * This is the method that prepares log_visitor_online for viewing 
 * on the "Online Customers" page
 */
public function prepare(Mage_Log_Model_Visitor_Online $object)
{
    // Check if log_visitor_online was recently updated. If so, exit
    if (($object->getUpdateFrequency() + 
        $object->getPrepareAt()) > time()) {
        return $this;
    }
    
    //...
    // Delete existing data from log_visitor_online
    $writeAdapter->delete($this->getMainTable());
    
    //...
    // Build online visitors based on online interval    
    $lastDate = Mage::getModel('core/date')->gmtTimestamp() 
        - $object->getOnlineInterval() * 60;

    $select = $readAdapter->select()
        ->from(
            $this->getTable('log/visitor'),
            array('visitor_id', 'first_visit_at', 
                  'last_visit_at', 'last_url_id'))
        ->where('last_visit_at >= ?', 
            $readAdapter->formatDate($lastDate));

    // Add additional visitor data    
    // ...
    
    // Save the prepared date
    $object->setPrepareAt();
    
    return $this;
}

The time that this was calculated gets stored in cache and prevents the data from being calculated again until a specified amount of time has passed. This time threshold defaults to 60 seconds and is dictated by the “log/visitor/online_update_frequency” config setting. Clearing the cache resets this and causes log_visitor_online to rebuild. Oddly, Magento does not make this variable visible in the admin via a system.xml file. You can edit this in a config file, the database directly, or add your own system.xml definition.

/**
 * Mage_Log_Model_Visitor_Online
 * 
 * The time log_visitor_online was built is stored in cache.
 */
public function getPrepareAt()
{
    return Mage::app()->loadCache('log_visitor_online_prepare_at');
}

public function setPrepareAt($time = null)
{
    if (is_null($time)) {
        $time = time();
    }
    Mage::app()->saveCache($time, 'log_visitor_online_prepare_at');
    return $this;
}

I have not found any other direct uses of this data within Magento. My review of this has determined that it is safe to disable this entirely, if you are ok with losing the “Online Customers” functionality. However, I don’t recommend this unless you have achieved similar functionality in another analytics platform; seeing online customers is pretty useful. Should you want to do so, here is a good post on how to disable customer and visitor logging. This might even result in a performance improvement since Magento is no longer writing to the log tables on every request.

Recommendation

My recommendation is to reduce the number of days logs are saved to one. This keeps the “Online Customers” list functional, while keeping the table sizes relatively small. You can do this at System > Configuration > Advanced > System > Log Cleaing. If you have a high traffic site, you should also apply the patch mentioned in my previous post that speeds up log cleaning; otherwise, it can become slow, fail to run, and begin to grow larger every day.

Log Cleaning Config Screenshot

If you are reading this, you probably already have huge log tables. You can feel relatively safe about nuking this data, unless you have specific modifications around this. You were only planning on keeping it 180 days anyways. Always do your own research though and always test before you do anything in Production.

How to clear this data:

TRUNCATE log_customer;
TRUNCATE log_quote;
TRUNCATE log_summary;
TRUNCATE log_url;
TRUNCATE log_url_info;
TRUNCATE log_visitor;
TRUNCATE log_visitor_info;
TRUNCATE log_visitor_online;

Magento Log Cleaning Blocking Other Cron Jobs

We had a cron job running at 3am that synchronized inventory that started failing a couple of weeks ago. We recognized the issue only this last week. I determined the cause and the solution and wanted to share to help others in the community.

Every time the Magento cron runs, it checks to see if a cron job is already running and exits quietly if so (Oddly, this only behaves this way if shell execution is enabled in PHP. The php code version of this does not follow the same logic). This means that a long running cron job blocks scheduling and execution of other cron jobs. I will write a more complete post on Magento cron job behavior in general later.

Log cleaning of customer records gets exponentially slower as the amount of data increases. At some point, unless MySQL timeout settings are extemely high, this causes the script to fail with a timeout, causing the logs to never get cleaned, meaning they just keep growing.

This is the problem query:

SELECT `log_customer_main`.`log_id` FROM `log_customer` AS `log_customer_main`
LEFT JOIN `log_customer` ON log_customer_main.customer_id = log_customer.customer_id AND log_customer_main.log_id < log_customer.log_id
 WHERE (log_customer.customer_id IS NULL)
 AND (log_customer_main.log_id < 553985)

It comes from line 147 of Mage_Log_Model_Resource_Log in the _cleanCustomers() method.

The purpose of this query seems to be to get the latest log id for each customer. It is doing this by finding the negative results of join queries of each row to similar rows (by customer_id) that have a higher log_id entry.

A much simpler way to accomplish this is with a group by statement and a MAX expression.

SELECT MAX(log_id) as log_id
FROM log_customer
GROUP BY customer_id

In my testing, this always produces the same results. The difference is that the time for the latter query is relatively constant regardless of the amount of data. The time for the former grows exponentially as the data grows.

For 90 records, the first query took 18 seconds in my local environment. The second query took 0.4 seconds. In our production environment, the first query won’t even run because the timeout is set to less than four hours. The second query executes in less than 0.4 seconds – with 320,000 records.

The query looks like this in Magento code:

$select = $readAdapter->select()
->from($this->getTable('log/customer'), array('log_id' => new Zend_Db_Expr('MAX(log_id)')))
->group('customer_id')
->where('log_id < ?', $lastLogId + 1);

If anyone sees a reason why the first query must be used over the second, please let me know. They both seem to accomplish the same goal and produce the same results every time for me.

Here is a quick sample file on how to fix this:
My_Module_Model_Resource_Log

View Magento’s sorted module tree

Here is an easy way to view Magento’s module tree after dependency sorting has been applied.

# File: app/code/core/Mage/Core/Model/Config.php  
# Method: _loadDeclaredModules()  
# Line: around 830  
        foreach ($moduleDepends as $moduleProp) {  
            $node = $unsortedConfig->getNode('modules/'.$moduleProp['module']);  
            $sortedConfig->getNode('modules')->appendChild($node);  
        }  
// Add these lines to output the sorted config as an array  
$modules = array_keys((array)$sortedConfig->getNode('modules')->children());  
print_r($modules);  
die;  
        $this->extend($sortedConfig);  

Magento Routing: Using the same frontname for admin and frontend routes

I recently noticed an issue with a module, Devinc_Dailydeal, where one of it’s pages was redirecting to the same page under the base URL of the admin store. For example, I would visit http://www.myfrontend.com/mymodule and get redirected to https://www.myadmin.com/mymodule.

I looked into the module’s config.xml file to check the defined routes. I noticed that there was a route defined under the “frontend” node as well as a route defined under the “admin” node with the frontname “dailydeal”.

# File: app/code/community/Devinc/Dailydeal/etc/config.xml
<?xml version="1.0"?>
<config>
    ...
    <frontend>
        <routers>
            <dailydeal>
               <use>standard</use>
               <args>
                   <module>Devinc_Dailydeal</module>
                   <frontName>dailydeal</frontName>
               </args>
           </dailydeal>
       </routers>
    </frontend>
    ...
    <admin>
        <routers>
            <dailydeal>
                 <use>admin</use>
                 <args>
                     <module>Devinc_Dailydeal</module>
                     <frontName>dailydeal</frontName>
                 </args>
             </dailydeal>
        </routers>
    </admin> 
    ...
</config>

At first glance, this seemed ok since they are using separate routers. Closer inspection revealed that the Admin router will always be matched first. Routers are processed in a stack on every request. The default routers are Admin, Standard, Cms, then Default. (For more info on Magento’s routers, check Alan Storm’s blog post). This means that the Admin router runs on every page, not just pages starting with “admin”. The Admin router runs first and hits a match first on “dailydeal”. It does not know that “dailydeal” has also been specified as a frontname for the Standard router. It just knows that it has found a match and proceeds to route it.

While the Admin router is routing the request, it checks if the URL should be secure. This checks against the Admin store’s settings, not the frontend store. If the Admin is set to use secure pages and the admin secure base URL is https and is different from the current URL, a redirect will be issued. This is correct behavior but can cause a lot of confusion.

I looked into a number of other third party modules we have used and noticed a significant number of them use the same frontname for the Standard router and the Admin router. This means that under this set of circumstances, these will all break. In all likelyhood, these modules were never tested in a multi store setup with SSL implemented and never will be.

This behavior only occurs if:

The frontend base URL is different from the admin base URL (If not, it will just redirect to https, which probably won’t cause any issues other than possibly broken SSL)
The admin is set to use secure URLs
The secure URL for the admin is actually secure (starts with https)
No redirect is issued if the admin is not set to use secure URLs, even if the base URL is different. This seems like a logic error to me, but we’ll leave that be.

The Fix:

Beware, there is a lot of work to be done here and a lot of updates made to third party code, which is sub-optimal. Only do this if your site meets the aforementioned conditions and you are seeing this issue.

There is no simple solution to this. You cannot just change the frontname of the admin route. You must also change the route name. This is because Magento expects both route frontnames and route names to be unique across all routers. Specifically, Mage_Core_Model_Url::getUrl() eventually calls a method on the front controller which retreives the router from the route name, which must be unique or there will be conflicts.

# File: app/code/core/Mage/Core/Controller/Varien/Front.php
public function getRouterByRoute($routeName)
{
    // empty route supplied - return base url
    if (empty($routeName)) {
        $router = $this->getRouter('standard');
    } elseif ($this->getRouter('admin')->getFrontNameByRoute($routeName)) {
        // try standard router url assembly
        $router = $this->getRouter('admin');
    } elseif ($this->getRouter('standard')->getFrontNameByRoute($routeName)) {
        // try standard router url assembly
        $router = $this->getRouter('standard');
    } elseif ($router = $this->getRouter($routeName)) {
        // try custom router url assembly
    } else {
        // get default router url
        $router = $this->getRouter('default');
    }

    return $router;
}

Here, if the Admin router and the Standard router both have a “dailydeal” route defined, the Admin router will always win, even on frontend pages. Could this be any more convoluted Magento?

Once you change the route name, you will also have to update the layout handles in the adminhtml layout file to match, since they are prefixed with the route name. If you are going to do all of this, why not just fix it correctly…

So here’s how to fix it.

Replace the admin route with an injection of your module into the existing adminhtml route.

# File: app/code/community/Devinc/Dailydeal/etc/config.xml
<admin>
    <routers>
        <dailydeal>
            <use>admin</use>
            <args>
                <module>Devinc_Dailydeal</module>
                <frontName>dailydeal</frontName>
            </args>
        </dailydeal>
    </routers>
</admin>

Becomes:

<admin>
    <routers>
        <adminhtml>
            <args>
                <modules>
                    <Devinc_Dailydeal_Adminhtml before="Mage_Adminhtml">Devinc_Dailydeal_Adminhtml</Devinc_Dailydeal_Adminhtml>
                </modules>
            </args>
        </adminhtml>
    </routers>
</admin>

Update the adminhtml menu actions:

# File: app/code/community/Devinc/Dailydeal/etc/config.xml
<adminhtml>
    <menu>
        <dailydeal module="dailydeal">
             <title>Daily Deal</title>
             <sort_order>71</sort_order>
             <children>
                 <add module="dailydeal">
                     <title>Add Deal</title>
                     <sort_order>0</sort_order>
                     <action>dailydeal/adminhtml_dailydeal/new/</action>
                </add>
                ...
            </children>
        </dailydeal>
    </menu>
</adminhtml>

Becomes:

<adminhtml>
    <menu>
        <dailydeal module="dailydeal">
            <title>Daily Deal</title>
            <sort_order>71</sort_order>
            <children>
                <add module="dailydeal">
                    <title>Add Deal</title>
                    <sort_order>0</sort_order>
                    <action>adminhtml/dailydeal/new/</action>
                </add>
                ...
            </children>
        </dailydeal>
    </menu>
</adminhtml>

Replace the adminhtml layout handles:

# File: app/design/frontend/default/default/layout/dailydeal.xml
<dailydeal_adminhtml_dailydeal_index>
    <reference name="content">
        <block type="dailydeal/adminhtml_dailydeal" name="dailydeal" />
    </reference>
</dailydeal_adminhtml_dailydeal_index>

Becomes:

<adminhtml_dailydeal_index>
    <reference name="content">
        <block type="dailydeal/adminhtml_dailydeal" name="dailydeal" />
    </reference>
</adminhtml_dailydeal_index>

When working in the admin, the url will now be https://www.myadmin.com/admin/dailydeal

This could mean that you need to make changes elsewhere if there are hardcoded URLs anywhere. I noticed that I had to hard set a form action in one of the modules I was working with.

What can be learned from this?

When writing a module, do not use the same frontname for the Standard and Admin routers. In fact, don’t even create an admin router. All URLs in the admin should start with “/admin” (or whatever the admin frontname is configured to). This makes it clear and consistent to users that they are still in the admin.

Instead, inject controllers into the existing “adminhtml” route like this:

<?xml version="1.0"?>
<config>
    ...
    <admin>
        <routers>
            <adminhtml>
                <args>
                    <modules>
                        <MyNamespace_MyModule_Adminhtml before="Mage_Adminhtml">MyNamespace_MyModule_Adminhtml</MyNamespace_MyModule_Adminhtml>
                    </modules>
                </args>
            </adminhtml>
        </routers>
    </admin>
    ...
</config>

Then, create your admin controllers at MyNamespace/MyModule/controllers/Adminhtml.

The only caveat with doing this is that you must ensure you don’t create any naming conflicts with other admin controllers in core or other third party code. Use a specific and unique controller class name to avoid this.

Move Recent Commits to a New Feature Branch

So you’ve made a few commits to the development branch and now realize that you need to make a feature branch for this and revert your commits to development.

Here’s how:
* This assumes you are currently working within the development branch and may have uncommitted work

// Stash your current edits if you have uncommitted work
git stash; 

// Make a new branch from the latest commit on development
git branch new_feature; 

// Find the commit id just before your first commit to 
// development branch
git log; 

It will look something like this:

commit a0b446145776970738952a4687a2c91cecd12b5a
Author: My Name <my.name@amplificommerce.com>
Date:   Fri Oct 21 08:47:58 2011 -0700

Commit message 1

commit d0b84614377c900758952ac687acc91cecd12b5d
Author: Kirk Madera <kirk.madera@amplificommerce.com>
Date:   Fri Oct 21 08:47:58 2011 -0700

Commit message 2

Pick the first commit this is before your commits.

// Reset the development branch to this commit. Make sure 
// you use the correct commit id;
git reset --hard d0b84614377c900758952ac687acc91cecd12b5d; 

// Checkout your new branch
git checkout new_feature; 

// Apply your stash. This will add your uncommitted edits 
// back in
git stash apply stash@{0}; 

// Clear your stash
git stash clear; 

// Push your branch to the main repo to allow for 
// collaboration & testing
git push origin new_feature;