Magento Customer and Visitor Logging

Related to the issue in my previous post about log cleaning, I would like to take some time to discuss customer and visitor logging in a little more detail.

The log_url_info table can bloat your database quickly. It logs every single unique visited URL. I have commonly seen this grow to 5GB or more, even when storing logs for only a short time period. As a general rule, database size has a direct impact on performance.

So what is this data really used for and is there a safe way to reduce it without affecting site functionality?

It seems Magento’s primary purpose for this data is analytics. In my opinion, there are much more robust platforms for this like Google Analytics or Omniture which should be used. If anyone sees analytics value in Magento’s data over other platforms, please share your experience. Also note that if you are using a proxy cache like Squid or Varnish, a frontend analytics platform like Google Analytics will track data more accurately since requests are not always sent to the server; Varnish will serve a page directly, the Magento app will not run, and the customer visit will not be logged.

These logs also drive the “Online Customers” page in the admin. When you load the “Online Customers” page, the log_visitor table is read and the log_visitor_online table is populated with the users who have visited within the threshold you have set; by default, 15 minutes.

/**
 * Mage_Log_Model_Resource_Visitor_Online 
 * 
 * This is the method that prepares log_visitor_online for viewing 
 * on the "Online Customers" page
 */
public function prepare(Mage_Log_Model_Visitor_Online $object)
{
    // Check if log_visitor_online was recently updated. If so, exit
    if (($object->getUpdateFrequency() + 
        $object->getPrepareAt()) > time()) {
        return $this;
    }
    
    //...
    // Delete existing data from log_visitor_online
    $writeAdapter->delete($this->getMainTable());
    
    //...
    // Build online visitors based on online interval    
    $lastDate = Mage::getModel('core/date')->gmtTimestamp() 
        - $object->getOnlineInterval() * 60;

    $select = $readAdapter->select()
        ->from(
            $this->getTable('log/visitor'),
            array('visitor_id', 'first_visit_at', 
                  'last_visit_at', 'last_url_id'))
        ->where('last_visit_at >= ?', 
            $readAdapter->formatDate($lastDate));

    // Add additional visitor data    
    // ...
    
    // Save the prepared date
    $object->setPrepareAt();
    
    return $this;
}

The time that this was calculated gets stored in cache and prevents the data from being calculated again until a specified amount of time has passed. This time threshold defaults to 60 seconds and is dictated by the “log/visitor/online_update_frequency” config setting. Clearing the cache resets this and causes log_visitor_online to rebuild. Oddly, Magento does not make this variable visible in the admin via a system.xml file. You can edit this in a config file, the database directly, or add your own system.xml definition.

/**
 * Mage_Log_Model_Visitor_Online
 * 
 * The time log_visitor_online was built is stored in cache.
 */
public function getPrepareAt()
{
    return Mage::app()->loadCache('log_visitor_online_prepare_at');
}

public function setPrepareAt($time = null)
{
    if (is_null($time)) {
        $time = time();
    }
    Mage::app()->saveCache($time, 'log_visitor_online_prepare_at');
    return $this;
}

I have not found any other direct uses of this data within Magento. My review of this has determined that it is safe to disable this entirely, if you are ok with losing the “Online Customers” functionality. However, I don’t recommend this unless you have achieved similar functionality in another analytics platform; seeing online customers is pretty useful. Should you want to do so, here is a good post on how to disable customer and visitor logging. This might even result in a performance improvement since Magento is no longer writing to the log tables on every request.

Recommendation

My recommendation is to reduce the number of days logs are saved to one. This keeps the “Online Customers” list functional, while keeping the table sizes relatively small. You can do this at System > Configuration > Advanced > System > Log Cleaing. If you have a high traffic site, you should also apply the patch mentioned in my previous post that speeds up log cleaning; otherwise, it can become slow, fail to run, and begin to grow larger every day.

Log Cleaning Config Screenshot

If you are reading this, you probably already have huge log tables. You can feel relatively safe about nuking this data, unless you have specific modifications around this. You were only planning on keeping it 180 days anyways. Always do your own research though and always test before you do anything in Production.

How to clear this data:

TRUNCATE log_customer;
TRUNCATE log_quote;
TRUNCATE log_summary;
TRUNCATE log_url;
TRUNCATE log_url_info;
TRUNCATE log_visitor;
TRUNCATE log_visitor_info;
TRUNCATE log_visitor_online;