Hey Stephen Wood: varnish

Saturday, August 10, 2013

Varnish collector plugin for OpenTSDB

Have you checked out OpenTSDB yet? It's pretty nifty.

OpenTSDB is a time-series database built on top of the venerable hbase. It allows you to aggregate and crunch many thousands of time-series metrics and digest them into useful statistics and graphs.

But the best part is the tagging system that allows you to build dynamic and useful graphs on the fly. With every metric you send you simply attach arbitrary tags "datacenter=ec2 cluster=production05 branch=master". Later on you can bring these up to compare minute differences between systems.

This kind of monitoring blows "enterprise" solutions like Zabbix and Nagios out of the water. There's no way you could fit this kind of data into either rrdtool or whatever the heck Zabbix uses to store it (MYSQL??!?!). It's also an "agentless" solution, which makes it well suited for the cloud.

Tcollector

Now you can get realtime metrics on how your varnish web accelerator is doing. I wrote a tcollector plugin to slurp counters from varnishstat and send them to TSDB.

There's a pull request up to merge the collector into the tcollector repo, but in the meantime you can find the varnish collector script here.

The Code

#!/usr/bin/python

"""Send varnishstat counters to TSDB"""

import re
import subprocess
import sys
import json
import time

from collectors.lib import utils

interval = 15 # seconds

# Prefixes here will be prepended to each metric name before being sent
metric_prefix = ['varnishstat']

# Add any additional tags you would to include into this array as strings
#
# tags = ['production=false', 'cloud=amazon']
tags = []

# By default varnishstat returns about 300 metrics and not all of them are
# very useful.
#
# If you would like to collect all of the counters simply set vstats to "all"
#
# vstats = 'all'

# Some useful default values to send
vstats = [
  'client_conn',
  'client_drop',
  'client_req',
  'cache_hit',
  'cache_hitpass',
  'cache_miss'
]

def main():
  utils.drop_privileges()

  while True:
    try:
      if vstats == "all":
        stats = subprocess.Popen(
          ["varnishstat", "-1", "-j"],
          stdout=subprocess.PIPE,
        )
      else:
        fields = ",".join(vstats)
        stats = subprocess.Popen(
          ["varnishstat", "-1", "-f" + fields, "-j"],
          stdout=subprocess.PIPE,
        )
    except OSError, (errno, msg):
      # Die and signal to tcollector not to run this script.
      sys.stderr.write("Error: %s" % msg)
      sys.exit(13)

    metrics = ""
    for line in stats.stdout.readlines():
      metrics += line
    metrics = json.loads(metrics)

    # We'll use the timestamp provided by varnishstat for our metrics
    pattern ='%Y-%m-%dT%H:%M:%S' 
    timestamp = int(time.mktime(time.strptime(metrics['timestamp'], pattern)))
    for k, v in metrics.iteritems():
      if k != 'timestamp':
        # Prepend any provided prefixes to each metric name
        metric_name = ".".join(metric_prefix) + "." + k
        print "%s %d %s %s" % \
          (metric_name, timestamp, v['value'], ",".join(tags))

    sys.stdout.flush()
    time.sleep(interval)

if __name__ == "__main__":
  sys.exit(main())

Sunday, May 26, 2013

varnish and varnishncsa init (upstart) script

Here is a simple upstart script for the varnish and varnishncsa daemon. As of 3.0.3 upstart scripts aren't included in the standard debian package.

Nothing too fancy here. I'm going to be referencing these upstart scripts when I talk about plugging in logstash to varnishnicsa in another post. The real beauty is you get to hook varnishncsa into starting and stopping with varnish starts and stops.

varnish init (upstart) script:

# varnish - HTTP accelerator   
#   
     
description  "load balancer for Python api workers"   
     
start on (local-filesystems   
   and net-device-up IFACE!=lo)   
stop on runlevel[!2345]   
     
respawn   
     
# These are from the default init.d script   
limit nofile 131072 131072   
limit memlock 82000 82000   
     
exec "/usr/sbin/varnishd -F -a :80 -T localhost:6082 \   
   -f /etc/varnish/default.vcl -S /etc/varnish/secret \   
   -u varnish -g varnish -s malloc,256m"

varnishncsa init (upstart) script:

# varnishncsa - Logging daemon for varnish  
#  
   
description   "logging daemon for varnish"  
   
env LOG_OPTIONS='%h %l %u %t "%r" %s %b %{Varnish:time_firstbyte}x %{Varnish:handling}x'  
start on started varnish  
stop on stopped varnish  
   
respawn  
   
exec /usr/bin/varnishncsa -a -F "$LOG_OPTIONS" -w /var/log/varnish/varnishncsa.log

Remember that when you replace the init.d script with the upstart version you effectively deprecate the settings in /etc/defaults/varnish*, so you'll need to incorporate any of those configuration settings into these scripts.

You can test these settings by running:

 $ service varnish start  
   varnish start/running, process 13562

And behold! It started both varnish and the varnishncsa logging daemon:

$ ps aux | grep varnish   
  root  13617 8.0 2.0 117236 84684 ?  SLs 19:13 0:00 /usr/sbin/varnishd -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -u varnish -g varnish -s malloc,256m   
  root  13618 0.0 0.0 93840 684 ?  Ss 19:13 0:00 /usr/bin/varnishncsa -a -F %h %l %u %t "%r" %s %b %{Varnish:time_firstbyte}x %{Varnish:handling}x -w /var/log/varnish/varnishncsa.log   
  varnish 13628 0.0 0.0 270936 1400 ?  Sl 19:13 0:00 /usr/sbin/varnishd -F -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -u varnish -g varnish -s malloc,256m

If it looks funny to you that there are 2 varnishd programs running, don't forget that varnish intentionally spawns a child.

I hope that helps.