Showing posts with label monitoring. Show all posts
Showing posts with label monitoring. Show all posts

Saturday, August 10, 2013

Varnish collector plugin for OpenTSDB



Have you checked out OpenTSDB yet? It's pretty nifty. 

OpenTSDB is a time-series database built on top of the venerable hbase. It allows you to aggregate and crunch many thousands of time-series metrics and digest them into useful statistics and graphs.

But the best part is the tagging system that allows you to build dynamic and useful graphs on the fly. With every metric you send you simply attach arbitrary tags "datacenter=ec2 cluster=production05 branch=master". Later on you can bring these up to compare minute differences between systems.

This kind of monitoring blows "enterprise" solutions like Zabbix and Nagios out of the water. There's no way you could fit this kind of data into either rrdtool or whatever the heck Zabbix uses to store it (MYSQL??!?!). It's also an "agentless" solution, which makes it well suited for the cloud.

Tcollector

Now you can get realtime metrics on how your varnish web accelerator is doing.  I wrote a tcollector plugin to slurp counters from varnishstat and send them to TSDB.

There's a pull request up to merge the collector into the tcollector repo, but in the meantime you can find the varnish collector script here.

The Code



#!/usr/bin/python

"""Send varnishstat counters to TSDB"""

import re
import subprocess
import sys
import json
import time

from collectors.lib import utils

interval = 15 # seconds

# Prefixes here will be prepended to each metric name before being sent
metric_prefix = ['varnishstat']

# Add any additional tags you would to include into this array as strings
#
# tags = ['production=false', 'cloud=amazon']
tags = []

# By default varnishstat returns about 300 metrics and not all of them are
# very useful.
#
# If you would like to collect all of the counters simply set vstats to "all"
#
# vstats = 'all'

# Some useful default values to send
vstats = [
  'client_conn',
  'client_drop',
  'client_req',
  'cache_hit',
  'cache_hitpass',
  'cache_miss'
]

def main():
  utils.drop_privileges()

  while True:
    try:
      if vstats == "all":
        stats = subprocess.Popen(
          ["varnishstat", "-1", "-j"],
          stdout=subprocess.PIPE,
        )
      else:
        fields = ",".join(vstats)
        stats = subprocess.Popen(
          ["varnishstat", "-1", "-f" + fields, "-j"],
          stdout=subprocess.PIPE,
        )
    except OSError, (errno, msg):
      # Die and signal to tcollector not to run this script.
      sys.stderr.write("Error: %s" % msg)
      sys.exit(13)

    metrics = ""
    for line in stats.stdout.readlines():
      metrics += line
    metrics = json.loads(metrics)

    # We'll use the timestamp provided by varnishstat for our metrics
    pattern ='%Y-%m-%dT%H:%M:%S' 
    timestamp = int(time.mktime(time.strptime(metrics['timestamp'], pattern)))
    for k, v in metrics.iteritems():
      if k != 'timestamp':
        # Prepend any provided prefixes to each metric name
        metric_name = ".".join(metric_prefix) + "." + k
        print "%s %d %s %s" % \
          (metric_name, timestamp, v['value'], ",".join(tags))

    sys.stdout.flush()
    time.sleep(interval)

if __name__ == "__main__":
  sys.exit(main())