Saturday, October 12, 2013

The Deployment Manifesto

This has nothing to do with the word devops. Instead, this is a discussion about certain responsibilities in which both the ops teams and the developer teams overlap. This is about cooperation between teams to create a better product.

The 10 Requirements


The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
     NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
     "OPTIONAL" in this document are to be interpreted as described in
     
RFC 2119.


1. Configuration management MUST NOT be coupled to an external service, such as EC2, Openstack, Foreman, or anything else.

2. Devops SHOULD provide a self-service framework for the automatic creation and destruction of hosts from the ground up.

3. Devops SHOULD work with engineering teams to come up with continuous deployment strategy that doesn’t involve the destruction and creation of fresh operating systems.

4. All code required for deployments MUST be maintained in a centralized source repository. 

5. Deployments MUST use immutable snapshots -- such as a git tag -- from source code.

6. Hosts being provisioned MUST get their configurations from source control and MUST NOT rely on resources from an individual user or an engineer’s local computer.

7. Developers MUST provide a way to test code before it is deployed.

8. Devops MUST have an automated and tested rollback plan with every deployment.

9. Devops SHOULD provide feedback and planning support for hardware, infrastructure, and software dependencies necessary to run applications.

10. Devops MUST monitor all deployments and have clear, identified benchmarks for success or failure.

Wednesday, October 2, 2013

Logstash 1.2 and Kibana2 auto-install for Ubuntu 12.04

Logstash and Kibana auto-install

Last time I brought you the auto-install script for logstash and kibana 2. I'm updating the install now to work with logstash 1.2 and Kibana 3.

You can read through the change logs if you're really curious about what's different between the two. The biggest change is the facelift in the Kibana project, which looks quite nice.

This auto-install will automatically install everything you need to start using logstash and kibana on ubuntu 12.04. Simply clone the project and run:

$ ./bootstrap

If you notice any issues please feel free to comment here or to head over to the github page and open up an issue.

About the project

My goal was to make the install as easy and automated as possible. You may want to adjust the settings (like the memory allowed by logstash), but overall the settings are general enough to be good for most folks.

I've opted to use nginx to host Kibana, but it's not hard to swap that out with apache if that's your thing.

A note about Kibana

You may notice that you're unable to connect to your Kibana instance to elasticsearch. They've changed the frontend to use client-side javascript, which needs the fully qualified domain name of you elasticsearch instance. This means your browser will basically make the request to ES and not the server running Kibana. Don't get confused. If you can't connect then take a look at the following:
  1. Verify the FQDN of your elasticsearch host is in config.js. You cannot use localhost here or your browser will make a request to localhost.
  2. Your browser has permissions to request information from ES.




Thursday, September 26, 2013

Install gnu grep on mac osx

One thing that bugs me about the bsd grep that comes with mac osx is that it doesn't offer you the -P flag. The -P flag will let you use the pcre engine in your regex matches, which is 100x more awesome than using the posix regex of egrep.

Not only that, but did you know gnu grep is significantly faster than the default bsd version? Go ahead and read Why is Gnu Grep So fast? where the author of gnu grep himself chimes in to explain why. Lots of nerdy tidbits in there.

You can install gnu grep on your mac very easily using brew. If you don't have brew you can install it following these instructions.

Installing gnu grep:


# Enable dupe and install
brew tap homebrew/dupes
brew install homebrew/dupes/grep

# Install the perl compatible regular expression library
brew install pcre

# Add the symlink to a place in $PATH
ln -s /usr/local/Cellar/grep/2.14/bin/ggrep /usr/bin/ggrep
# Add an alias
alias grep="ggrep"

# Verify you got it!
$ grep --version

grep (GNU grep) 2.14
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

# If you want it to be permanent, you can add the alias line to your ~/.bash_profile

# You probably want the alias to stick after reboots
echo 'alias grep="ggrep"' >> ~/.bash_profile

All done!

Thursday, September 5, 2013

Best place to find EC2 instance type and pricing info

I have to give a shout-out to www.ec2instances.info.

This site makes it quick and easy to shop around the various EC2 instance types while comparing their guts and costs.

Tuesday, September 3, 2013

Quickly get public IP of ec2 instance

Here's two ways you can get the public IP or DNS of an EC2 instance:

Ask a remote webserver

This is my preferred method because it's easy to remember. Simply run

curl eth0.me

And you'll be greeted with the IP address of the host that contacted the server.

If curl isn't installed, you can also use wget:

wget -q eth0.me -O-

Full disclosure, I run this website. My goal is to keep it simple and fast. No html, newlines, etc. Just an IPv4 address.

Using EC2's metadata server
If you're on an EC2 instance you can also get this information by curling the metadata server accessible to each ec2 instance.

Public IPv4 address:

curl http://169.254.169.254/latest/meta-data/public-ipv4

Public hostname:

curl http://169.254.169.254/latest/meta-data/public-hostname

Final thoughts

The benefit of using EC2's metadata server is you can also get the public hostname and it's guaranteed to work behind strange NAT or proxy rules. The drawback is it's harder to remember.

Running curl against eth0.me is fast and is easy to remember, but keep mind it will give you the IP that accesses the webserver and not necessarily the ip of the host making the request!

Saturday, August 10, 2013

Varnish collector plugin for OpenTSDB



Have you checked out OpenTSDB yet? It's pretty nifty. 

OpenTSDB is a time-series database built on top of the venerable hbase. It allows you to aggregate and crunch many thousands of time-series metrics and digest them into useful statistics and graphs.

But the best part is the tagging system that allows you to build dynamic and useful graphs on the fly. With every metric you send you simply attach arbitrary tags "datacenter=ec2 cluster=production05 branch=master". Later on you can bring these up to compare minute differences between systems.

This kind of monitoring blows "enterprise" solutions like Zabbix and Nagios out of the water. There's no way you could fit this kind of data into either rrdtool or whatever the heck Zabbix uses to store it (MYSQL??!?!). It's also an "agentless" solution, which makes it well suited for the cloud.

Tcollector

Now you can get realtime metrics on how your varnish web accelerator is doing.  I wrote a tcollector plugin to slurp counters from varnishstat and send them to TSDB.

There's a pull request up to merge the collector into the tcollector repo, but in the meantime you can find the varnish collector script here.

The Code



#!/usr/bin/python

"""Send varnishstat counters to TSDB"""

import re
import subprocess
import sys
import json
import time

from collectors.lib import utils

interval = 15 # seconds

# Prefixes here will be prepended to each metric name before being sent
metric_prefix = ['varnishstat']

# Add any additional tags you would to include into this array as strings
#
# tags = ['production=false', 'cloud=amazon']
tags = []

# By default varnishstat returns about 300 metrics and not all of them are
# very useful.
#
# If you would like to collect all of the counters simply set vstats to "all"
#
# vstats = 'all'

# Some useful default values to send
vstats = [
  'client_conn',
  'client_drop',
  'client_req',
  'cache_hit',
  'cache_hitpass',
  'cache_miss'
]

def main():
  utils.drop_privileges()

  while True:
    try:
      if vstats == "all":
        stats = subprocess.Popen(
          ["varnishstat", "-1", "-j"],
          stdout=subprocess.PIPE,
        )
      else:
        fields = ",".join(vstats)
        stats = subprocess.Popen(
          ["varnishstat", "-1", "-f" + fields, "-j"],
          stdout=subprocess.PIPE,
        )
    except OSError, (errno, msg):
      # Die and signal to tcollector not to run this script.
      sys.stderr.write("Error: %s" % msg)
      sys.exit(13)

    metrics = ""
    for line in stats.stdout.readlines():
      metrics += line
    metrics = json.loads(metrics)

    # We'll use the timestamp provided by varnishstat for our metrics
    pattern ='%Y-%m-%dT%H:%M:%S' 
    timestamp = int(time.mktime(time.strptime(metrics['timestamp'], pattern)))
    for k, v in metrics.iteritems():
      if k != 'timestamp':
        # Prepend any provided prefixes to each metric name
        metric_name = ".".join(metric_prefix) + "." + k
        print "%s %d %s %s" % \
          (metric_name, timestamp, v['value'], ",".join(tags))

    sys.stdout.flush()
    time.sleep(interval)

if __name__ == "__main__":
  sys.exit(main())

Saturday, August 3, 2013

Auto Install Logstash and Kibana on Ubuntu 12.04

What is Logstash?

Logstash is a log indexer built on top of elasticsearch. It aggregates logs from multiple sources and allows you to query them using the Apache Lucene query parser syntax.
Logstash is built on elasticsearch, which allows your data to scale easily. This is an important factor when dealing with bigdata because you never really know how big your logs are going to get. 
Now it's totally automated! Run this script and you'll be shipping and indexing logs in no time.

How it works

Logstash has two parts, the indexer and the server. The indexer works on a specific datasource to collect logs and ship them to the server. The indexer can also be something totally unrelated to Logstash (for example, rsyslogd). 
If you do use logstash to ship logs you can do interesting things, such as mutate them, add tags, or disregard them altogether.
Adding tags to certain types of logs allows you to quickly retrieve them and keep track of trending information.
The server keeps logs in a redis queue until the logs can be drained into elasticsearch. Neither redis nor elasticsearch are required to be on the server, but they are nevertheless required and installed here.

The frontend

While not a direct part of the logstash project, Kibana works on top of logstash to give you visualization and monitoring tools. Kibana also gives you the flexibility to define patterns and filters and then watch the stream for these matches as they happen in realtime.
logstash running with varnishncsa

Setup

The entire setup has been automated. Simply run:
$ sudo ./logstash_server
All of the logstash services (elasticsearch, logstash, and Kibana) will be listening on their default port except Kibana which is running on port 80.
You may want to change the default data directory for Elasticsearch.
For more information or if you found a bug, please visit my github repo for this project here. I've tested the install on fresh installs of Ubuntu 12.04. In production I am indexing about 800 logs per second and it is handling it quite nicely.