Tuesday, September 3, 2013

Quickly get public IP of ec2 instance

Here's two ways you can get the public IP or DNS of an EC2 instance:

Ask a remote webserver

This is my preferred method because it's easy to remember. Simply run

curl eth0.me

And you'll be greeted with the IP address of the host that contacted the server.

If curl isn't installed, you can also use wget:

wget -q eth0.me -O-

Full disclosure, I run this website. My goal is to keep it simple and fast. No html, newlines, etc. Just an IPv4 address.

Using EC2's metadata server
If you're on an EC2 instance you can also get this information by curling the metadata server accessible to each ec2 instance.

Public IPv4 address:

curl http://169.254.169.254/latest/meta-data/public-ipv4

Public hostname:

curl http://169.254.169.254/latest/meta-data/public-hostname

Final thoughts

The benefit of using EC2's metadata server is you can also get the public hostname and it's guaranteed to work behind strange NAT or proxy rules. The drawback is it's harder to remember.

Running curl against eth0.me is fast and is easy to remember, but keep mind it will give you the IP that accesses the webserver and not necessarily the ip of the host making the request!

Saturday, August 10, 2013

Varnish collector plugin for OpenTSDB

Have you checked out OpenTSDB yet? It's pretty nifty.

OpenTSDB is a time-series database built on top of the venerable hbase. It allows you to aggregate and crunch many thousands of time-series metrics and digest them into useful statistics and graphs.

But the best part is the tagging system that allows you to build dynamic and useful graphs on the fly. With every metric you send you simply attach arbitrary tags "datacenter=ec2 cluster=production05 branch=master". Later on you can bring these up to compare minute differences between systems.

This kind of monitoring blows "enterprise" solutions like Zabbix and Nagios out of the water. There's no way you could fit this kind of data into either rrdtool or whatever the heck Zabbix uses to store it (MYSQL??!?!). It's also an "agentless" solution, which makes it well suited for the cloud.

Tcollector

Now you can get realtime metrics on how your varnish web accelerator is doing. I wrote a tcollector plugin to slurp counters from varnishstat and send them to TSDB.

There's a pull request up to merge the collector into the tcollector repo, but in the meantime you can find the varnish collector script here.

The Code

#!/usr/bin/python

"""Send varnishstat counters to TSDB"""

import re
import subprocess
import sys
import json
import time

from collectors.lib import utils

interval = 15 # seconds

# Prefixes here will be prepended to each metric name before being sent
metric_prefix = ['varnishstat']

# Add any additional tags you would to include into this array as strings
#
# tags = ['production=false', 'cloud=amazon']
tags = []

# By default varnishstat returns about 300 metrics and not all of them are
# very useful.
#
# If you would like to collect all of the counters simply set vstats to "all"
#
# vstats = 'all'

# Some useful default values to send
vstats = [
  'client_conn',
  'client_drop',
  'client_req',
  'cache_hit',
  'cache_hitpass',
  'cache_miss'
]

def main():
  utils.drop_privileges()

  while True:
    try:
      if vstats == "all":
        stats = subprocess.Popen(
          ["varnishstat", "-1", "-j"],
          stdout=subprocess.PIPE,
        )
      else:
        fields = ",".join(vstats)
        stats = subprocess.Popen(
          ["varnishstat", "-1", "-f" + fields, "-j"],
          stdout=subprocess.PIPE,
        )
    except OSError, (errno, msg):
      # Die and signal to tcollector not to run this script.
      sys.stderr.write("Error: %s" % msg)
      sys.exit(13)

    metrics = ""
    for line in stats.stdout.readlines():
      metrics += line
    metrics = json.loads(metrics)

    # We'll use the timestamp provided by varnishstat for our metrics
    pattern ='%Y-%m-%dT%H:%M:%S' 
    timestamp = int(time.mktime(time.strptime(metrics['timestamp'], pattern)))
    for k, v in metrics.iteritems():
      if k != 'timestamp':
        # Prepend any provided prefixes to each metric name
        metric_name = ".".join(metric_prefix) + "." + k
        print "%s %d %s %s" % \
          (metric_name, timestamp, v['value'], ",".join(tags))

    sys.stdout.flush()
    time.sleep(interval)

if __name__ == "__main__":
  sys.exit(main())

Saturday, August 3, 2013

Auto Install Logstash and Kibana on Ubuntu 12.04

What is Logstash?

Logstash is a log indexer built on top of elasticsearch. It aggregates logs from multiple sources and allows you to query them using the Apache Lucene query parser syntax.

Logstash is built on elasticsearch, which allows your data to scale easily. This is an important factor when dealing with bigdata because you never really know how big your logs are going to get.

Now it's totally automated! Run this script and you'll be shipping and indexing logs in no time.

How it works

Logstash has two parts, the indexer and the server. The indexer works on a specific datasource to collect logs and ship them to the server. The indexer can also be something totally unrelated to Logstash (for example, rsyslogd).

If you do use logstash to ship logs you can do interesting things, such as mutate them, add tags, or disregard them altogether.

Adding tags to certain types of logs allows you to quickly retrieve them and keep track of trending information.

The server keeps logs in a redis queue until the logs can be drained into elasticsearch. Neither redis nor elasticsearch are required to be on the server, but they are nevertheless required and installed here.

The frontend

While not a direct part of the logstash project, Kibana works on top of logstash to give you visualization and monitoring tools. Kibana also gives you the flexibility to define patterns and filters and then watch the stream for these matches as they happen in realtime.

Setup

The entire setup has been automated. Simply run:

$ sudo ./logstash_server

All of the logstash services (elasticsearch, logstash, and Kibana) will be listening on their default port except Kibana which is running on port 80.

You may want to change the default data directory for Elasticsearch.

For more information or if you found a bug, please visit my github repo for this project here. I've tested the install on fresh installs of Ubuntu 12.04. In production I am indexing about 800 logs per second and it is handling it quite nicely.

Monday, July 22, 2013

Converting EC2 pem file to ssh public key

EC2 supplies an OpenSSL encrypted "pem" private key. Sometimes you just need the ssh public key version of this file.

There are a few ways to do this using openssl and other tools, but I prefer using ssh-keygen to create a public key from a private key.

 ssh-keygen -y -f yourprivatekey.pem

This will print out the public key from any given private key. Works with EC2 pem files.

Just drop the output of this command into ~/.ssh/authorized_keys and you should be good to go!

Wednesday, July 10, 2013

Using sources.list.d instead of endlessly appending to sources.list

There's a "new way" to using all of your custom repo lists, and that's to append it to a special fine in the /etc/apt/sources.list.d/ directory. This helps keep things a tidy and organized.

Simply add any sources you would like to use to a separate file within this directory.

Don't be afraid if you get this message:

N: Ignoring file 'percona' in directory '/etc/apt/sources.list.d/' as it has no filename extension

It just means you need to rename the file and append ".list" to the end of it.

Monday, June 17, 2013

Zabbix mysql blocked by bin_log errors

Zabbix never fails to find new and unusual ways to break. The most recent breakage occured because the housekeeper is running "unsafe" sql. You'll find this repeated a thousand times in the mysql-error.logs:

130617 17:30:35 [Note] The following warning was suppressed 50 times during the last 38 seconds in the error log
130617 17:30:35 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. The statement is unsafe because it uses a LIMIT clause. This is unsafe because the set of rows included cannot be predicted. Statement: delete from history_uint where itemid=2506343 limit 500

I just decided to disable the housekeeper. It slows performance of the database down anyway. In /etc/zabbix/zabbix_server.conf uncomment the following line:

DisableHousekeeping=1

And then restart Zabbix. You should be good to go. You can also vote on my bug report here.

Friday, June 14, 2013

Setting up a firewall on Your Raspberry Pi

Raspberry Pi Firewall

You have two good options for protecting your raspberry pi with a software firewall. The first is the tried and true iptables. The second is much more easy to use and configure, and that's debian's "ufw" service. I'll show you how to firewall your Raspberry Pi with ufw.

Before we start messing around with firewall rules, I always like to leave myself a backdoor. We're going to continually open up port 22 to our local network. We'll open up a screen session and start a loop. When we're sure everything is good, we'll close our screen session.

You can learn more about the awesome program screen here.

$ apt-get install -y screen
$ screen -S firewall
$ while true; do ufw allow from 192.168.1.0/24; sleep 60;done
 (disconnect from the screen session by type in "ctrl+a d")

Great, now we have a backdoor in case we lock ourselves out. Every 60 seconds our session will try allow every address from 192.168.1.1-255 to access every port on the host. You'll only be locked out for up to a minute. Trust me you do not want to skip this step.

We can use ufw to add different ports. Here's my basic setup.

# Allow port 22 to everyone in the world
sudo ufw allow 22

# Allow all ports on my local network
sudo ufw allow from 192.168.1.0/24

# Allow web ports to everyone
sudo ufw allow 80

sudo ufw --force enable

You can check the status:

$ ufw status
Status: active

To                         Action      From
--                         ------      ----
Anywhere                   ALLOW       192.168.0.0/24
Anywhere                   ALLOW       192.168.1.0/24
80                         ALLOW       Anywhere
22                         ALLOW       Anywhere
80                         ALLOW       Anywhere (v6)
22                         ALLOW       Anywhere (v6)

Now all of the Raspberry Pi's ports are exposed to our local network, but everything else can communicate with port 22 and 80. If you're done making changes to the firewall and are positive you're not locked out, then go ahead and kill the screen loop:

$ screen -r
(ctrl + d once inside the session)

Now you've got every port locked down from the outside but 22 and 80. But your raspberry pi probably isn't yet expose to the public internet. For this to happen we're going to add our Raspberry Pi to the DMZ on our wireless router's firewall.

A firewall DMZ means that every port will be forwarded to this specific host by default. This will make our raspberry pi the first port of entry into our home network. You can connect to it anywhere, and even use your raspberry pi as an ssh tunnel.

You can usually find the dmz settings by logging into your router, which is typically found at 192.168.1.1 or 192.168.0.1.

DMZ Settings for Tomato wireless firmware

Now you can run some external port scans and make sure the ports are actually open. You can use inCloak's tool. Since we opened up every port to our local network, we'll need to use an external port scanner.

Here's the scan on my network, which has my Raspberry Pi in the DMZ.

Success. It looks like port 22 and 80 are open. Everything else is closed off. Now you can "safely" expose your raspberry pi to the public internet.

Up next: Protecting Your Raspberry Pi With fail2ban and SSH Private Keys

Hey Stephen Wood