What is Logstash?
Logstash is a log indexer built on top of elasticsearch. It aggregates logs from multiple sources and allows you to query them using the Apache Lucene query parser syntax.
Logstash is built on elasticsearch, which allows your data to scale easily. This is an important factor when dealing with bigdata because you never really know how big your logs are going to get.
Now it's totally automated! Run this script and you'll be shipping and indexing logs in no time.
How it works
Logstash has two parts, the indexer and the server. The indexer works on a specific datasource to collect logs and ship them to the server. The indexer can also be something totally unrelated to Logstash (for example, rsyslogd).
If you do use logstash to ship logs you can do interesting things, such as mutate them, add tags, or disregard them altogether.
Adding tags to certain types of logs allows you to quickly retrieve them and keep track of trending information.
The server keeps logs in a redis queue until the logs can be drained into elasticsearch. Neither redis nor elasticsearch are required to be on the server, but they are nevertheless required and installed here.
While not a direct part of the logstash project, Kibana works on top of logstash to give you visualization and monitoring tools. Kibana also gives you the flexibility to define patterns and filters and then watch the stream for these matches as they happen in realtime.
The entire setup has been automated. Simply run:
$ sudo ./logstash_server
All of the logstash services (elasticsearch, logstash, and Kibana) will be listening on their default port except Kibana which is running on port 80.
You may want to change the default data directory for Elasticsearch.
For more information or if you found a bug, please visit my github repo for this project here. I've tested the install on fresh installs of Ubuntu 12.04. In production I am indexing about 800 logs per second and it is handling it quite nicely.