code-vagabond
11/1/2016 - 6:21 PM

Template for Elasticsearch to use in conjunction with Kibana

Template for Elasticsearch to use in conjunction with Kibana

# /etc/init/logstash.conf


description     "logstash"
start on filesystem or runlevel [2345]
stop on runlevel [!2345]

respawn
umask 022
nice 19
chroot /
chdir /
#limit msgqueue <softlimit> <hardlimit>
#limit nice <softlimit> <hardlimit>
limit nofile 16384 16384
#limit rtprio <softlimit> <hardlimit>
#limit sigpending <softlimit> <hardlimit>
setuid logstash
setgid logstash
console log # log stdout/stderr to /var/log/upstart/


exec /usr/share/logstash/bin/logstash "--path.config" "/opt/logstash/logger.conf"  "--path.settings" "/etc/logstash"
input {

      beats {
               port => 5043
               codec => "plain"
      }

}

filter {

  grok {

    match => { "message" => "%{COMBINEDAPACHELOG}" }

  }

  date {

    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]

  }

  geoip {

    source => "clientip"

    target => "geoip"

    database => "/etc/logstash/GeoLite2-City.mmdb"

    add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]

    add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]

  }

  mutate {

    convert => [ "[geoip][coordinates]", "float"]

  }

}



output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "test"
#    index => "textimager"
#    template => "/opt/logstash/logger.template.json"
  }
  stdout { codec => rubydebug }
}

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.full.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

#=========================== Filebeat prospectors =============================

filebeat.prospectors:



# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- input_type: log

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /resources/log/apache/alba/*.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ["^DBG"]

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ["^ERR", "^WARN"]

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: [".gz$"]

  # Optional additional fields. These field can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Mutiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.

  
  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ["^ERR", "^WARN"]

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: [".gz$"]

  # Optional additional fields. These field can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

  ### Multiline options

  # Mutiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation

  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^\[

  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false

  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after




#================================ General =====================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging

#================================ Outputs =====================================

# Configure what outputs to use when sending the data collected by the beat.
# Multiple outputs may be used.

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5043"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

output.file:
  path: "/opt/logstash/"
  filename: shitty_file2


#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: critical, error, warning, info, debug

  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5043"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

output.file:
  path: "/opt/logstash/"
  filename: shitty_file2


#================================ Logging =====================================

# Sets log level. The default log level is info.
# Available log levels are: critical, error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publish", "service".
#logging.selectors: ["*"]

[{"source":"/resources/log/apache/alba/textimager.log","offset":6482657,"FileStateOS":{"inode":122945543,"device":30},"timestamp":"2016-11-03T11:36:36.686925326+01:00","ttl":-1000000000}]
curl -XPUT 'localhost:9200/_template/textimager?pretty' -d'
{
"template": "textimager*",
"settings": {
"number_of_shards": 1
},
"mappings" : {
"log" : {
"properties" : {
"@timestamp" : {
"type" : "date"
},
"@version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"agent" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"auth" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"beat" : {
"properties" : {
"hostname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"bytes" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"clientip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"geoip" : {
"properties" : {
"city_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"continent_code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"coordinates" : {
"type" : "float"
},
"country_code2" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"country_code3" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"country_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dma_code" : {
"type" : "long"
},
"ip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"latitude" : {
"type" : "float"
},
"location" : {
"type" : "geo_point"
},
"longitude" : {
"type" : "float"
},
"postal_code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"region_code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"region_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timezone" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"host" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"httpversion" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"ident" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"input_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"offset" : {
"type" : "long"
},
"referrer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"request" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"response" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"source" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"tags" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timestamp" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"verb" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}'

h2. ELK-Stack Version 5.0 on Lamaran

h3. Elasticsearch 5.0

Elasticsearch is a database optimized for searching. In this stack, it is used to dynamically aggregate log data such as time or location. After installing Elasticsearch to localhost:9200, following curl command are relevant for use in ELK stack : 


List all indexes: 

<pre>
curl -GET 'localhost:9200/_aliases?pretty'
</pre>


Delete the index "index_1":

<pre>
curl -XDELETE 'localhost:9200/_template/template_1?pretty'
</pre> 


See all templates: 

<pre>
curl -GET 'localhost:9200/_template?pretty'
</pre>





h3. Filebeat 5.0:

Fiebeat tails log files, convert each lines into a JSON object and forward the result to a port or another file. Using filebat.yml, user can configure the location from which Filebeat should fetch log files as well as where the result should be outputted. In the config file below, Filebeat is configured to output to port 5043 :

<pre>
output.logstash: 
  # The Logstash hosts
  hosts: ["localhost:5043"]
</pre>



Filebeat config file (filebeat.yml): @/etc/filebeat/filebeat.yml@
 Complete file content: https://gist.githubusercontent.com/endsub/88ca7101e6b01bebf97ff601deca5052/raw/0209a0b5b2fddf83006544d2f2e25029f645018c/filebeat.yml

Each file that is processed in Filebeat has its own registry, in which the offset is saved. This offset tells Filebeat the location in the logfile, at which it previously stopped reading. This mechanism prevents a log line to be read multiple time for e.g when you restart Filebeat. However this can be a problem if you want to load the same logfile multiple times (e.g when you want 2 Elasticsearch indices for one log file). To reset the offset simply remove the registry: 

<pre>rm /var/lib/filebeat/registry</pre>

After setting up, to *start Filebeat in foreground for easy debugging*:

<pre>/usr/share/filebeat/bin/filebeat -e -d "*" -c /etc/filebeat/filebeat.yml</pre>

To *start Filebeat as a service* :

<pre>
etc/init.d/filebeat start
</pre>

This service is configured to log to @/var/log/filebeat/@

h3. Logstash 5.0: 

Now that Filebeat is running and sending log files to port 5043, we will configure Logstash to listen to this port. Logstash is a data collection engine and is used in this case as a parser. It parses datastream to Elasticsearch-compatible JSON objects with seperated attributes such as time, location, ip, etc. There are 2 different configuration files for logstash: 

1. Configuration for Logstash pipelines, worker division etc, this configuration affects the scalability of the stack, since we only deal with a few log files, this is not relevant for our usecase. Location:

<pre>
/etc/logstash/logstash.yml
</pre>

2. Configuration for the input, output and the parsing process. Please create the config in the following direction  : 

<pre>
/opt/logstash/logger.conf
</pre>

Content: 

<pre>
input {

      beats {
               port => 5043
               codec => "plain"
      }

}

filter {

  grok {

    match => { "message" => "%{COMBINEDAPACHELOG}" }

  }

  date {

    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]

  }

  geoip {

    source => "clientip"

    target => "geoip"

    database => "/etc/logstash/GeoLite2-City.mmdb"

    add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]

    add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]

  }

  mutate {

    convert => [ "[geoip][coordinates]", "float"]

  }

}



output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "textimager"
  }
  stdout { codec => rubydebug }
}

</pre>

In the config file above ouput is configured to be the elasticsearch host (port 9200), data will be stored in an index named "textimager". 

To *make geo location identification works* as configured above, we download the GeoLite2-City database to our system: 

<pre>
cd /etc/logstash/
wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz
gunzip GeoLite2-City.mmdb.gz
</pre>

Kibana will not recognize the geo_point datatype unless a template is created for it. Create a template by using following curl command in terminal, this template will then be used for every ES index named "textimager*" : 

{{collapse(curl command...)
<pre>
curl -XPUT 'localhost:9200/_template/textimager?pretty' -d'
{
"template": "textimager*",
"settings": {
"number_of_shards": 1
},
"mappings" : {
"log" : {
"properties" : {
"@timestamp" : {
"type" : "date"
},
"@version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"agent" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"auth" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"beat" : {
"properties" : {
"hostname" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"version" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"bytes" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"clientip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"geoip" : {
"properties" : {
"city_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"continent_code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"coordinates" : {
"type" : "float"
},
"country_code2" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"country_code3" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"country_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"dma_code" : {
"type" : "long"
},
"ip" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"latitude" : {
"type" : "float"
},
"location" : {
"type" : "geo_point"
},
"longitude" : {
"type" : "float"
},
"postal_code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"region_code" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"region_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timezone" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"host" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"httpversion" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"ident" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"input_type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"message" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"offset" : {
"type" : "long"
},
"referrer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"request" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"response" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"source" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"tags" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"timestamp" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"type" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"verb" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
}
}'
</pre>
}}



*Start Logstash in the foreground*: 

<pre>
/usr/share/filebeat/bin/filebeat -e -d "*" -c /etc/filebeat/filebeat.yml
</pre>

*Start Logstash as a service*: 

Fiebeat on Ubuntu 14 uses Upstart as service daemon. First go to @/etc/init/logstash.conf@ to make sure that Upstart configuration is as following: 

<pre>
description     "logstash"
start on filesystem or runlevel [2345]
stop on runlevel [!2345]

respawn
umask 022
nice 19
chroot /
chdir /
#limit msgqueue <softlimit> <hardlimit>
#limit nice <softlimit> <hardlimit>
limit nofile 16384 16384
#limit rtprio <softlimit> <hardlimit>
#limit sigpending <softlimit> <hardlimit>
setuid logstash
setgid logstash
console log # log stdout/stderr to /var/log/upstart/


exec /usr/share/logstash/bin/logstash "--path.config" "/opt/logstash/logger.conf"  "--path.settings" "/etc/logstash"
</pre>

"--path.config" is used to configure the directory, in which input, ouput and parsing settings for Logstash are fetched, in this case it's in @/opt/logstash/logger.conf@
"--path.settings" includes pipeline configurations (logstash.yml) an in our case is located in "/etc/logstash"

The Upstart service will log into : @/var/log/upstart/logstash.log@
Logstash itself should log into : @/usr/share/logstash/logs/logstash-plain.log@

h3. Kibana 5.0: 

After the ES index is created, it can be used in Kibana to visualize and discover insight:

# On Kibana Interface, go to Management menu
# Click on "Index pattern"
# "Add new" to add an Elasticsearch index to Kibana.
# Input the ES index name and click on "Create" to create the Index pattern on Kibana
# Go to "Discover", change time stamp on the upper right corner from "last 15 minutes" to later time stamp to see more log inputs and start visualizing
curl -XDELETE 'localhost:9200/_template/template_1?pretty'
Running in foreground: 

/usr/share/filebeat/bin/filebeat -e -d "*" -c /etc/filebeat/filebeat.yml

registry for background:
/usr/share/filebeat/bin/data

/usr/share/logstash/bin/logstash --debug -f /opt/logstash/logger.conf




Running in background:

etc/init.d/filebeat start

configs in /etc/filebeat/filebeat.yml
logs configured to log to: /var/log/filebeat/
registry: /var/lib/filebeat/registry 




etc/init.d/logstash start -> older version
pipline (yml) configs in /etc/logstash/logstash.yml


sudo initctl start logstash

on uni server logstash uses upstart so startup daemon is in /etc/init/logstash.conf
logs to  /var/log/upstart/logstash/
pipeline configs in  /etc/logstash/conf.d/logstash.yml

in logstash.yml there is a log config : 
 log.level: debug
path.logs: /var/log/logstash


layout https://www.elastic.co/guide/en/logstash/5.0/dir-layout.html









See existing templates: logger.hucompute.org/elasticsearch/_template?pretty

Reread a logfile
Filebeat hold record of last read position of each logfiles to prevent a log to be read multiple times. This log is in "/usr/share/filebeat/bin/data".

remove Filebeat Log Position Registry: 
rm /usr/share/filebeat/bin/data/registry

Delete an existing index named "logger":
curl -XDELETE 'http://localhost:9200/logger/'