实时日志分析之四:Filebeat

Filebeat是一个日志文件托运工具,在你的服务器上安装客户端后,filebeat会监控日志目录或者指定的日志文件,追踪读取这些文件(追踪文件的变化,不停的读),并且可以转发这些信息到Elasticsearch、Logstash、File、Kafka、Redis 和 Console。
官方文档:https://www.elastic.co/guide/en/beats/filebeat/current/index.html
中文文档:https://kibana.logstash.es/content/beats/

下面我们使用filebeat收集openresty生成的log日志,发送到kafka集群中

环境

安装

1
sudo rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch

sudo vim /etc/yum.repos.d/filebeat.repo 加入如下内容

1
2
3
4
5
6
7
8
[filebeat-5.x]
name=Elastic repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

安装

1
sudo yum install filebeat

设置filebeat开机启动

1
sudo systemctl enable filebeat

  • 安装文件说明:
    配置目录:/etc/filebeat/
    配置文件:/etc/filebeat/filebeat.yml
    log目录:/var/log/filebeat/
    bin目录:/usr/share/filebeat/bin
  • filebeat默认output字段说明:
    beat.hostname: beat 运行的主机名
    beat.name: shipper 配置段设置的 name,如果没设置,等于 beat.hostname
    @timestamp: 读取到该行内容的时间
    type: 通过 document_type 设定的内容
    input_type: 来自 “log” 还是 “stdin”
    source: 具体的文件名全路径
    offset: 该行日志的偏移量
    message: 日志内容
    fields: 添加的其他固定字段都存在这个对象里面(配置里面fields的配置)

配置

sudo vim /etc/hosts 加入如下内容(kafka集群的host, 参考文章实时日志分析之三:Kafka

1
2
3
192.168.2.230 kafka01
192.168.2.231 kafka02
192.168.2.232 kafka03

sudo vim /etc/filebeat/filebeat.yml 配置修改如下内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#=========================== Filebeat prospectors =============================
filebeat.prospectors:
- input_type: log
enabled: true
paths:
- /var/log/nginx/we_click_access.log
document_type: we_click_access
encoding: plain
ignore_older: 0
tail_files: true
#================================ Processors ===================================
processors:
# 清除掉一些默认的不必要字段(@timestamp和type字段是不能清除的)
- drop_fields:
fields: ["beat", "input_type", "source", "offset"]
#================================ Outputs ======================================
output.kafka:
enabled: true
# The list of Kafka broker addresses from where to fetch the cluster metadata.
# The cluster metadata contain the actual Kafka brokers events are published
# to.
hosts: ["kafka01:9092", "kafka02:9092", "kafka03:9092"]
# Kafka version filebeat is assumed to run against. Defaults to the oldest
# supported stable version (currently version 0.8.2.0)
# Valid values are 0.8.2.0, 0.8.2.1, 0.8.2.2, 0.8.2, 0.8, 0.9.0.0, 0.9.0.1, 0.9.0, 0.9, 0.10.0.0, 0.10.0, 0.10
version: "0.10"
# The Kafka topic used for produced events. The setting can be a format string
# using any event field. To set the topic from document type use `%{[type]}`.
topic: '%{[type]}'
partition.round_robin:
reachable_only: true
# The number of concurrent load-balanced Kafka output workers(default 1)
worker: 2
required_acks: 1
compression: gzip
max_message_bytes: 10000000 # 10MB
#================================ Logging ======================================
# Available log levels are: critical, error, warning, info, debug
logging.level: info
logging.to_files: true
logging.files:
path: /var/log/filebeat
name: filebeat
rotateeverybytes: 52428800 # 50MB
keepfiles: 5

在kafka集群中创建对应的topic(参考文章实时日志分析之三:Kafka):

1
bin/kafka-topics.sh --create --zookeeper kafka01:2181 --replication-factor 1 --partitions 3 --topic we_click_access

启动

1
sudo service filebeat start

运行测试

  • 如果你跟着前面的文章实时日志分析之二:Openresty启动了openresty
    访问“http://192.168.2.236/v1/ad/click/?pubid=123&campid=456&gaid=abcdefg”
    在/var/log/nginx/we_click_access.log中即可看到类似如下的日志:

    1
    192.168.2.182 - - [01/Sep/2017:07:53:30 +0000] "GET /v1/ad/click/?pubid=123&campid=456&gaid=abcdefg HTTP/1.1" 200 43 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" "-" "3fe63613-83b1-474b-bd29-2d21d8709980:,12345:,45678:,XY:,0.252:,0.252:,2:,sub1::sub2,::,sub3"
  • 如果你没有跟着前面的文章安装openresty的话,可以手动向log日志文件中写入数据

    1
    echo '192.168.2.182 - - [01/Sep/2017:07:53:30 +0000] "GET /v1/ad/click/?pubid=123&campid=456&gaid=abcdefg HTTP/1.1" 200 43 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36" "-" "3fe63613-83b1-474b-bd29-2d21d8709980:,12345:,45678:,XY:,0.252:,0.252:,2:,sub1::sub2,::,sub3"' >>/var/log/nginx/we_click_access.log
  • 我们使用消费者工具消费上面写入到kafka的数据,验证数据是否正确写入

    1
    bin/kafka-console-consumer.sh --zookeeper kafka01:2181 --topic we_click_access --from-beginning
  • 如果一切正常的话,我们可以看到如下的数据:

    1
    {"@timestamp":"2017-09-06T02:18:11.419Z","message":"192.168.2.182 - - [01/Sep/2017:07:53:30 +0000] \"GET /v1/ad/click/?pubid=123\u0026campid=456\u0026gaid=abcdefg HTTP/1.1\" 200 43 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36\" \"-\" \"3fe63613-83b1-474b-bd29-2d21d8709980:,12345:,45678:,XY:,0.252:,0.252:,2:,sub1::sub2,::,sub3\"","type":"we_click_access"}

接下来我们先建立elasticsearch集群,后面再使用logstash消费kafka中上面的内容output到elasticsearch中去,最后再建立grafana实时展现。

坚持原创技术分享,您的支持将鼓励我继续创作!

热评文章