Our previous two guides were on how to install Apache Hadoop and how to install Apache Spark. Next step technically is to get data from source like server log files. Here is How To Install fluentd Agent on Ubuntu 16.04 As Intermediate Step of Log Data Collection For Hadoop. We Can Receive Data on Laptop. This step is on your servers to monitor, not on other servers.
Install fluentd Agent : Preface For Log Data Collection For Hadoop
Here is official website of fluentd :
1 | http://www.fluentd.org |
It is a beautiful software written in Ruby. Fluentd is a Big Data tool and can work with unstructured data in real time. Exactly like an another tool Kafka, it analyzes the event logs, application logs, and clickstreams. It can simply output the collected data over HTTPS with settings to secure the transport. It is better to get used with the usage of such tools before jumping to analyse data with Apache Hadoop, Spark or plain Elastic Search. You must not use the public IP of the web server associated with domain under question for higher security except for debugging, initial setup etc.
---
The above illustration shows our plan of tutorials. In future we will stream this data to :
- Local computer like MacBook Pro
- Server running Big Data Specific tool like Apache Hadoop, Spark alone or in combination.
- Server running primarily basic analysis tools like ElasticSearch rather ELK Stack.
- Server running Big Data Specific tool like Apache Hadoop, Spark alone or in combination and ElasticSearch and Kibana.
This way of learning will help you to never fumble with – “will I use ElasticSearch or Apache Hadoop?”. Elastic analytics stack is gaining popularity for various reasons. ElasticSearch uses JSON based query language which is much easier to master than Hadoop’s MapReduce. Also the Developers are more comfortable maintaining ElasticSearch instance over Hadoop. Of course, this architecture would have data loss issue on the ElasticSearch side. You need Treasure Data in such case in front of ElasticSearch to “buffer” it. We are simply using fluentd to maintain backward and forward compatibility without disturbing the main server too much. Where there’s a shell, there is a way. The way is for the hackers too.
Steps To Install fluentd Agent on Ubuntu 16.04
It is recommended to set up ntpd
on the main server under question of logging to prevent invalid timestamps in logs. Increase the maximum number of file descriptors, run :
1 2 3 4 | su - root ulimit -n ulimit -Hn ulimit -Sn |
Expect value like 65535
against ulimit -n
. If you get value like 1024
, open :
1 | nano /proc/sys/fs/file-max |
add/modify :
1 | sysctl -w fs.file-max=100000 |
Exit session after saving and reboot. open :
1 | nano /etc/sysctl.conf |
add/modify :
1 | fs.file-max = 100000 |
Exit session after saving and reboot. open :
1 | nano /etc/security/limits.conf |
add/modify :
1 2 | root soft nofile 65536 root hard nofile 65536 |
Exit session after saving and reboot. open :
1 | nano /etc/sysctl.conf |
add/modify :
1 2 3 | net.ipv4.tcp_tw_recycle = 1 net.ipv4.tcp_tw_reuse = 1 net.ipv4.ip_local_port_range = 10240 65535 |
Exit session after saving and reboot. For Ubuntu 16.04, run this :
1 2 3 4 5 6 7 8 9 10 | cd ~ sudo -k curl https://packages.treasuredata.com/GPG-KEY-td-agent | apt-key add - echo "deb http://packages.treasuredata.com/2/ubuntu/xenial/ xenial contrib" > /etc/apt/sources.list.d/treasure-data.list apt-get update apt-get install -y td-agent /opt/td-agent/embedded/bin/fluent-gem install fluent-plugin-elasticsearch /opt/td-agent/embedded/bin/fluent-gem install fluent-plugin-record-modifier /opt/td-agent/embedded/bin/fluent-gem install fluent-plugin-secure-forward usermod -G adm td-agent |
You can control the software with :
1 2 3 4 | /etc/init.d/td-agent start /etc/init.d/td-agent stop /etc/init.d/td-agent restart /etc/init.d/td-agent status |
Carefully run this command after changing IP, port :
1 | nc -vzw1 log-ip 24224 |
log-ip
is your log server ip, 24224
is your port.
open :
1 | nano /etc/td-agent/td-agent.conf |
add/modify :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | ############################################### ## ## Forwarder ## Section 1 :: Sending Logs ############################################### ## Secure forward logs to fluentd server <match **> type forward send_timeout 10s recover_wait 5s heartbeat_interval 1s phi_threshold 16 hard_timeout 60s <server> host log-ip port 24224 weight 20 </server> </match> ## Section 2 :: Reading Logs ############################################### ## tail nginx logs ## add tag nginx.access # <source> type tail format nginx path /var/log/nginx/access.log tag nginx.access # Select a file to store offset position pos_file /tmp/nginx-access-td-agent </source> ################################################ ## tail nginx logs ## add tag nginx.error <source> type tail format /^(?<time>[^ ]+ [^ ]+) \[(?<log_level>.*)\] (?<pid>\d*).(?<tid>[^:]*): (?<message>.*)$/ path /var/log/nginx/error.log tag nginx.error # Select a file to store offset position pos_file /tmp/nginx-error-td-agent time_format %Y/%m/%d %H:%M:%S </source> ################################################ ## tail syslog ## add tag syslog <source> type tail format syslog path /var/log/syslog tag syslog pos_file /tmp/syslog-td-agent.tmp <source> |
Run :
1 2 | /etc/init.d/td-agent restart /etc/init.d/td-agent status |
You can run cURL to POST :
1 | curl -X POST -d 'json={"json":"message"}' http://localhost:24224/debug.test |
There are more documentation on port part :
1 | http://docs.fluentd.org/articles/out_secure_forward |
Here is plugin management :
1 | http://docs.fluentd.org/articles/plugin-management |
fluentd has one of the worst documentation on this earth. I ran the command many steps before :
1 | /opt/td-agent/embedded/bin/fluent-gem install fluent-plugin-secure-forward |
else you would get error. They actually have lot of stuffs :
1 | https://github.com/fluent |
including web UI.
Tagged With Fluentd , fluentd installation step by step to read logs in ubuntu , fluentd register on hadoop , fluentd server log agent , from windows td-agent to fluentd , how to setup fluentd as a service on windows , paperuri:(da9ad54f0ba8fad911ea136b4ff7537a) , td-agent conf window server logs , td-agent nginx_access