There is reason why we compared Elasticsearch with Apache Hadoop. Here is How To How to Install, Configure Elasticsearch with Apache Hadoop, Flume, Kibana. Also We Provided Links to Official Configuration. Before running the commands, we will suggest to read the text under the next sub header.
README To Install, Configure Elasticsearch with Apache Hadoop
Previously we have published few important guides. Even you are used for few months with Apache Hadoop, it will not harm to read the above links. They have more textual information more than mere commands to install.
- Installing Apache Hadoop
- Installing Apache Spark
- Installing Fluentd
- Installing Elastic Stack / ELK Stack
How to Install, Configure Elasticsearch with Apache Hadoop
Minimum needs are HDP 2.0 or HDP Sandbox for HDP 2.0 on CentOS. In other words, follow previous guide to install Hadoop which is on numbered one on previous list of guides. Now we need to work with Java and PATH:
---
1 2 3 | export JAVA_HOME=/usr/java/default export PATH=$JAVA_HOME/bin:$PATH java -version |
Next we need to install Flume :
1 | yum install flume-agent flume |
Next, we need to install Elasticsearch, which is on numbered four on previous list of guides. We need to open and modify the file elasticsearch.yml
:
1 | nano /etc/elasticsearch/elasticsearch.yml |
These are things you need to modify :
cluster.name: “logsearch”
node.name: “node1”
node.master: true
node.data: true
index.number_of_shards: 5
index.number_of_replicas : 1
path.data: /data1,/data2,/data3,/data4
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.timeout: 3s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [“host1”, “host2:port”]
Log location is at /var/log/elasticsearch
. You can CD, open and adjust later. We can control Elasticsearch with usual commands :
1 2 | /etc/init.d/elasticsearch start /etc/init.d/elasticsearch status |
Next we need to install Kibana, which is on numbered four on previous list of guides. Update the logstash index pattern to Flume supported index pattern. Under app/dashboards/logstash.json
the entries [logstash-]YYYY.MM.DD
need to be separated by dash – [logstash-]YYYY-MM-DD
. Now, we need to work with Flume :
1 2 3 | mkdir /usr/lib/flume/plugins.d cp $elasticsearch_home/lib/elasticsearch-0.90*jar /usr/lib/flume/plugins.d cp $elasticsearch_home/lib/lucene-core-*jar /usr/lib/flume/plugins.d |
We can update Flume configuration to use a local file and index into Elasticsearch in logstash format. But in real cases, Flume Log4j Appender, Syslog TCP Source, Flume Client SDK, Spool Directory Source are used. The below is for Flume configuration to use a local file. You must understand what you are doing. You are only testing.
agent.sources = tail
agent.channels = memoryChannel
agent.channels.memoryChannel.type = memory
agent.sources.tail.channels = memoryChannel
agent.sources.tail.type = exec
agent.sources.tail.command = tail -F /tmp/es_log.log
agent.sources.tail.interceptors=i1 i2 i3
agent.sources.tail.interceptors.i1.type=regex_extractor
agent.sources.tail.interceptors.i1.regex = (\w.*):(\w.*):(\w.*)\s
agent.sources.tail.interceptors.i1.serializers = s1 s2 s3
agent.sources.tail.interceptors.i1.serializers.s1.name = source
agent.sources.tail.interceptors.i1.serializers.s2.name = type
agent.sources.tail.interceptors.i1.serializers.s3.name = src_path
agent.sources.tail.interceptors.i2.type=org.apache.flume.interceptor.TimestampInterceptor$Builder
agent.sources.tail.interceptors.i3.type=org.apache.flume.interceptor.HostInterceptor$Builder
agent.sources.tail.interceptors.i3.hostHeader = host
agent.sinks = elasticsearch
agent.sinks.elasticsearch.channel = memoryChannel
agent.sinks.elasticsearch.type=org.apache.flume.sink.elasticsearch.ElasticSearchSink
agent.sinks.elasticsearch.batchSize=100
agent.sinks.elasticsearch.hostNames = your.IP.here:9300
agent.sinks.elasticsearch.indexName = logstash
agent.sinks.elasticsearch.clusterName = logsearch
agent.sinks.elasticsearch.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer
Now, create :
1 | touch /tmp/es_log.log |
open :
1 | nano /tmp/es_log.log |
populate with this kind of things :
1 2 3 4 5 6 7 8 | website:weblog:login_page weblog data1 website:weblog:profile_page weblog data2 website:weblog:transaction_page weblog data3 website:weblog:docs_page weblog data4 syslog:syslog:sysloggroup syslog data1 syslog:syslog:sysloggroup syslog data2 syslog:syslog:sysloggroup syslog data3 syslog:syslog:sysloggroup syslog data4 |
Restart Flume :
1 | /etc/init.d/flume-agent restart |
Frankly, this is a typical setup. You need to read :
1 2 3 | https://github.com/elastic/elasticsearch-hadoop https://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html https://www.elastic.co/guide/en/elasticsearch/hadoop/current/reference.html |
for case specific setup and usage.
Tagged With flume kibana , dhow to install elasticsearch hadoop , kibana elasticsearch hadoop flume , installing elasticsearch on hadoop 2 7 , install elasticsearch-hadoop , install apache for kibana , how to install elasticsearch on hadoop , How to Install and Configure Apache Hadoop , How To Configure Elasticsearch on Hadoop with HDP 2018 , elasticsearch for apache hadoop configuration