Apache Tika is a Content Analysis Framework. Tika is like we right click on file and selecting properties option on desktop BUT for web. It also can detect content. Apache Tika detects and extracts metadata and text from different file types – it can identify more than 1400 file types. Tika has relation with Apache Nutch codebase. Tika has fork in Python too. Tika has different way of implementation on server to integrate with various blogging platforms and CMS (including WordPress). Here is How to Install Apache Tika on Ubuntu server. Tika is easy and light to install, indeed can be tested on Windows with Ubuntu bash running. Tika has a GUI when ran from desktop operating system.
How to Install Apache Tika on Ubuntu Server
We need latest Java Runtime Environment (JRE) and Maven. Java Development Kit (JDK) is usually needed to compile Java applications. Installing Java with apt :
1 | apt update && apt upgrade |
Installing Java
---
This will install the Java Runtime Environment (JRE) with the following command:
1 | apt install default-jre |
If you need the Java Development Kit (JDK), then execute the following command. JDK contains the JRE, there are no disadvantages if you install the JDK instead of the JRE :
1 2 | apt install default-jdk apt install openjdk-8-jdk |
Oracle JDK is the official JDK; but no longer Oracle provides as default installation for Ubuntu. We can install with :
1 2 3 4 | sudo apt install software-properties-common apt-transport-https unzip wget curl nano -y sudo add-apt-repository ppa:webupd8team/java apt update && apt upgrade apt install oracle-java8-installer |
When there are multiple Java installations on your server, the Java version to use as default can be chosen with the following command:
1 | sudo update-alternatives --config java |
You will get something like this :
1 2 3 4 5 | Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/lib/jvm/java-8-oracle/jre/bin/java 1062 auto mode 1 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java 1061 manual mode 2 /usr/lib/jvm/java-8-oracle/jre/bin/java 1062 manual mode |
Same goes for the Java compiler (javac), keytool, javadoc and jarsigner :
1 2 3 4 | sudo update-alternatives --config javac sudo update-alternatives --config keytool sudo update-alternatives --config javadoc sudo update-alternatives --config javasigner |
We can set JAVA_HOME
environment, we will get the path of the installation for each from above example, like :
1 2 3 | /usr/lib/jvm/java-8-oracle /usr/lib/jvm/java-7-openjdk-amd64 ... |
We can open file /etc/environment
:
1 | nano /etc/environment |
And add the appropiate line :
1 2 | JAVA_HOME="/usr/lib/jvm/java-8-oracle" ## change /usr/lib/jvm/java-8-oracle with yours |
Reload and test :
1 2 | source /etc/environment echo $JAVA_HOME |
Installing Maven
We actually have older guide to Install JDK 8 and Maven 3 on Ubuntu 14.04. Apache Maven is easy to install and has good official documentation :
1 2 | http://maven.apache.org/download.cgi ## click the installation instructions there |
In short, this is the way :
1 2 3 4 5 6 7 8 9 | cd /opt wget http://redrockdigimark.com/apachemirror/maven/maven-3/3.5.3/binaries/apache-maven-3.5.3-bin.tar.gz tar -xzvf apache-maven-3.5.3-bin.tar.gz ls -al ## you'll see directory like apache-maven-3.5.3 ## remove tar rm apache-maven-3.5.3-bin.tar.gz ## give a sane name mv apache-maven-3.5.3/ apache-maven/ |
Now, just add the $PATH
of that directory :
1 2 | cd /etc/profile.d/ nano maven.sh |
Add :
1 2 3 4 | export JAVA_HOME=/usr/lib/jvm/java-8-oracle export M2_HOME=/opt/apache-maven export MAVEN_HOME=/opt/apache-maven export PATH=${M2_HOME}/bin:${PATH} |
Save the file and run these :
1 2 | chmod +x maven.sh source maven.sh |
Test :
1 2 | mvn --version mvn --help |
You can see, how weak the official guides are.
Installing Apache Tika
This is official download site :
1 | https://tika.apache.org/download.html |
Download and uncompress :
1 2 | cd /opt wget http://redrockdigimark.com/apachemirror/tika/tika-1.18-src.zip |
Ten uncompress, install with Maven :
1 2 3 4 | tika-1.18-src.zip ls -al cd tika-1.18 mvn install |
Finish. Now you can test Tika from the base directory of Tika :
1 2 3 | https://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/index.html https://wiki.apache.org/tika/TikaBatchUsage http://tika.apache.org/1.18/gettingstarted.html |
This ends the tutorial.
Tagged With YGHH , NUVH , install tika ubuntu , how to download apache tika , apache tika ubuntu setup , Apache Tika server , apache tika installation , apache tika guide , Apache tika gov cloud aws , apache tika download