In our previously published article namely Approaches of Deep Learning : Part 1, we explained the most important basics of artificial intelligence, machine learning, deep learning.
Those were among the fundamentals which are important for further understanding of the scientific work topic. In this Approaches of Deep Learning : Part 2, we will focus around Benefits and use cases, Artificial neural network, Overview of use-cases of Deep Learning.
Table of Contents |
Approaches of Deep Learning : Benefits and Application Examples
Deep Learning can be used in a variety of applications.
---
Recognition of standalone objects in pictures and videos: Deep learning algorithms allow a machine to classify and recognize objects when the same object looks a little different. Example: The deep learning program has learned what a bus operating in a city looks like or what features characterize a bus. However, when it sees pictures of buses traveling in other country, it still recognizes these as buses because the essential features of a bus can be applied.
Detecting objects that are obscured, for example, by other objects to a certain extent: Deep Learning allows objects to be recognized, even though a certain area of the searched object is obscured. The deep learning program knows a stop sign. However, a camera films a stop sign that is almost covered by a tree. However, the program recognizes the essential features such as color, shape and characters and recognizes a stop sign.
Voice recognition: Deep Learning makes it possible to communicate with machines by voice. The machines learn new words and word applications and independently expand their language repertoire. As an example, Apple’s Siri.
Translation of spoken texts: Deep learning makes it possible to convert spoken language into text. Also exotic languages, such as Chinese can be supported. Deep Learning Speech recognition algorithms recognize different pitches in such languages, which usually needs to be translated differently.
Advanced artificial intelligence in computer games: The artificial intelligence of many PC and console games used to be that the programmers predefined a set of rules and the object randomly decides which of the previously defined actions should be triggered. Through deep learning, a computer object can learn from the player’s behavior and calculate its own routines. Thus, the computer opponent becomes more unpredictable.
Predictive Analytics: Deep Learning makes it possible, for example, to continuously analyze customer data from a CRM system in order to make certain predictions about future customer behavior.
Artificial Neural Network
Artificial neural networks are a composite of many small artificial neurons and belong to the category of artificial intelligence. This is based on the biological neural networks. Neural networks are mathematical models that are based on the human brain. Human brain has more than 10 billion neurons and nerve cells. A neuron consists essentially of a cell body, dendrites and an axon. Stimuli are absorbed via the dendrites and transmitted to the cell body or axon hill. This is responsible for the function of the neuron. It processes the different stimuli by adding up the stimuli. The shorter a dendrite, the stronger the appeal. Through the transmission of the stimuli, an excitation potential builds up on the axon hill. If the added stimuli exceed the threshold value of the excitation potential, an action potential is triggered. It is also controlled by “cell fires”. When the “cell fires”, the stimuli on the axon are relayed to the synapses. The synapses are connected to other neurons. One neuron is directly connected to 2000 other neurons. The stimuli are always processed in one direction only. This procedure is also called “transmitter / receiver” principle. The whole process is colloquially referred to as “thinking”.
Building Artificial Neuron
Artificial neurons are modeled on the neurons of the human brain. They have n inputs through which the stimuli can be absorbed. The output propagates the processed stimuli to other artificial neurons. You can work according to the all or nothing principle (1 or 0) or they work according to the input, which are dependent on thresholds. The thresholds are between 0 and 1 or -1 and +1 depending on the activation function. The inputs of an artificial neuron are the dendrites of a biological neuron. Each entrance has its own weighting. The input value is multiplied by the weighting. If all input values have been multiplied by their weighting, then all values in a sum function are added together. This sum function is an elementary component of an artificial neuron. The sum of the function is passed to the activation function. The activation function decides which value is output or displayed on the output.
Activation functions
To generate an output from an artificial neuron requires an activation function. It receives the sum function, ie the input values depending on the weightings. The activation value can be arbitrary. As already explained, default values are -1 and +1. The three most important activation functions are the jump function, the linear function and the sigmoid function. The jump function, also called the threshold function, generates a 1 if the result of the summation function is greater than or equal to 0, and if the result is less than 0, a 0 is output. Thus, a neuron with the threshold function can only carry something to the network when the neuron outputs a 1. With the linear function, the output increases linearly depending on the input values. The sigmoid function is the most realistic function of the three mentioned. It is a very useful non-linear function. The above mentioned activation functions are not limited to those few.
An artificial neural network consists of several artificial neurons connected by means of a connection. There are usually three layers in a network. An input layer, a hidden layer and an output layer. The input layer is responsible for data acquisition. The data can come from different sources, eg from other programs. The output layer provides the calculated information. The hidden layer includes all neurons which lie between the input layer and the output layer. The hidden layer is so called because it is not in direct contact with the outside world. The hidden layer can also consist of several layers. These networks are called multi-layer neural networks.
Use-Cases of Deep Learning
Usage in image recognition or face recognition is on the rise. For face recognition, it is necessary to extract important features from the input values so that they can be used. There are already various facial recognition methods.
Template Matching: The most commonly used method is called template matching. Here, the face is compared with a template. A template is a predefined face, with the most important features of a face eg eyes, nose and mouth. The template must be of very high quality, so that different faces can be compared. For a face to be recognized, it has to be matched with many faces from a database. This creates a vector. This contains features that are similar to the face. However, this method is very computationally expensive.
Geometric features: The method extracts the positions of the most important features of a face in a vector. These features are the nose, the mouth and the eyes. The distances between the different features are also calculated and stored in a vector.
Fourier Transformation: The idea behind Fourier Transformation is to transform the input image and the comparison image into frequencies. Due to the frequency ranges, the images can be compared more easily.
Elastic graphs: The method puts up grids on the face. This grid is called Labeled Graph. Using the mesh and its nodes, an algorithm calculates the important features of a face. With a new face to be compared with the existing, the grid of the existing face is hung up. This grid is adjusted to look similar to the existing one. If this grid fits, then there is a match.
There are many areas where text recognition algorithms are used. In the following two different ways are presented.
Text recognition: Optical Character Recognition (OCR) means the automatic recognition of characters from a computer. Using algorithms, it recognizes printed characters. In this case, a printed or written text is optically scanned. Colloquially, one can say that the computer can copy the visual texts. The technology is mainly used in document, form evaluation to archival systems. To use this technology, you need not only a scanner, but also the right software that can convert the recognized text into Doc, HTML, PDF, or TXT. Following this, you can then work with the recognized text. We given a practical use case of using IBM Watson to analyze texts in a given Google Docs text document.
Text mining: Analyzing texts using a computer is still a tough challenge, but not impossible tasks for the algorithms. In order to analyze texts in terms of software, this presents a challenging task for the combination of linguistic and static methods. Text Mining is a largely automatic process for extracting specific information and knowledge from texts. For this purpose, techniques have been used that have been developed in areas such as Natural Language Processing (NLP), Information Retrieval, Information Extraction and AI. Text mining is considered a special feature of data mining. But one of the most important differences between these two information retrieval methods is in the database. In text mining, the values are understood as an unstructured basis of analysis, while data mining is the first normal of relational database terminology. This means that the individual values are atomic. Nonetheless, in Text Mining, the values of the text can not be fully understood as an unstructured basis of analysis, because even these values are already subject to a certain grammar. Furthermore, the text is structured by headings and paragraphs.
Speech processing: Speech processing can be divided into two types – Speaker-independent speech recognition and Speaker-dependent speech recognition. In speaker-independent speech recognition, the user can start speech recognition without a previous training phase. It is different with speaker-dependent speech recognition. In this case, the system must be trained before the first use on the user-specific peculiarities of pronunciation.
The homophones are a major problem in the language. Homophones are words that have a different meaning, but are pronounced the same.
Application Examples of Deep Learning
Deep Learning will be used in many areas in the future. Google and Apple already took advantage of deep learning in the smartphone operating systems Android and iOS. The speech recognition was implemented on them. Facebook uses the facial recognition process on uploaded photos of friends. But also in other sectors, such as advertising, finance or medicine finds usage of Deep Learning. A researcher Andrew Ng, has developed a prediction method for hard drive failure. In medicine, deep learning should be used to predict disease progression, for example in cancer. This procedure would revolutionize medicine. In drug development, deep learning could predict the best drugs for a given drug. We can find usage of Deep Learning in the field of Recognition of standalone objects in pictures and videos, Voice recognition, Translation of spoken texts, Advanced artificial intelligence in computer games, Predictive Analytics (predictions with probabilities) and so on.
Conclusion
The breakthrough of artificial intelligence or artificial neural networks was achieved by Google’s subsidiary company, DeepMind in 2016. Here, a human player was defeated by AlphaGO.
An artificial neural network is not very easy for a normal user to understand. To make this more transparent, Google has provided a “playground” for neural networks on a website of Tensorflow (http://playground.tensorflow.org
) where easily multi-layered neural networks can be built together. The behavior at runtime, before, during and after the calculation, can be observed.
Google is currently researching on a deep learning system that can determine the location of the image on the basis of the pixels in a picture. In the initial tests, the computer is far ahead of humans.
In 1943, the early pioneers attempted the challenge of machine learning and neural networks. They believed that one could adapt the structure of the brain with the neurons as neural networks, with the goal of developing a simple method to teach machines to think. It is astonishing how at that time the idea could be even thought with so limitation of resources.
Tagged With voice recognition