This is the continuation of third part of the series. The basis of learning from data is gathering the sources of data which can generate the forecast and a methodology to be able to evaluate them. In order to be able to create stock forecasts using machine learning, the sources must first be defined with regard to the required data. For a forecast, the data should come from sources from which other stockholders would also obtain information in order to better determine a behavior between supply and demand. Sources that provide information about future dividends, interest rates and future prices are suitable for these needs. The information can be taken automatically from online portals or news magazines or entered manually. With manual entry, further control by the user is possible, which increases the effort on the one hand, but on the other hand ensures that only checked data is used for a forecast.
As a basis for the machine evaluation of the data, the neural network will be discussed further in this work. A neural network consists of artificial neurons and is part of artificial intelligence. Neural networks are successfully used in pattern recognition, categorization or forecasting. In contrast to a computer program, neural networks do not have to be programmed or adapted to a specific situation. A network can be learned with existing data and react associatively with the learned data in a new or unknown situation. Internally, a network works with different neurons that are linked together in different layers. In the input layer, the inputs are received by the outside world and forwarded to hidden layers. The hidden layer evaluates the input signals and then passes them on to the output layer, where they are then output. In some applications, the use of a hidden layer is not used. Then the information is passed on directly from the input to the output layer. The neurons saw it as a task to absorb information from the environment or from other neurons and to pass it on in a modified form. The cumulative result is then output in the output layer. Depending on the level of learning of the neural network, there is a set of inputs, a set of outputs related to the inputs. Thanks to the level of learning, the neurons can individually establish links to other neurons, change weights or separate links. As already explained, the individual neurons are linked to a weighting. The higher the absolute weighting, the greater the impact on the next unit. A positive weight indicates that a neuron has an exciting effect on the next layer. A negative size means that the neuron inhibits the other layer. A value of zero has no effect on the next shift. In addition to the weighting, the result for the next neuron is also influenced by its output or output amount. Only the multiplication between the output amount and the weighting gives the result, which is passed on as input to the next neuron. Input for the next neuron can also be specified as a formula. The content of the output layer is determined by the arrangement or weighting of the individual neurons. A small change in the weighting can therefore influence the result of the loan approval mentioned in the example above.
Before data can be used to create characteristics that can be learned, they must first be prepared. A clean database is the basis of machine learning. Even with the best classification method, the computer will not produce good results if the data is insufficiently filtered or not adapted. So, data must first be cleaned in order to achieve a satisfactory result. The cleaning of the data is associated with a relatively high effort, but has a decisive influence on the quality of the results. Good data quality or its data preprocessing is the basis for machine learning.
---
In the second stage, the data were examined for possible outliers. Course changes of more than 10% were defined as outliers. If an outlier was detected, it was determined whether it was a faulty data point or an existing course change. Missing data points would have been treated using interpolation options. The completeness of the data was ensured in the third and final stage. As with the faulty data points, missing period units would have been treated here by interpolation. The data obtained in this way can then be used for further measures to automatically predict the share price.
In order to be able to recognize course trends and course changes quickly at a glance, they are usually displayed graphically. This can be done with a line chart or bar chart, for example. While people can easily evaluate this type of representation, it is not very ideal for computers. For a computer, the data must be described numerically or mathematically. In addition to the course information, average values and characteristics that describe a relative location belong to the required data. Relative values can be determined from the difference between opening, high, low and closing prices. A candlestick pattern can be used to mathematically describe price trends for the computer. Rising courses are drawn with a white candle body. The high and low prices are defined at the two outer ends. The high above and the low below. The closing price is given directly below the high price and the opening price above the low price. Falling courses are described with a black candle body. As with the white candle body, the high price is defined at the top and the low price at the bottom. Unlike the white candle body, the opening price below the high and the closing price above the low are described here. The following figure shows the structure of a candlestick As with the white candle body, the high price is defined at the top and the low price at the bottom. Unlike the white candle body, the opening price below the high and the closing price above the low are described here. The following figure shows the structure of a candlestick As with the white candle body, the high price is defined at the top and the low price at the bottom.
To be continued in next part.