Replication or replication in the literal sense of the word is the mere production of multiple copies (copies) of the same data but is usually associated with the regular comparison of the data. In general, replication in data processing is used to make data accessible in multiple places. On the one hand, this is used for data backup; on the other hand, to shorten response times, especially for read data accesses.
The simplest form of data replication is the storage of a copy of a file, in advanced form the copy and paste of modern operating systems. Replication in the literal sense is also the duplication of optical data carriers in a pressing plant or with the help of a burner.
Changing data accesses are generally more complex due to the multiple keeping of the data. In the case of frequently encountered master/slave replication, a distinction is made between the “original” of the data (primary data) and the dependent copies. In the case of equivalent copies (version management), merge strategies must be used in replication to merge the data stocks (casual synchronization, different from real synchronization).
---
Sometimes it is important to know the timeliness of the replicas. Depending on the type of replication, there is a certain amount of time between the processing or creation of the primary data and its replication. This period is also called timeline but is usually referred to as latency.
Synchronous replication
Synchronous replication is when a change operation on a data object can only be completed if it has also been performed on the replicas. To be able to implement this technology, a protocol to ensure the atomicity (indivisibility) of transactions must be used, the commit protocol. Synchronous replication strategies:
- ROWA procedure
- Voting procedure, e.g. weighted voting
Examples of synchronous replication include:
- Warm Standby Replication
- Hot Standby Replication of SQL Server Microsoft Databases
Asynchronous replication
If there is a latency between the processing of the primary data and replication, it is called asynchrony. The data is synchronous (identical) only at the time of replication. A simple variant of asynchronous replication is “File Transfer Replication”, the transfer of files via FTP or SSH.
This means that the data from the replicas are only a snapshot of the primary data at a given point in time. At the database level, the transaction logs of the databases can be transported from one server to another at short time intervals and read into the database. Assuming an intact network, the latency then corresponds to the time interval in which the transaction logs are written. Asynchronous replication strategies:
- Merge replication
- Primary Copy
- Snapshot replication
Pros and cons of replication
Advantages of replicas in distributed database systems:
- Increased availability of data
- Acceleration of read accesses
- Better options for load balancing and query optimization
Disadvantages:
- High update effort
- Increased disk space requirements
- Possible redundancy of the data sets with possible networking