In our previous articles, we have discussed about Key-Value Database and Document-Oriented Database. The Column-oriented Database is another major type of database which is used in data sciences.
The Column-oriented Databases store data tables by column (not row). These can use query languages like SQL and can serve data for extract, transform, load (ETL) and data visualization. The result, the database can access the data in a granular way avoiding scanning and discarding unwanted data. A common method of storing is to serialize each row of data. Take that two persons are having text chat and this is the JSON form :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | [ { "message": "Hi Abhishek. How are you?", "timestamp": 1571909530, "senderId": 111111, "seen": true },{ "message": "This is Kiki.", "timestamp": 1571909607, "senderId": 111112, "seen": true },{ "message": "Hi Kiki. I am fine. How are you?", "timestamp": 1571909653, "senderId": 111113, "seen": false } ] |
I have used UNIX timestamp of the present time to create the conversation. SenderID is imaginary. Below will be its column-oriented representation :
---
1 2 3 4 5 6 | { "messages": ["Hi Abhishek. How are you?", "This is Kiki.", "Hi Kiki. I am fine. How are you?"], "timestamps": [1571909530, 1571909607, 1571909653], "senderId": [111111, 111112, 111113], "seen": [true, true, false] } |
We can use some script to concatenate in the above form. In the binary form, we get to benefit a column-oriented data by simple alignment. We are talking about the data structure alignment and completely avoiding this article. Column-oriented Database a good solution for size reduction by avoiding repetition – byte packing. Although not everything of Column-oriented Database is just great.
Row Oriented Databases store as row-wise. The column values are stored together. The data is in data files occupies a certain number of physical blocks. In each block, a certain number of rows stored. These are suitable for transactional (OLTP) workloads. Data of Column-oriented Database persist on a set of column files comprising of several physical disk blocks. One column is stored as a single column file. These are more suitable or analytical (OLAP) workloads. A SATA hard drive has an average seek time of 20 milliseconds while DRAM access on latest Intel processors have seek time of 60 nanoseconds. Column-oriented Database boosts the performance by reducing disk reading amount. Column-oriented databases play great with the data which is much larger than available RAM, or where stream processing is needed.
Tagged With column oriented database , open db file , cap and column oriented , column oriented database characteristics , column-oriented , what is a column oriented database , what is column oriented database , what is column-oriented table service , what is the principal column oriented