#0 Scylla: A data intensive DB.
There are numerous of frameworks and languages to work on as a software engineer. As for myself, I already worked with PHP, JS, Python, Java and Go. And as I went through them I work with different frameworks on the same language. Most of the times languages and frameworks — and as we should see databases — are tools to get to an objective. In most of the cases I used relational databases to store data. It’s the most common one. It’s the engineers and computer scientists are tought when in college. However more recently I was presented to Scylla. A NoSQL Wide-column database. Scylla is optimized for intensive throughput and recommended for applications that deals with larges amount of data. This inital text will give the idea how Scylla is positioned when it’s compared with relational databases and for the next weeks I will go through the main features implemented on Scylla.
Relational Database X NoSQL
Relational databases store data in whats called tables. Don’t take the word relational for granted. It defines how the tables make sure how it will related between them. These relation is defined by sharing an unique identifier between the tables: the primary key(PK). When the PK is used to be linked to another table, the column which the PK is stored is called a foreign key(FK). These relations guarantee what is call referential integrity and means that if PK exist in one table and it’s related to another table through FK it makes sure the link exist. It protects the relational structure of the database. Something that don’t exist in most of NoSQL as they sacrifice in certain degree the consistency. For example Scylla works on distributed servers. How can one make sure the data existing in one server is the same persisted in another? NoSQL focus mainly availability. Rather than providing strong consitency, they provide eventual consistency. This means the read can eventually fails because of a result of the latest write.
Entity-based modeling
This way to model a DB is strongly connected with the relational model. The main focus is to create a structure that can represent data regardless of the application. Is fully focused on how the data will be store and how will represent the relations on the DB. The application is build from this concept forwards. The main advantages on this approach is to make sure the data is concived and generated with the guarantee the relations between the entities are protected by the referential integrity. With entity-based it’s also required to think through normalization to avoid redundant data across different tables and entities.
Query-based modeling
On the other hand, NoSQL databases requires to abandon normalization in some levels to take advantage of their capabilities. For this we applied query-based data modeling. Their main focus is to model a database for which will be used. This means that all data that will be required by the application will be found on a single query to the database. Make this shift on data modeling can be tricky.
Scylla & Wide-column databases
Wide-column databases, a type of NoSQL databases, is a column-oriented DBMS. The data is stored in tables but each row can assume names and formats differents with each other. In the case of Scylla the data is even stored across different servers. However handle with transactions can be costly as typically data is stored as columns and maybe across different files.
To work around and optimize read/write Scylla offers compactation strategies to improve one or another. These strategies play around with the SSTables to amplify storage, read and write. Each one has to be choose carefully to avoid bottlenecks and specific for each use case. These strategies will be the first topic next.
Next up
As we could see despite being there for years SQL could not handle the mutable, unpredictable and distributed scenarios that covers big data. NoSQL offers better response, availability and flexibity needed for data-intensive scenarios. For these next texts we will cover some aspects of Scylla that allows these characteristics to be explored and optimized by developers.