We'll start with the storage layer. Understanding how it works is a good way to understand what happens at higher layers.
The storage layer provides:? Parallelism and efficiency.In other words, it is competitive access. That is, when we try to benefit from parallelism, the problem of competitive access inevitably arises. We simultaneously go for one resource that can be written in the wrong way, be beaten while writing and, hell knows what else, it can work.
? Reliability: recovery from failures.
The second problem is a sudden failure. When reliability is provided, it means that not only have we provided a disaster-resistant solution as much as possible, but it's also important that we know how to recover quickly if something happens.
When I talk about integrity, external keys and so on, everybody somehow hums and says that they check it all at the code level. But as soon as you say: "Let's give an example on your salary! They transfer your salary, but it doesn't come" - somehow it becomes clearer at once. I don't know why, but there is a glitter in your eyes and interest in the topic of external keys, contacts.
Below is the code in a non-existent programming language.
balance = 1000,
curr = 'RUR'
send_money(account_a, account_b, 100);
send_money(account_a, account_c, 200);
account_a->balance = ????
Let's say we have a bank account with a balance of 1,000 rubles, and there are 2 functions. How they are arranged inside, we do not care now, these functions transfer 100 and 200 rubles from the account a to other bank accounts.
Attention, the question is: how much money will the result in the balance of the account a? Most likely, you will answer that 700.
This is where the problems with competitive data access begin, because I have a fictitious language, it is completely unclear how it is implemented, whether these functions are executed simultaneously and how they are arranged internally.We probably think that the send_money() operation is not an elementary action. We need to check the balance and where it is translated, perform control 1 and 2. These are not elementary operations that take some time. That's why we need the order of executing elementary operations inside them.
In the sequence "read the value on balance", "write to another balance", the important question is - when did we read this balance? If we do it simultaneously, a conflict will occur. Both functions are performed approximately in parallel: we have read the same value of the balance, transferred apm saas money, wrote each one of our own.
There may be a whole family of conflicts, which may result in 800 rubles on the balance, 700 rubles as it should be, or something will be broken and the balance will be null. This, unfortunately, happens if you do not treat it with due attention. How to fight it, we will talk about it.
In theory, everything is simple - we can do it one by one and everything will be fine. In practice, there can be a lot of these operations and it can be problematic to do them strictly sequentially.
If you remember, a few years ago there was a story when Sberbank fell Oracle and card processing stopped. They then asked for advice from the public and roughly indicated how many logs the database wrote. These are huge performance management saas numbers and competitive issues.
It is not a good idea to carry out operations in series because of the simple reason that there are a lot of operations and we will not benefit from parallelism. We can, of course, divide operations into groups that won't conflict with each other. There are such approaches too, but they are not very classical for modern databases.