Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

What exactly are secondary members acknowledging in replicasets with MongoDB?

$
0
0

I try to understand the mechanics and behavior of MongoDB's replica-features better. I hope to get some insightful information from someone who is familiar with mongo's internals and behaviour. My aim is to understand Mongo's durability guarantees in a high-availability setup. The data I want to maintain would have unique constraints (by using two unique indexes), to maintain a 1:1 relationship by having two unique values in one doc. Inconsistencies or only partial durability would be fatal for my 1:1 relation and hardly resolvable once happened, especially undetected.

My question:

What exactly are secondaries acknowledging in a replica-set setup and using the writeconcern:"majority"?

That only the statement is acknowledged and applied in the secondary or that the whole oplog up until the new statement is applied?

Explanation of what exactly I am looking for:

According to their own documentation and some other sources (e.g. Aphyr's try on simulating network-partitions and his follow-up) using write-concern "majority" will make sure, that no write will be lost even in partitioning-scenarios and the like. (since I cannot post more than two links, I will not be able to refence mongodb's documenation). Stale reads are acceptable in my use-case.

But one thing that I just cannot figure out from their documentation is, what exactly is acknowledged on the replicas.

Replica-Sets are replicated using the oplog (collection of statements that modified data) by distributing them asynchronously. For avoiding a rollback once a failed primary tries to join the cluster with writes, that were not distributed (because of the asynchronous nature), they recommend using the majority-writeconcern (Doc->Replica->Replica Concepts->High Availbilty->Rollback). To my understanding, there is a case, where this cannot be achieved, depending on what exactly is acknowledged.

Let's say we have three instances, (1), (2), (3). One is primary (1,P) and the other two are secondaries (2,S), (3,S).

We are now trying to insert two new rows/docs {1} and {2}. And due to network-instability, instances will be unavailable to each other at some times.

Let's say the replicasets acknowledge single statements from the primary, then the following can happen:

  1. instance (3,S) is unavailable, but inserting record {1} will succeed since there are 2 out of 3 available (majority achieved).

Current state:

(1,P):{{1}}

(2,S):{{1}}

(3,S):{{}} unreachable

  1. instance (2,S) becomes unavailable and instance (3,S) happens to be online (not restarted, just online to the others). Meanwhile no asynchronous replication was initiated by the primary. Record {2} is inserted. Majority achieved again (2 out of 3)

Current state:

(1,P):{{1},{2}}

(2,S):{{1}} unreachable

(3,S):{{2}}

If (1,P) goes down, an election will happen. (2) or (3) will become primary. But that will make the database inconsistent, since the new primary has no knowledge of the other lost doc/row. Now some sort of rollback is bound to happen when the original primary comes online (and thus losing data).

If acknowledges are made by applying all latest oplog-entries, then this should not happen. (2,S) would have both entries when acknowledging {2}. And then will also win the election. Data is not lost.

I am well aware that this is unlikely to happen. But if it happens, I don't see how it would even be detected or avoidable unless automatic failover is just disabled.


Viewing all articles
Browse latest Browse all 17268

Trending Articles