Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

SOLR replication: different behavior for network cut off vs. machine restart

$
0
0

We try to set up a SOLR Cloud environment using 1 shard with 2 replicas (1 leader). The replicas are managed by 3 zookeeper instances.

The setup seems fine when we do the normal work. The data is being replicated at runtime.

Now we try to simulate erroneous behavior in several cases:

  1. Turn off one of the replicas in two different scenarios: leader and non-leader
  2. Cutting off the network making the non-leader replica down

In both cases the data is being written contentiously to the SOLR Cloud.

CASE 1: The replication process starts after the failed machine gets boot up again. The complete data set is present in both replicas. Everything works fine.

CASE 2: Once reconnected to network the non-leader replica starts the recovery process ,but for some reason the new data from leader is not being replicated onto the previously failed replica.

From what I was able to read from logs comparing both cases I don't understand why SOLR sees

RecoveryStrategy ###### currentVersions as present and RecoveryStrategy ###### startupVersions=[[]] (empty)

compared to CASE 1 when RecoveryStrategy ###### startupVersions are filled with objects that are in currentVersions in CASE 2

The general question is... why restarting SOLR results in a successful migration process, but reconnecting the network does not?

Thanks for any tips / leads! Greg


Viewing all articles
Browse latest Browse all 17268

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>