I have two HBase clusters (running version 0.98.6-cdh5.3.8) with multiple tables being replicated from one cluster to the other. Some of the tables no longer need to be replicated so I want to alter the REPLICATION_SCOPE value from 1 to 0 for those tables. One caveat, I cannot easily stop either of the clusters, so this has to be done on a live environment.
All the documentation I found online indicates that you need to disable the table before you can alter the replication scope, for example :
hbase> disable 'example_table'
hbase> alter 'example_table', {NAME => 'example_family', REPLICATION_SCOPE => '1'}
hbase> enable 'example_table'
However, I cannot find any information on what will happen to a live table when it is temporarily disabled then re-enabled. Will there be missing gaps in data or will the application that uses the table crash?
Also, I have read somewhere that in newer releases of HBase (> 0.94) you can alter tables without having to disable them. All the examples I found online regarding the alter HBase shell command do not mention anything about having to disable the table first.
To test this out, I altered the replication scope value of an 'enabled' table in a sandbox environment and it appears to have accepted the change:
Updating all regions with the new schema...
0/1 regions updated.
1/1 regions updated.
Done.
So now I am confused. My questions are as follow:
Do I actually NEED to disable a table before I alter the replication scope?
If so, how will this affect the everything else that relies on that table being enabled?
Or is there a better way of turning off replication for a particular table while keeping everything else running?