I've followed the Apache "Single Node Setup" instructions which sets the dfs.replication
on the single node.
But then I followed the "Cluster Setup" but it doesn't mention about this property, so I don't know whether this is a property to be set on the Namenode, or also/only on Datanodes ..
I have also read that setting multiple (comma-separated) paths in dfs.datanode.data.dir
on data nodes will replicate data on all paths.
So my question is : on which node(s) will the dfs.replication
have an effect, and if multiple paths for dfs.datanode.data.dir
are set, are these extra independent replications only per Datanode, or are these also tied in some way by the dfs.replication
factor ?
And also, what is the use this extra local replication on Datanodes when the data is already replicated on other nodes ?