I've been reading hdfs balancer questions this weekend and could not answer my question from what I've seen here.
So here it goes: "I've Hadoop grid with replication factor = 4 and 4 DNs, I expect all to have roughly the same blocks and disk usage across the grid, but that's NOT what's happening. I've one DN with 50% usage when the others have around 20% occupation."
I've ran hdfs balancer
this weekend to try to balance disk usage but nothing happened, I always end up with 5 idle tasks doing nothing even though the job detects almost 300GB of imbalanced data on the grid. I've tried tweaking threshold and policy parameters to no avail.
Any tips? Thanks!