I looking for a suitable MySQL replication technology.
I'll try to explain our use case first.
We have several applications:
- 1 public web application
- multiple customized enterprise web applications
The public app has its own DB and each enterprise app also has a separate DB.
The applications are very similar (at least on the DB side). Enterprise applications usually have a few customization depending on the customer needs, but for the sake of simplicity, lets say the DB schemas for all applications are identical.
A lot of the DB data should be same in all applications (lookup tables, countries, currencies, builtin items, our "knowledge base" entries...). Pretty much everything that doesn't directly depend on the user account should be the same in all applications.
This is pretty hard to maintain... for example: when we add a new builtin item, we have to add it to all DBs.
So, we would like to partition our data in multiple databases:
- 1 shared DB (which contains all shared tables and data)
- 1 DB for each app (which contain only the app specific tables)
On the application side, we are working with NHibernate. It can handle multiple databases seamlessly, as long as they are on the same host.
We have a couple problems with this approach:
- we are expecting a lot more enterprise applications in the future. That will be a lot of data. Keeping all the databases on the same host is just not a viable long term option
- there is a single point of failure. If the DB host goes down all our apps go down
Now, I am looking into clustering solutions to take care of these problems, but don't really know what technology I should go with.
So, to summarize, these are our clustering requirements:
- distribute multiple DBs in the cluster
- make the applications see the cluster as a single host, so we can join between the tables physically located on different hosts
- have a load balancer
- have a failover mechanism
- decent performance when switching from a standard InnoDB single host DB (without the need for altering a lot of queries)
Currently we are using a single host DB for each application (without the shared DB) and optionally another host for master-slave replication, with a failover balancer.
I have been looking at different solutions, but have some questions about all of them:
MySQL Cluster (NDB) looks very interesting. The biggest problem I see is storing everything in memory. Since there are multiple databases (and more will be added continuously) we can't hold everything in RAM. Is there a way to configure NDB to store just a single scope of the DB on specific data nodes, so it doesn't need to allocate RAM for the entire DB?
Galera Cluster also looks interesting. It seems to require less modification when switching from a single host to a cluster than NDB. DBs are stored in disk, not memory, which is more scalable for us. But it has a similar problem, only with disk, not memory. If I understood correctly, it replicates the whole DB on every node, and that is a lot of data. Disk space is a much less of a problem than RAM. But I am worried if there will be performance problems with handling such a huge database. Is there any way to distribute partial data in Galera?
All other suggestions and technologies are welcomed...
Am I barking up the wrong tree, could the same be done with something like HiveDB or some other project?
Or should I change the whole concept about the shared DB. I know we can get the same effect by leaving a single separate DB for each application and tweaking the replication config, so that all "shared" data is replicated from our public app on all the enterprise apps. But still, that takes some maintenance and can lead to errors. I think a shared DB is a much more elegant solution, at least from an architectural level.