MySQL in a DataWarehouse environment?

Let’s start with a bunch of links shall we?

  1. Flickr’s MySQL Database
  2. Second Life’s MySQL Database
  3. A small Data Warhouse running on MySQL; A big DataWarehouse on MySQL

Obviously, any open-source database that can (given the right hardware) reliably and quickly handle a 3TB database deserves some serious respect.

Though I am a HUGE Oracle fan, given these statistics, I would expect to be hearing some talk about using MySQL in the Data Warehousing projects that my company works on (atleast as a alternative to the more heavywieght database vendors). But I can’t say I have.

My own guess is that this stems from a few different reasons:

  • Corporate Standards – most companies have IT handbooks that allow no freedom of choice when it comes to database technologies
  • The Linux “bent“ – much of the documentation available on the Web today regarding configuring and tuning a MySQL instance assumes you have a Linux back-end. In a corporate environment, it’s more likely to be either Windows or an odd flavour of UNIX like AIX. Good luck finding info on how to handle a MySQL database in that setup
  • Uncertain future – MySQL doesn’t support transactions or hot backups directly. For both you have to rely on InnoDB, its storage engine and the InnoDB Backup tool. In a commercial environment, InnoDB Backup is critical as it allows for hot backups, which is a must for all production systems. Worryingly for MySQL though, InnoDB is now part of Oracle and no one knows what Oracle will do once the current contract between InnoDB and MySQL expires. Given Oracle’s history with the OLAP tool Express, I don’t have very high hopes.

I guess the last one might not seem as important, but it happened just at a time when MySQL was gaining a lot of momentum and that’s what makes it more damaging.

Also the articles linked above on using MySQL in a Data Warehouse suggest using custom-built ETL tools and that just doesn’t make sense for me. Custom-building ETL tools makes things like building slowly changing dimensions in a Data Warehouse an very difficult task, whereas it becomes much less challenging when you have a recognized ETL tool to take of the grunt work like converting codes to IDs.

While I am no doubt impressed by the scalability of a MySQL database, I’d say it has a while to go before it emerges as a real contender in the Data Warehousing space.

PS: For those of you interested in how Web2.0 companies use databases (as highlighted in links 1 and 2 at the beginning of the article, here’s a counterpoint)