Last week, a GridIron representative published a response to my original post about solving Oracle performance problems. The response includes some valid points but shows that the author is not familiar with the Kaminario K2’s full capabilities.
GridIron: There is no such thing as non-disruptive deployment of new storage arrays. In a production environment, halting a system to perform data migration, validate that migration and then restart the environment can be time and resource intensive. Converting existing scripts and operating procedures to use a new vendor’s snapshot features can be equally complicated and risky. With GridIron’s transparent network-based deployment, no changes are required to business processes or applications and there is no data migration involved – it is truly non-disruptive!
Kaminario Response: Yes, in many cases, customers will plan a downtime window for uploading data to Kaminario. In cases where such downtime is unavoidable, a customer can dynamically build a mirror (ASM or OS). When the mirror is completed, the customer can decide whether to drop the old storage or keep it. GridIron claims they are a truly non-disruptive solution, but I wonder what happens when their boxes fail. Based on a GridIron document, a failed unit can be bypassed through simple zoning changes in the Fibre Channel fabric. Applying Fibre Channel zoning in an active system may affect the entire fabric and it is not a recommended operation. To avoid this, customers will need some mirroring solution with two GridIron boxes (one acting as a mirror) and sophisticated configurations to make their solution truly HA. This seems very expensive, and I am not sure how feasible it is.
GridIron: The difference in size between Oracle block cache and the dataset is so vast that the Oracle block cache cannot effective hold anything but the hottest blocks. SAN based caches can be scaled to be many times the size of the largest memory footprints (impossible to construct using server DRAM) yet be a fraction of the dataset size and the (implicitly larger) physical storage footprint. Using sophisticated caching algorithms based on performance feedback is the key to making caches effective for large datasets. GridIron customers who evaluated server-based, storage-based and network-based caching using flash can testify to the advantages that proper algorithms bring to the picture.
Kaminario Response: In the end, it all boils down to performance. No matter how great the algorithms are, you will not get better performance from SAN-based caches compared with placing an entire dataset on a Kaminario K2. As an example, no caching algorithm will ever predict random behavior (that’s why it is called random). This is especially true with K2’s scale-out ability that supports large capacities. If you invest in Flash you should invest in the solution that gives you best performance and value. I am not familiar with a GridIron customer that evaluated Kaminario K2 and achieved better performance from GridIron. I am open for that challenge.
GridIron: Holding a dataset vs. making that dataset available at high bandwidth are entirely different problems. Holding data on a single SSD drive would only be available at the bandwidth of that single drive and the queue depth of that drive’s controller. Data within a storage array populated with SSD disks is limited by the RAID structure – RAID 5 cannot deliver more bandwidth than 4 disks and cannot deliver higher IOPS than 4 disks (assuming perfectly spread random reads). GridIron architecture can spread data over 100 disks per appliance to deliver highest concurrent bandwidth at levels not available from primary storage arrays.
Kaminario Response: The author did not really address my point that many applications are dynamic and will get very little benefit from caching. It is also clear that the author is not familiar with the K2’s internal architecture as we don’t use RAID 5 or a single SSD Drive. Kaminario uses a unique RAID10 HD (Hybrid Distributed) algorithm and all internal SSD drives are used. This is how we support throughput and IOPS like no other storage.
GridIron: Write throughput of spinning disks is higher than SSDs. If a database is write throughput bound, there is no doubt you would be wasting money on an all-flash array.
Kaminario Response: Kaminario is different than some of the new SSD players who stack drives without the ability to offer truly scale-out performance. To get the same write throughput that Kaminario K2 supports, you need hundreds or thousands of spinning disks.
Additionally, for many Oracle applications, the write latency (and not the throughput) is the bottleneck. This is especially true when lazy write is not used (e.g. writing the redo buffer at commit or performing sort operation on disk etc). With a network cache solution, you will not improve write latency.
The author talked about the fact that ASM silvering will bring down the database server to its knees. This also can be managed. DBAs have control on the rebalance overhead with the rebalance power setting. The author’s calculations for the throughput are wrong.
Organizations evaluating storage solutions should consider:
- What performance do you get for your investment?
- Is the storage really highly available?
- Is the storage ready for enterprise deployment?
Hold vendors accountable. Compare and decide for yourself which solutions best meet your needs. If you are not familiar with Kaminario, you can sign up for a free application performance assessment. We encourage further discussion about performance. Check out my previous post about performance, IOPS and latency.