Centiq Blog

Centiq Blog

Aug 10
2011

OpenStack IO performance unreliable due to DiskScrubbing

Posted by: Robin Webster in Infrastructure

Tagged in: Virtualisation , Storage , Robin Webster , Cloud

Robin Webster

Following a bit of reading (turns out not enough), including an article on Cloud performance on the thebitsource ,  we concluded that our small scale development app which relies on a single MySQL server would be more than catered for by a RackSpace cloud server. We needed a small (but consistent) IO requirement and modest memory/CPU. 

The first month of service was a complete success and we began to consider migration of live systems to cloud servers; then we where suddenly hit by a dramatic IO performance drop lasting 6 hours making the instance unusable during the online day. 

 

The average wait time for IO increased to 60ms compared to less than 1 during normal service. The RackSpace support team responded quick to our support ticket to let us know it was due to DiskScrubbing.

The OpenStack system initiates a DiskScrubbing procedure each time an OS image is deleted to ensure your data is not still lurking on the disk when you leave. Writing a lots of zeros across an area of the disk kills IO on that disk for other users.

So I guess we were either on our own on a server for the first month, or just with quiet neighbours. But we soon guessed that this problem would occur when there are API's available to quickly create and destroy new instances. And we were not wrong. The problem came back over and over, the worst period was 18 hours of dreadful IO. We asked to be moved to a new host, which was seamless but our new neighbours were just as noisy and our service was often unusable.

Rackspace's only resolution suggestion was to better design our app for the cloud, which would be fine if we were ready to scale bigger than a single cloud server instance, which for this particular app we are not. I had wrongly assumed that storage could be provisioned from outside of the server you are on to remove this IO bottleneck. But at time of writing only the Rackspace cloud files service was available, which they clearly state is not suitable for database environments. Feeling more than a little burnt by the whole experience we swapped for a dedicated host from UK2 which meets our needs for cost and performance. (the rackspace entry point for dedicated servers was quite a significant jump from the cloud offering.) 

I'd like to try the experiment again having re-read the comments in response to the above bitsource article  recommending using amazons elastic block storage (EBS) I think we would have a very different experience. But with billing on an per IO and size of disk used I think we could quickly get up to the cost of a dedicated host. If I manage to convince the developers to risk the pain again I will give it a go. Watch this space!

Hits: 3381
Trackback(0)
Comments (1)Add Comment
0
...
written by Robin, 09:16 August 11, 2011
Since writing this blog i have found a couple more great blogs on the topic of IO on the cloud suggesting that EBS might not be a straight forward answer to our problem.
http://victortrac.com/EC2_Ephe...BS_Volumes
http://blog.rightscale.com/200...explained/

Seems that for a reliable consistent average disk wait time of less than 5ms the only cost effective way to achieve this is through an entry level dedicated server. We currently pay 100GBP per month for a suitable server with RAID1 internal disks to meet this objective. Financially it makes perfect sense, from an environmental perspective it makes no sense at all; we barely even register on the stops for CPU utilisation, so there are a pair of PSU's in that machine that are going to be wasting power. Hopefully in time there will be more guarantees available on minimum IO performance that will make cloud servers an option for this kind of workload.

Write comment

security code
Enter the displayed characters


busy

Request more information

Want us to contact you right now?

Leave your details and we'll call you Immediately during work hours.

Name: *
Company:
Phone: *

Bloggers

Emily MalbonEmily Malbon:
Helpdesk and Support

Rebecca PritchardRebecca Pritchard:
Project Management

Robin WebsterRobin Webster:
UNIX

Steven CalvertSteven Calvert:
Storage

Steve StringerSteve Stringer:
Blade and SAP BWA

Glyn HeathGlyn Heath:
IT Industry

Tags

 Centiq Technical Wiki Site  Centiq is a 2012 IBM Beacon award winner and IBM Premier Business partner specialising in System x, Power and SAP BWA smarter_monitiq_logo v2 preferred partner_2012_solid_blue_vert_png accredit_uk_logo v2