Centiq Blog

Centiq Blog

Dec 04
2009

Storage de-duplication and keeping down the costs of storing information

Posted by: Alastair Williams in IT Industry

Tagged in: Storage

Alastair Williams

Despite the rise in hardware based de-duplication technology within the storage market the impact and benefits of these tools I believe is  greatest lower down the storage tiers. Where the application requires Tier 1 storage, many storage managers have discounted de-dup as service design requires data to be stored on multiple and widespread devices and are also nervous of the performance implications where DBAs are already stretched trying to maintain service levels. This was the case when we discussed the growing storage pain of a worldwide investment bank. Moving down to lower priority applications or backup environments with slower storage needs de-duplication becomes more relevant however these costs are typically lower to begin so carefully consider effort against return.

 

Before undertaking a de-duplication review establish which tier is causing the most pain. If it  is the Tier 1 costs then investing in archive technology rather than changing the hardware platform will probably give far greater returns, if it is in test and development tier 2 right-sizing the test data rather and then de-duplicating will give the best returns. Tier 3, File systems and backup systems tend to return the greatest de-dupe benefits.

 

The following hypothetical table for a 10TB Business critical database application suggests a level of 75% for archive data (however we have seen both higher and lower levels achieved), compared to de-duplication.

 

 

Storage Level and Database instance

“existing Environment”

With Archive (75% historic information) and right sized test environments

With de-duplication

 

 

 

 

Tier 1 – Production

10TB

2.5TB

10TB

Tier 1 – Recovery Clone

10TB

2.5TB

10TB (different array)

Tier 1 – HA of Production

10TB

2.5TB

10TB (different array)

Tier 1 – HA of Recovery Clone

10TB

2.5TB

10TB (different array)

 

 

 

 

Tier 2 – Development copy

10TB

500GB

10TB (separate array) maybe illegal

Tier 2 – Development clone

10TB

500GB (or 50GB if deduped)

50 - 500GB with block de-dupe

Tier 2 – Development Clone

10TB

500GB (or 50GB if deduped)

50 - 500GB with block de-dupe

Tier 2 – Test / UAT

10TB

2.5TB

10TB (separate array)

 

 

 

 

Tier 2 – Archive

na

7.5TB

na

Tier 2 – Archive HA

na

7.5TB

na

 

 

 

 

Total required capacity and footprint

40TB tier 1

40TB tier 2                   Total 80TB

10TB tier 1

19TB tier 2                        Total 29TB

40TB tier 1

21TB tier 2                       Total 61TB

 

 

This does not mean that de-duplication does not have its place and having found the fit the next piece to  consider is which de-duplication technology  has a better fit?  Is single instance or block level de-dupe required? In terms of file systems both are desirable however if the file system is distributed for use across multiple branches multiple instances may need to be retained. Centralised file archives are perfectly suited to single instance. A single small change in the file however will create a new object and footprint unless block level is also enabled.

 

A recent healthcheck for an insurance house revealed that not only were there a large number of duplicate instances, equating to nearly 10% of the total capacity, there were also a  significant number of files with the same name but different sizes. This raises the question whilst IT may be able to address the storage issue by introducing de-dupe technology, what is being done to ensure that employees are using the right “version” of a file or that work is not being duplicated or even sensitive data being held inappropriately such as in home directories.

 

It is my view that de-dupe does have value however should not be seen as a cure or alternative to a well designed corporate information management policy.

 
RE-asking the question with “keeping down the cost of storing current information” could open doors to far greater savings.

Hits: 2442
Trackback(0)
Comments (0)Add Comment

Write comment

security code
Enter the displayed characters


busy

Request more information

Want us to contact you right now?

Leave your details and we'll call you Immediately during work hours.

Name: *
Company:
Phone: *

Bloggers

Emily MalbonEmily Malbon:
Helpdesk and Support

Rebecca PritchardRebecca Pritchard:
Project Management

Robin WebsterRobin Webster:
UNIX

Steven CalvertSteven Calvert:
Storage

Steve StringerSteve Stringer:
Blade and SAP BWA

Glyn HeathGlyn Heath:
IT Industry

Tags

 Centiq Technical Wiki Site  Centiq is a 2012 IBM Beacon award winner and IBM Premier Business partner specialising in System x, Power and SAP BWA smarter_monitiq_logo v2 preferred partner_2012_solid_blue_vert_png accredit_uk_logo v2