newsCentiq Approved as Buying Solutions Sub-Contractor

How can Centiq help you today?

Information Architecture – Data Privacy and the DPA

( 1 Vote )
...the maximum fine for data privacy breaches is to rise...

In this series of articles I will be covering many of the challenges which face IT departments with regards to the management of business information. Whilst not focused on any specific industry, and therefore specific uses of information in the day to day running of an organisation, I hope to provide insight into key areas of data management, demystify jargon and provide some simple techniques to gain insight to what data is being stored and why.

Following on from the announcements that the maximum fine for data privacy breaches is to rise from £5,000 to £500,000 this first article covers the eight guiding principles of the Data Protection Act.

History of the Data Protection Act

The Data Protection Act (1998) is the United Kingdom’s legal implementation of the European Union Data Protection Directive (1995) and defines the obligations and control of personal and sensitive personal data by an organisation and the responsibilities of that organisation’s data controller.

In the Act the following is defined as “sensitive personal data”

  1. the racial or ethnic origin of the data subject,
  2. his political opinions,
  3. his religious beliefs or other beliefs of a similar nature,
  4. whether he is a member of a trade union (within the meaning of the [1992 c. 52.] Trade Union and Labour Relations (Consolidation) Act 1992),
  5. his physical or mental health or condition,
  6. his sexual life,
  7. the commission or alleged commission by him of any offence, or
  8. any proceedings for any offence committed or alleged to have been committed by him, the disposal of such proceedings or the sentence of any court in such proceedings.

The control and security of this information is the responsibility of any organisation that

  1. Is established in the United Kingdom and the data are processed in the context of that establishment, or
  2. the data controller is established neither in the United Kingdom nor in any other EEA State but uses equipment in the United Kingdom for processing the data otherwise than for the purposes of transit through the United Kingdom.

(Additional obligations have to be complied with in this case)

Enforcement of the Data Protection Act is from the Office of the Information Commissioner which is responsible for investigating, and bringing cases to tribunal for breaches.

For the latest information on breach enforcements see: http://www.ico.gov.uk/what_we_cover/data_protection/enforcement.aspx

The second main element of the act is to provide individuals with entitlement to understand what personal information an organisation may hold on them, how that information is being processed and the purpose of its retention.

The 8 Guiding Principles and their Impact on Information Management

The fundamentals of the act are based on eight guiding principles:

  1. Personal data shall be processed fairly and lawfully and, in particular, shall not be processed unless
    1. at least one of the conditions in Schedule 2 is met, and
    2. in the case of sensitive personal data, at least one of the conditions in Schedule 3 is also met.
  2. Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes.
  3. Personal data shall be adequate, relevant and not excessive in relation to the purpose or purposes for which they are processed.
  4. Personal data shall be accurate and, where necessary, kept up to date.
  5. Personal data processed for any purpose or purposes shall not be kept for longer than is necessary for that purpose or those purposes.
  6. Personal data shall be processed in accordance with the rights of data subjects under this Act.
  7. Appropriate technical and organisational measures shall be taken against unauthorised or unlawful processing of personal data and against accidental loss or destruction of, or damage to, personal data.
  8. Personal data shall not be transferred to a country or territory outside the European Economic Area unless that country or territory ensures an adequate level of protection for the rights and freedoms of data subjects in relation to the processing of personal data.

These principles raise a number of challenges for IT departments and the support of the business. The following are just some of the examples where organisations may be in breach.

Breaches of the Act

The most common breach currently reported is pertaining to “end point security”. Misplaced memory sticks and stolen laptops are by far the most prevalent failures in an organisation. Firewalls, etc provide necessary protection from external attack but, with the rising number of incidents coming from within the organisation, data officers should review technologies that either lock devices or the information itself.

One area that is particularly susceptible is the test and development and training environments of an organisation. This is for two reasons: the first is that it is very unlikely that the data is being used for the purpose it was provided in breach of principle 2; the second is that security and visibility of data in these environments is generally less well monitored.

Deletion of data in accordance with principle 5 is rare amongst the majority of organisations with many continuing to run legacy systems. Data Controllers need to establish with relevant departments the long term retention needs of this historic data and set in motion processes to delete pertinent records. Controllers should take particular care when reviewing unstructured data such as Word documents and spreadsheets are susceptible to copying and renaming. Modern classification and search tools may provide assistance for those organisations with widespread unstructured data stores. For those companies that are unsure of the current unstructured estate the next article in this series will show how to interrogate the file system metadata to better understand the types of data being stored.

Best Practice Guidelines

The Data Protection Act doesn't guarantee personal privacy at all costs, but aims to strike a balance between the rights of individuals and the sometimes competing interests of those with legitimate reasons for using personal information. It applies to some paper records as well as computer records.

This short checklist will help you comply with the Data Protection Act. Being able to answer 'yes' to every question does not guarantee compliance, and you may need more advice in particular areas, but it should mean that you are heading in the right direction.

  • Do I really need this information about an individual? Do I know what I'm going to use it for?
  • Do the people whose information I hold know that I've got it, and are they likely to understand what it will be used for?
  • If I'm asked to pass on personal information, would the people about whom I hold information expect me to do this?
  • Am I satisfied the information is being held securely, whether it's on paper or on computer? And what about my website? Is it secure?
  • Is access to personal information limited to those with a strict need to know?
  • Am I sure the personal information is accurate and up to date?
  • Do I delete or destroy personal information as soon as I have no more need for it?
  • Have I trained my staff in their duties and responsibilities under the Data Protection Act, and are they putting them into practice?
  • Do I need to notify the Information Commissioner and if so is my notification up to date?

To help determine how well you comply with the data protection principles, the Information Commissioners Office (ICO) provides an audit guide:

Additionally the British Standards Institute (BSI) has released standard BS10012:2009 to provide a framework for organisations to enable effective use of information within the confines of the act by introducing a Personal Information Management System (PIMS)

References

If you have questions regarding an individual organisations implementation of the Act and its legal standing we recommend referring to an industry specific lawyer.

Attribution: Some reference material quoted directly from the Data Protection Act or the Information Commissioner’s Office.
 

Calculating the TCO of Oracle licences for infrastructure choice

( 1 Vote )
Oracle licensing is a significant percentage...

Oracle licensing is a significant percentage of your Total cost of ownership (TCO) and can affect your choice of server platform and overall architecture design.

This article is based on my interpretations of the licence text available on the oracle website, and the opinions of contacts I have made in the Oracle license industry. Whilst I am confident that the information is accurate enough to produce a total cost of ownership (TCO) document to assist with infrastructure choice I recommend verifying your final hardware choice with Oracle for licence costs prior to purchase.

This article will reference list prices taken from the Oracle online store. If you have an existing relationship with an Oracle licence vendor you may receive a substantial discount on these prices. I’m resisting the temptation to comment on the relative performance of each chipset for Oracle loads in this article (and hence price-performance metrics), I intend to cover this in a separate follow up article.

Different Oracle Licence Editions

Key Feature SummaryExpress Edition
(10g only)
Standard Edition OneStandard EditionEnterprise Edition
Named User plusFree£110 per user£213 per user£579 per user per core
Processor licenseFree£3,535 per socket£10,555 per socket£28,947per core
Maximum CPU1 CPU2 Sockets4 SocketsNo Limit
RAM1GBOS MaxOS MaxOS Max
Database Size4GBNo LimitNo LimitNo Limit
Windowsxxxx
Linuxxxxx
Unix (AIX/HP-UX/Solaris)xxx
64 Bit Supportxxx
RAC (Real Application Cluster)xOption
RAC Node oneOption
Integrated clusterwarexxx
Automatic workload managementxx
Enterprise Managerxxx
ASM (Automatic storage management)xxx
Data Guard (DR server transaction replication)Option

For a full list of Oracle features available in each license see http://www.oracle.com/database/product_editions.html

The Oracle Database has four different licence options in an attempt to appeal to all sizes of Market.

  • Express Edition
  • Standard Edition One
  • Standard Edition
  • Enterprise Edition

Express Edition (EE) can only be run on a single core. Can only be run on a multi-core processor through use of a virtualisation technology that Oracle recognises for licensing. It is really just aimed at developers, small software vendors that need a free database to distribute with their application and for educational purposes.

Standard Edition (SE) and Standard Edition One (SEO) licenses were introduced to compete with Microsoft-SQL and offer great value compared to the Enterprise Edition (EE) license. Unit price itself is a fraction of EE and unlike EE is based on a per Socket basis, not per core. So for a quad core Intel chip you will only pay for a single SE licence. The SEO licence is limited to a single node of 2 sockets.

Enterprise Edition (EE) by contrast is priced on a per core basis with a core factor per chip type to compensate for differing core performances. Unlike SE RAC has to be bought as an additional product. There are many additional features available to EE not available to SE but the majority of them are a licensed separately for an additional cost. One popular option available for EE is Data Guard.

So why buy EE when the SE feature set will do? Well unfortunately SE is limited to 4 available sockets per cluster. Example maximum physical server capacities for SE

  • A single server physically capable of containing no more than 4 sockets
  • A pair of servers that are physically capable of containing no more than 2 sockets each running in a RAC active-active cluster
  • Four single socket machines running in a RAC active-active cluster.

In each case a socket could contain multiple cores. (See note on MCM’s below). So if the systems were based on the popular 4 core x86 processors each example will have a maximum core count of 16. POWER6 and the ITANIUM chips are dual core sockets and so a maximum of 8 cores could be licensed in this example. The POWER7 chip is rumoured to be available in 4, 6 and 8 cores, so watch out for a change in the SE license model following the release of these machines.

The following table shows you the impact of pricing for the three types of chip mentioned above, x86, POWER6 and Itanium for the maximum configuration allowed under the SE licence.

Standard Edition price per socket = £10,665No. of SocketsExtended costCores per socket# of cores£/core
Itanium dual cores4£42,66028£5,333
POWER6 dual cores4£42,66028£5,333
x86 quad core4£42,660416£2,666

The next table shows you the comparative pricing for EE licence on the same size systems. Note that it is priced on a per core basis and uses the core factor supplied by Oracle to weight the pricing according to chipset.

Enterprise Edition price per core = £28,947No. CoresCore factorUnits to licenceTotal cost
Itanium dual core80.54£115,788
POWER6 dual core81.08£231,576
x86 quad core160.58£231,576

So it is clearly a big step between the two licensing models. If it will fit under SE, then you will make some significant savings.

If the architecture contains Multi-Chip Modules (MCM) you need to count each chip on the MCM as a socket. There has been some debate on the internet over what constitutes an MCM in the eyes of Oracle. When the rule was introduced it was referring to the IBM POWER4/5 MCM's but since then some INTEL chips have been described as being in an MCM. Oracle licence on MCM - debate on google archive. In my investigation I spoke with a distributor who reassured me that this rule does not apply to the INTEL chipset, but please check with your Oracle representative before making your purchase.

If you choose to licence Oracle per user, the infrastructure choice will not make a difference to the Oracle license cost. So if you are building a TCO model for infrastructure comparison you will be able to exclude the Oracle cost. You may however need to do a separate TCO comparison to ensure that licensing by user and not CPU is the most cost effective route. Remember you need to multiply the number of users and cores by the Enterprise “named user plus” price. (see the Oracle licence guidelines)

Coming soon:

Oracle infrastructure designs
Choosing the right OS and hardware for Oracle

Reference documents

 

Questioning Disk Solutions

( 1 Vote )
Normally when someone specifies a storage requirement, they tend to...

Normally when someone specifies a storage requirement, they tend to specify only the desired capacity, and perhaps a preferred method of host attachment. Sometimes that requirement can be as basic as "I'd like 2TB of storage, please". But how effective is this at fulfilling actual real world requirements? Taking the above quote as an example, we can look at two extreme solutions:

  • Two 1TB SATA Disks in an external SAS enclosure.
  • Fourteen 146GB 15K Spindles in an external SAN attached storage disk subsystem with 8GB of cache.

One of these solutions costs around £200 from a high street store, the other can cost over a hundred times more than that from a dedicated storage reseller. So which is the correct solution? Chances are that neither solution is correct, but when performance issues with the host system are encountered it is usually the big black unknown void of disk storage where the finger points first. So how do you go about sizing a storage solution that meets people's expectations in both capacity and performance? What questions need to be asked? In this article I'll be covering some of the high-level basics of storage specification and highlight the questions that should be asked when looking for storage. To some of you I'm sure that this will be fairly common knowledge, but to others who are new to storage or fancy a more concise introduction to storage, this may prove useful.

Question 1: How many disks do I require?

When it comes to creating a storage solution to meet or exceed specific performance levels it's the number of disks (or “spindles”), and not necessarily their capacity, that is the most important factor to consider. It is worth remembering that each disk is a physical unit, a spinning platter with a read/write head that moves above its surface. Therefore in the computing world where even the speed of light can be considered slow and problematic at times, the effort required to actually move a physical piece of metal around at sub-sonic speeds is practically antiquated. Therefore in an ideal world the maximum amount of performance possible from a single disk, by discounting the movement of the disk head and using an unrealistically optimized sequential read or write, the maximum data transfer rate is around 80-100MB/s for a 15,000rpm SCSI disk. Each bit of information passed to or from that disk spindle is an operation, and the number of these IOPs (Input/Output operations per second) is also limited. Putting the maths around track lengths and rotational speeds aside, in an ideal world you can manage to squeeze around 200 IOPs from a single spindle. However if we start to introduce aggressive seek times into that so the head has to actually move back and forth across the platter, which is normal behaviour during more random based disk operations in the real world, and these figures can half immediately.

Disk devices are surprisingly slow; it's only with a more layered structure, command queuing, and heavy cache use that we can make a dent in these performance figures. While attempts can be made from an OS level to try and further tune the layout of data across these disks, unless the configuration was very bad to begin with, then only a gain of around 5-10% in performance can be expected. The easiest and most effective method to improve performance in a disk subsystem is to increase the number of spindles the required data is spread across.

Take our previously mentioned example. The two SATA disks with their slower spin speeds (5,400 or 7,200rpm) and more basic data transfer protocols will manage approximately a hundred or so IOPS and maybe 40MB/s. However the fourteen SCSI based disks with their faster spindles (15,000rpm) have the capacity for a total of 2,800 IOPS and a throughput of 1400MB/s. This example may be a little extreme; however this shows that despite the same requirement there is a dramatic performance difference. So when choosing the capacity of the disks used, what you're essentially defining is a compromise between cost and the number of spindles. While a couple of 450GB disks may be cheaper in terms of “megabyte per dollar” than a 146GB disk, more 146GB disks will give you more “spindles per dollar” because 146GB disks are cheaper. Also by buying the smaller disks, you force administrators to spread data across more spindles, rather than placing all the data on as few spindles as possible, thus ensuring some form of guaranteed performance.

Question 2: How much capacity do I actually require?

There are less physics but more maths involved in answering this question. We'll assume for now that we know exactly how much file space is required for a system, ignoring concepts such as HSM, thin provisioning, snapshots, over-estimation, and so forth.

Firstly you need to ask yourself, binary and decimal? Most people are used to dealing with binary gigabytes; these are the ones where 1024KB equals 1MB and 1024MB equals 1GB. However most storage manufacturers use decimal gigabytes, where 1000KB equal 1MB and 1000MB equals 1GB and so forth, because quite frankly that way it sounds like their disks are capable of storing more data than they actually can. 24MB may not sound like much between friends, however when you start to play in the terabyte realms this can have a major effect on the actual space available, around 9.9% of the actual capacity available! The IEEE have introduced the use of "GiB" (Gibibytes) and "TiB" (Tebibytes) to indicate the binary rather the decimal quantities, however this is still not common practice and therefore cannot be relied upon.

Another common mistake is not considering the difference between raw space versus usable space. There are two areas where this comes into play. The one most people are familiar with is the concept of RAID arrays. Be it striping with parity (RAID5 or 6) or mirroring the data (RAID1 or 10), you're going to end up with a lot less usable space than raw disk capacity by the end of it. Remember, there are also significant read and write penalties for RAID-5, so anything with performance in mind should be considered for RAID-1 protection. So there's half your disks gone there, and your 2TB solution is actually only 1TB in usable capacity. The second, more hidden loss of usable capacity however is at disk level. With nearly all storage subsystems the disks will have some kind of configuration storage area, a portion of the disk that is used to store information on the format of the disk, the controller configuration, array and logical drive information and so forth. For example the IBM DS3000-5000 hardware this area is called the "DACSTOR" area. NetApp and N Series devices also have similar areas with a whopping 10%-20% overhead on the actual capacities available to the end hosts, due to the volume management using the WAFL system they utilise. While this volume information is very useful for the migration of disks from one storage subsystem to another, it does sap away more of your usable space. So now it looks like your 146GiB disk is only capable of around 138GiB or less once it's formatted and read to use.

Then you've got the matter of hot spares. These are empty disks just left to spin in the subsystem enclosures ready to take over from any drive that is either failing, or about to fail. While you don't need many to protect an entire storage subsystem, it can have an impact if these disks are not included in smaller solutions where a specific amount of disks have been ordered. So remember to take these and the overheads mentioned above into account when specifying the number of spindles. For the most part a couple of disks will suffice, however make sure that the disks are of suitable size to protect all the arrays. For the DS4000 range for example, larger capacity hot spares can temporarily replace a failed smaller capacity volume, however the reverse is not true.

Question 3: What attachment do I need?

Once the storage subsystem is configured it needs to attach it to the host. But do you really need the latest and greatest fibre speeds at the host level? To add to the general confusion, since it's communications we're now talking gigabits and not gigabytes and these are always in decimal, so 8Gbps link is actually around 763MiB/s throughput. Due to protocol overhead, transmission times and so forth, the actual data throughput will be around 10% less than this. As we've seen in one of the previous examples, we need at least fourteen spindles in a perfect and highly unlikely continuously sequential read or write to get up to 1.4GiB/s. So with two connections (for redundancy) we can just cope with this unlikely scenario with two 8Gb fibre connections. However, given real world throughputs are much less, in our example 4Gb fibre would probably be more than adequate for most hosts with the given number of spindles.

iSCSI is still trying to make headway, and provides a cheaper solution to fibre based storage networks. Using 1Gb Ethernet we can expect a throughput of around 95MiBps with a direct connection, probably more around 80MiBps in a real environment. While it's not stunning it's usually enough to do the job for small, non-critical systems. The thing to always bear in mind with iSCSI though is that dedicated networks for iSCSI traffic should be used. Start sending other data across the same network and it will generate a lot of unpredictable performance bottlenecks. There are other network based protocols as well such as CIFS and NFS which can also provide network connectivity to storage areas. However these are file rather than block based protocols, and they are intended for file sharing rather than storage sharing applications, so are less effective when you simply want raw performance.

New to the scene is FCoE. (Fibre Channel over Ethernet) This is essentially the fibre channel protocol run over Ethernet based hardware and cabling, similar in fashion to iSCSI but with more dedicated hardware infrastructure. However FCoE better integrates with existing FC infrastructure as it carries all the same management and control protocols as SAN infrastructure. This is mainly intended for data centres where there is a desire to consolidate the cable types used, and ease the complexity of managing many hardware systems. At the moment it is unlikely to replace optical fibre outright, since it is currently easier to use higher speeds over greater distances using optical rather than copper cabling.

Finally there's SAS, or Serial Attached SCSI. It uses the same wire protocol as SATA, hence the fact SATA devices can co-exist on SAS disk backplanes. (The reverse however is not true, and drive connectors are keyed to prevent this.) Current bus speeds are 3Gbit and 6Gbit, so while slower than current fibre technology it still provides a more than ample connection speed to a single host.

With these speeds in mind, what defines the connectivity used is the scale of the solution. Want only two or three hosts sharing a storage device, then go down the route of SAS. Looking to start a storage network of five or six systems but no really plans to scale out further, iSCSI provides a cheap entry point. However if you want a multi-host, multi-subsystem storage network, then fibre channel networks are still the way to go

Question 4: How much Cache do I need?

Most storage subsystems contain a level of cache memory within them. The term “cache” (created by Lyle R. Johnson of IBM back in 1967 if Wikipedia is to be believed) refers to an area of memory within a device where the cost in terms of time of fetching desired data, is less than fetching the data from its ultimate location. In our case, this is usually some form of memory DIMM rather than the disk spindle itself. Note there is "cache" and "non-volatile cache", and in some storage subsystems there is a difference between the two. "Cache" is usually used only to store data that has already been committed to disk, and is used to fulfil subsequent read requests for the same data. "Non-Volatile Cache" however can also store incoming writes, since in the event of a failure the data can remain within this memory area without being committed to disk. How this is achieved varies on the subsystem, some simply have internal batteries to retain data within the memory chip itself, some however de-stage that data to a dedicated disk or USB device before the device is shut down completely. It's this non-volatile state that ensures the expensive nature of non-volatile cache and the small values of cache size compared to host memory. However when it comes to cache, it's usually this non-volatile cache that has the greatest performance benefit. This is because when writing data, it can be written to the cache and then an acknowledgement can be sent back to the host immediately, and the data is then committed to disk at a time more suitable for the subsystem controller. This helps to mask the significant write delays on RAID5 and RAID6 arrays caused by parity calculations. Likewise any data committed via this cache can hang around in case it's needed again. But because so much data can pass through in a short space of time, data cannot hang around in cache too long before it is replaced.

So the bigger the size of the cache, the better it is? Not always. While too little cache can be a detriment, having too much will not impact performance but it will impact the cost of the storage subsystem. As to when a cache is too big this is normally down to applications that are considered "cache-unfriendly", in that they never access the same data twice, or they are very write intensive. Cache is useful for burst writes, coping with peak loads; however when writes are sustained the cache will eventually fill up with uncommitted writes, and then data can only be committed as fast as it can be written to the underlying disk. At that point cache provides very little benefit to the host. To tell what cache size will be of benefit, and how much is an optimal trade-off between cost and capacity, the application and the way it accesses the disk needs to be well understood. Again, other than a finger-in-the-air approach the only way to determine an accurate cache size is though either application testing or using existing application workload data to simulate response times based around different cache sizes. Failing that, simply go for what you can afford!

Summary

So when ordering a storage subsystem, ask yourself the following questions...

How many spindles do I need?
What are my RAID requirements?
What capacity disk do I need to select?
How much cache do I require?

And finally...

How much capacity do I need?

 

SAP Sizing

( 1 Vote )
It is common knowledge in IT that hardware gets smaller...

More Than Just a Number

Introduction

It is common knowledge in IT that hardware gets smaller. The same cannot be said for workloads. The need to record, retrieve and analyse information is growing at a tremendous rate and that is reflected in the growth of IT systems that perform these workloads. SAP is no exception. Estimating growth as well as understanding your current workload are some of the important elements to sizing a new SAP system. Correct sizing is the happy place between systems that constantly seem to be on the brink of grinding to a halt and systems that perform like a dream but only run at 15% utilisation. When systems are undersized and struggle to perform the end user experience is poor, and this can lead to user wide negativity towards the solution. When systems are oversized, the end users may be happy but you have needlessly overspent on hardware and in most cases software too. So firstly, you must ensure that the sizing number you get is the right one. These articles do not attempt to help you get to that number. This article is the first in a series that aim to assist you once your new system has been accurately sized. These articles are concerned with SAP's benchmarking known as SAPS. The articles will explain how the SAPS benchmark is calculated and how to interpret the benchmarks when comparing hardware and virtualisation options.

For guidance on SAP sizing you should refer to your SAP consultant or try these links.

Vendor specific links.

Once you have gathered all the required information and processed it through the SAP Quicksizer, you will have a number of results. These results are normally:

SAPS (CPU), Memory, DB Memory, App Memory and Disk DB (MB).

This series of articles is wholly aimed towards the figure quoted as SAPS.

In the beginning

SAP formalised the benchmarking of systems with the SAPS benchmark in 1997. According to SAP's benchmark the first system to be awarded a SAPS benchmark score was a Compaq Armada 7730 MT Laptop featuring a 166Mhz Pentium MMX processor with 144MB Memory. The system was running Microsoft NT4 and Oracle 3.1G. The following graphs have been compiled using benchmark data available on 21/1/2010 and shows how the systems being submitted for benchmarks changed from 1998 to 2009.

SAP Sizing: Average number of cores

In the late 1990s multicore processors did not exist, therefore the graph represents the average number of physical processors for pre-multicore systems. The graph shows a rather slow increase in average number of cores being used in systems until the end of 2007. 2008 and 2009 see large rises in the average number of cores in benchmarked systems with an average of around 35 cores in 2009.

SAP Sizing: Average processor clock speed

The average clock speeds of systems tested by SAP show a steady increase from 1997 to 2005. Since 2005 the clock speeds seem to have plateaued. However, the previous graph shows that the number of cores has increased so therefore we would imagine that overall system performance to also increase.

SAP Sizing: Average main memory

Average main memory size has seen huge increase since 1997. The 1997 the average amount of memory in benchmarked systems was 1.7GB. In 2009 the average was 174GB, over 100 times more than in 1997. In addition 2008 and 2009 saw the largest rises in memory use which seems to be in line with the rise of the numbers of cores.

SAP Sizing: Average benchmark result (SAPS)

This final graph shows that the average benchmark result for systems submitted to SAP. Like number of cores and amount of memory, the average benchmark result has seen steady growth from 1997 to 2007 and large rises in 2008 and 2009.

The following conclusions can be drawn from this data. As expected the average benchmark result of systems submitted to SAP has increased over time. Multicore technology and the large increases in memory seem to have increased performance more than the improvements in CPU clock speed.

Although in the majority of cases SAP workloads are growing, it is not likely that the average SAP workload has seen the same explosive growth demonstrated in the average benchmarked systems in 2008 and 2009. So is there a need for such large systems? Some SAP customers may require a single SAP instance with 256 cores and terabytes of memory but it is more likely that customers running these large systems are using them for virtualisation (much more on this in later articles). Virtualisation allows multiple SAP applications and instances to run on the same hardware without the need for tens or in some cases hundreds of individual servers.

The SAP software model allows 'Scale up': multiple instances virtualised on large systems and 'Scale out': spreading the load of single systems across multiple smaller systems. Both approaches have their positives and negatives and will be covered in later articles. When the 'Scale up' vs 'Scale out' argument is added to the hundreds of different SAP certified benchmarks it becomes apparent that choosing hardware for a new solution is not just a case of looking through the benchmark figures and choosing a close match from the hardware vendor you usually buy from.

The next article in this series explains how the SAPS benchmarking is carried out and what information is available from SAP about each benchmark. In addition there will be information regarding the OS's and databases that each of the major hardware vendors chose when submitting their systems for certification.

 

Read Jim Chadbourne

Jim Chadbourne
Symantec change their approach to backups

Read Rebecca Pritchard

Rebecca Pritchard
Project Management tool for free

Read Alastair Williams

Alastair Williams
Data ownership - Does Data Insight fix or simply mask th...

Read Steve Stringer

Steve Stringer
More slots, less sockets.

What Our Customers Say

Mackays Stores Transforms Reporting after IT Upgrade

Donald Beaton, IT Manager, Mackays Stores

We now have a more robust and flexible infrastructure for company-wide reporting and financial applications. The IBM Power 6 environment is designed to meet our current and future needs.

Centiq understood our business needs and the type of processes to deliver them more effectively.

The company’s expertise in the necessary infrastructure upgrade and the execution of impact-free data migrations needed in pursuit of improved reporting has successfully underpinned the upgrade of our capabilities.

tecniq site follow the Centiq twitter Centiq on LinkedIn Centiq fanpage on Facebook