• Jump to Left Menu
  • Jump to Right Menu
  • Jump to Main Content
  • Jump to Footer
  • Accessibility Page
BARC BI Survey 13 banner
IT-Director.com Logo

 

Main navigation - go to a section of this website:

  • ARCHIVE
  • PAPERS
  • EVENTS
  • NEWSWIRE
  • BLOGS

  

Register For Membership | Member Login

 
 
DOMAINS
  • Business Issues
  • Channels
  • Enterprise
  • Services
  • SME
  • Technology
    • Applications
    • Big Data
    • Data Management
    • Infrastructure
    • Mobile
    • Personal Productivity
    • Security
    • Storage
    • Systems Mgmt
FEATURED EVENTS
  • Performance and Risk Control
    21st June
    Webinar (online)
  • Brainstorm San Francisco 2013
    24th June - 27th June
    Burlingame CA, USA
USEFUL LINKS
  • Last 7 Days
  • Archives
  • Top Articles
SHARE THIS PAGE
  • Delicious Icon Delicious
  • Digg Icon Digg
  • reddit Icon reddit
  • Facebook Icon Facebook
  • StumbleUpon Icon StumbleUpon
CONTENT FEED

Technology -> Big Data
RSS Feed:

RSS Icon

What is RSS?

RANDOM QUOTE
Famous Slights - "You're a mouse studying to be a rat." - Wilson Mizner

PAGE TOOLS
ADVERTISEMENT
Analysis

Big Data and In-Memory Database

[No Image] By: Joe Clabby, President, Clabby Analytics
Published: 18th March 2013
Copyright Clabby Analytics © 2013
Tweet

For some unknown reason the topic of memory has come up a lot in my research this week. It started when I was comparing cache designs on IBM's POWER7+ microprocessor to Intel's i7 x86 architecture, then moved into how much main memory each system could support—and then I chose to add an IBM System z mainframe to the discussion.

As I looked at each processor environment, here's what I found:

  • The System z has a tremendous amount of on-chip cache. 969 kB Level 1. 12 MB Level 2. 48 MB Level 3. And 348 MB Level 4.
  • The POWER7+ has 512kB Level 1. 2 MB Level 2. And 80 MB Level 3.
  • The Intel E7 core that I chose because it had a high core count and average i7 speed was an E7 8870. This had 480 kB Level 1. 2 MB Level 2. And 30 MB Level 3. 

Why is this important? Because the closer you can put data in-memory to the processor, the faster that data can be processed.

I then started to look at main memory for each system. And here's what I found.

  • The System z can address up to 3TB of main memory.
  • The POWER7+ can address up to 4 TB of main memory (in a Power 770 configuration).
  • The E7 chip specifications say that it can address up to 512 GB. Last I looked, Hewlett-Packard’s BladeSystem topped out at 576Gb; Cisco’s UCS B230 M2 topped out at 512Gb; and Dell’s blade environment had the ability to address 640Gb (but, as I recollect, at 3 memory DIMMS per channel, using all 640 Gb of memory may result in unbalanced performance). Still, I have run across some vendor non-blade configurations with around 1.5 TB of memory. And I think IBM's MAX 5 architecture can take this up to 1 TB.

Why is this important? Again, because the more data that you can place near the processor, the faster that data can be processed.

I then started to think about IBM's Flex System architecture. This environment can run POWER and/or x86 chips (note: you can process twice as many threads with POWER chips—and POWER has significantly more on-chip cache). This environment has access to plenty main memory. This environment has eight internal, on-the-compute-node solid state drives that can also act as extended memory and that can accelerate the processing of applications that benefit from high IOPS (input/output per second) performance. Applications that perform extremely well within a Flex System environment include various data mining and database applications, multimedia streaming and video-on-demand, a wealth of financial services applications (that rely on results for quick decision making), surveillance and security applications (especially for real time security checks against reference materials), and
video rendering. I then asked myself—are Big Data applications appropriate for this environment?

This meant I had to venture away from memory into storage (storage feeds memory). IBM's Pure Systems/Flex System architecture offers access to large amounts of internal storage (blades typically do not). IBM’s StorWise V7000 storage array can be mounted within a Flex System environment—and can thus speed access to data (no need for multiple hops). Additionally, PureSystems/Flex Systems offer direct access by compute nodes to up to eight SSDs located within each compute node. These SSDs act like extended, fast memory—and are also positioned to provide 'hot data' rapidly to compute elements.

I then started thinking outside the box (literally—about external storage subsystems). IBM's storage offerings are particularly strong in the areas of tiering (placing the data used most often on fast disk for fast accesss), in compression, and in interoperability. But it is the tiering that interest me most because, yet again, it places hot data closer to the processor. And the closer that data is to the processor, the faster it can be processed...

What I think we're going to see soon is systems designed around in-memory database processing. Traditional blade architecture is not positioned to support very large memory (VLM) databases due to memory/footprint constraints. But other architectures such as traditional mainframes, Power Systems, and scale-up x86 designs are indeed well positioned for in-memory database processing. 

Next week I'm starting a research report on systems designs and will discuss this topic in greater depth. But I would welcome any feedback and thoughts from readers of this article in the meantime. Please consider dropping me an e-mail or commenting on this article.

Big data boosts from in-memory databases and analytics—Why does in-memory technology help with big data processing problems? What is the role of data compression? Looking beyond in-memory storage, what about optimized hybrid storage?

Reader Comments

Posted: 18th March 2013 | By Philip Howard :

Yes, but there's always the question of how you use memory or SSDs. Oracle, for example, uses its Flash Cache very differently from the way that IBM does.

Posted: 18th March 2013 | By Pae MunKyu :

Yes, In-Memory Database vendors like Altibase are waiting for systems designed around in-memory database.

Posted: 18th March 2013 | By Joe Clabby :

Phil is right. There are differences in how vendors are using solid state drives. This is why I'm expecting some big news in systems designs this year. My belief is that we're going to see a bunch of new Big Data configurations that feature large amounts of solid state and that use it like memory. I've actually started writing a systems design report that talks about converged systems, expert integrated systems and this new class of solid state systems.

Posted: 18th March 2013 | By Ranjit Nayak :

Tier 0 SSD cards such as the one from LSI / Cisco and EMC for the Cisco blade servers are addressing the data proximity. More details in this video -
http://www.youtube.com/watch?v=BTpJReLB8hA&feature=youtu.be

Posted: 18th March 2013 | By Joe Clabby :

Thanks Pae. This is exactly why I started using this blog. I'm looking for as much field insight as I can get. Please keep the feedback coming.

The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.

Post A Comment?

All fields must be completed to submit a comment. Email addresses are passed through to the author so they can contact you directly if needed.




  • Contact
  • | Site Map
  • | Terms of Use
  • | Privacy Policy
  • | Cookie Policy

Published by: Electronicdawn Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761