Architecting the future

At the end of last year, IBM announced its first set of Intel dual core servers. At the same time, and almost disguised by the dual core news, it announced the third generation of its Enterprise X-Generation architecture - X3. The goal was to deal with I/O latencies, especially memory, that IBM claimed would reduce the benefits of multi-core, multi-socket Intel machines. It was also necessary to reduce the gap between Intel and AMD in the way memory was being handled.

The primary target of X3 was dual core, quad socket servers and at its core was the hurricane chipset. By restructuring how memory was used, X3 reduced memory latency from 256ms in the X2 generation of servers to under 108ms. X3 also addressed a significant problem with address memory slots.

Xeon servers have a performance problem once they move beyond the first four memory slots. I/O delay means that when more than four slots are used, the memory bus speed has to be reduced by up to 50 per cent. IBM targeted this problem with X3 in order to address two issues, speed and cost. Speed by ensuring that ALL memory slots could be addressed at the highest possible speed. Cost by allowing the customer to use 8 x 2GB rather than 4 x 4GB DDR memory chips. The price premium of 4GB over 2GB is around 30 per cent, so a significant saving could be had.

With the launch of its new four-core Intel servers, IBM has decided that it doesn't need to update that architecture just yet. For IBM shareholders that's probably a relief. After all, IBM admitted that X3 had cost over $100m to develop and just 18 months on, it would be hard to justify another update.

Moving to eight cores

By the end of 2007, early 2008 we should see the first eight core processors. They are certainly scheduled to be in engineering by then. Given the speed with which we move from single to dual and dual to quad core, it's not unreasonable to expect to see servers using eight core processors in 2008.

With single core processors, moving from quad processor to eight processor architectures required a whole new design of the system architecture. This was hugely expensive and forced a number of vendors out of the eight-way market completely. With the arrival of multi-core processors capable of using existing architectures the emphasis has been on providing more memory and improving the overall I/O so that all the cores can run at full speed.

With quad core, quad processors machines already announced and expected to ship in early 2007, we are now back at the absolute limit of managing I/O. Tim Dougherty, director for BladeCenter strategy at IBM, admits to serious concerns over core density and architectures. "Core requirements for I/O are going to outstrip current architectures. We are already looking at whether quad core will really be effective beyond two sockets [processors] and may have to reduce the sockets in order to manage the demands for I/O".

Dougherty was prepared to go even further than this by suggesting that eight core may be the most that can realistically be supported using current architectures. He even suggested that eight core could herald a return to single processor and away from the multi-processor motherboard.

The key to making the most of the multi-core servers in the future, according to Dougherty, will be virtualisation. This is not the virtualisation that we generally see today on Intel platforms with a host OS and lots of clients, but true hardware virtualisation bringing across the lessons from mini and mainframe computers. Dynamic resource management will be a big part of this and IBM has announced that it will introduce a new virtualisation solution of its own in the first quarter of 2008. Virtualisation doesn't address the I/O problem.

Defining rival x86 and x64 platforms

The difference in memory architectures between AMD and Intel does show a clear difference in I/O capability. Intel uses a Front Side Bus (FSB) and that requires an external memory controller to connect the FSB to memory and I/O functions. Intel is looking at changing this to remove the memory controller bottleneck. AMD uses its own HyperTransport Bus and the memory controller is built in to the Opteron processor. Although each processor has its own memory it does check the memory on other processors. This creates an interesting problem where running Opteron in combinations of three and six is actually faster than four and eight.

With X3, IBM managed to bring the memory latency performance of Xeon closer to that of the Opteron. X4 may take it closer still but until Intel changes the memory architecture which is a major issue for them and the whole motherboard industry, it will still lag behind AMD.

Dealing with the quad core quad processor and eight core processor I/O problems is not going to be easy. IBM's current solution, the X4 architecture will not appear until the first quarter of 2008 and it's not unreasonable for it to be tied to the announcement of their first eight core server. This would mirror the way that the X3 was tied to their dual core announcement.

Core density was seen as being the solution to increasing the speed of the processor and with it the problem of dissipating the heat the faster speeds generated. Now we have hit another bottleneck, I/O. This should have been expected but seems to have caught people out. Where we go from here and whether IBM can resolve the I/O problem before Intel remains to be seen.