Cray. Despite everything that has happened over the years, from technological advances to organisational wobbles, bankruptcies and buy-outs, the name retains a certain cachet. They make super computers! Their computers come (came) with seats and bubbling coolant systems and everything! To someone growing up with early examples of rudimentary computing in the home, Cray was the stuff of Tomorrow’s World, Bond villains, and more. This was what real computers were all about.
Despite the growing power of ordinary computers, and the opportunities offered by parallelisation and the Cloud, those early memories of Cray superness were sufficient to pique my interest when a recent press release from semantic technology company Cambridge Semantics landed in my inbox. Digging a little further, it rapidly became apparent that Cambridge Semantics were not alone. Amongst eight ‘Solution Partners‘ listed for Cray’s XMT supercomputer, not one of them would look out of place presenting or exhibiting at a semantic technology event. For some reason, semantics appear to have replaced quirky styling, bubbling coolant and the circular seating of yesteryear.
Shoaib Mufti, Cray’s Director of Knowledge Management, is quick to highlight the opportunities created by combining hardware specifically optimised for computing across complex graph data with a semantic technology community that structures so much of its data in the form of graphs. California’s Franz Inc., one of those Cray Solution Partners, would appear to agree. Craig Norvell, Franz VP of Global Sales & Marketing, enthused that “They talked about what we talk about” when I asked about his interest in Cray.
Everything about the hardware is optimised to churn through large quantities of data, very quickly, with vital statistics that soon become silly. A single processor “can sustain 128 simultaneous threads and is connected with up to 8 GB of memory.” The Cray XMT comes with at least 16 of those processors, and can scale to over 8,000 of them in order to handle over 1 million simultaneous threads with 64 TB of shared system memory. Should you want to, you could easily hold the entire Linked Data Cloud in main memory for rapid analysis without the usual performance bottleneck introduced by swapping data on and off disks.
It’s here that Cray’s strengths begin to differentiate them from the rest of the pack. Big Data is an increasingly hot topic, as web giants and traditional enterprises alike rush to unlock the value hidden in their rapidly expanding data stores. Rather than follow the trail blazed by Google, LinkedIn, Facebook and others, Mufti contends that Cray’s approach increasingly makes more sense. Whilst consumer web companies traditionally deploy massive server farms comprising large numbers of relatively weak commodity servers, Cray tackles similar computing problems with one large computer. Instead of dividing a task up into discrete chunks that can be run on each of these small computers, a Cray removes that complexity (and management overhead) by keeping all of the data together (often in RAM) and tackling all of it at once. Well-rehearsed arguments about the cost-effectiveness, fault tolerance and scalability of the approach taken by Google et al continue to hold, but Mufti’s persuasive contention is that the rise of the Real-Time Web makes distributed processing of data increasingly ineffective. When time is money, and when users want updates as soon as something happens, the overhead of receiving data, dividing it up into manageable chunks, sharing those chunks out across a large number of computers, having all of those computers come up with their part of the answer, gathering all those answers up in the right order, combining those answers to compute the answer, and then reporting that to the user? That overhead begins to look too great. And then, clearly, you need a Cray.
The multi-threaded, parallel approach adopted by Cray’s XMT to achieve its levels of performance requires a bespoke operating system. A fairly common variant of Linux sits in front of this, enabling developers to run their existing applications on the Cray without a complete re-write. Cray is currently working with their Solution Partners to finalise an API that will optimise calls to the computer’s behind-the-scenes power. This API will be based upon the World Wide Web Consortium’s SPARQL query specification, allowing just about any Linux application that can generate SPARQL queries to run on the Cray with a 10-100x speed increase over conventional hardware. With further development work, an existing application might be fully ported onto the Cray’s own operating system to achieve even greater levels of performance.
Some of those application domains that are stretching current semantic technologies to their limit are also the domains to which Cray has traditionally sold; oil & gas, pharma, defence, intelligence. The synergies are great, and – so long as the ‘Cray people’ and the ‘semantics people’ inside those organisations can work together – the opportunities are clear. Rich, massive datasets, processed and explored in (or near) real-time rather than shunted off into batch jobs.
Mufti also points to new opportunities for Cray and its Solution Partners, tackling the unpredictable complex graphs of data that underlie intelligent searches across web data, and offloading the complexity of increasingly rich mobile applications from the device to a (Cray) server out in the Cloud. In both cases, structure and semantics are already becoming increasingly prevalent (look at Google’s Rich Snippets, Siri, and more), and in both cases timeliness matters. Mufti talks of huge “unpredictable complex graphs” and “offloading the thinking” [from your cellphone], and sees opportunities for Cray and its partners.
Sean Martin, Founder & CTO of Cambridge Semantics, shares the enthusiasm of both Cray’s Mufti and Franz’s Norvell. He suggests that Cray’s processing of large data volumes in memory is an extreme example of a growing trend that also extends down to the humble desktop. Throughout the computing industry, both RAM and solid state Flash drives (SSDs) are getting cheaper. Storage vendors are increasingly utilising RAM and SSDs to provide rapid access to ‘hot’ time-sensitive data, and even in consumer devices (the MacBook Air is perhaps the most high-profile example) Flash-based SSDs are completely replacing traditional hard drives to accelerate data access and extend battery life. Considered as a whole, Martin suggests, this shift toward increasingly rapid availability of stored data presages new application models. Citing examples from his own company, where they are increasingly replacing disk drives with SSDs and increasing the amount of RAM they install, Martin describes seeing a “huge difference” in application performance. The essentially random manner in which semantic data is stored and retrieved (large numbers of requests for small chunks of data stored anywhere, rather than small numbers of requests for long sequential reads off one area of a disk) exacerbates this effect. Traditional hard drives are limited by the number of spindles able to touch the spinning disk and retrieve data from it. Flash-based memory is far more amenable to this sort of data access. Although an extreme example, Cray’s approach may well begin to influence those developing applications for more humble computing infrastructure.
The image of a Cray X-MP48 supercomputer was taken at the École polytechnique fédérale de Lausanne in Lausanne, Switzerland, by ‘Rama.’ The image is shared on Wikipedia under a Creative Commons Attribution Share Alike license.
The image of a Cray XMT supercomputer was taken from the Cray website, © Cray Inc.