A new kind of database query processing technology is emerging: one based on graphics processing units (GPUs). Unlike conventional technology that limits its processing to CPUs, these new products can crunch large sets of numbers in parallel in a fraction of the time that the same queries require on regular systems. This is because GPUs, which were originally developed to perform the calculations necessary to drive real-time graphics, can concentrate terrific processing power on many problems simultaneously. GPU databases promise to be a key to handling very complex queries, where answers are needed to drive ongoing processing in an increasingly dynamic, stream-driven, smart processing environment.
Since the first database was created in the 1960s, database technology has evolved to provide better support for complex queries. They pose tough challenges, because in addition to requiring the collection of data from all over the database, many calculations must be made, preferably in parallel. Decades ago, we accepted that such queries could run a long time; we would submit them in batch and hope they didn't time out during the 6 hour overnight batch window. Then came symmetrical multiprocessing (SMP) or, for greater scalability, massively parallel processing (MPP) systems on clusters. Using these architectures, our biggest queries could run in just a few hours. Amazing. But when we are examining complex scenarios, even that is just not fast enough. Today, most leading relational database management systems feature in-memory columnar compressed data organization and the use of vector processing, including single-instruction, multiple data (SIMD) functionality, and hours have turned into minutes.
But for the application of complex analytics to decisions "in the moment", even a minute or two can be too long. GPUs have become common in computers that support gaming, live video streaming, etc. These chips can also be used, if managed correctly, to carry out large numbers of concurrent complex calculations for analytic purposes. Their parallel calculation capability, combined with vastly more cores than are found on a typical CPU, help account for their power. A number of emerging products has appeared that use this capability to deliver ultra-fast database query performance where such calculation power is required.
Vendors and Products
Although this space is still emerging, there are a few representative vendors (and products) worth examining.
MapD is a technology that uses the GPU to accelerate very complex queries. They don't yet support the full SQL 92 language, having concentrated on query support. Nonetheless, in this area, they offer some interesting capability. According to their literature, they offer a "software platform that leverages GPU computing to enable real-time interactive analysis of multi-billion row datasets 100x-1000x faster than existing solutions." This product can be deployed on a single server, with up to 8 GPUs and a quarter terabyte of GP RAM together with 2 terabytes of conventional RAM. No surprise, one of their backers is the GPU manufacturer, Nvidia.
GPUdb is similar to MapD, in that they also offer a RDBMS for analytic purposes powered by GPU technology. They claim this differentiator: according to their website, they have "the only GPU accelerated OLAP data engine with built in visualization." The company has also developed solutions for various scenarios in agriculture, communications, energy, finance, government, media and advertising, network security, and retail analytics.
Unlike MapD and GPUdb, Blazegraph is a graph DBMS. A graph database stores data in the form of graphs, which are relationship structures built using nodes, edges, and properties. A graph database can accept any data structure without a schema, and can be used to discover a data structure from its relationships. Apart from the challenge of complex traversal of such relationships, a graph DBMS may be called upon to collect large amounts of diverse data and perform rapid calculations. For this, Blazegraph offers Blazegraph GPU, which leverages the power of GPU processing to greatly accelerate such calculations. According to the website, this approach provides "a way to exploit the main-memory bandwidth advantages of GPUs to provide extreme scaling that is 100s of times faster and 40X more affordable, 10,000X faster than disk-based, and 100s of times faster than CPU main memory-based approaches."
Not all databases require the extreme high performance calculation power of GPU technology. As the industry moves increasingly toward stream-driven analytics, however this capability could be key. Also, as newer approaches to application development seek to leverage analytics in the context of transaction processing (as evidenced in the emerging analytic-transaction data platform), the speed of the analytics will become critical. In these ways and others, GPU technology is likely to play a major role in enabling powerful high-speed analytics that can keep up with the pace of business.