Exploiting Huge Databases In Real-Time: The GPU Strategy

Everybody is aware that we use more and more data every day. As it has always been said: knowledge is power. The problem that we face nowadays is the amount of data we can gather. A few years ago it was just the data we would obtain from around us, and in many occasions it was limited to some devices connected to their primary systems (the “old” SCADA systems) while every other data was to be either introduced manually or assumed.


Now this story has changed. SCADA systems have evolutioned into the Intenet-Of-Things, where everything is providing, in many cases, Gigabytes or even Terabytes of information. To “further” the challenge internal data is “crossed” with external data which will be merged into a system for further analysis…and the problem begins. How do you search valid data (as junk data is also inserted or data that is no valid for your specific queries) and get results in seconds or even in a shorter period of time, sometimes even in need of real-time information? The answer has been out there for some time but it has not been until now that people have realized the real value of this new way of thinking for your technology stack: Welcome to the GPU databases!!!

What is a “GPU”?

Many will think “You are kidding me, GPU is what I have in my computer for my graphics” and that is correct…partially. GPU (graphical processing unit) database is a database that uses the GPU to perform database activities. While this seems simple we have to think of the nature of GPU cards.

Until recently everybody was identifying the term GPU with video card. These were mainly used to create graphics as they would accelerate the process to present information in the screen (multiple pixels at a certain rate per second). The bigger the resolution (the amount of pixels or information presented) at a higher rate the more powerful the video card needed to be, thus the separation between what the CPU (Central Processing Unit) of the computer would do from the more specialized card that would handle the graphics.

To put this reasoning in context: A video card has hundreds or thousands or CPUs that perform at a much higher speed that the CPU performing operations of the same sort. For instance, we can think of “rendering”, this is, giving texture to graphics. The video – GPU card can do this at a very high speed multiple times per second. If we were to leave this to the CPU the quality achieved (as it depends of data transferred into graphics and the number per second that this operation is performed) the resolution that we would obtain (what we can see) in our monitors would be extremely low. GPUs handle data that is structured and needs to be analyzed the same way.

As far as performance is concerned, studies show that a single 2Us GPU server can outrun a 42Us rack full of “traditional” servers. Some solutions can work close to 100x speed faster than CPUs, with databases ranging from the terabytes to the petabytes!!!

If the performance is so good many may be asking “why dont you just then implement these sort of GPUs servers”? Simple, because one solution does not fit it all. See below for more clarification:

As we can see we can benefit using GPU structure if the database we are to explore is mainly structured and has been converted into a “graphical” mode database. Let’s see how we can make these changes.

The change to GPU utilization

Firstly, GPUs do not perform well doing text manipulation. They are designed to work on 2D and /or 3D data. Thus, the first thing you have to think is how to change your data in your SQL (*or prefered database) into something that can be interpreted as a graphic.

Something that you need to keep in mind is that GPUs are like muscle while CPUs are like brain. By using GPUs you are adding pure power to your queries, thus accelerating the process over all when those queries have to be made in a massive database.

Which GPU hardware or software to use?

Remember that a GPU solution is not just the software that “links” with your already populated database, you need to have the hardware that will impulse your data gathering. Many solutions will bring the software and hardware solution as one since they will do the “transition” linking your database with their GPU solution adding the server needed, either a physical server or an on-cloud deployment.

There are a number of commercial solutions and among them here you can see some of the more “popular”:

You will have to try to see which one fits you best with their solutions as some are more for SQL, other PostgreSQL, etc. Also, there are some that can work as open – source (“free”) as well as a commercial solution, depending of the features you are looking for. Among those open-source we can find:

  • MapD (mixing open-source with commercial solution, depending on features). Their free version allows for SQL processing on however many GPUs are available from a single server.
  • GPUOpen

If you want to investigate the software side and use the cloud deployment, here you have some solutions:

Mind you, these are just the most well known but the offer is endless. Almost everybody that is offering cloud-server solutions introduce the GPU card option as well.

If you are more into the “physical” technical guy then the offer is almost as interesting. To mention just a few:

Example of GPUs utilization in real life

Wherever there is a need to extract data from a huge database and make sense our of it at very high speed, there is an opportunity to use the GPU technology. Here are a few examples with some names:

  • Bitcoin: There are many “miners” and different companies that are using the GPU advantage to mine into the bitcoins…even though it is not on the rise any more.
  • Oil & Gas companies: With the IoT and the enourmous amount of data being gathered in every petrol and gas field, GPU database are used in order to aligned all the data and foresee the results that can be interpreted in new exploring areas.
  • Medicine and pharmaceutical companies: Research and development is utmost in these sort of companies. The sooner results are found the sooner mediciones can be put in the market or alternatives can bet pursuit, thus saving money and time.
  • E-commerce and logistics: Analysing the market trends exploring data in the internet (i.e. facebook and twitter), introducing and comparing those versus our products and sending them through the right routes avoiding dangerous or difficult to get to due to weather (including the forecast in our studies) can make we reach our customers before the novelty wears of, something that for some commodites happen just in a matter of hours.
  • Card fraud and banking systems: The ability to correlate international data coming from a user that is doing transactions in a place while other possible factor authentication may be providing something totally different (credit card details stolen coming from a third country while the voice recognition or the finger print comes from another place, working alongside geolocalization of the mobile which does not correspond with the “in theory” physical location).
  • Real time mapping: Ability to do real-time data-mining incorporating data into maps for user information.

So, if you are in need for:

  • Large data-mining exploring huge databases.
  • Very high speed results (near or even real-time).

the answer is clear: use a GPU infrastructure. It will give you a competitive advantage upon other companies searching for results in a “more traditional” way via CPUs.


Be the first to comment

Leave a Reply