IBM Rockets Machine Learning Process by 10 Times

essidsolutions

The hardware needed to train a machine to learn an algorithm typically faces one of two problems – either speed or capacity constraints. If a server has enough memory to fit all of the data needed for the process, machine learning can take anything from a few hours to possibly weeks.

The much faster graphics processing units (GPU) may seem like the obvious candidate for speeding up compute-intensive workloads but their memory has a maximum capacity of 16 gigabytes, and that’s a problem when handling databases with terabytes of data.

One solution has been to cut the data into smaller chunks and then feed these 16GB portions into the GPU one at a time – an expensive way to move data. The time it takes to transfer each batch of data from the central processing unit to the GPU can become so expensive that it may completely outweigh the benefits of using the GPU in the first place.

IBM Research, together with Ecole Polytechnique Federale de Lausanne, has developed a different approach. The scheme determines which smaller part of the data is the most important to the training algorithm at any given moment. For most datasets of interest, the importance of each data point can change during the training process. By processing the data points in the right order, the process of machine learning can become that much faster.

According to IBM Research, using this method can speed up the process 10 times compared with traditional training methods used in combination with limited training memory. This way a 30-gigabyte training dataset can be processed in less than one minute using a single GPU because it uses that one unit to the fullest.

The scheme cherry-picks the items in a big data stream that will make a difference to the training process and ignores data that is irrelevant at that particular moment.

IBM lists the example of training an algorithm to distinguish between the photos of cats and dogs: “Once the algorithm can distinguish that a cat’s ears are typically smaller than a dog’s, it retains this information and skips reviewing this feature, eventually becoming faster and faster.”

IBM has developed a new, re-useable component for training machine learning models on heterogeneous compute platforms called Duality-gap based Heterogeneous Learning (DuHL). This scheme can also be applied to limited memory accelerators other than GPUs such as systems using field-programmable gate arrays.

The data-heavy nature of social media and online marketing make them an obvious target for this approach, which can be used to predict which ads to display to visitors. Additional applications include finding patterns in telecoms data and for fraud detection.

One of the biggest financial savings for users comes from using DuHL as a service in the cloud. In such a cloud environment, resources such as GPUs are typically billed by the hour. This scheme will be particularly valuable for researchers, developers and data scientists who need to train large-scale machine learning models.