Implementação Eficiente de PSO em GPUs Usando Memória de Baixa Latência (PSO Efficient Implementation on GPUs Using Low Latency Memory)

Eric H. M. Silva (ehms@ecomp.poli.br)1, Carmelo J. A. Bastos Filho (carmelofilho@ieee.org)1


1Universidade de Pernambuco

This paper appears in: Revista IEEE América Latina

Publication Date: May 2015
Volume: 13,   Issue: 5 
ISSN: 1548-0992


Abstract:
This paper proposes an efficient implementation for the Particle Swarm Optimization (PSO) algorithm using the shared memory available in the Graphic Processing Units (GPU) of CUDA (Compute Unified Device Architecture) platforms. In our proposal each dimension of each particle is mapped as a thread. The threads are executed in parallel within a GPU block. Since the GPU blocks present a maximum number of allowed parallel threads, we propose to use multiple sub-swarms. Each sub-swarm is executed in a GPU block aiming at maximizing data alignments and avoiding instructions bifurcations. We also propose two communication mechanisms and two topologies in order to allow the sub-swarm to exchange information and collaborate by using the GPU global memory. The results for 8 sub-swarms, each one with 32 particles and 32 dimensions, show speedups up to 100 and 5 times when compared to the serial implementation and PSO start-of-art implementation for CUDA, respectively. Our proposal allows one to deploy PSO algorithms in continuous optimization problems, which present many input variables. This type of problem is very common in engineering.

Index Terms:
Particle Swarm Optimization, Swarm intelligence, Graphics Processing Units, CUDA, Parallel Computing, Shared Memory.   


Documents that cite this document
This function is not implemented yet.


[PDF Full-Text (811)]