Open Access System for Information Sharing

Login Library

 

Article
Cited 20 time in webofscience Cited 21 time in scopus
Metadata Downloads

A GPU-accelerated semi-implicit fractional-step method for numerical solutions of incompressible Navier-Stokes equations SCIE SCOPUS

Title
A GPU-accelerated semi-implicit fractional-step method for numerical solutions of incompressible Navier-Stokes equations
Authors
Ha, SanghyunPark, JunshinYou, Donghyun
Date Issued
2018-01
Publisher
ACADEMIC PRESS INC ELSEVIER SCIENCE
Abstract
Utility of the computational power of Graphics Processing Units (GPUs) is elaborated for solutions of incompressible Navier-Stokes equations which are integrated using a semi-implicit fractional-step method. The Alternating Direction Implicit (ADI) and the Fourier-transform-based direct solution methods used in the semi-implicit fractional-step method take advantage of multiple tridiagonal matrices whose inversion is known as the major bottleneck for acceleration on a typical multi-core machine. A novel implementation of the semi-implicit fractional-step method designed for GPU acceleration of the incompressible Navier-Stokes equations is presented. Aspects of the programing model of Compute Unified Device Architecture (CUDA), which are critical to the bandwidth-bound nature of the present method are discussed in detail. A data layout for efficient use of CUDA libraries is proposed for acceleration of tridiagonal matrix inversion and fast Fourier transform. OpenMP is employed for concurrent collection of turbulence statistics on a CPU while the Navier-Stokes equations are computed on a GPU. Performance of the present method using CUDA is assessed by comparing the speed of solving three tridiagonal matrices using ADI with the speed of solving one heptadiagonal matrix using a conjugate gradient method. An overall speedup of 20 times is achieved using a Tesla K40 GPU in comparison with a single-core Xeon E5-2660 v3 CPU in simulations of turbulent boundary-layer flow over a flat plate conducted on over 134 million grids. Enhanced performance of 48 times speedup is reached for the same problem using a Tesla P100 GPU. (C) 2017 Elsevier Inc. All rights reserved.
URI
https://oasis.postech.ac.kr/handle/2014.oak/96092
DOI
10.1016/j.jcp.2017.09.055
ISSN
0021-9991
Article Type
Article
Citation
JOURNAL OF COMPUTATIONAL PHYSICS, vol. 352, page. 246 - 264, 2018-01
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher

유동현YOU, DONGHYUN
Dept of Mechanical Enginrg
Read more

Views & Downloads

Browse