New Computationally Cost-Effective Implementation of Online Nesting for a Regional Model

Monday, 14 December 2015
Poster Hall (Moscone South)
Ryuji Yoshida1, Tsuyoshi Yamaura1, Sachiho A. Adachi1, Seiya Nishizawa1, Hisashi Yashiro1, Yousuke Sato2 and Hirofumi Tomita1, (1)RIKEN Advanced Institute for Computational Sciences, Kobe, Japan, (2)RIKEN Advanced Institute for Computational Science (AICS), Kobe, Japan
A new cost-effective implementation of online nesting is developed to improve the computational performance, which is important as well as physical performance in a numerical weather prediction and regional climate experiment. For a down-scaling experiment, a nesting system is indispensable component. Online nesting has merits against offline nesting in updating interval of boundary data and un-necessity of intermediate files. However, the computational efficiency of online nesting has not been much evaluated. In the conventional implementation (CVI) of online nesting, the MPI processes are arranged as a single group, and the group manages all of the nested-domains. In the new implementation (NWI), the MPI processes are divided into several groups, and each process group is assigned to each domain. Therefore, there can be almost no idling processes ideally. In addition, the outer domain calculation can be overlapped behind the inner domain calculation. Elapsed time of data transfer from the outer domain to the inner domain also can be hidden behind the inner domain calculation by appropriate assignment of the processes.

We applied the NWI to the SCALE model (Nishizawa et al. 2015), which is a regional weather prediction model developed by RIKEN AICS. We evaluated the computational performance of the NWI in the double-nested experiment by using the K computer. The grid numbers (x,y,z) were set as (120, 108, 40) for the outer domain with 7.5 km horizontal grid space, and (180, 162, 60) for the inner domain with 2.5 km horizontal grid space. For the calculation, 90 processes were used in both the CVI and the NWI. In the NWI, the MPI processes were divided into two groups, and assigned to the outer and the inner domains; 9 and 81 processes for the outer and inner domains, respectively. The computational performance was improved 1.2 times in the NWI compared to the CVI. The benefit of the NWI could become larger when domains are multiple nested.