efficiency is mainly due to the non-scalable border size andsecondarily to the unbalanced computational load causedby the multi-resolution grids.5.3Strong Scalability 1D vs HSFCThis section investigates the performance enhancementsintroduced by the 2D partitioning based on Hilbert SpaceFilling Curves. Furthermore, it determines the lower boundfor perfect scaling of CUDA kernels processing internalblocks (lines 7 and 12 of Algorithm 1).The ideal setting for capturing such aspects is a BUQgrid with uniform resolution and rectangular arrangementof blocks. The grid is internally handled as a BUQG (i.e.,using the block numbering as mentioned in Section 2.1).This setting allows to validate the BUQG infrastructure,without incurring in the multi resolution related overheads,that may introduce unbalanced kernel executions.The input model for the strong scalability test is a cir-cular dam break discretized by means of a uniform BUQgrid containing approximately 82 millions cells. This modelis designed to guarantee that the HSFC algorithm createsrectangular shaped partitions that minimize the border size.The test is performed ranging from 8 to 64 GPUs. Inorder to have a fair comparison, each GPU should runkernels with nearly 100% efficiency and a minimal grid sizeis guaranteed for each GPU. This requires an initial grid sizethat can not be stored in GPU memory for less than 8 GPUs.Figure 16 shows the improvements introduced by HSFC.As stated in Section 5.2, border kernels of 1Dpartitioning donot scale and tests show that the efficiency drops almostlinearly with the number of GPUs, reaching 73% for 64GPUs. This is not the case for HSFC, which reaches 85%of efficiency with 64 GPUs and a partition size of approx-imately 1.3 million cells. Tests with smaller input gridsshow a loss of efficiency in the inner kernel on the singleGPU. This is the empirical measure of the suggested GPUload for best occupancy. The HSFC algorithm shows visibleimprovements, compared to 1D, starting from 16 partitions.Fig. 16: 2D-HSFC and 1Dpartitioning strong scalabilitytests.5.4Dynamic Load Balancing on a realistic scenarioThis section shows the improvements provided by dynamicpartitioning. We focus on HSFC partitioning, since weshowed that it is the best performer on static cases. Theinput model for this test is a circular dam-break located atthe centre of a domain of size 50m×50mand discretizedby means of a multi-resolution grid containing∼6.5×106cells. The grid is shown in Figure 17 (on the left), where thewet and dry portions of the grid are shown in red and blue,respectively. We introduce a vertical wall that confines thewater on the right side of the domain while the left region isnon-wettable. The wall is covered by high resolution blocksand it is visible as a denser vertical line. Figure 17 (right)shows the end of the simulation, where water completelycovers the right region.