Multi-dimensional arrays need to be padded in the fastest-moving dimension, to ensure array sections to be aligned at the desired byte boundaries :
- Fortran: first array dimension
- C/C++: last array dimension
npadded= ((n +veclen– 1) /veclen) *veclen
- No alignment requested:veclen= 1
- 16-byte alignment (SSE):veclen= 4 (sp) or 2 (dp)
- 32-byte alignment (AVX2):veclen= 8 (sp) or 4 (dp)
- 64-byte alignment (AVX-512):veclen= 16 (sp) or 8 (dp)
Example:
real, allocatable:: a(:,:), b(:,:), c(:,:)
!dir$ attributes align : 32 :: a,b,c
...
allocate (a(npadded,n))
allocate (b(npadded,n))
allocate (c(npadded,n))
...
do j=1,n
do k=1,n
!dir$ vector aligned
do i=1,npadded
c(i,j) = c(i,j) &+ a(i,k) * b(k,j)
end do
end do
end do