Multi-dimensional arrays need to be padded in the fastest-moving dimension, to ensure array sections to be aligned at the desired byte boundaries :

  • Fortran: first array dimension 
  • C/C++: last array dimension  

npadded= ((n +veclen– 1) /veclen) *veclen

  • No alignment requested:veclen= 1
  • 16-byte alignment (SSE):veclen= 4 (sp) or 2 (dp)
  • 32-byte alignment (AVX2):veclen= 8 (sp) or 4 (dp)
  • 64-byte alignment (AVX-512):veclen= 16 (sp) or 8 (dp)

Example:

real, allocatable:: a(:,:), b(:,:), c(:,:)
!dir$ attributes align : 32 :: a,b,c
...
allocate (a(npadded,n))
allocate (b(npadded,n))
allocate (c(npadded,n))
...
do j=1,n
  do k=1,n
    !dir$ vector aligned
    do i=1,npadded
      c(i,j) = c(i,j) &+ a(i,k) * b(k,j)
    end do
  end do
end do

Leave a Reply

Your email address will not be published. Required fields are marked *