The complementary data movement, the WRITE primitive, is called
when an arbitrary processor writes q elements from
a local array to a remote location. Again, many parallel platforms
contain both blocking and non-blocking write function calls. The BDM
complexity is again given in Eq. (1).