The complementary data movement, the WRITE primitive, is called when an arbitrary processor writes q elements from a local array to a remote location. Again, many parallel platforms contain both blocking and non-blocking write function calls. The BDM complexity is again given in Eq. (1).