gds_mlx5_post_send in PR #86 can be optimized further. See - https://github.com/gpudirect/libmlx5/commits/fixes - https://github.com/gpudirect/libmlx5/commits/copyblock-4.3