In the case of data with lots of repeated segments there could be an additional bandwidth savings from caching the block checksums and sending a CLONE directive to the far end to make a local copy.
Looking at the code it doesn't appear that this would be too horribly difficult to implement. I might get to it eventually if you're interested.