Pure torch implementaton of pixelfly (for TPUs, CPUs and custom blocks)

I'm not sure if it's appropriate to create issues like this, feel free to close it without warning. Otherwise, I'd request this to stay open for some time in case somebody is interested.

__Why?:__ The two existing backends for pixelfly use either huggingface blocksparse or triton. However, these are not always available, such as when training on TPUs or using custom parameters (e.g. [triton offers only a couple block sizes](https://github.com/openai/triton/blob/2c287544cb6fdd63ad3e6927e467c9d0660489a1/python/triton/ops/blocksparse/matmul.py#L419))

__What?:__ Below you can find a (limited) re-implementation of pixelfly in pure pytorch. Instead of block-sparse kernels, this implementation takes advantage of the fact that __butterfly layout has equal number of nonzero blocks in each row__. We can take advantage of this using a two-stage procedure:
1. compute all blocks using regular (dense) matmul with `[in_features, (block_size * blocks_per_input)]` weights
2. aggregate blocks according to butterfly layout using `F.embedding_bag(..., mode='sum')`

__Here's the implementation:__ https://gist.github.com/justheuristic/9e4fb81381451a4bc8cbfee0a5100eba
It's heavily inspired by the original code and re_uses parts of [blocksparse_linear.py](https://github.com/HazyResearch/pixelfly/blob/master/src/models/modules/layers/blocksparse_linear.py)

It's a single file, requires only pytorch and einops and is compatible with TPUs. The speed-ups are comparable (see example_and_tests), plus it supports custom block sizes, tf32, autocast, etc. You can also easily re-write this in tensorflow using [tfa.EmbeddingBag](https://www.tensorflow.org/addons/api_docs/python/tfa/layers/EmbeddingBag)


Feel free to use for whatever :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pure torch implementaton of pixelfly (for TPUs, CPUs and custom blocks) #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pure torch implementaton of pixelfly (for TPUs, CPUs and custom blocks) #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions