2 Comments

The LOMO paper is weird... They say that they're doing this new thing about ordering the update operations, but actually I'm pretty sure JAX supports this out of the box and I think DeepSpeed does it as well for Torch. It seems more like their main claim is that you can productively do finetuning with SGD instead of Adam, but they don't provide the experiments needed to make that comparison thoroughly.

Expand full comment

Also lol @ 8 3090s being low resources, they're not remotely competing for the same market as LoRA etc.

Expand full comment