Discussion about this post

User's avatar
Davis Yoshida's avatar

The LOMO paper is weird... They say that they're doing this new thing about ordering the update operations, but actually I'm pretty sure JAX supports this out of the box and I think DeepSpeed does it as well for Torch. It seems more like their main claim is that you can productively do finetuning with SGD instead of Adam, but they don't provide the experiments needed to make that comparison thoroughly.

Expand full comment
1 more comment...

No posts