generation capabilities and almost sota representations at 229x fewer flops [paper]
✍️
why are we modelling proteins the same way we do text? is it the letters we use for amino acids?
proteinLM inverse scaling is not real in well-conditioned settings.
i've been told that this distinctive view at representations and generation capabilities is hinton-pilled and rbmcore. i have yet to read on it.