Benchmark Scores


RankModel nameModel typeAvg. SpearmanStd. Error of Diff. to Best Score*Spearman by FunctionSpearman by MSA DepthSpearman by TaxonSpearman by Mutation DepthModel Details
ActivityBindingExpressionOrganismal FitnessStabilityLow depthMedium depthHigh depthHumanOther EukaryoteProkaryoteVirus12345+DescriptionReferences
1ProSST (K=2048)Hybrid - Structure & PLM0.5070.00.4760.4450.530.4310.6530.4730.5110.5780.5160.5730.5490.4540.5210.3940.3170.2770.332ProSST (K=2048)Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Ziyi Zhou, Huiqun Yu, Wanli Ouyang, Liang Hong, Bingxin Zhou, Pan Tan. (2024). ProSST: Protein language modeling with quantizied structure and disentangled attention. bioRxiv.
2ProSST (K=4096)Hybrid - Structure & PLM0.4980.0080.4440.4720.5070.4160.6520.4770.4880.5790.4970.5740.5470.440.5050.4260.3880.3420.408ProSST (K=4096)Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Ziyi Zhou, Huiqun Yu, Wanli Ouyang, Liang Hong, Bingxin Zhou, Pan Tan. (2024). ProSST: Protein language modeling with quantizied structure and disentangled attention. bioRxiv.
3ProSST (K=1024)Hybrid - Structure & PLM0.4850.0060.4330.4360.4990.4140.6420.4660.4730.580.4830.5680.5390.4360.4920.4340.3730.3410.403ProSST (K=1024)Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Ziyi Zhou, Huiqun Yu, Wanli Ouyang, Liang Hong, Bingxin Zhou, Pan Tan. (2024). ProSST: Protein language modeling with quantizied structure and disentangled attention. bioRxiv.
4ProSST (K=512)Hybrid - Structure & PLM0.4710.0060.4230.4310.4790.3940.6290.4460.4580.5660.4720.550.5290.4080.4760.410.3540.3140.383ProSST (K=512)Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Ziyi Zhou, Huiqun Yu, Wanli Ouyang, Liang Hong, Bingxin Zhou, Pan Tan. (2024). ProSST: Protein language modeling with quantizied structure and disentangled attention. bioRxiv.
5PoET (200M)Hybrid - Alignment & PLM0.470.0090.4940.3960.4660.4750.5190.4880.4720.5150.4820.5410.4640.4910.4660.2990.4190.3950.418PoET (200M)Truong, Timothy F. and Tristan Bepler. PoET: A generative model of protein families as sequences-of-sequences. NeurIPS.
6ProSST (K=128)Hybrid - Structure & PLM0.4690.0070.4170.440.4730.3870.6280.4510.4530.5610.4660.5450.5230.4150.4720.4180.3760.3320.394ProSST (K=128)Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Ziyi Zhou, Huiqun Yu, Wanli Ouyang, Liang Hong, Bingxin Zhou, Pan Tan. (2024). ProSST: Protein language modeling with quantizied structure and disentangled attention. bioRxiv.

* Non-parametric bootstrap standard error of the difference between the Spearman performance of a given model and that of the best overall model (ie., TranceptEVE), computed over 10k bootstrap samples from the set of proteins in the ProteinGym substitution benchmark.