Benchmark Scores


RankModel nameInput ModalitiesAvg. SpearmanStd. Error of Diff. to Best Score*Spearman by FunctionSpearman by MSA DepthSpearman by TaxonSpearman by Mutation DepthModel Details
ActivityBindingExpressionOrganismal FitnessStabilityLow depthMedium depthHigh depthHumanOther EukaryoteProkaryoteVirus12345+DescriptionReferences
1AIDO Protein-RAG (16B)Structure & MSA0.5180.00.5170.4260.5220.4910.6350.4980.5340.5850.5310.5870.5580.5220.5270.4140.4190.3940.414AIDO Protein-RAG (16B)Sun, N., Zou, S., Tao, T., Mahbub, S., Li, D., Zhuang, Y., Wang, H., Cheng, X., Song, L., & Xing, E.P. (2024). Mixture of Experts Enable Efficient and Effective Protein Understanding and Design. bioRxiv.
2VenusREMStructure & MSA0.5180.0050.4950.4540.5330.4590.650.4950.5240.5770.5290.5820.5490.4920.5340.3970.3550.3220.368VenusREMYang Tan, Ruilin Wang, Banghao Wu, Liang Hong, Bingxin Zhou. (2024). Retrieval-Enhanced Mutation Mastery: Augmenting Zero-Shot Prediction of Protein Language Model. ArXiv, abs/2410.21127.
3ProSST (K=2048)Single sequence & Structure0.5070.0060.4760.4450.530.4310.6530.4650.5070.580.5160.5730.5490.4540.5210.3940.3170.2770.332ProSST (K=2048)Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Ziyi Zhou, Huiqun Yu, Wanli Ouyang, Liang Hong, Bingxin Zhou, Pan Tan. (2024). ProSST: Protein language modeling with quantizied structure and disentangled attention. bioRxiv.
4ProSST (K=4096)Single sequence & Structure0.4980.0090.4440.4720.5070.4160.6520.4720.4810.5830.4970.5740.5470.440.5050.4260.3880.3420.408ProSST (K=4096)Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Ziyi Zhou, Huiqun Yu, Wanli Ouyang, Liang Hong, Bingxin Zhou, Pan Tan. (2024). ProSST: Protein language modeling with quantizied structure and disentangled attention. bioRxiv.
5S3F-MSAStructure & MSA0.4960.0070.5020.440.4790.4770.5810.4690.5090.5470.5020.5580.5210.5020.4990.3330.3780.3460.383S3F with MSA retrievalZuobai Zhang, Pascal Notin, Yining Huang, Aurelie C. Lozano, Vijil Chenthamarakshan, Debora Marks, Payel Das, Jian Tang. (2024). Multi-Scale Representation Learning for Protein Fitness Prediction. NeurIPS.
6S2F-MSAStructure & MSA0.4880.0070.4980.4320.4720.4720.5670.4630.5020.5360.4950.5460.5130.4930.4910.3030.3460.3180.362S2F with MSA retrievalZuobai Zhang, Pascal Notin, Yining Huang, Aurelie C. Lozano, Vijil Chenthamarakshan, Debora Marks, Payel Das, Jian Tang. (2024). Multi-Scale Representation Learning for Protein Fitness Prediction. NeurIPS.
7ProSST (K=1024)Single sequence & Structure0.4850.0090.4330.4360.4990.4140.6420.4570.4660.5850.4830.5680.5390.4360.4920.4340.3730.3410.403ProSST (K=1024)Mingchen Li, Yang Tan, Xinzhu Ma, Bozitao Zhong, Ziyi Zhou, Huiqun Yu, Wanli Ouyang, Liang Hong, Bingxin Zhou, Pan Tan. (2024). ProSST: Protein language modeling with quantizied structure and disentangled attention. bioRxiv.

* Non-parametric bootstrap standard error of the difference between the Spearman performance of a given model and that of the best overall model, computed over 10k bootstrap samples from the set of proteins in the ProteinGym substitution benchmark.