DeepSeek releases Prover-V2 model with 671 billion parameters
DeepSeek released a new model called DeepSeek-Prover-V2-671B on the AI open source community Hugging Face today. It is reported that DeepSeek-Prover-V2-671B uses a more efficient safetensors file format and supports multiple calculation precisions, which facilitates faster and more resource-saving model training and deployment. It has 671 billion parameters and may be an upgraded version of the Prover-V1.5 mathematical model released last year. In terms of model architecture, the model uses the DeepSeek-V3 architecture, adopts the MoE (mixed expert) mode, has 61 Transformer layers, and 7168-dimensional hidden layers. It also supports ultra-long contexts, with a maximum position embedding of 163,800, enabling it to handle complex mathematical proofs, and uses FP8 quantization, which can reduce the model size and improve reasoning efficiency through quantization technology.