[2024.05.24] We release updated version InternLM2-Math-Plus with 4 sizes and state-of-the-art performances including 1.8B, 7B, 20B, and 8x22B. We improve informal math reasoning performance (chain-of-thought and code-intepreter) and formal math reasoning performance (LEAN 4 translation and LEAN 4 theorem proving) significantly.
[2024.02.10] We add tech reports and citation reference.
[2024.01.31] We add MiniF2F results with evaluation codes!
[2024.01.29] We add checkpoints from ModelScope. Update results about majority voting and Code Intepreter. Tech report is on the way!
[2024.01.26] We add checkpoints from OpenXLab, which ease Chinese users to download!
Performance
Formal Math Reasoning
We evaluate the performance of InternLM2-Math-Plus on formal math reasoning benchmark MiniF2F-test. The evaluation setting is same as Llemma with LEAN 4.
Models
MiniF2F-test
ReProver
26.5
LLMStep
27.9
GPT-F
36.6
HTPS
41.0
Llemma-7B
26.2
Llemma-34B
25.8
InternLM2-Math-7B-Base
30.3
InternLM2-Math-20B-Base
29.5
InternLM2-Math-Plus-1.8B
38.9
InternLM2-Math-Plus-7B
43.4
InternLM2-Math-Plus-20B
42.6
InternLM2-Math-Plus-Mixtral8x22B
37.3
Informal Math Reasoning
We evaluate the performance of InternLM2-Math-Plus on informal math reasoning benchmark MATH and GSM8K. InternLM2-Math-Plus-1.8B outperforms MiniCPM-2B in the smallest size setting. InternLM2-Math-Plus-7B outperforms Deepseek-Math-7B-RL which is the state-of-the-art math reasoning open source model. InternLM2-Math-Plus-Mixtral8x22B achieves 68.5 on MATH (with Python) and 91.8 on GSM8K.
Model
MATH
MATH-Python
GSM8K
MiniCPM-2B
10.2
-
53.8
InternLM2-Math-Plus-1.8B
37.0
41.5
58.8
InternLM2-Math-7B
34.6
50.9
78.1
Deepseek-Math-7B-RL
51.7
58.8
88.2
InternLM2-Math-Plus-7B
53.0
59.7
85.8
InternLM2-Math-20B
37.7
54.3
82.6
InternLM2-Math-Plus-20B
53.8
61.8
87.7
Mixtral8x22B-Instruct-v0.1
41.8
-
78.6
Eurux-8x22B-NCA
49.0
-
-
InternLM2-Math-Plus-Mixtral8x22B
58.1
68.5
91.8
We also evaluate models on
MathBench-A
. InternLM2-Math-Plus-Mixtral8x22B has comparable performance compared to Claude 3 Opus.
Model
Arithmetic
Primary
Middle
High
College
Average
GPT-4o-0513
77.7
87.7
76.3
59.0
54.0
70.9
Claude 3 Opus
85.7
85.0
58.0
42.7
43.7
63.0
Qwen-Max-0428
72.3
86.3
65.0
45.0
27.3
59.2
Qwen-1.5-110B
70.3
82.3
64.0
47.3
28.0
58.4
Deepseek-V2
82.7
89.3
59.0
39.3
29.3
59.9
Llama-3-70B-Instruct
70.3
86.0
53.0
38.7
34.7
56.5
InternLM2-Math-Plus-Mixtral8x22B
77.5
82.0
63.6
50.3
36.8
62.0
InternLM2-Math-20B
58.7
70.0
43.7
24.7
12.7
42.0
InternLM2-Math-Plus-20B
65.8
79.7
59.5
47.6
24.8
55.5
Llama3-8B-Instruct
54.7
71.0
25.0
19.0
14.0
36.7
InternLM2-Math-7B
53.7
67.0
41.3
18.3
8.0
37.7
Deepseek-Math-7B-RL
68.0
83.3
44.3
33.0
23.0
50.3
InternLM2-Math-Plus-7B
61.4
78.3
52.5
40.5
21.7
50.9
MiniCPM-2B
49.3
51.7
18.0
8.7
3.7
26.3
InternLM2-Math-Plus-1.8B
43.0
43.3
25.4
18.9
4.7
27.1
Citation and Tech Report
@misc{ying2024internlmmath,
title={InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning},
author={Huaiyuan Ying and Shuo Zhang and Linyang Li and Zhejian Zhou and Yunfan Shao and Zhaoye Fei and Yichuan Ma and Jiawei Hong and Kuikun Liu and Ziyi Wang and Yudong Wang and Zijian Wu and Shuaibin Li and Fengzhe Zhou and Hongwei Liu and Songyang Zhang and Wenwei Zhang and Hang Yan and Xipeng Qiu and Jiayu Wang and Kai Chen and Dahua Lin},
year={2024},
eprint={2402.06332},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Runs of internlm internlm2-math-plus-20b on huggingface.co
21
Total runs
0
24-hour runs
0
3-day runs
-26
7-day runs
-142
30-day runs
More Information About internlm2-math-plus-20b huggingface.co Model
internlm2-math-plus-20b huggingface.co is an AI model on huggingface.co that provides internlm2-math-plus-20b's model effect (), which can be used instantly with this internlm internlm2-math-plus-20b model. huggingface.co supports a free trial of the internlm2-math-plus-20b model, and also provides paid use of the internlm2-math-plus-20b. Support call internlm2-math-plus-20b model through api, including Node.js, Python, http.
internlm2-math-plus-20b huggingface.co is an online trial and call api platform, which integrates internlm2-math-plus-20b's modeling effects, including api services, and provides a free online trial of internlm2-math-plus-20b, you can try internlm2-math-plus-20b online for free by clicking the link below.
internlm internlm2-math-plus-20b online free url in huggingface.co:
internlm2-math-plus-20b is an open source model from GitHub that offers a free installation service, and any user can find internlm2-math-plus-20b on GitHub to install. At the same time, huggingface.co provides the effect of internlm2-math-plus-20b install, users can directly use internlm2-math-plus-20b installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
internlm2-math-plus-20b install url in huggingface.co: