Model |
TU |
TBO |
TCO |
DA |
ADA | Avg. | |
---|---|---|---|---|---|---|---|
Sep 1, 2025 | Qwen3-32B open source |
83.12 | 84.12 | 77.57 | 73.15 | 76.84 | 72.96 |
Sep 1, 2025 | Qwen-14B open source |
81.30 | 83.13 | 78.48 | 68.56 | 46.23 | 71.54 |
Sep 1, 2025 | Qwen-8B open source |
81.70 | 81.56 | 75.36 | 64.30 | 43.15 | 69.21 |
Sep 1, 2025 | Qwen2.5-72B-Instruct open source |
81.90 | 74.85 | 72.65 | 71.03 | 45.29 | 69.15 |
Sep 1, 2025 | Deepseek-R1-Distill-Qwen-32B open source |
80.00 | 81.46 | 70.59 | 65.74 | 40.32 | 67.62 |
Sep 1, 2025 | DeepSeek-R1-0528-Qwen3-8B open source |
77.49 | 82.38 | 70.34 | 56.69 | 36.42 | 64.66 |
Sep 1, 2025 | Llama-3.1-70B-Instruct open source |
75.46 | 69.69 | 69.69 | 63.72 | 40.91 | 63.81 |
Sep 1, 2025 | Deepseek-R1-Distill-Qwen-14B open source |
79.57 | 75.05 | 66.88 | 57.83 | 30.92 | 62.05 |
Sep 1, 2025 | TableGPT2-7B open source |
75.07 | 64.07 | 64.35 | 59.90 | 37.14 | 60.10 |
Sep 1, 2025 | Seed-Coder-8B-Instruct open source |
65.49 | 60.10 | 57.24 | 66.94 | 39.15 | 57.78 |
Sep 1, 2025 | Table-R1-Zero-7B open source |
73.24 | 57.97 | 60.79 | 61.58 | 33.93 | 57.50 |
Sep 1, 2025 | Qwen2.5-Coder-7B-Instruct open source |
69.09 | 65.88 | 57.99 | 59.64 | 32.66 | 57.05 |
Sep 1, 2025 | Qwen2.5-7B-Instruct open source |
70.22 | 58.71 | 58.06 | 62.12 | 32.48 | 56.32 |
Sep 1, 2025 | Qwen2.5-Math-72B-Instruct open source |
68.31 | 62.83 | 58.74 | 58.10 | 31.79 | 55.95 |
Sep 1, 2025 | Llama-3.1-8B-Instruct open source |
62.44 | 54.30 | 51.86 | 58.47 | 26.77 | 50.77 |
Sep 1, 2025 | Deepseek-R1-Distill-Qwen-7B open source |
57.03 | 59.14 | 50.17 | 54.63 | 22.45 | 48.68 |
Sep 1, 2025 | Yi-Coder-9B-Chat open source |
50.87 | 56.67 | 48.54 | 53.91 | 30.33 | 48.07 |
Sep 1, 2025 | Table-R1-SFT-7B open source |
57.63 | 52.04 | 47.42 | 43.56 | 36.92 | 47.51 |
Sep 1, 2025 | Deepseek-coder-33B-instruct open source |
46.26 | 55.38 | 44.68 | 51.28 | 27.82 | 45.08 |
Sep 1, 2025 | Deepseek-R1-Distill-Llama-8B open source |
48.39 | 52.23 | 44.16 | 35.46 | 15.05 | 41.06 |
Sep 1, 2025 | Qwen2.5-Math-7B-Instruct open source |
33.26 | 37.07 | 27.97 | 29.03 | 11.35 | 27.74 |
Sep 1, 2025 | Kimina-Prover-Preview-Distill-7B open source |
19.20 | 15.57 | 14.37 | 13.49 | 6.08 | 13.74 |
Model |
TU |
TBO |
TCO |
DA |
ADA | Avg. | |
---|---|---|---|---|---|---|---|
Sep 1, 2025 | Qwen3-32B open source |
53.07 | 38.26 | 50.10 | 27.93 | 25.44 | 38.96 |
Sep 1, 2025 | Qwen2.5-72B-Instruct open source |
53.84 | 33.00 | 47.60 | 27.97 | 23.48 | 37.18 |
Sep 1, 2025 | Qwen3-8B open source |
51.52 | 37.95 | 47.20 | 25.96 | 22.51 | 37.03 |
Sep 1, 2025 | Deepseek-R1-Distill-Qwen-32B open source |
52.84 | 35.52 | 46.40 | 27.51 | 21.81 | 36.82 |
Sep 1, 2025 | Qwen3-14B open source |
51.15 | 39.00 | 48.04 | 23.47 | 22.33 | 36.80 |
Sep 1, 2025 | Deepseek-R1-Distill-Qwen-14B open source |
50.04 | 28.66 | 42.51 | 26.04 | 17.00 | 32.85 |
Sep 1, 2025 | Qwen2.5-7B-Instruct open source |
45.63 | 28.58 | 43.69 | 25.26 | 18.69 | 32.37 |
Sep 1, 2025 | Qwen2.5-Coder-7B-Instruct open source |
44.15 | 29.54 | 42.93 | 24.02 | 17.81 | 31.69 |
Sep 1, 2025 | Table-R1-Zero-7B open source |
44.84 | 27.42 | 41.42 | 25.35 | 17.76 | 31.36 |
Sep 1, 2025 | TableGPT2-7B open source |
43.25 | 31.83 | 42.08 | 16.82 | 15.56 | 29.91 |
Sep 1, 2025 | Seed-Coder-8B-Instruct open source |
36.65 | 28.60 | 38.52 | 26.97 | 17.21 | 29.59 |
Sep 1, 2025 | DeepSeek-R1-0528-Qwen3-8B open source |
35.39 | 26.80 | 33.79 | 16.58 | 14.48 | 25.41 |
Sep 1, 2025 | Deepseek-R1-Distill-Qwen-7B open source |
32.20 | 25.14 | 31.41 | 23.78 | 11.22 | 24.75 |
Sep 1, 2025 | Qwen2.5-Math-72B-Instruct open source |
31.11 | 24.92 | 30.53 | 18.89 | 16.53 | 24.40 |
Sep 1, 2025 | Llama-3.1-70B-Instruct open source |
30.98 | 20.59 | 29.91 | 10.49 | 11.39 | 20.67 |
Sep 1, 2025 | Deepseek-R1-Distill-Llama-8B open source |
30.71 | 21.49 | 24.94 | 13.66 | 7.65 | 19.69 |
Sep 1, 2025 | Table-R1-SFT-7B open source |
25.17 | 16.95 | 23.34 | 14.24 | 6.20 | 17.18 |
Sep 1, 2025 | Llama-3.1-8B-Instruct open source |
20.00 | 19.88 | 22.96 | 12.44 | 9.96 | 17.05 |
Sep 1, 2025 | DeepSeek-coder-33B-instruct open source |
22.09 | 20.42 | 17.75 | 14.10 | 8.33 | 16.54 |
Sep 1, 2025 | Yi-Coder-9B-Chat open source |
22.89 | 17.90 | 19.02 | 9.56 | 8.92 | 15.66 |
Sep 1, 2025 | Qwen2.5-Math-7B-Instruct open source |
9.26 | 13.53 | 13.26 | 6.00 | 6.22 | 9.65 |
Sep 1, 2025 | Kimina-Prover-Preview-Distill-7B open source |
2.49 | 4.30 | 3.39 | 2.19 | 1.75 | 2.82 |
About TReB
TReB
is a comprehensive table reasoning evolution benchmark, which measures both shallow table understanding abilities and deep table reasoning abilities.
Overall, we construct a high quality dataset to evaluate 5 core skills of LLMs: Table Understanding (TU), Table Basic Operation (TBO), Table Computational Operation (TCO), Data Analysis (DA), and Advanced Data Analysis (ADA).
Accordingly, we propose a total of 20 sub-tasks.
The evaluation framework supports 3 distinct inference modes, TCoT, PoT and ICoT, encouraging more robust reasoning.
News
-
Jun. 18, 2025:
π₯π₯ The benchmark paper, code and dataset are all released! Please check and submit your result to leaderboard!!
.Thank you for all the feedback!!!
Challenges from TReB

Submission
π€π€ We warmly welcome submissions to our leaderboard, including both your own methods and contributions
showcasing the latest model performance! TReB features two separate leaderboards. Please refer to
the Submission Guidelines below for details, and submit your results as instructed to
jttreb2025@gmail.com
.
Citation
@misc{li2025trebcomprehensivebenchmarkevaluating, title={TReB: A Comprehensive Benchmark for Evaluating Table Reasoning Capabilities of Large Language Models}, author={Ce Li and Xiaofan Liu and Zhiyan Song and Ce Chi and Chen Zhao and Jingjing Yang and Zhendong Wang and Kexin Yang and Boshen Shi and Xing Wang and Chao Deng and Junlan Feng}, year={2025}, eprint={2506.18421}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.18421}, }