Update readme to include details on AIDE paper release and METR's evaluation
Browse files
README.md
CHANGED
|
@@ -6,11 +6,13 @@
|
|
| 6 |
[](https://discord.gg/Rq7t8wnsuA) 
|
| 7 |
[](https://twitter.com/WecoAI) 
|
| 8 |
|
| 9 |
-
AIDE is an LLM agent that generates solutions for machine learning tasks just from natural language descriptions of the task.
|
| 10 |
|
| 11 |
AIDE is the state-of-the-art agent on OpenAI's [MLE-bench](https://arxiv.org/pdf/2410.07095), a benchmark composed of 75 Kaggle machine learning tasks, where we achieved four times more medals compared to the runner-up agent architecture.
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
| 14 |
|
| 15 |
More specifically, AIDE has the following features:
|
| 16 |
|
|
@@ -246,3 +248,18 @@ At its core, Solution Space Tree Search consists of three main components:
|
|
| 246 |
By repeatedly applying these steps, AIDE navigates the vast space of possible solutions, progressively refining its approach until it converges on the optimal solution for the given data science problem.
|
| 247 |
|
| 248 |

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
[](https://discord.gg/Rq7t8wnsuA) 
|
| 7 |
[](https://twitter.com/WecoAI) 
|
| 8 |
|
| 9 |
+
AIDE is an LLM agent that generates solutions for machine learning tasks just from natural language descriptions of the task. This repository implements the AIDE agent described in our paper - [AIDE: AI-Driven Exploration in the Space of Code](https://arxiv.org/pdf/2502.13138). We recommend to check out the [project page](https://www.aide.ml) and [technical report](https://www.weco.ai/blog/technical-report) for a quick summary of the method and results.
|
| 10 |
|
| 11 |
AIDE is the state-of-the-art agent on OpenAI's [MLE-bench](https://arxiv.org/pdf/2410.07095), a benchmark composed of 75 Kaggle machine learning tasks, where we achieved four times more medals compared to the runner-up agent architecture.
|
| 12 |
|
| 13 |
+
METR's [RE-Bench](https://arxiv.org/pdf/2411.15114) shows that AIDE is not only capable at machine learning tasks but generalizes to the AI R&D tasks such as optimizing low level Triton kernels and finetuning GPT-2 for QA, even surpassing the performance of human experts.
|
| 14 |
+
|
| 15 |
+
In our own benchmark composed of over 60 Kaggle data science competitions, AIDE demonstrated impressive performance, surpassing 50% of Kaggle participants on average.
|
| 16 |
|
| 17 |
More specifically, AIDE has the following features:
|
| 18 |
|
|
|
|
| 248 |
By repeatedly applying these steps, AIDE navigates the vast space of possible solutions, progressively refining its approach until it converges on the optimal solution for the given data science problem.
|
| 249 |
|
| 250 |

|
| 251 |
+
|
| 252 |
+
# Citation
|
| 253 |
+
|
| 254 |
+
If you use AIDE in your work, please cite the following paper:
|
| 255 |
+
```bibtex
|
| 256 |
+
@misc{aide2025,
|
| 257 |
+
title={AIDE: AI-Driven Exploration in the Space of Code},
|
| 258 |
+
author={Zhengyao Jiang and Dominik Schmidt and Dhruv Srikanth and Dixing Xu and Ian Kaplan and Deniss Jacenko and Yuxiang Wu},
|
| 259 |
+
year={2025},
|
| 260 |
+
eprint={2502.13138},
|
| 261 |
+
archivePrefix={arXiv},
|
| 262 |
+
primaryClass={cs.AI},
|
| 263 |
+
url={https://arxiv.org/abs/2502.13138},
|
| 264 |
+
}
|
| 265 |
+
```
|