Andrej Karpathy just dropped a game-changer called autoresearch—a lean, mean Python tool for letting AI agents autonomously run machine learning experiments. This project is essentially a slimmed-down, single-file take on the nanochat LLM training core, fine-tuned for a single NVIDIA GPU.
Here’s the gist: there’s an ingenious feedback loop involving human researchers and AI agents. Humans set the high-level instructions in a Markdown file while the AI handles the nitty-gritty adjustments in a Python training script. It pulls off five-minute sprints of training, checking its work with a metric called bits-per-byte (BPB), and only commits mod changes if it actually improves.
Karpathy’s initial tests showed this autonomous tweak wizardry can be impressive, reducing validation loss in early runs. Tobi Lutke from Shopify even gave it a spin, reporting a notable 19% improvement in validation scores with a smaller model outperforming its larger, manually-tuned counterpart.
For developers, this marks a new frontier in ML model development. Forget manual hyperparameter tweaking—focus on sculpting the right prompts to guide these agents instead. With a concise 630 lines of code, the system fits within modern LLMs’ context windows, offering a ‘holistic’ grasp and minimizing errors. Expect this to become a staple for those pursuing cutting-edge efficiency in ML training.
Read more at MarkTechPost…
