RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens — TOGETHER

GPT-4: RedPajama project aims to create fully open-source AI models, bridging the quality gap between open and closed models. Collaborating with major research institutions, the project has completed its first step by reproducing the LLaMA training dataset with over 1.2 trillion tokens. This milestone marks the beginning of AI’s Linux moment, fostering creativity and innovation in the field.
