GPT-5 isn’t just crunching language anymore — it’s beginning to take part in actual laboratory workflows. In recent collaborative work with a biosecurity startup called Red Queen Bio, the model was put into a wet lab context, where scientists handle liquids, biological materials, and real experimental protocols. Over repeated cycles of proposing changes based on data, having those changes executed by human technicians, and then suggesting further refinements, the system improved a standard molecular cloning procedure by a factor of 79. ng is one of those foundational tools in biotechnology that underpins everything from genetic engineering to protein design. Small gains in cloning efficiency can cascade into big productivity boosts across research and development. What sets this work apart is not just the size of the improvement, but the iterative loop between the AI’s suggestions and experimental feedback — a step toward more dynamic, data-driven exploration of lab conditions. oai_citation:0‡Axios
What’s happening here isn’t autonomous science; it’s augmented science. The model doesn’t run experiments on its own, and the human scientists set up the framework, execute the protocols, and interpret results. But by identifying adjustments that result in measurable gains and proposing them in successive rounds, the AI is functioning as a reasoning partner, not just a static knowledge base. oai_citation:1‡Reddit
The broader context for this work is a shift in how advanced AI models are evaluated in scientific domains. Traditional measures like benchmark scores don’t capture what happens when a system engages with the messy, empirical world of physical experiments. OpenAI and others working on similar tools are increasingly building frameworks to test how models propose hypotheses, adapt based on outcomes, and interact with experts in iterative workflows. oai_citation:2‡OpenAI
These capabilities are emerging alongside improvements in other areas of scientific reasoning. New benchmarks designed to measure how well models handle expert-level scientific questions across physics, chemistry, and biology have shown notable gains in recent versions of the GPT-5 family, though challenges remain before such systems can operate without careful expert oversight. oai_citation:3‡OpenAI
What’s most interesting isn’t just the specific result in cloning efficiency, but what it suggests about the human-AI partnership in research. At present, researchers frame questions, control safety, and make final judgments. The model accelerates exploration within that structure, pointing out paths that might take much longer to uncover through manual trial-and-error alone. As these workflows mature, AI may shift from being a tool that answers questions to one that helps shape the questions scientists ask — all while the critical judgment and validation remain firmly in human hands.