Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork.

Devin is an autonomous agent that solves engineering tasks through the use of its own shell, code editor, and web browser.

When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted.

Check out what Devin can do in the thread below.

Devin, the first AI software engineer

1/4 Devin can learn how to use unfamiliar technologies.

learn how to use unfamiliar technologies

2/4 Devin can contribute to mature production repositories.

contribute to mature production repositories

3/4 Devin can train and fine tune its own AI models.

train and fine tune its own AI models

4/4 We even tried giving Devin real jobs on Upwork and it could do those too!

did the real jobs on Upwork

For more details on Devin, check out the blog post here: https://cognition-labs.com/blog