Data Science

GenAI’s impact on developers is ‘meh,’ improvements needed, study says

Research from Uplevel Shows 41% Increase in Bug Rate with GenAI for Coding — Plus No Notable Improvements to Key Efficiency Metrics or Burnout Risk

Lines of code generated. Time required to resolve bugs. Coding speed. These are often metrics that software developers look to improve on — but is generative AI (GenAI) the answer? A new study from Uplevel indicates that today’s GenAI-based developer tools don’t tend to increase coding efficiency and actually increase bug rates — indicating improvements may be needed to drive greater business value.

The research was conducted by Uplevel Data Labs, Uplevel’s data science arm. It examined a sample of 800 software developers on large engineering teams whose organizations had adopted Microsoft’s GitHub Copilot, a GenAI-based coding assistant and developer tool. Uplevel’s study looked at key engineering metrics pre and post Copilot implementation. The developers studied were segmented into a test group (those with Copilot access) and control group (those without access) — both with similar roles, working days and coding volume.

As interest in GenAI exploded over the last year, so have GenAI-based business applications. Uplevel’s research comes when many organizations are now taking stock of their investments – making sure they’re not just flashy but actually drive value. And with97% of software developers, programmers and engineers having used AI-powered coding tools at work, businesses want to understand whether these tools improve outcomes.

Here’s what Uplevel’s data on Copilot shows:

  • No significant change to efficiency metrics with GenAI — Developers with Copilot did not see an increase in coding speed. In terms of pull request (PR) cycle time (the time to merge code into a repository) and PR throughput (number of pull requests merged), Copilot neither helped nor hurt developers.
  • 41% increase in bug rate — Post-Copilot implementation, developers saw a 41% increase in bugs within pull requests — suggesting the tool may impact code quality.
  • Doesn’t mitigate risk of burnout — Uplevel’s “Sustained Always On” metric (measuring working time outside of standard hours: a leading indicator of burnout) decreased for both control and test groups during the course of the study. But it decreased by 17% for those with Copilot access and, notably, 28% for those without — showing Copilot isn’t the most effective way to alleviate burnout.

“Engineering teams today seek to allocate their time to the highest value work, complete that work as effectively as possible, and do so without burning out,” said Joe Levy, CEO of Uplevel. “They look at data to drive decision-making, and right now, the data doesn’t show appreciable gains in these specific areas through generative AI. But innovation moves quickly, and we’re not suggesting that developers ignore GenAI-based tools like Copilot, Gemini or CodeWhisperer.  These tools are all new, there is a learning curve, and most teams have yet to land on the most effective use cases that improve productivity. We will continue to watch these insights as the adoption of GenAI continues to grow, and we recommend anyone investing in GenAI tools do the same.” Uplevel’s study was a non-biased one; the company does not offer a competitive solution. As engineering teams seek to ascertain whether GenAI solutions (along with other development and collaboration tools) free up developers and improve performance metrics, Uplevel surfaces and interprets the insights they need. For more information, see uplevelteam.com.

Explore AITechPark for the latest advancements in AI, IOT, Cybersecurity, AITech News, and insightful updates from industry experts!

Related posts

Hamed Shahbazi appointed Chairman of the Board of HEALWELL AI

GlobeNewswire

Groundbreaking Dataset to Tackle Heart Disease

PR Newswire

Data Science-driven CognitOps Raises $11M

Business Wire