Boosting Software Developer Efficiency with AI's Second Opinion: GitHub Copilot CLI's Rubber Duck

In the fast-paced world of software development, ensuring code quality and architectural soundness from the outset can significantly boost software developer efficiency. GitHub Copilot CLI is taking a bold step in this direction with its new experimental feature, "Rubber Duck." This innovative tool introduces a novel concept: giving your primary AI coding agent a "second opinion" from an entirely different AI model family.

Two AI models collaborating to review code in a digital editor.
Two AI models collaborating to review code in a digital editor.

A Second Opinion for Smarter Code

Traditionally, AI coding agents follow a loop of task assessment, planning, implementation, and testing. While powerful, this process can have blind spots. Early-stage decisions, if flawed, can lead to compounding errors, making fixes more costly and time-consuming down the line. Even self-reflection, a proven technique, is limited by a single model's inherent biases and training data.

Enter Rubber Duck. Operating in experimental mode within GitHub Copilot CLI, this feature acts as an independent reviewer. When your primary orchestrator is a Claude model, Rubber Duck leverages GPT-5.4, a model from a complementary AI family. Its mission is to provide a focused critique, highlighting potential issues the primary agent might have overlooked—such as questionable assumptions, missed details, or critical edge cases.

Enhancing Performance and Catching Critical Errors

The impact of this cross-family review is significant. Evaluations on SWE-Bench Pro, a benchmark for complex, real-world coding problems, showed impressive results. Claude Sonnet 4.6, when paired with Rubber Duck (GPT-5.4), closed 74.7% of the performance gap between Sonnet and the more powerful Opus 4.6 running alone. This boost is particularly evident in difficult, multi-file tasks that typically require extensive steps, where Sonnet + Rubber Duck scored up to 4.8% higher on the hardest problems.

Rubber Duck excels at identifying subtle yet impactful issues:

  • Architectural Catches: It can spot fundamental design flaws, like a proposed scheduler that would immediately exit or contain an infinite loop.
  • One-Liner Bugs with Big Impact: A seemingly minor loop error that silently overwrites data, leading to significant data loss without any error messages.
  • Cross-File Conflicts: Identifying dependencies where new code might break existing functionalities across multiple files, preventing silent failures upon deployment.

By catching these issues early, Rubber Duck directly contributes to improved software developer efficiency, reducing the need for extensive debugging and rework.

Illustration of an AI catching errors early in the software development planning stage.
Illustration of an AI catching errors early in the software development planning stage.

When and How Rubber Duck Activates

GitHub Copilot can invoke Rubber Duck both automatically and on demand. For complex tasks, it proactively seeks critiques at key checkpoints where feedback provides the highest return:

  • After drafting a plan, to prevent compounding errors.
  • Following a complex implementation, for a second set of eyes on intricate code.
  • After writing tests but before execution, to identify coverage gaps or flawed assertions.

It can also activate reactively if the primary agent gets stuck. Users can also request a critique at any point, prompting Copilot to incorporate feedback and show the changes.

Community Insights and Future Directions

The community's initial reactions highlight both excitement and areas for refinement. Users are keen to understand the impact on quota and premium requests, a common concern with advanced AI features. Another key question revolves around configuring Rubber Duck to use specific models or custom instructions.

One user, logar16, reported being "inundated" with over ten Rubber Duck agents during a session with Claude Opus 4.6. This experience prompted crucial questions about control and efficiency:

- Can we limit how frequently it launches ducks?
- Can we turn off rubber-duck tool for subagents and leave it up to the orchestrator of the fleet to decide when to invoke?
- Can we specify when rubber-duck can be used like ONLY after plan is made but before implementation or NEVER for tests etc.
- Can we just turn off the feature if it isn't adding value?
- Can we have the system warn the agent when it's got more than one rubber duck running at a time or it is running a rubber duck on every modified file?

These insights underscore the importance of fine-grained control and transparency in AI-assisted workflows. While the potential for boosting software developer efficiency through intelligent, multi-model review is clear, managing resource consumption and preventing "AI overload" will be key to its widespread adoption and success. As Rubber Duck continues its experimental phase, community feedback will undoubtedly shape its evolution into an even more powerful and user-friendly tool.

To try Rubber Duck, install GitHub Copilot CLI and run the /experimental slash command. It's available when you select any Claude model and have access to GPT-5.4.

Track, Analyze and Optimize Your Software DeveEx!

Effortlessly implement gamification, pre-generated performance reviews and retrospective, work quality analytics, alerts on top of your code repository activity

 Install GitHub App to Start
devActivity Screenshot