The Jagged Frontier: Why Human-AI Collaboration Fails and What Enterprise Leaders Can Do About It
Apr 15, 2026By Dr. Felicia Newhouse, Founder, AI-Powered Women
There is an assumption at the heart of most enterprise AI strategies that the research has now shown to be wrong. The assumption is that pairing humans with AI always produces better results than either working alone. It sounds intuitive. It seems like common sense. And a landmark study from Harvard Business School demonstrates that it fails in precisely the situations where organizations most need it to work.
In 2023, Fabrizio Dell'Acqua and colleagues ran a controlled experiment with 758 consultants at Boston Consulting Group, one of the largest and most rigorous field studies of AI in professional knowledge work. The results revealed what the researchers called a "jagged technological frontier," an irregular boundary between tasks where AI dramatically improves human performance and tasks where AI causes human performance to degrade. The frontier is jagged because there is no clean line between where AI is reliable and where it is misleading. And the determining factor on which side of the frontier a team lands is the human's ability to recognize the boundary (Dell'Acqua et al., 2023).
This finding should reshape how every enterprise leader thinks about AI deployment, AI training, and AI-augmented decision-making.
What 758 Consultants Revealed
The experiment was designed to test a straightforward question: does access to frontier AI models (specifically GPT-4) improve the performance of skilled professionals on realistic consulting tasks?
The answer was: it depends entirely on the task.
On tasks that fell inside AI's capability frontier, consultants using GPT-4 saw performance gains of approximately 40 percent. They completed work faster, produced higher-quality outputs, and outperformed their peers who worked without AI access. For these tasks, the technology worked exactly as advertised.
On tasks that fell outside the frontier, the results reversed. Consultants using GPT-4 performed worse than consultants working without AI. The degradation was measurable and consistent. The consultants with the most powerful tools available produced the lowest-quality work on these tasks (Dell'Acqua et al., 2023).
The mechanism was what the researchers described as "falling asleep at the wheel." When consultants had access to AI, they became less likely to apply their own expertise. They anchored on AI-generated suggestions even when those suggestions were wrong. The more confident the AI output appeared, the more consultants deferred to it. This is a documented cognitive bias pattern, automation bias, amplified by the fluency and confidence of large language model outputs.
The consultants were not novices. They were experienced professionals at one of the world's leading management consulting firms. If automation bias affected their judgment, it will affect every professional in every organization deploying AI at scale.
The Metacognitive Skill Gap
Dr. Ethan Mollick at the Wharton School has been running parallel research on AI in professional settings, and his findings reinforce the same core dynamic. The value of human-AI collaboration is contingent on the human's metacognitive skill: the ability to evaluate when AI is helpful and when it is misleading. This skill does not develop from tool exposure alone. Giving someone access to a powerful AI tool does not teach them to recognize when the tool is wrong (Mollick, 2024).
This is the gap that most enterprise AI training programs fail to address. The standard approach focuses on how to use the tools: prompt construction, feature navigation, workflow integration. These are necessary skills, but they do not develop the judgment required to work on the jagged frontier. An employee can be proficient at prompting a language model and still lack the ability to identify when the model's output is plausible, well-structured, and completely wrong.
The metacognitive skill of "appropriate reliance," knowing when to trust AI and when to override it, is a learnable capability. Research on calibrated trust in automated systems has shown that structured training in identifying AI failure modes, evaluating output confidence, and applying domain-specific judgment significantly improves human performance in AI-augmented tasks. The training works. The problem is that most organizations have not built it into their readiness investments.
The Real Cost of the Skills Gap
The practical implication of the Dell'Acqua study is significant and uncomfortable. For tasks at the boundary of AI capability, providing AI tools without structured readiness training is worse than providing no tools at all. The cost is not merely missed productivity. It is actively degraded decision quality.
Consider what this means at enterprise scale. An organization deploys a generative AI platform to thousands of knowledge workers. Some tasks those workers perform fall clearly inside the AI frontier. Other tasks fall outside it. Without the metacognitive skill to distinguish between the two, workers apply AI to both categories with the same level of trust. On the inside-frontier tasks, productivity improves. On the outside-frontier tasks, decision quality degrades, and neither the workers nor their managers can see it happening because the AI outputs look polished and confident regardless of their accuracy.
The net effect across the organization depends on the ratio of inside-frontier to outside-frontier tasks. For routine, well-structured work, the gains dominate. For complex, judgment-intensive work, the losses may offset or exceed the gains. And the highest-stakes decisions in any organization tend to cluster at the boundary of AI capability, exactly where the risk of degradation is greatest.
Building Appropriate Reliance
The path forward is not to limit AI deployment. The performance gains on inside-frontier tasks are real and significant. The path forward is to invest in the human capability that determines whether those gains extend across the organization or get undermined by degraded judgment on boundary tasks.
Organizations need to build what the research calls "appropriate reliance." This means three things in practice.
First, structured training on recognizing AI failure modes specific to the organization's domain. A consulting firm's failure modes differ from a healthcare system's, which differ from a financial services company's. Generic AI literacy programs do not develop this capability.
Second, evaluation frameworks that give professionals a systematic process for assessing AI output quality before acting on it. This is analogous to the verification protocols that exist in fields like aviation and medicine. The stakes of AI-augmented decision-making in enterprise settings justify a similar level of process discipline.
Third, ongoing development that evolves with the technology. The jagged frontier is not static. As models improve, the boundary shifts. Tasks that were outside the frontier last quarter may be inside it this quarter, and new boundary cases emerge. Readiness is a continuous investment.
This is the work at the center of the AI-Powered Women AI Leadership Readiness Program, and it is a featured track at the 2026 MIT Conference (September 12-13, Kresge Auditorium). Sessions on the jagged frontier, appropriate reliance, and metacognitive skill development in AI-augmented environments will feature researchers from the teams behind the studies cited here. Register now to join more than 1,200 enterprise leaders building the human capability layer that determines whether AI investment produces value.
References
Dell'Acqua, F., McFowland, E., Mollick, E., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge Worker Productivity and Quality. Harvard Business School Working Paper No. 24-013.
Mollick, E. (2024). Ongoing research on AI in professional settings and metacognitive skill in human-AI collaboration. Wharton School, University of Pennsylvania. See also Mollick, E. (2024). Co-Intelligence: Living and Working with AI. Portfolio/Penguin.
Dr. Felicia Newhouse is the founder of AI-Powered Women and convener of the 2026 MIT Conference on AI Leadership Readiness. The AI Leadership Readiness Program develops the metacognitive skills enterprise leaders need to navigate the jagged frontier of human-AI collaboration.
Join Us at the 2026 AI-Powered Women Conference
Connect with visionary women leaders, explore cutting-edge AI strategies, and grow your business at our flagship annual event. Don't miss out!