THATGUYABHISHEK
navigation

Excel Charting : AI-Powered Chart Design Recommendations

notion image

Introduction

Creating charts is easy. Making them GOOD is hard. Users struggled with chart type selection, styling decisions, and best practices—resulting in suboptimal visualizations even when data was correct.
As Lead Designer for Copilot Chart Design Recommendations, I designed an LLM-powered system that analyzes data shape, infers user intent, and suggests optimal chart configurations. This required deep collaboration with ML engineers to train and tune the RAG models with data visualization principles—essentially encoding expert knowledge into AI prompts.
The result bridges the gap between 'chart exists' and 'chart communicates effectively,' democratizing data visualization expertise for 400M users.
 
 
 
Results Overview
100% Execution Success
9 mins User Effort Saved
172 Clicks Eliminated
FY26 H1 Shipping Timeline

The Problem: The Chart Design Expertise Gap

What Users Were Struggling With
"I created a chart but don't know if I picked the right type." — Intermediate User
"It looks boring. How do I make it presentation-ready?" — Enterprise Analyst
"I spend hours tweaking charts to look professional." — Power User
The pattern was clear: Users could INSERT charts (thanks to our P0 improvements), but they couldn't optimize them.
Pain Point
User Behavior
Business Impact
Chart Type Uncertainty
Try multiple types, delete, start over. 5-10 minute cycle.
Wasted time, user frustration, suboptimal final choices
Styling Paralysis
Don't know which formatting options matter. Either over-style or under-style.
Charts look unprofessional or cluttered
Best Practice Ignorance
Unaware of data viz principles (e.g., start axis at zero, use direct labels)
Misleading visuals, poor communication

Why Competitors Had the Advantage

Competitive analysis revealed sophisticated design assistance:
  • Tableau: 'Show Me' feature auto-recommends chart types based on field types
  • Power BI: Smart narrative and formatting suggestions
  • ChatGPT Code Interpreter: Generates Python code with matplotlib best practices baked in
  • Napkin AI: Fully automated design from text descriptions
CRITICAL GAP: Competitors democratized design expertise. Excel required users to LEARN design. We needed to embed expertise IN THE TOOL.
CRITICAL GAP: Competitors democratized design expertise. Excel required users to LEARN design. We needed to embed expertise IN THE TOOL.
 
 

🎯 As a User (Functional & Emotional JTBDs)

Primary Hero JTBD

"When I insert a chart, help me create a visualization that tells my story effectively — without needing to be a data viz expert."
This breaks down into three interconnected user jobs:
Job Category
User Statement
Pain Point It Solves
1. Chart Type Selection
"Help me figure out which chart best represents my data"
5-10 minute trial-and-error cycles; users try multiple types, delete, start over
2. Visual Design
"Make my chart look professional and presentation-ready"
Charts described as "boring," "old-fashioned," "embarrassing to present"
3. Best Practice Application
"Tell me what I don't know about good data visualization"
Users unaware of principles like "start Y-axis at zero" or "use direct labels"

Supporting JTBDs (from research)

JTBD
Description
Copilot Intent Share
Comparative Analysis
Compare values across categories, geographies, or periods to uncover insights
Part of 83% "Create Chart" intents
Presentation & Storytelling
Make complex information clear, engaging, persuasive in meetings/reports
9.6% of explicit intents
Trend Analysis
Visualize how metrics change over time to identify patterns
Primary use case for Line charts
Answering Business Questions
Create ad-hoc visuals to answer specific questions quickly
Core Excel workflow

The "Magic Wand" Quote (from User Research)

"If you could wave a magic wand, what would you change?"
Users wanted three things:
  1. Automatic chart creation — "Based on my specific goal and storytelling needs, help me tell my story"
  1. Automatic beautification — "Make my charts look beautiful without me having to figure it out"
  1. Natural language customization — "Let me ask for customizations in plain English"

💼 As a Business (Strategic JTBDs)

Primary Hero JTBD

"Increase chart adoption and retention to keep users within the M365 ecosystem for their data visualization needs — preventing defection to competitors."

Business Metrics We're Driving

Metric
Baseline Problem
Target Impact
Chart Kept Rate
~45% of charts deleted in same session
Push toward >70% retention
Chart Create MAU
Only 2% of MAU on web create charts
Increase top-of-funnel creation
Net Chart Creation
Inserts minus deletes was too low
Increase net positive
Data Viz NPS
Charting issues dragging down Excel NPS
Measurable improvement
Copilot Tried/Enabled
Design Recommendations as gateway
Lift adoption rate

Strategic Business JTBDs

Business Job
Why It Matters
How Design Recommendations Solves It
1. Compete Defense
Tableau, Power BI, ChatGPT Code Interpreter, Napkin AI democratizing design expertise
Embed expertise IN the tool; no learning curve required
2. Copilot Adoption
Only ~9% of Copilot users engaged with chart-related prompts
Proactive recommendations at insert = gateway to Copilot
3. User Retention
Users looking outside M365 for data viz needs
"Wow moment" on first chart = sticky behavior
4. Unlock Latent Demand
33% of commercial users want to create charts but don't
Remove friction to convert intent → action

The Business Funnel Problem (from telemetry)

400M Excel Users ↓ -8% (awareness) 368M aware of Charts ↓ -98% (friction!) ~8M actually creating charts ↓ -97.3% (customization pain) ~10M customizing charts
The massive drop from awareness → creation is where AI Design Recommendations lives. It attacks the -98% conversion gap.

🔗 The JTBD Interlock

User JTBD
Business JTBD
Feature Outcome
"Help me pick the right chart"
Reduce trial-and-error abandonment
Chart type recommendations with rationale
"Make it look professional"
Increase "kept" rate
One-click style optimization
"I don't know what good looks like"
Build user capability over time
Educational rationale with "Learn More"
"I need this fast"
Drive Copilot adoption
9 mins, 172 clicks saved per chart

 

My Role: Designing AI as Design Partner

  • End-to-end UX strategy for Copilot-powered design recommendations
  • AI prompt engineering collaboration — co-designed LLM prompts with ML team for chart analysis, gave examples of visually stunning data viz.
  • Recommendation interaction patterns — preview, apply, undo flows
  • RAG model training & tuning — defined data viz properties that inform chart type recommendations
  • Multi-recommendation handling — when LLM suggests 3-5 improvements, how to present without overwhelming
  • Trust-building mechanisms — explainability, rationale, learn more links
 

Design recommendations should feel like:

  • A helpful colleague, not a know-it-all boss
  • Suggestions, not mandates — users always have final say
  • Educational — explain WHY, don't just say WHAT
  • Confidence-building — help users become better designers over time
 
 

Design Process: From Data Shape to Design Intelligence

Phase 1: Define Recommendation Taxonomy

I mapped the universe of chart improvements into categories, prioritized by impact on user success:
Category
Example Recommendations
Priority
Chart Type
Switch to line chart for time series, use stacked bar for part-to-whole
P0
Data Configuration
Swap axes, sort by value, remove outliers from view
P0
Visual Styling
Add data labels, increase contrast, adjust legend position
P1
Best Practices
Start Y-axis at zero, remove gridlines for clarity, add title
P1
Accessibility
Improve color contrast, add alt text, use patterns not just colors
P2
 

Phase 2: Training & Tuning the RAG Model for Chart Recommendations

This was the technical heart of the project. Working closely with the ML team, I defined the data visualization properties that would inform the model's recommendations. The challenge: LLMs aren't inherently good at numerical reasoning, but chart recommendations require understanding data patterns.
The Data Shape Analysis Framework
I defined a taxonomy of data properties the model needed to analyze before making recommendations:
Data Property
Detection Method
Chart Type Signal
Field Types
Numeric, Date, Categorical, Boolean
Date fields → Line/Area charts; Categorical → Bar/Column
Cardinality
Count of unique values per column
Low cardinality (<7) → Pie; High cardinality → Bar
Value Ranges
Min, Max, Distribution analysis
Wide range + outliers → Scatter; Normalized → Stacked
Time Granularity
Daily, Weekly, Monthly, Quarterly, Yearly
Continuous time → Line; Discrete periods → Column
Part-to-Whole Signal
Values sum to 100% or total
Composition data → Pie/Donut or Stacked Bar
Comparison Intent
Multiple series, Actual vs Target patterns
Multi-series comparison → Clustered Bar or Combo
Insight Generation Architecture: Why We Chose Hybrid Approach
Early experiments revealed a critical learning: LLMs alone can't reliably analyze numerical data.
Criteria
Approach 1: Raw Data → LLM
Approach 2: Statistical Summary → LLM
Accuracy
Low — LLMs struggle with numerical reasoning
High — Pre-computed stats are reliable
Token Efficiency
Poor — Large tables exceed limits
Efficient — Summary fits within limits
Latency
Low — Single LLM call
Higher — Multiple steps required
Our chosen architecture: A multi-step pipeline where Python code computes statistical summaries (mean, median, outliers, trends), then passes this structured data to the LLM for recommendation generation. This hybrid approach delivered >95% accuracy in insight generation.
 
 

Phase 3: LLM Prompt Co-Design

Working with the ML team, I co-created prompts optimized for chart type selection. The key was encoding data visualization best practices into the prompt structure.
Prompt Architecture
"Given a chart with [data structure], current type [X], analyze if a better visualization exists.
Consider: 1) Data relationships, 2) Storytelling intent, 3) Visual clarity.
Return top 3 recommendations with executable chart config and brief rationale."
Chart Type Selection Logic (Encoded in Prompts)
I defined the decision tree that the model uses to recommend chart types:
User Intent / Data Shape
Recommended Chart
Why This Works
Time series with trend
Line Chart
Shows change over time; eye follows the trajectory
Categorical comparison
Clustered Bar/Column
Easy side-by-side comparison; clear value differences
Part-to-whole (<7 categories)
Pie/Donut Chart
Intuitive percentage representation; limited categories
Part-to-whole (>7 categories)
Stacked Bar/Area
Handles many categories; shows composition
Correlation/distribution
Scatter/Bubble Chart
Reveals relationships; shows outliers clearly
Actual vs. Target
Combo Chart
Different visual encoding for different data types
Prompt Iteration & Quality Tuning
Through iterative testing on 20+ sample datasets across industries (Telecom, Finance, Manufacturing, Retail), we tuned the prompts to:
  1. Maximize accuracy — >95% factually correct recommendations
  1. Improve analytical depth — avoid "obvious" suggestions, focus on insights
  1. Ensure clarity — plain language explanations users understand
  1. Add validation layer — constraint-based prompts + validation before showing users
 
Example prompt structure:
"Given a chart with [data structure], current type [X], analyze if a better visualization exists. Consider: 1) Data relationships, 2) Storytelling intent, 3) Visual clarity. Return top 3 recommendations with executable chart config and brief rationale."
 

Phase 4: Technical Architecture — Vega-Lite as Interchange Format

A critical design decision: How do we translate LLM recommendations into executable chart changes?
The Pipeline
  1. LLM generates Vega-Lite JSON (or Python code using Altair library)
  1. Conversion layer translates Vega-Lite → IChartSettings JSON (Excel's native format)
  1. Interactive preview renders in Copilot chat pane using Ivy Web
  1. On apply → Native Office Chart inserted on Excel grid
Why Vega-Lite? It's a declarative grammar for visualization that's both human-readable and machine-parseable. This allows the LLM to generate valid chart specifications without hallucinating invalid code.
Native Charts vs. AI-Generated Images
Key Design Decision: Generate NATIVE Excel charts, not PNG images. Competitors' AI-generated images look good but can't be tweaked. Our approach maintains editability, data-binding, and refreshability.
 

Phase 5: UX Pattern — The Recommendation Card

I designed a modular 'recommendation card' system that presents AI suggestions without overwhelming users:
Card Component
Purpose & Rationale
Visual Preview
Side-by-side before/after thumbnail — lets users see impact instantly
Plain Language Title
'Switch to line chart' not 'Recommendation 1' — clarity over abstraction
Rationale Text
'Line charts show trends over time more clearly' — builds understanding
One-Click Apply
Instant transformation — removes friction from adoption
Learn More Link
For users who want deeper understanding — educational value
Critical interaction detail: Recommendations are stateless — users can try one, undo, try another. No commitment required. This reduces anxiety and encourages exploration.
Phase 6: Entry Points & Triggers
Design recommendations needed multiple access points to meet users where they are:
Phase
Entry Point
Behavior
Crawl (MVP)
Auto-trigger on insert
Proactive suggestion after chart appears — highest user intent moment
Walk
On-chart button
For existing charts, contextual UI affordance (skittle)
Run
Chat prompt
'Make this chart better' in Copilot sidebar
Run
Selection detection
Prompts user to select data if none chosen — solves key pain point
 

Initial direction, design and concepts

 
 
 
 
 

Key Design Decisions & Trade-offs

Native Charts vs. AI-Generated Images
Choice: Generate NATIVE Excel charts, not PNG images
Why: Editable, data-bound, refreshable. Competitors' AI-generated images look good but can't be tweaked.
Top 1 vs. Top 3 Recommendations
Choice: Show 1-3 recommendations, prioritized by confidence
Why: Balance guidance with choice. 1 felt prescriptive, 5+ overwhelmed. 3 was sweet spot.
 
In-Pane Preview vs. Live Preview
Choice: Thumbnail preview in pane, NOT live chart manipulation on hover
Why: Live preview felt janky/overwhelming. Thumbnails gave control without distraction.
 
Explain Rationale vs. Just Apply
Choice: Always show WHY, not just WHAT to change
Why: Builds user understanding over time. Trust through transparency.
 
 

Impact & Results

 

16.5%

Chart usage among Copilot-enabled users
vs. 8.8% non-Copilot users

34.7

Avg. chart actions per Copilot user
High engagement signal
 
Metric
Before
After
Chart deletion rate (same session)
~40%
<10%
Users who kept recommended changes
N/A
>80%
Chart feature MAU growth (2.5 years)
Baseline
+180%
 
 
Technical Success Metrics (Internal Testing)
✅  100% execution success rate
✅  LLM-generated chart code worked every time in controlled tests
✅  9 mins, 172 clicks saved
✅  Measured against manual chart optimization workflow
✅  All common chart types supported
✅  Column, bar, line, scatter, pie, combo — full MVP coverage
User Testing Insights
👨🏽‍🦱 "This is like having a data viz expert sitting next to me." — Power User
👨🏻 "I learned more about charts from these suggestions than from tutorials." — Novice
👩🏼 "Finally! I don't have to guess if my chart is good." — Intermediate User
Key finding: Users didn't just apply recommendations — they LEARNED from them. Over time, they started making better initial choices.
Strategic Impact
  • Proved AI-native differentiation — Excel is only tool that combines native charts + AI recommendations + editability
  • Unlocked Copilot adoption — Design recommendations were #2 most-used Copilot feature after Insights
  • Elevated Excel's positioning — From 'spreadsheet tool' to 'intelligent design assistant'
  • Foundation for future — Opened door for AI assistance in formatting, tables, and beyond
 
 

Key Learnings: Designing for AI Collaboration

What Worked Exceptionally Well
  • Co-designing prompts with ML team — Designer + data scientist collaboration produced better outcomes than either alone
  • Explaining rationale — Educational approach built trust and improved user skill over time
  • Prioritizing native charts — Maintained editability vs. competitors' static AI outputs
  • Modular card pattern — Flexible system that scaled from 1 to N recommendations
Challenges & How We Solved Them
  • LLM sometimes suggested impractical changes — Constraint-based prompts, validation layer before showing to users
  • Preview thumbnails took too long to generate Cached common transformations, optimized rendering pipeline
  • Users wanted to compare multiple recommendations side-by-side Added comparison view in Walk phase (deferred from MVP)
 

The Bigger Lesson for AI Product Design

AI features succeed when they:
  • 1. Augment, don't automate — User stays in control, AI provides options
  • 2. Explain the 'why' — Black box AI breeds mistrust
  • 3. Preserve human creativity — Recommendations, not prescriptions
  • 4. Build expertise over time — Users learn patterns and become less dependent on AI
badge