Excel Charting : AI-Powered Chart Design Recommendations

Introduction

Creating charts is easy. Making them GOOD is hard. Users struggled with chart type selection, styling decisions, and best practices—resulting in suboptimal visualizations even when data was correct.

As Lead Designer for Copilot Chart Design Recommendations, I designed an LLM-powered system that analyzes data shape, infers user intent, and suggests optimal chart configurations. This required deep collaboration with ML engineers to train and tune the RAG models with data visualization principles—essentially encoding expert knowledge into AI prompts.

The result bridges the gap between 'chart exists' and 'chart communicates effectively,' democratizing data visualization expertise for 400M users.

Results Overview

100% Execution Success

9 mins User Effort Saved

172 Clicks Eliminated

FY26 H1 Shipping Timeline

The Problem: The Chart Design Expertise Gap

What Users Were Struggling With

"I created a chart but don't know if I picked the right type." — Intermediate User

"It looks boring. How do I make it presentation-ready?" — Enterprise Analyst

"I spend hours tweaking charts to look professional." — Power User

The pattern was clear: Users could INSERT charts (thanks to our P0 improvements), but they couldn't optimize them.

Pain Point	User Behavior	Business Impact
Chart Type Uncertainty	Try multiple types, delete, start over. 5-10 minute cycle.	Wasted time, user frustration, suboptimal final choices
Styling Paralysis	Don't know which formatting options matter. Either over-style or under-style.	Charts look unprofessional or cluttered
Best Practice Ignorance	Unaware of data viz principles (e.g., start axis at zero, use direct labels)	Misleading visuals, poor communication

Why Competitors Had the Advantage

Competitive analysis revealed sophisticated design assistance:

Tableau: 'Show Me' feature auto-recommends chart types based on field types

Power BI: Smart narrative and formatting suggestions

ChatGPT Code Interpreter: Generates Python code with matplotlib best practices baked in

Napkin AI: Fully automated design from text descriptions

CRITICAL GAP: Competitors democratized design expertise. Excel required users to LEARN design. We needed to embed expertise IN THE TOOL.

CRITICAL GAP: Competitors democratized design expertise. Excel required users to LEARN design. We needed to embed expertise IN THE TOOL.

🎯 As a User (Functional & Emotional JTBDs)

Primary Hero JTBD

"When I insert a chart, help me create a visualization that tells my story effectively — without needing to be a data viz expert."

This breaks down into three interconnected user jobs:

Job Category	User Statement	Pain Point It Solves
1. Chart Type Selection	"Help me figure out which chart best represents my data"	5-10 minute trial-and-error cycles; users try multiple types, delete, start over
2. Visual Design	"Make my chart look professional and presentation-ready"	Charts described as "boring," "old-fashioned," "embarrassing to present"
3. Best Practice Application	"Tell me what I don't know about good data visualization"	Users unaware of principles like "start Y-axis at zero" or "use direct labels"

Supporting JTBDs (from research)

JTBD	Description	Copilot Intent Share
Comparative Analysis	Compare values across categories, geographies, or periods to uncover insights	Part of 83% "Create Chart" intents
Presentation & Storytelling	Make complex information clear, engaging, persuasive in meetings/reports	9.6% of explicit intents
Trend Analysis	Visualize how metrics change over time to identify patterns	Primary use case for Line charts
Answering Business Questions	Create ad-hoc visuals to answer specific questions quickly	Core Excel workflow

The "Magic Wand" Quote (from User Research)

"If you could wave a magic wand, what would you change?"

Users wanted three things:

Automatic chart creation — "Based on my specific goal and storytelling needs, help me tell my story"

Automatic beautification — "Make my charts look beautiful without me having to figure it out"

Natural language customization — "Let me ask for customizations in plain English"

💼 As a Business (Strategic JTBDs)

Primary Hero JTBD

"Increase chart adoption and retention to keep users within the M365 ecosystem for their data visualization needs — preventing defection to competitors."

Business Metrics We're Driving

Metric	Baseline Problem	Target Impact
Chart Kept Rate	~45% of charts deleted in same session	Push toward >70% retention
Chart Create MAU	Only 2% of MAU on web create charts	Increase top-of-funnel creation
Net Chart Creation	Inserts minus deletes was too low	Increase net positive
Data Viz NPS	Charting issues dragging down Excel NPS	Measurable improvement
Copilot Tried/Enabled	Design Recommendations as gateway	Lift adoption rate

Strategic Business JTBDs

Business Job	Why It Matters	How Design Recommendations Solves It
1. Compete Defense	Tableau, Power BI, ChatGPT Code Interpreter, Napkin AI democratizing design expertise	Embed expertise IN the tool; no learning curve required
2. Copilot Adoption	Only ~9% of Copilot users engaged with chart-related prompts	Proactive recommendations at insert = gateway to Copilot
3. User Retention	Users looking outside M365 for data viz needs	"Wow moment" on first chart = sticky behavior
4. Unlock Latent Demand	33% of commercial users want to create charts but don't	Remove friction to convert intent → action

The Business Funnel Problem (from telemetry)

400M Excel Users
    ↓ -8% (awareness)
368M aware of Charts
    ↓ -98% (friction!)
~8M actually creating charts
    ↓ -97.3% (customization pain)
~10M customizing charts

The massive drop from awareness → creation is where AI Design Recommendations lives. It attacks the -98% conversion gap.

🔗 The JTBD Interlock

User JTBD	Business JTBD	Feature Outcome
"Help me pick the right chart"	Reduce trial-and-error abandonment	Chart type recommendations with rationale
"Make it look professional"	Increase "kept" rate	One-click style optimization
"I don't know what good looks like"	Build user capability over time	Educational rationale with "Learn More"
"I need this fast"	Drive Copilot adoption	9 mins, 172 clicks saved per chart

My Role: Designing AI as Design Partner

End-to-end UX strategy for Copilot-powered design recommendations

AI prompt engineering collaboration — co-designed LLM prompts with ML team for chart analysis, gave examples of visually stunning data viz.

Recommendation interaction patterns — preview, apply, undo flows

RAG model training & tuning — defined data viz properties that inform chart type recommendations

Multi-recommendation handling — when LLM suggests 3-5 improvements, how to present without overwhelming

Trust-building mechanisms — explainability, rationale, learn more links

Design recommendations should feel like:

A helpful colleague, not a know-it-all boss

Suggestions, not mandates — users always have final say

Educational — explain WHY, don't just say WHAT

Confidence-building — help users become better designers over time

Design Process: From Data Shape to Design Intelligence

Phase 1: Define Recommendation Taxonomy

I mapped the universe of chart improvements into categories, prioritized by impact on user success:

Category	Example Recommendations	Priority
Chart Type	Switch to line chart for time series, use stacked bar for part-to-whole	P0
Data Configuration	Swap axes, sort by value, remove outliers from view	P0
Visual Styling	Add data labels, increase contrast, adjust legend position	P1
Best Practices	Start Y-axis at zero, remove gridlines for clarity, add title	P1
Accessibility	Improve color contrast, add alt text, use patterns not just colors	P2

Phase 2: Training & Tuning the RAG Model for Chart Recommendations

This was the technical heart of the project. Working closely with the ML team, I defined the data visualization properties that would inform the model's recommendations. The challenge: LLMs aren't inherently good at numerical reasoning, but chart recommendations require understanding data patterns.

The Data Shape Analysis Framework

I defined a taxonomy of data properties the model needed to analyze before making recommendations:

Data Property	Detection Method	Chart Type Signal
Field Types	Numeric, Date, Categorical, Boolean	Date fields → Line/Area charts; Categorical → Bar/Column
Cardinality	Count of unique values per column	Low cardinality (<7) → Pie; High cardinality → Bar
Value Ranges	Min, Max, Distribution analysis	Wide range + outliers → Scatter; Normalized → Stacked
Time Granularity	Daily, Weekly, Monthly, Quarterly, Yearly	Continuous time → Line; Discrete periods → Column
Part-to-Whole Signal	Values sum to 100% or total	Composition data → Pie/Donut or Stacked Bar
Comparison Intent	Multiple series, Actual vs Target patterns	Multi-series comparison → Clustered Bar or Combo

Insight Generation Architecture: Why We Chose Hybrid Approach

Early experiments revealed a critical learning: LLMs alone can't reliably analyze numerical data.

Criteria	Approach 1: Raw Data → LLM	Approach 2: Statistical Summary → LLM ✓
Accuracy	Low — LLMs struggle with numerical reasoning	High — Pre-computed stats are reliable
Token Efficiency	Poor — Large tables exceed limits	Efficient — Summary fits within limits
Latency	Low — Single LLM call	Higher — Multiple steps required

Our chosen architecture: A multi-step pipeline where Python code computes statistical summaries (mean, median, outliers, trends), then passes this structured data to the LLM for recommendation generation. This hybrid approach delivered >95% accuracy in insight generation.

Phase 3: LLM Prompt Co-Design

Working with the ML team, I co-created prompts optimized for chart type selection. The key was encoding data visualization best practices into the prompt structure.

Prompt Architecture

"Given a chart with [data structure], current type [X], analyze if a better visualization exists.

Consider: 1) Data relationships, 2) Storytelling intent, 3) Visual clarity.

Return top 3 recommendations with executable chart config and brief rationale."

Chart Type Selection Logic (Encoded in Prompts)

I defined the decision tree that the model uses to recommend chart types:

User Intent / Data Shape	Recommended Chart	Why This Works
Time series with trend	Line Chart	Shows change over time; eye follows the trajectory
Categorical comparison	Clustered Bar/Column	Easy side-by-side comparison; clear value differences
Part-to-whole (<7 categories)	Pie/Donut Chart	Intuitive percentage representation; limited categories
Part-to-whole (>7 categories)	Stacked Bar/Area	Handles many categories; shows composition
Correlation/distribution	Scatter/Bubble Chart	Reveals relationships; shows outliers clearly
Actual vs. Target	Combo Chart	Different visual encoding for different data types

Prompt Iteration & Quality Tuning

Through iterative testing on 20+ sample datasets across industries (Telecom, Finance, Manufacturing, Retail), we tuned the prompts to:

Maximize accuracy — >95% factually correct recommendations

Improve analytical depth — avoid "obvious" suggestions, focus on insights

Ensure clarity — plain language explanations users understand

Add validation layer — constraint-based prompts + validation before showing users

Example prompt structure:

"Given a chart with [data structure], current type [X], analyze if a better visualization exists. Consider: 1) Data relationships, 2) Storytelling intent, 3) Visual clarity. Return top 3 recommendations with executable chart config and brief rationale."

Phase 4: Technical Architecture — Vega-Lite as Interchange Format

A critical design decision: How do we translate LLM recommendations into executable chart changes?

The Pipeline

LLM generates Vega-Lite JSON (or Python code using Altair library)

Conversion layer translates Vega-Lite → IChartSettings JSON (Excel's native format)

Interactive preview renders in Copilot chat pane using Ivy Web

On apply → Native Office Chart inserted on Excel grid

Why Vega-Lite? It's a declarative grammar for visualization that's both human-readable and machine-parseable. This allows the LLM to generate valid chart specifications without hallucinating invalid code.

Native Charts vs. AI-Generated Images

Key Design Decision: Generate NATIVE Excel charts, not PNG images. Competitors' AI-generated images look good but can't be tweaked. Our approach maintains editability, data-binding, and refreshability.

Phase 5: UX Pattern — The Recommendation Card

I designed a modular 'recommendation card' system that presents AI suggestions without overwhelming users:

Card Component	Purpose & Rationale
Visual Preview	Side-by-side before/after thumbnail — lets users see impact instantly
Plain Language Title	'Switch to line chart' not 'Recommendation 1' — clarity over abstraction
Rationale Text	'Line charts show trends over time more clearly' — builds understanding
One-Click Apply	Instant transformation — removes friction from adoption
Learn More Link	For users who want deeper understanding — educational value

Critical interaction detail: Recommendations are stateless — users can try one, undo, try another. No commitment required. This reduces anxiety and encourages exploration.

Phase 6: Entry Points & Triggers

Design recommendations needed multiple access points to meet users where they are:

Phase	Entry Point	Behavior
Crawl (MVP)	Auto-trigger on insert	Proactive suggestion after chart appears — highest user intent moment
Walk	On-chart button	For existing charts, contextual UI affordance (skittle)
Run	Chat prompt	'Make this chart better' in Copilot sidebar
Run	Selection detection	Prompts user to select data if none chosen — solves key pain point

Initial direction, design and concepts

Key Design Decisions & Trade-offs

Native Charts vs. AI-Generated Images

Choice: Generate NATIVE Excel charts, not PNG images

Why: Editable, data-bound, refreshable. Competitors' AI-generated images look good but can't be tweaked.

Top 1 vs. Top 3 Recommendations

Choice: Show 1-3 recommendations, prioritized by confidence

Why: Balance guidance with choice. 1 felt prescriptive, 5+ overwhelmed. 3 was sweet spot.

In-Pane Preview vs. Live Preview

Choice: Thumbnail preview in pane, NOT live chart manipulation on hover

Why: Live preview felt janky/overwhelming. Thumbnails gave control without distraction.

Explain Rationale vs. Just Apply

Choice: Always show WHY, not just WHAT to change

Why: Builds user understanding over time. Trust through transparency.

Impact & Results

16.5%

Chart usage among Copilot-enabled users

vs. 8.8% non-Copilot users

34.7

Avg. chart actions per Copilot user

High engagement signal

Metric	Before	After
Chart deletion rate (same session)	~40%	<10%
Users who kept recommended changes	N/A	>80%
Chart feature MAU growth (2.5 years)	Baseline	+180%

Technical Success Metrics (Internal Testing)

✅ 100% execution success rate

✅ LLM-generated chart code worked every time in controlled tests

✅ 9 mins, 172 clicks saved

✅ Measured against manual chart optimization workflow

✅ All common chart types supported

✅ Column, bar, line, scatter, pie, combo — full MVP coverage

User Testing Insights

👨🏽‍🦱 "This is like having a data viz expert sitting next to me." — Power User

👨🏻 "I learned more about charts from these suggestions than from tutorials." — Novice

👩🏼 "Finally! I don't have to guess if my chart is good." — Intermediate User

Key finding: Users didn't just apply recommendations — they LEARNED from them. Over time, they started making better initial choices.

Strategic Impact

Proved AI-native differentiation — Excel is only tool that combines native charts + AI recommendations + editability

Unlocked Copilot adoption — Design recommendations were #2 most-used Copilot feature after Insights

Elevated Excel's positioning — From 'spreadsheet tool' to 'intelligent design assistant'

Foundation for future — Opened door for AI assistance in formatting, tables, and beyond

Key Learnings: Designing for AI Collaboration

What Worked Exceptionally Well

Co-designing prompts with ML team — Designer + data scientist collaboration produced better outcomes than either alone

Explaining rationale — Educational approach built trust and improved user skill over time

Prioritizing native charts — Maintained editability vs. competitors' static AI outputs

Modular card pattern — Flexible system that scaled from 1 to N recommendations

Challenges & How We Solved Them

LLM sometimes suggested impractical changes — Constraint-based prompts, validation layer before showing to users

Preview thumbnails took too long to generate — Cached common transformations, optimized rendering pipeline

Users wanted to compare multiple recommendations side-by-side — Added comparison view in Walk phase (deferred from MVP)

The Bigger Lesson for AI Product Design

AI features succeed when they:

1. Augment, don't automate — User stays in control, AI provides options

2. Explain the 'why' — Black box AI breeds mistrust

3. Preserve human creativity — Recommendations, not prescriptions

4. Build expertise over time — Users learn patterns and become less dependent on AI