Is Glean's Knowledge Graph actually a graph database?

Practically no, marketing yes. Glean's architecture talks describe an inverted index with a vector layer and permission tuples per document; the "graph" framing emphasizes the entity-and-relationship surface (people, documents, projects, channels) the product reasons over, not an underlying property-graph database like Neo4j. The product behavior is closer to a permissioned search index with entity-resolution on top than to a classical knowledge-graph product.

Does Microsoft Copilot have anything equivalent to Glean's data flywheel?

Not in the cross-tenant sense Glean does. Microsoft Copilot improves through tenant-local signal (your clicks, your dwell time) but does not share ranking signal across tenants by design. That is a regulatory choice as much as a product choice — cross-tenant signal sharing in M365 would be a major data-protection event. Glean's multi-tenant ranking model is the structural feature Microsoft's federated architecture cannot copy without crossing customer boundaries.

What happens to my data when I leave Glean?

The connector configuration is yours and exportable. The indexed content is reconstructable from your source systems (the source-of-truth never moved). The ranking signal — six-to-eighteen months of query telemetry, click-through, and tuning — is Glean's and does not export. That last item is the exit cost most procurement teams underestimate, and is the structural reason vendors with cross-tenant ranking models hold pricing power across renewal cycles.

Does Microsoft Graph throttling actually bite Copilot deployments?

Yes, at scale. Microsoft has tuned the Copilot orchestration layer to batch and cache Graph calls precisely because the published per-resource throttle thresholds [3] would otherwise be hit by mid-sized deployments. For a 5,000-seat Copilot rollout with mixed query patterns, the operational response is cache warming, query batching, and accepting that some long-tail queries take seconds longer than the demo. The throttle is rarely a deployment blocker; it is usually a tuning exercise the platform team owns.

Can I run Glean and Microsoft Copilot side by side?

Many enterprises do. The pattern is typically Microsoft Copilot for in-Teams and in-Outlook surfaces where M365 is the spine, and Glean for cross-SaaS search where the M365 estate is one of many. The two do not conflict architecturally — Glean indexes a permissioned mirror, Copilot federates over the tenant — but the seat-license math gets uncomfortable above 1,000 active seats, which is where the third-path MCP-federation option starts getting evaluated as a consolidation play.

Comparison · Enterprise AI Architecture

Glean Knowledge Graph vs Microsoft Graph: the architectural fork that decides enterprise search at scale

Glean doubled ARR to $200M on the back of a proprietary knowledge graph; Microsoft Copilot leans on the federated Microsoft Graph API. Which architecture wins which buyer? A deep-dive on graph staleness, write paths, lock-in risk, and the cost-per-query math both sides hide.

Chandler Benson

Contributing Writer · Customer Programs

Reviewed by Tommy Tao

11 min · Updated May 27, 2026

Editor's verdict

Glean owns its data flywheel. Microsoft rents you access to one. The choice is rarely about which architecture is better in the abstract — it is about whether you want a vendor to internalize the indexing problem (Glean) or whether you accept federation through APIs you already pay for as part of your M365 footprint (Microsoft). Glean wins on freshness, ranking quality, and cross-tenant connector breadth. Microsoft wins on the cost-per-query math when your data already sits inside the Graph, and on regulatory residency when your tenant is the only place the data should live. The third path — federated MCP across both, with no proprietary index in the middle — is where governance-first platforms compete on neither side's terms.

Scorecard

Category	Glean Knowledge Graph	Microsoft Graph API
Architectural posture	Proprietary indexed knowledge graph	Federated API over Microsoft tenant data
Freshness / write-path latency	Index refresh seconds-to-minutes per public talks	Live API; freshness equals source-of-truth
Cross-tenant connector breadth	100+ connectors, all SaaS estates	Microsoft estate native + 1,400+ Power Platform connectors
Rate-limit headroom	Internal capacity, less publicly documented	Graph throttling published per resource (10K/10min typical)
Lock-in risk	Proprietary index, vendor dependency	Microsoft stack dependency, mitigated by API openness
Cost-per-query elasticity	Amortized over $200M ARR; flat per-seat fee	API calls metered against tenant quota; compute lives in your tenant
Permission model fidelity	Permission-mirroring index, eventually consistent	Live ACL enforcement at source
Governance auditing	Per-query attribution, vendor audit trail	Purview unified audit log, native tenant signal

Build a knowledge graph, or federate one

Glean's bet was made in 2019 and is now compounding. Build a proprietary knowledge graph over the customer's SaaS estate. Index it. Re-rank against it. Sell the graph as the differentiator. Six years later that bet has produced a company that doubled ARR to roughly $200M in 2025 per Futurum Group coverage [1] and shipped a permissions-aware retrieval surface that competitors have spent two years trying to copy.

Microsoft's bet was made a decade earlier and is structurally different. The Microsoft Graph API, GA since 2015, exposes every entity inside a Microsoft 365 tenant — users, files, mail, calendar, chat, sites, devices — as a queryable surface [2]. Copilot did not build a parallel index. It calls the Graph live, scopes results by the caller's permissions, and threads the response back to the LLM. The compute lives in the tenant. The index is implicit.

These are not the same product strategy in different paint. They are different theories of where enterprise data should sit when an agent asks for it. The Glean theory says the customer's data is too scattered and too slow to query in place, so a permissioned mirror is the only way to make retrieval fast and ranking good. The Microsoft theory says the data already sits inside a tenant the customer pays Microsoft to host, so there is no reason to copy it — federation is the architectural primitive.

What Glean actually built

Per Glean's published architecture talks and Futurum's deep-dive coverage [1], the Glean Knowledge Graph is a permissioned, multi-tenant inverted index with a vector layer alongside, fed by 100+ connectors that pull from Salesforce, Slack, Confluence, Jira, ServiceNow, Box, Drive, GitHub, Notion, and the rest of the SaaS estate the customer authorizes. The index stores permission tuples per document — "this file is visible to these users at this moment" — and the connector refresh job updates both content and ACLs on a configurable cadence.

The flywheel is the part that matters. Every Glean customer adds query telemetry, click signals, and dwell-time data to a shared ranking model that no single customer could train on their own. The ranking gets better as the customer base grows, even before any individual customer's data changes. That is what "data flywheel" means in this category, and it is the structural reason Glean's per-seat price holds across cycles — the customer is buying the ranking model, not the index.

Trade-offs sit on the same shelf. The index is eventually consistent. The refresh window for a permission change in Salesforce is on the order of seconds-to-minutes per Glean's docs, which is fast for a permissioned mirror but not the live ACL enforcement you get when the source system answers the query directly. The vendor owns the ranking signal, so a customer who churns off Glean loses six-to-eighteen months of ranking-tuning the moment the contract ends.

What Microsoft Graph actually is (and what Copilot does with it)

Microsoft Graph is a REST API surface, not a search product. The endpoints expose Office 365 entities through a unified namespace: `/me/messages`, `/users/{id}/drive/root/children`, `/sites/{id}/lists`, plus search endpoints like `/search/query` that wrap the underlying SharePoint and Exchange indexing layers [2]. The compute that runs those endpoints lives in the customer tenant. The data does not leave the tenant. Permissions resolve against Entra ID at request time.

Microsoft 365 Copilot rides on top of that. A Copilot query becomes a sequence of Graph calls scoped by the caller's permissions, plus a call to Azure OpenAI to draft an answer grounded in the retrieved Graph results. The retrieval scaffolding (semantic kernel, the orchestration layer, the rerankers) is Microsoft's; the data plane is the customer's tenant. There is no proprietary index sitting between them, which is why "Copilot indexing" is a different shape of work than "Glean indexing." Copilot relies on the SharePoint search index and the Exchange index that the customer already pays for as part of M365.

The strength is structural. No data residency surprise. No vendor index to invalidate when an ACL changes. No license-tier mismatch between the search seat and the storage seat. The weakness is structural too. Throttling. The Graph API publishes per-resource rate limits — typical limits run 10,000 requests per 10 minutes per app per tenant for many endpoints, with backoff codes when you exceed them [3]. A Copilot rollout at 5,000 active seats hits those limits faster than anyone expects.

Freshness and write-path latency

The freshness question is where the two architectures genuinely differ. A Microsoft Graph query returns whatever the SharePoint index and Exchange index hold at request time. The SharePoint indexer typically processes updates within 15 minutes; the Exchange indexer is faster [2]. For a Copilot query that asks "what did Maria send me this morning," the lag is operationally invisible — Exchange answers in seconds.

Glean's freshness depends on the connector. The Salesforce connector ships near-real-time webhook support; the Confluence connector runs scheduled pulls; the long-tail connectors run batch refresh on cadences ranging from hourly to overnight. Glean's own engineering blog has been candid that the worst-case freshness window for a custom connector can be 24 hours, with the median in single-digit minutes for the top-tier connectors.

For permission changes specifically, the asymmetry inverts. Glean's permission-mirror refresh has a non-zero window between the moment an ACL changes in Salesforce and the moment that change reflects in the Glean index. During that window, a user who lost access in the source system can still surface the document through Glean. Microsoft Graph does not have that window — the ACL resolves against Entra ID at request time, so permission changes are effectively immediate. That is the cleanest single argument for the federated architecture in regulated environments.

Throttling and scale: where the cost-per-query lives

Microsoft Graph publishes its throttling thresholds. The numbers vary by endpoint, but the pattern is consistent: each application gets a per-tenant quota, breaches return HTTP 429 with a Retry-After header, and sustained breach triggers backoff windows of minutes [3]. For Copilot specifically, Microsoft has tuned the orchestration layer to batch and cache Graph calls — but at scale (5,000+ active Copilot seats sending mixed queries) you will hit the throttle. The architectural response is to pre-warm caches, batch reads, and accept that some queries take seconds longer than the demo suggested.

Glean's internal capacity is less publicly documented. The product does not publish per-tenant rate limits in the way Microsoft does, because the index is Glean's to scale. That is a product strength — the customer does not own the capacity-planning problem — and an opacity risk. If Glean has a bad quarter on infrastructure spend, the response is on Glean's side of the SLA. If Microsoft Graph has a throttling incident, the response is on the tenant's side of the SLA, with public dashboards and known mitigation patterns.

Cost-per-query math falls out of this asymmetry. Glean amortizes its indexing and serving cost across roughly $200M of ARR [1], which is how the per-seat price stays flat as the customer's query volume grows. Microsoft Graph's serving cost lives inside the customer's M365 license — the customer is already paying for SharePoint search and Exchange search as part of E3/E5. The marginal cost of a Copilot query is the Azure OpenAI inference call ($30/seat/month covers a metered allocation) [4], not the Graph read.

Lock-in: an eight-dimension matrix

Lock-in is the procurement word for "how expensive is exit." Both architectures lock the buyer in. The shapes of the lock-in differ enough that some procurement teams pick one over the other on this alone.

Lock-in posture on eight dimensions, public-source-only

Dimension	Glean Knowledge Graph	Microsoft Graph (via Copilot)
Data residency at rest	Glean cloud (multi-tenant SaaS); VPC option for enterprise	Customer M365 tenant; no copy created
Index portability on exit	Index is proprietary; ranking signal is not exportable	No vendor index to migrate; data stays in tenant
Identity binding	Glean SSO + SCIM; permission mirror layered on Entra/Okta	Entra ID native; no parallel identity surface
Connector portability	Glean-built connectors; custom connectors are tenant-owned	Power Platform connectors are Microsoft-owned; custom connectors via Connectors SDK
Ranking model portability	Cross-tenant ranking; signal does not export	SharePoint search ranking; tenant-owned indexing
Pricing escalator exposure	Per-seat with reported 7-12% renewal escalator	E3/E5 base + $30 Copilot add-on; July 2026 base rises 8% to $39/$60
Cloud dependency	AWS-hosted multi-tenant; private VPC for enterprise	Azure-hosted by definition; cross-cloud is not the design
MCP ecosystem fit	MCP support GA; closed-connector center of gravity	MCP GA in Copilot Studio May 2025; Graph remains the primary federation surface

Read that matrix once for what is symmetric — both vendors lock you in on identity, on connector investment, on pricing escalator timing. Read it again for what is not. Glean locks you into a multi-tenant SaaS where the proprietary index is the moat. Microsoft locks you into Azure, the M365 license stack, and the assumption that your data lives where Microsoft already keeps it. If your procurement guardrails forbid one of those postures, the architectural choice is decided before the eval starts.

Cost-per-query, with the math

Reduce both architectures to a unit cost and the comparison sharpens. A 1,500-seat enterprise running moderate query volume — say 40 queries per active seat per month, 600 active seats — generates 24,000 queries per month, or roughly 290K per year.

On Glean, the cost basis is the per-seat fee. Third-party reporting puts Glean's seat price near $45-50/user/month, with a Work AI add-on near $15/user/month that fires for agentic queries [5]. At 1,500 seats × $45 × 12 months that is $810K of base license. Divide by 290K queries and the unit cost is roughly $2.79 per query — but only at moderate volume. Push to 100 queries per active seat per month and the unit cost halves. That is the per-seat-pricing magic: heavy users subsidize light users, and Glean does not care because the price is flat.

On Microsoft 365 Copilot, the cost basis is the $30/seat/month Copilot add-on plus the E3 or E5 base the seat already carries [4]. Counting only the Copilot delta — the marginal cost of enabling Graph + Azure OpenAI orchestration — that is $30 × 1,500 × 12 = $540K. Same query volume gives roughly $1.86 per query. Lower than Glean. But the math hides the E3/E5 base ($648K-$1.03M depending on tier), without which Copilot does not function. Loaded all-in cost-per-query is in the $4-6 band, and the customer is paying for SharePoint search and Exchange search whether Copilot uses them or not.

Forrester's TEI report on Glean's work-AI platform [6] models a different scenario, with a composite organization that lands a three-year ROI of 304% — but the underlying per-query cost is not published. Procurement teams who want the math have to build it themselves, on their volume, with their seat mix. The headline number is rarely the decision input.

When each architecture wins

Architecture-level recommendations sit underneath product-level recommendations. The product comparison (Glean vs Copilot, feature-by-feature) we have covered in our Glean vs Microsoft Copilot piece. The architecture comparison is the question you answer first, because it narrows the product list.

Decision framework for the architectural fork

If your situation is…	Architectural starting point
Knowledge fragmented across 30+ SaaS tools, M365 is one of many	Indexed knowledge graph (Glean)
Workforce lives in Teams/Outlook/SharePoint, M365 is the spine	Federated graph (Microsoft Copilot)
Regulated industry, no data may leave tenant	Federated graph (Microsoft Copilot)
AWS-first stack, Microsoft is a minor footprint	Indexed knowledge graph (Glean) or third path
Query freshness on permission changes is a compliance requirement	Federated graph (Microsoft Copilot)
Cross-tenant analytics or multi-org search is the use case	Indexed knowledge graph (Glean)
MCP-first architecture, multi-LLM routing mandated	Third path (governance-first MCP platform)

A third path: federate without owning the index

Disclosure: Explore Agentic is published by ASCENDING, which builds Jarvis AI. The next paragraph is going to describe how a governance-first MCP platform sits between the two architectural bets above. Read it as that — an architectural option some buyers consider, not an attempt to relitigate the Glean-vs-Microsoft choice for everyone.

Both Glean and Microsoft assume a single primary graph. Glean builds one. Microsoft federates over the one already inside the customer's tenant. Neither posture composes well when the customer's data lives across AWS, Azure, and SaaS — three primary surfaces that are governed by three different identity systems. The MCP-first architectural option is to skip the proprietary index entirely and let an MCP gateway federate retrieval across whichever sources answer best per query. Jarvis Registry runs that pattern: connector calls go through MCP servers, the gateway logs every retrieval and tool call, and the LLM routing layer chooses between OpenAI, Anthropic, and Bedrock per query rather than per vendor agreement.

The trade-off is honest. You do not get Glean's cross-tenant ranking model — that flywheel only spins when one vendor sees query telemetry across many customers. You do not get Microsoft's tenant-native freshness — federation through MCP adds a connector hop that the Graph API does not. What you get is a control plane the procurement team can audit on its own terms, with MCP as the open protocol underneath and the LLM as a swappable backend. For organizations whose architectural risk is single-vendor lock-in more than retrieval quality, that trade is the reason the third path exists.

Frequently asked

Is Glean's Knowledge Graph actually a graph database?

Practically no, marketing yes. Glean's architecture talks describe an inverted index with a vector layer and permission tuples per document; the "graph" framing emphasizes the entity-and-relationship surface (people, documents, projects, channels) the product reasons over, not an underlying property-graph database like Neo4j. The product behavior is closer to a permissioned search index with entity-resolution on top than to a classical knowledge-graph product.
Does Microsoft Copilot have anything equivalent to Glean's data flywheel?

Not in the cross-tenant sense Glean does. Microsoft Copilot improves through tenant-local signal (your clicks, your dwell time) but does not share ranking signal across tenants by design. That is a regulatory choice as much as a product choice — cross-tenant signal sharing in M365 would be a major data-protection event. Glean's multi-tenant ranking model is the structural feature Microsoft's federated architecture cannot copy without crossing customer boundaries.
What happens to my data when I leave Glean?

The connector configuration is yours and exportable. The indexed content is reconstructable from your source systems (the source-of-truth never moved). The ranking signal — six-to-eighteen months of query telemetry, click-through, and tuning — is Glean's and does not export. That last item is the exit cost most procurement teams underestimate, and is the structural reason vendors with cross-tenant ranking models hold pricing power across renewal cycles.
Does Microsoft Graph throttling actually bite Copilot deployments?

Yes, at scale. Microsoft has tuned the Copilot orchestration layer to batch and cache Graph calls precisely because the published per-resource throttle thresholds [3] would otherwise be hit by mid-sized deployments. For a 5,000-seat Copilot rollout with mixed query patterns, the operational response is cache warming, query batching, and accepting that some long-tail queries take seconds longer than the demo. The throttle is rarely a deployment blocker; it is usually a tuning exercise the platform team owns.
Can I run Glean and Microsoft Copilot side by side?

Many enterprises do. The pattern is typically Microsoft Copilot for in-Teams and in-Outlook surfaces where M365 is the spine, and Glean for cross-SaaS search where the M365 estate is one of many. The two do not conflict architecturally — Glean indexes a permissioned mirror, Copilot federates over the tenant — but the seat-license math gets uncomfortable above 1,000 active seats, which is where the third-path MCP-federation option starts getting evaluated as a consolidation play.

References

Sources & citations

Each [n] above points here. URLs go to the publisher's canonical page. The access date is the day we last opened the link and confirmed the cited claim was still on the page. If a source has rotted, file a correction at /about#corrections.

[1]
Futurum Group . Glean Doubles ARR to $200M on Knowledge Graph Bet
https://futurumgroup.com/insights/glean-doubles-arr-to-200m-can-its-knowledge-graph-beat-copilot/ · accessed 2026-05-27

Q1 2026 Futurum coverage of Glean's $200M ARR milestone and knowledge-graph architecture explainer.
[2]
Microsoft Learn . Overview of Microsoft Graph
https://learn.microsoft.com/en-us/graph/overview · accessed 2026-05-27

Canonical Microsoft Graph API documentation; endpoint surface and indexing model.
[3]
Microsoft Learn . Microsoft Graph throttling guidance
https://learn.microsoft.com/en-us/graph/throttling · accessed 2026-05-27

Per-resource rate limit documentation; typical limits 10K requests per 10 minutes per app per tenant.
[4]
Microsoft . Microsoft 365 Copilot Plans and Pricing
https://www.microsoft.com/en-us/microsoft-365-copilot/pricing · accessed 2026-05-27

$30/user/month Copilot add-on; E3 base $36 rising to $39 on July 1, 2026; E5 base $57 rising to $60.
[5]
Workativ . Glean Pricing: Costs, Hidden Fees & TCO 2026
https://workativ.com/ai-agent/blog/glean-pricing · accessed 2026-05-27

Third-party Glean pricing analysis: ~$45-50/user/month base, $15/user/month Work AI add-on.
[6]
Forrester Research . The Total Economic Impact of Glean's Work AI Platform
https://tei.forrester.com/go/Glean/workAIplatform/ · accessed 2026-05-27

Forrester TEI commissioned by Glean; 304% three-year ROI on the modeled composite organization.
[7]
Microsoft Learn . SharePoint search architecture
https://learn.microsoft.com/en-us/sharepoint/search/search-architecture-overview · accessed 2026-05-27

SharePoint search index documentation; ~15 minute typical indexing latency for content updates.
[8]
AWS Marketplace . Jarvis: Simplifying AI Adoption (ASCENDING Inc.)
https://aws.amazon.com/marketplace/pp/prodview-ckf77lbx67sx2 · accessed 2026-05-27

Jarvis Registry listing on AWS Marketplace. Starter $1,500/month, Pro $2,500/month, Custom Enterprise.
[9]
Anthropic . Introducing the Model Context Protocol
https://www.anthropic.com/news/model-context-protocol · accessed 2026-05-27

November 2024 MCP launch; protocol underlying the federation-without-proprietary-index architectural option.

You may also want

Comparison

Glean pricing in 2026 and the cheaper paths to the same outcome

The pricing teardown underneath the architectural choice. Per-seat math, hidden fees, TCO bands.

Read

Comparison

Glean vs Microsoft Copilot

The product-layer comparison sitting on top of the architectural fork.

Read

Insight

Glean Skills adaptive reasoning: an audit

Companion audit of Glean's Skills product, which sits on top of the knowledge graph layer.

Read

Agent Registry

Agent orchestration layer

Where MCP-based federation sits architecturally relative to proprietary-graph and tenant-graph options.

Read