Skip to main content
    Strategy 11 min

    One Assistant, Many Sources: Why "Index Everything" Is the Wrong First Move

    Every enterprise knowledge project starts with the same sentence: we want one assistant that knows everything. The vendor who says yes first is the wrong vendor.

    Sachin Shah

    CRO, Certainly · June 28, 2026

    Editorial diagram of a federated enterprise knowledge assistant

    A scene from last quarter

    A CIO at a logistics firm I had been talking to for six months sent me a single line at midnight. "We just paused the project. The legal team found six documents in the index that nobody should have been able to see." The platform they had piloted, not ours, had done what every vendor's first slide promises. It had indexed everything. The permission model in SharePoint, Salesforce and the legal share had been flattened on the way in. Nobody noticed until a curious engineer asked the assistant a question they should not have been able to.

    Six months of work paused in a single afternoon. The failure was not the model. It was the architecture. And the architecture was chosen on the first slide, before anyone in legal or security was in the room.

    What the executive is actually asking for

    Strip the request down. "One assistant that knows everything" means three things at once. Employees stop hunting across five systems. Each employee only sees what they are allowed to see. The assistant tells the truth, with sources, and refuses when it does not know.

    Three constraints make this harder than the first slide pretends. Every source has a different permissions model. The data is rarely clean. And the most useful knowledge is the most sensitive. Deal notes. HR records. Contracts. Board materials. The failure mode of a bad implementation is a data leak. Not a hallucination. A leak.

    Once you frame the request that way, the answer rules out half the vendor pitches before they begin.

    The wrong pattern

    I have watched enterprises lose six months to the same architecture. Copy every document into a new vector index. Put a chat interface on top. Demo it to the board in four weeks. Pause the project in twelve.

    Three failure modes, every time.

    Permissions collapse. A new index has a single access model. Rebuilding the original permission tree across SharePoint, Salesforce, the wiki and the file share inside a fresh system is a multi-year project, and it is wrong the moment a permission changes upstream. Which they do, every day.

    Stale data. The index drifts the moment a document changes. Nightly re-indexing is expensive and still leaves an eight-hour window of wrong answers. Hourly is more expensive and still leaves one.

    Audit gap. When a regulator or your own legal team asks "why did the assistant tell this employee that," tracing the answer to the original source becomes forensic work. You can do it. You will not enjoy doing it.

    If a vendor's first move is "let us index everything," they are solving for their demo, not your permission model. Walk out.

    The right pattern

    Leave the data where it lives. Query each source at the moment the question is asked. Use the employee's identity, not a service account. Cite every answer. Log every retrieval.

    Four components.

    Per-source connectors. Each system has a small connector that knows how to authenticate as the requesting user, query, and return ranked snippets with citations. The connector respects whatever permissions the source already enforces. If the user cannot read the document in SharePoint, the connector cannot retrieve it. The assistant cannot see it either. The permissions live where they already lived.

    A routing layer. When the employee asks a question, a router decides which sources to query. "What is our parental leave policy" goes to HR. "What did we agree with Acme on pricing" goes to the CRM and the contract store. A modern router uses the question, the user's role and the conversation context to decide. Sources the user has no rights to are not queried at all.

    Grounded synthesis. The model receives the retrieved snippets and the question. It is constrained to answer only from the snippets, with citations. If retrieval returns nothing relevant, the assistant says so. This is the rule that turns hallucination into a tail risk.

    An audit trail. Every query, every source touched, every snippet retrieved, every answer returned. Logged with the user's identity and timestamp. Compliance can reconstruct any conversation in any quarter. The midnight email from the CIO never has to happen.

    The label for this is permission-aware federated retrieval. The label matters less than the property. Data stays in place. Identity travels with the query. Every answer is auditable.

    AI Readiness Score

    How ready is your team for AI?

    6 quick questions. Get a personalised score and action plan.

    Try the AI Readiness Score

    1000+ agents deployed worldwide · 4.8 on G2

    Where MCP earns its keep

    The Model Context Protocol has become the connective tissue for this architecture. Each source exposes a small MCP server. The assistant calls it at query time. Adding a new source is a config change, not a re-indexing project. The same MCP layer lets the assistant take actions under the user's identity. File a ticket. Update a CRM record. Schedule a meeting. The assistant stops being a search tool and starts being an operating tool.

    If your platform vendor cannot speak MCP, ask why. The protocols are moving faster than most roadmaps. Vendors that already support MCP ship integrations in days. Vendors that do not ship them in quarters.

    The governance layer

    Five controls that make this defensible to your security and legal teams. Print them, hand them to the vendor at the first meeting, and refuse to move forward without each.

    Identity passthrough. The assistant never queries a source with a service account that has elevated permissions. It always acts as the employee. No exceptions, including for convenience during the pilot.

    Source allowlist. Each use case has an explicit list of sources it can touch. HR cannot read sales contracts. Sales cannot read HR records. The assistant cannot reach further than the use case allows.

    Data class controls. Sensitive document classes (legal, board materials, personal data) require explicit opt-in from the document owner before they are reachable. Default is unreachable.

    Retention and logging. Conversation logs, retrieval logs and source logs each have their own retention rules. Encrypted, access-controlled, aligned to records management. Defensible to a regulator without rebuilding.

    Refusal patterns. The assistant is configured to refuse certain question shapes. Impersonation requests. Sweeping data exports. Anything that looks like an exfiltration attempt. The refusal is policy, not vibes.

    Most of these are policy choices. The platform's job is to make them enforceable and to make them visible to the team that audits them.

    The rollout pattern that actually works

    Start narrow. Prove value. Widen. I have never seen a successful enterprise knowledge assistant that did not follow this sequence.

    Quarter one. One use case, one source, one audience. Usually HR policy on the HR knowledge base. Small audience. Clear success metric, usually ticket deflection. Tight feedback loop.

    Quarter two. Two more use cases, three more sources. Sales enablement on the wiki and contract store. IT support on the ticketing system and IT knowledge base. Same architecture, more connectors.

    Quarter three and onward. New use cases ship as configuration. The platform team owns the architecture. Business teams own their use cases.

    Enterprises that try to launch ten use cases in quarter one almost always stall. Enterprises that launch one and let it work do not.

    Five non-negotiables

    If I were the executive sponsor, I would not move past first demo without all five.

    1. 1.Permission-aware retrieval with identity passthrough, demonstrated on your real sources.
    1. 1.Per-answer citations linking to the original source, not a re-indexed copy.
    1. 1.MCP support, or a credible plan for it inside a quarter.
    1. 1.A full audit log of every query, source touched and snippet retrieved.
    1. 1.A failure mode that says "I do not know," rather than improvising, when retrieval returns nothing.

    If a vendor cannot demonstrate all five live on your data, the production deployment will not behave the way the sales deck claims. The midnight email from the CIO is already in the pipeline. It just has not been sent yet.

    Case Studies

    See how teams deploy 1000+ agents worldwide

    Real results from Feastables, Fintiba, Quad Lock, and more.

    Try the Case Studies

    1000+ agents deployed worldwide · 4.8 on G2

    The test I would run before the next vendor meeting

    Hand the vendor one real document from each of your three most sensitive sources. Ask them to demonstrate retrieval under three different user identities, one of whom should not be able to see one of the documents. Watch what happens. If the unauthorised user sees anything, the conversation is over. If the authorised users see the right things with the right citations, you are looking at a real platform.

    If you want to run that test against ours, book a working session. We will use your sources, your identities, your documents. The strongest signal I can give you is what the audit log looks like at the end of the session. The second strongest is what the assistant says when retrieval returns nothing. Both are quieter than the demo most vendors are still running. They are also the only two answers your CIO needs.

    Frequently Asked Questions

    Can one assistant reach SharePoint, Salesforce and PDFs at once?

    Yes, and the right architecture leaves each source where it lives. Per-source connectors query in the user's identity. The assistant only surfaces what the user is already allowed to see. The wrong architecture copies everything into a new index and inherits a permissions problem it cannot fix.

    Does this require moving our data?

    It should not. If a vendor's first move is "let us index everything into our store," you are taking on a permissions and compliance project you did not need. Retrieval at query time, identity passthrough, data stays in place.

    How do we stop the assistant hallucinating internal facts?

    Three controls together. Every answer cites the source. The model is constrained to refuse when retrieval is empty. A continuous evaluation set catches drift. Hallucination on internal knowledge is an architecture problem, not a model problem.

    See how this works in practice.

    Book a demo
    enterprise knowledgeragsharepointsalesforceinternal aigovernance

    See Certainly in action.

    Book a demo and experience what agentic AI can do for your customer experience.