Skip to content
Kivuli Index Lab
Home Method Research Lab Contact
Research · Direction 2 — Language & form · comparison of models · 7 May 2026

Do engines agree on Kenyan sector pictures?

A comparison of how major answer engines describe the same Kenyan sectors, counties and business forms across repeated prompts.

Outcome

The lab finds that engine agreement is uneven: some sector pictures converge around familiar Kenyan categories, while names, counties and business forms often shift enough to change the practical meaning of an answer.

A Kenyan sector can look settled in one engine and oddly hollow in another. The lab studies that gap as answer behaviour, because disagreement across engines can expose which parts of the sector are well evidenced and which are only being guessed into shape.

In one comparison run, the lab asked several engines for examples of Kenyan businesses serving agricultural buyers outside Nairobi. One answer leaned toward agritech language. Another named broad supplier categories but few places. A third produced a neat paragraph about smallholder support while avoiding business names almost entirely. The prompt had not changed much. The sector picture had.

A tourism run showed the same problem from another angle. A composite coastal tour operator with a working site, reviews and licence references appeared as a relevant kind of business in one answer, disappeared in another, and was pushed behind Nairobi-based operators in a third. One engine mentioned the coast but gave the answer the bones of a Nairobi shortlist. The lab classified the states separately instead of calling the whole run inconsistent and moving on.

Agreement Is Not The Same As Accuracy

When engines disagree, the tempting conclusion is that one is right and another is wrong. Sometimes that is true. A named business may be misplaced, a county may be confused, or a sector may be described with stale language. More often, the lab sees a softer problem: each engine selects a different slice of the available evidence, then presents that slice as if it were the sector.

Engine agreement is the degree to which different AI systems produce the same answer state for a Kenyan sector, because their source paths and category assumptions lead them toward similar named, skipped, blurred or displaced results. The definition matters because agreement can be misleading. Two engines can agree on a narrow Nairobi-heavy picture and still underrepresent county-level operators. A single dissenting answer may be noisy, or it may be the only one noticing a missing business form.

The lab therefore treats agreement as an object of study, not a stamp of truth. ChatGPT, Gemini, Perplexity, Google AI Overviews and Copilot are observed as answer systems with different habits. Some answer in a broad explanatory voice. Some show citations or source-like trails more readily. Some lean into search-shaped summaries. The lab does not rank them here. It records whether they name, skip, blur or displace the same Kenyan businesses, counties and business forms under comparable prompts.

This approach slows the reading down. A quick user may ask three engines, see three different lists and shrug. The lab asks a narrower question: what changed in the answer state? Did a sector remain visible while business names changed? Did Nairobi stay constant while Mombasa, Kisumu or Nakuru moved in and out? Did informal enterprise vanish in all engines, or only in those that favoured formal websites? The disagreement begins to have edges.

How The Lab Compares Sector Pictures

A repeatable comparison begins with a sector prompt that can be reconstructed. The lab records the prompt type, sector, county or region, engine, language, answer date and classification logic. It does not expect identical wording. A model can describe the same sector in different prose and still produce the same answer state. The question is whether the underlying picture holds.

For example, in a professional-services prompt, one engine may name accounting and legal firms, another may discuss consulting categories, and a third may offer buyer advice. If all three skip county-level operators and treat Nairobi as the default commercial location, the lab marks a shared regional skew. If one names a county association while others ignore it, that becomes a divergence worth testing again. The comparison is less tidy than a table, but more honest.

The composite coastal tour operator from the research plan is useful here because it tests several pressures at once: region, licence wording, seasonality and the difference between working local evidence and stronger national visibility. An engine that names Nairobi operators while speaking generally about coastal tourism may not be hallucinating. It may be taking the easiest evidence path. That still matters to a coastal business trying to understand why it appears only as a category, not as a named option.

The composite Nakuru farm-supply cooperative works differently. It tests whether engines can hold a group enterprise in view without converting it into a conventional company or generic supplier. One engine may name a cooperative. Another may talk about agricultural input dealers. A third may name a Nairobi agribusiness serving the same market. The lab classifies those as named, blurred and displaced states, not as small wording differences.

Patterns Of Disagreement The Lab Watches

The first pattern is name churn. The sector remains recognisable, but the named examples change from engine to engine. In tourism, this may look like one answer naming operators, another naming platforms, and another avoiding names. In agriculture, it may swap cooperatives for agritech firms. Name churn is not automatically a flaw, but it warns the reader that the sector picture is not settled enough to support a strong conclusion from one answer.

The second pattern is county drift. A prompt asks about Kenyan businesses, perhaps even includes a non-Nairobi county, but the answer slides toward Nairobi examples or national phrasing. If every engine does this, the lab reads it as a likely benchmark pattern. If only one does, the run may reflect that engine’s source mix or prompt interpretation. The difference matters because county drift can hide behind fluent prose.

The third pattern is business-form compression. Cooperatives, SACCOs, jua kali enterprises and mobile-first sellers may appear as concepts while disappearing as operating entities. An answer can sound inclusive because it mentions these forms. Yet if it names only formal firms, the practical picture remains narrow. The lab has learned to mark this carefully, because readers often notice missing names only after they have trusted the category description.

The fourth pattern is language divergence. English prompts may produce a formal, investor-facing version of a sector. Swahili wording or mixed local phrasing may surface different category assumptions, though not always better named presence. In some runs the Swahili prompt makes the social shape of the sector clearer while losing specific business references. That is not a simple win or loss. It is a change in answer state.

What Agreement Can And Cannot Prove

If several engines name the same business in response to a sector prompt, the business has a stronger observed presence. That does not mean the lab treats the business as the best, largest or most representative. It means the business is easier for multiple systems to retrieve, summarise or repeat. The difference between visibility and market standing is one of the lab’s hardest lines.

If several engines skip the same county or enterprise form, the finding becomes more useful. A single omission can be bad luck, vague prompting or a temporary source issue. Repeated omission across engines suggests the public evidence path may be thin or poorly connected. The lab still avoids claiming cause too quickly. Maybe the evidence exists but is not phrased in extractable ways. Maybe the business form is locally legible and machine-awkward. Maybe the prompt asks for a category that engines interpret through urban examples.

Agreement on distortion is also possible. Several engines can blur a cooperative into a private supplier, describe mobile-first sellers as if they were ecommerce sites, or treat tourism operators as interchangeable booking pages. This kind of agreement can be more dangerous than disagreement because it reassures the reader. When the same compression appears in several places, it starts to feel like common knowledge.

The lab’s benchmark frame holds these cases apart. It records presence, omission, inaccuracy, regional skew, language divergence and business-form mismatch as separate weaknesses. A sector picture may agree on one dimension and diverge on another. That makes the method slower, but it prevents the answer from becoming a smooth surface with hidden cracks.

Reading Multiple Engines Without Turning It Into A Ranking

The lab does not use this work-item to declare which engine is best for Kenyan business research. That would be a thinner study and a more brittle one. Interfaces change. Source access changes. Answer style changes. A ranking written as if it will remain stable would age badly.

A better use of multi-engine comparison is diagnostic. If a Kenyan sector appears clearly in one engine and weakly in another, the question is what kind of evidence each engine seems to be following. Are named businesses supported by websites, directories, reviews, county pages or news mentions? Are informal enterprises present only when the prompt names them directly? Does a cited answer still compress the business form? The lab studies these behaviours because they help businesses and public bodies decide where evidence is thin.

For trade bodies, disagreement can be useful. If every engine gives a different picture of a sector, the sector may lack stable public descriptions that connect business names, counties, services and forms. A trade body cannot control answer engines, but it can publish clearer sector evidence. For county offices, repeated county omission may point to weak local data or pages that describe programs without naming the enterprise landscape.

For individual businesses, the lesson is narrower. A single favourable answer in one engine should not be treated as durable visibility. A single absence should not be treated as permanent failure. The lab encourages readers to compare answer states across engines before deciding what problem they are facing. Sometimes the issue is naming. Sometimes it is displacement. Sometimes the business is present, but with the wrong category attached.

Limits Of Cross-Engine Observation

The method has limits that the lab states plainly. Answer engines are unstable, and their interfaces do not expose every source path. A repeated prompt may produce different wording later. A model may use sources the reader cannot inspect. Some engines may summarise live web material while others rely on a different retrieval layer or memory-like patterns. The lab can record answer behaviour; it cannot fully reverse-engineer the system behind it.

The comparison also does not measure national representation. The lab’s samples are descriptive, built around sectors, counties, languages, business forms and evidence conditions. They are designed to reveal patterns, not to produce a single score for Kenya or a permanent league table for engines. Exact percentages would look crisp and mean less than they promise.

There is one more caution. Engine agreement can harden a weak picture. If several systems repeat the same Nairobi-heavy or formal-firm-heavy view, readers may assume the sector itself is built that way. The lab’s work pushes against that reflex. Agreement is a signal to inspect, not a place to stop.

The practical conclusion is plain enough: Kenyan sector pictures should be checked across engines when the decision matters. The point is not to average the answers. It is to see which businesses are named, which counties vanish, which forms are blurred and which references take space that others might reasonably occupy. That is where disagreement becomes evidence.

Contact

Follow the pattern from answer state to benchmark frame.

The index is built for readers who need evidence they can discuss, challenge and reuse.

Contact the lab