Research · Direction 2 — Language & form · repeatable run · 2 Apr 2026

Does Swahili wording change Kenyan sector visibility?

A language-variance study of how English and Swahili prompts can change Kenyan sector representation in AI answers.

Outcome

Swahili wording can change the answer state of Kenyan sectors by shifting category labels, county cues and business-form meanings. The lab treats language divergence as a visibility condition to record, not as a simple translation issue.

A prompt does not only ask a question. It carries the language habits of a market, and in Kenya those habits can move a business from named presence to blur.

A composite run began with a plain English prompt about farm-supply cooperatives in Nakuru. The answer named agricultural suppliers, mentioned inputs for smallholder farmers, and gave a tidy national summary. Then the team rewrote the question in Swahili, keeping the sector and county intention as close as possible. The answer changed its grip. Some formal supplier language weakened. Local trading words came forward. One cooperative-like example stayed visible, but its role was described less cleanly.

The lab did not treat that as a translation puzzle. It was a visibility event. The same sector, county and business form had travelled through two language paths and arrived in different shapes. One answer sounded like a market overview written for procurement. The other sounded closer to local service seeking, but with looser institutional boundaries. Neither version could be declared the “real” one without more runs.

Language changes the test object

A Kenyan sector is not always the same object in English and Swahili prompts. English business wording tends to carry formal categories: supplier, operator, fintech provider, professional services firm, cooperative, SACCO. Swahili wording may carry everyday search habits, service verbs, local category names and mixed registers. Sometimes it is a direct translation. Often it is a shift in how the question imagines the business.

Language divergence is a change in answer state caused by prompt language, because English and Swahili wording can activate different category labels, regions and business forms.

That working definition keeps the analysis grounded. The lab is not measuring which language is better. It is recording whether the answer state changes when the language changes. A business may be named in English and skipped in Swahili. A sector may be clear in Swahili and generic in English. A county cue may matter in one language and fade in another.

This matters for Kenyan visibility because English often holds the more formal public evidence. Websites, licence pages, company descriptions and service pages commonly lean English, especially in sectors that sell to institutions, tourists or investors. Swahili may be closer to how some buyers describe a need, but thinner in published business evidence. The answer engine is caught between the language of records and the language of use.

A model can therefore answer a Swahili prompt by translating the question inward, retrieving English-shaped evidence, then writing back in Swahili. When that happens, the answer may sound fluent while still carrying English category assumptions. The lab watches for that pressure point, though it does not claim to see the internal retrieval path. The observed clue is the answer state.

What changes when the wording changes

The first visible change is category naming. In English, a prompt about “farm-supply cooperatives” may produce answers that distinguish cooperatives, agrovet shops, suppliers and agribusiness firms. In Swahili, depending on the wording, the answer may move toward sellers of farm inputs, farmer groups or general agricultural support. The difference is subtle until a specific business form matters.

A second change is county handling. English prompts with county names often keep the place as a formal geographic filter. Swahili prompts may sometimes bring forward local phrasing, route language or community-level terms. That can help a business whose public identity is tied to local service. It can also blur the answer if the engine lacks enough Swahili evidence to connect the wording to a named enterprise.

A third change appears in buyer framing. English prompts about professional services may invite firm-style answers: agencies, consultants, providers. Swahili wording may ask more naturally about help, service access or where to find a certain kind of support. In a human market, those may be equivalent. In an AI answer, they may produce different candidate sets.

The lab also records when language changes the confidence of the answer. Some English answers over-formalise Kenyan sectors, turning mixed business realities into neat company lists. Some Swahili answers stay more general, offering category advice rather than naming businesses. That does not mean Swahili is weaker. It means the available evidence and the model’s learned phrasing may be less tightly joined for named business inclusion.

The roughest cases are mixed-language answers. A prompt in Swahili may return English business names, English category labels and Swahili explanatory sentences. That hybrid can be useful to a reader, but it complicates classification. Was the sector represented in Swahili, or was English evidence simply wrapped in Swahili? The lab marks the answer state first and leaves deeper causation for repeated comparison.

The four states across two languages

The lab applies the same anchor classification in both languages: named, skipped, blurred or displaced. The point is not to create separate English and Swahili rules. It is to see whether the same Kenyan sector changes state when the language path changes.

Named means a business, sector, county or business form is directly identified in a recognisable way. Skipped means it is absent even though the prompt made it relevant. Blurred means the answer compresses a specific business or business form into a generic label. Displaced means another reference occupies the space the tested object could reasonably have taken.

In a composite Nakuru farm-supply cooperative example, the English prompt may name a cooperative or at least a supplier category near the county. The Swahili prompt may blur it into sellers of farm inputs, or it may skip named organisations and answer with advice on where farmers usually buy supplies. A third run may displace the cooperative with a better-known Nairobi agribusiness because the model finds stronger English traces behind the Swahili question.

That movement is the study object. The lab is less interested in declaring one answer correct than in mapping the conditions under which the state changes. If a sector is named in English and blurred in Swahili across several comparable runs, that suggests a language-evidence gap. If Swahili names local forms more clearly in some categories, that is just as important. The benchmark frame should have room for both directions.

This is also where the lab watches for false equivalence. A literal translation can preserve dictionary meaning while changing search behaviour. “Tour operator,” “kampuni ya utalii,” and a more everyday phrase about arranging trips may invite different answer habits. A clean benchmark cannot assume they are interchangeable. It has to record the prompt wording with enough detail that another reader can reconstruct the test path.

Sector by sector, the language problem is uneven

Tourism often has stronger English evidence because it faces international visitors. In English prompts, named operators may appear more readily, especially around Nairobi, safari routes and well-known destinations. Swahili prompts can shift the frame toward domestic travel, local arrangements or county-specific phrasing. That can reveal useful differences, but it can also reduce named presence if the public evidence is mostly English.

Agriculture behaves differently. Swahili may better match how some farmers describe needs, inputs and local support. Yet formal records for cooperatives, suppliers and programmes may still be in English or mixed administrative language. The answer may understand the need but fail to attach it to a named local business. The result is a blurred answer that feels locally sensible but remains weak as business representation.

Fintech is another uneven field. English prompts often align with public product pages, investor-style descriptions and formal service categories. Swahili prompts may focus on use: sending money, receiving payments, borrowing, saving, paying suppliers. A model can answer those use cases without naming the same firms it would name in English. That changes the visibility frame from company presence to service explanation.

Professional services may suffer from over-translation. A legal, accounting, design or consulting query in Swahili can produce general advice instead of named providers, especially outside Nairobi. The lab does not read this as a failure of Swahili. It reads it as evidence that named provider traces may be thinner along that language path. The sector exists; the answer path cannot always hold it.

These sector differences stay descriptive. The lab does not claim full coverage of Kenyan language behaviour. It compares prompt sets, records answer states and looks for repeated divergence. Where the pattern holds, the material can support a benchmark frame. Where it appears once, it remains an observation.

Why divergence matters for public use

A county office, trade body or business owner may be tempted to test AI visibility only in English because English is the base site language for many Kenyan organisations. That would miss part of the market. It would also miss a particular kind of distortion: the business may look visible in formal English and vanish when the question sounds closer to local demand.

The reverse can happen too. A sector may be well described in Swahili as an activity, yet weakly represented through named enterprises. A reader searching for service access gets a useful answer. A researcher studying business visibility sees a gap. Those are different judgments, and the lab keeps them apart.

For public bodies, language divergence is especially important because policy and support work often moves between formal English documentation and local-language service realities. If AI answers can name the formal programme in English but blur the operating businesses in Swahili, then the benchmark has found more than a content problem. It has found a translation pressure point in the evidence infrastructure.

For businesses, the lesson is quieter. A bilingual presence is not only a matter of having two versions of the same slogan. The important question is whether the business remains the same answer object across both languages. Does the county stay attached? Does the service boundary survive? Does the business form remain visible? Does a cooperative stay a cooperative?

Limits of the language comparison

The lab cannot see exactly which language the model used internally to retrieve or compose an answer. A Swahili response may rely on English evidence. An English answer may be influenced by multilingual traces. The observable unit is the answer state, not the hidden route.

Translation itself can also introduce noise. Some English business terms do not have one stable Swahili equivalent across all contexts. Some Swahili prompts naturally use mixed vocabulary. A rigid translation protocol would look clean on paper and still fail to match how people ask questions. The lab therefore records the wording and classifies the answer, while staying cautious about broad language claims.

Another limit is evidence availability. Some Kenyan businesses operate strongly in spoken, social or mobile-first channels but leave limited text that an answer engine can reuse. If a Swahili prompt skips them, the cause may be language, public evidence, business form, reviews, county data or a blend of all of these. The material can identify divergence; it cannot always isolate the single reason.

The benchmark remains useful precisely because it is modest. It asks whether English and Swahili prompts produce the same answer state for a Kenyan sector. When they do not, the difference is recorded rather than smoothed over. That small discipline keeps language from being treated as a surface layer. In Kenyan AI visibility, language is part of the measurement.

← Are Kenyan businesses omitted or described wrongly? Does review scarcity make Kenyan businesses disappear? →