In the last couple of days Google has added Swahili to the list of languages supported by its Translate service. On one hand I’m very happy to see this addition as I think it has the potential to be a big step forward for development in east Africa. However, from first impressions the service still has a long way to go.

One of the main problems for Google is that Swahili is an agglutinative language – meaning that it puts morphemes (grammatical parts of words) together to form longer words. So it can be difficult for a machine to know where the morphemes (parts of the word carrying meaning) begin and end.

Here are some very simple examples that I tried putting into Google:

Swahili Morphemes English Google Translate
kupika ku-pika to cook cooking
ninapika ni-na-pika I am cooking I cooked
nilipika ni-li-pika I cooked I cooked
nitapika ni-ta-pika I will cook I cooked
sijapika si-ja-pika I have not cooked I cooked
apike a-pik-e let him cook apike
umepika u-me-pika you have cooked has cooked
tutapika tu-ta-pika we will cook we cooked
watakapopika wa-taka-po-pika when they will cook will kakopika
mlipokuwa mnapika m-li-po-kuwa m-na-pika when you (pl) were cooking as they were cooked
ikipikwa nasi i-ki-pik(w)a na-si if it is cooked by us it be boiled us
bado kidogo bado ki-dogo not quite yet still little

To be fair, from what I’ve seen the translations of single words isn’t bad at all. Where it falls down is in the grammar – translating Swahili past, present, future and negative-perfect tenses all to English past!

Going the other way, here a few English examples I tried:

English Swahili Google Translate English back-translation
many people watu wengi watu wengi many people
many trees miti mingi miti mingi many trees
many elephants tembo wengi wengi tembo many elephants [wrong word order]
many cars magari mengi wengi magari many cars [wrong word order and agreement]
I am cooking ninapika I am kupikia “I am” to cook with
I cooked nilipika mimi kupikwa I to be cooked
To be fair, from what I’ve seen the translations of single words isn’t bad at all. Kwa kweli, kutokana na yale ambayo nimeyaona, utafsiri wa maneno ya pekee siyo mbaya Kuwa na haki, kutokana na yale I’ve amemwona zote maneno ya wimbo sio mbaya wakati wote. In truth, coming from what “I’ve” he has seen all [wrong agreement and word order] words of song not bad all the time

At this point it looks to be a decent dictionary (although with nothing like the depth of the excellent Kamusi Project), and actually does ok with set phrases. However once you get past the set phrases that it knows it seems unable to understand the relatively simple grammar and come up with a meaningful translation.

This is obviously a work in progress, as the “Contribute a better translation” option shows. It would be interesting to know whether Google takes these user contributed translations and tries to work out how the grammars and structures of the languages compare, or whether it simply remembers the set translation in case anyone enters the exact same phrase again. The first would be fascinating to investigate, whereas I fear the second would be like trying to empty the ocean with a teaspoon.

  1. MattW says:

    Interesting. Little sad you made tables but nice to reap the benefit of! One minor thing though, is that ‘kupika’ can also be ‘cooking’ as well as ‘to cook’ so they are not necessarily wrong there!

    • Mark says:

      What do you have against tables Mr Wisbey? Yeah maybe I was a bit harsh on kupika… but it does highlight again the immense difficulties of machine translation between languages that have very different structures – in order to know which it was the machine would have to understand the context of the word in the sentence.

  2. Andy S says:

    Interesting post. A good test for any translation is to translate from one language into another, then back to the first, and see how closely it matches.

    And I like your tables, they make me feel all 2003. 🙂

    • Mark says:

      Yeah – maybe I should have done the two-way translation thing – I’m sure there would have been some interesting results!

      Glad the tables are providing entertainment to everyone. I wondered why WordPress didn’t have a table option and I had to manually enter the html… my web design skills are obviously stuck in 2003…! 🙂

  3. Beatrice says:

    Good to know that Swahili is attaining the same status as any other foreign language. The tables are unavoidable when teaching many concepts in Swahili. I agree that the context has to be considered because one word could have so many meanings; kupika is essentially to cook BUT when a possessive now like kwake/his, kwao/theirs comes after kupika(kupika kwake) the meaning changes to his cooking/their cooking.

  4. richard says:

    Kaka, asante sana kwa kunifungua macho kuhusiana na tafsiri ya tovuti ya google kwenda katika lugha ya kiswahili! ni jambo la maendelea sana kama ulivyo sema hapo awali. Lakini ni changamoto kwetu sisi Watanzania kuhakisha kwamba matumizi ya kiswahili ktk tovuti ya google yanaendelea na kuboreshwa zaidi.

  5. Chris says:

    I haven’t spoken any Swahili in almost ten years so when I learned about the Google facility, I thought I’d test my memory.

    It can only really be useful as a dictionary tool, because of the very reasons you outline.

    That said, it did get ‘Mungu ibariki Tanzania’ first time of asking…

  6. Lorna says:

    in kiswahili the tables are called jedwali very important and common in kiswahili school textbooks
    your explanation is wonderful keep it up

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>