In the last couple of days Google has added Swahili to the list of languages supported by its Translate service. On one hand I’m very happy to see this addition as I think it has the potential to be a big step forward for development in east Africa. However, from first impressions the service still has a long way to go.
One of the main problems for Google is that Swahili is an agglutinative language – meaning that it puts morphemes (grammatical parts of words) together to form longer words. So it can be difficult for a machine to know where the morphemes (parts of the word carrying meaning) begin and end.
Here are some very simple examples that I tried putting into Google:
|ninapika||ni-na-pika||I am cooking||I cooked|
|nilipika||ni-li-pika||I cooked||I cooked|
|nitapika||ni-ta-pika||I will cook||I cooked|
|sijapika||si-ja-pika||I have not cooked||I cooked|
|apike||a-pik-e||let him cook||apike|
|umepika||u-me-pika||you have cooked||has cooked|
|tutapika||tu-ta-pika||we will cook||we cooked|
|watakapopika||wa-taka-po-pika||when they will cook||will kakopika|
|mlipokuwa mnapika||m-li-po-kuwa m-na-pika||when you (pl) were cooking||as they were cooked|
|ikipikwa nasi||i-ki-pik(w)a na-si||if it is cooked by us||it be boiled us|
|bado kidogo||bado ki-dogo||not quite yet||still little|
To be fair, from what I’ve seen the translations of single words isn’t bad at all. Where it falls down is in the grammar – translating Swahili past, present, future and negative-perfect tenses all to English past!
Going the other way, here a few English examples I tried:
|English||Swahili||Google Translate||English back-translation|
|many people||watu wengi||watu wengi||many people|
|many trees||miti mingi||miti mingi||many trees|
|many elephants||tembo wengi||wengi tembo||many elephants [wrong word order]|
|many cars||magari mengi||wengi magari||many cars [wrong word order and agreement]|
|I am cooking||ninapika||I am kupikia||“I am” to cook with|
|I cooked||nilipika||mimi kupikwa||I to be cooked|
|To be fair, from what I’ve seen the translations of single words isn’t bad at all.||Kwa kweli, kutokana na yale ambayo nimeyaona, utafsiri wa maneno ya pekee siyo mbaya||Kuwa na haki, kutokana na yale I’ve amemwona zote maneno ya wimbo sio mbaya wakati wote.||In truth, coming from what “I’ve” he has seen all [wrong agreement and word order] words of song not bad all the time|
At this point it looks to be a decent dictionary (although with nothing like the depth of the excellent Kamusi Project), and actually does ok with set phrases. However once you get past the set phrases that it knows it seems unable to understand the relatively simple grammar and come up with a meaningful translation.
This is obviously a work in progress, as the “Contribute a better translation” option shows. It would be interesting to know whether Google takes these user contributed translations and tries to work out how the grammars and structures of the languages compare, or whether it simply remembers the set translation in case anyone enters the exact same phrase again. The first would be fascinating to investigate, whereas I fear the second would be like trying to empty the ocean with a teaspoon.