This is post is about how I have come to use the words “crowdsourced” and “community” to distinguish different, but related, activities. I’ve been working on Twitter’s community translation tools since before they were launched and this is a lesson I’ve learned during that time. This all started with my reply to a Quora topic and much of the information was already covered there. But since Quora is a smaller community than the web at large I wanted to re-format the information for widest consumption and change some of the examples to be a little bit clearer.
As I mentioned above I’ve been working on Twitter’s community translation tools from the beginning. I originally called it “crowdsourced translation” but over time I’ve come to call it “community translation” and appreciate the subtle difference between the two terms, or at least started using them to mean two slightly different, but similar, things. The explanation of that difference explains is a little interesting but it’s most interesting because it highlights the biggest benefits Twitter has seen from translating this way.
I see “crowdsourcing” as an activity that requires very little or no prior knowledge of the problem space at hand and which can benefit solely from a wealth of different opinions/views/activities. A great example of this is search engine relevance evaluations. At their most basic these evaluations ask a group of people to search for a term and mark different links on the results as on- or off-topic. In more complicated scenarios they ask the participants to re-order the links into the “best” order, or to compare two result pages side-by-side and pick which is best. In all of these scenarios the activity does not require that the person have any idea how search engines work. You’re not asking an expert in the field to evaluate something, you’re asking the largest, most diverse group possible to evaluate it so you can understand the problem.
In the case of Twitter’s Translation Center we rely heavily on our user’s prior knowledge of Twitter itself. While that might seem like a minor difference it matches exactly what Nicholas Muldoon said above about Atlassian in the original Quora topic, “translations provided by our Language Service Provider were not translated by people who used the product“. Twitter has also experimented with Language Service Providers (LSPs) and they do a great job at some types of tasks, while at others they don’t. The reason I now call it “community translation” is that we have taken an existing community of users and asked them to help based on prior knowledge. We did not take a group of people at random, but an existing community (bi-lingual Twitter users). We also didn’t take a collection of professional translators because while our task is translation our actual problem space is actually Twitter, and how it uses language.
One quick example to highlight the difference in community translation quality versus professional translators: Twitter uses the word “unfollow” to label the button that stops following another account. If you ask an LSP to translate this they will likely return a phrase such as “Nicht mehr folgen” in German. If you ask a translator (or linguist) directly they will probably point out that “unfollow” isn’t a word. Twitter’s language is very informal in English and community translation has helped us keep that playful tone in every language (even the ungrammatical “Entfolgen” in German). This might seem like a small thing but the consistency of product language with the product itself is really the key to translation quality. The localization industry has known this for ages and has made great strides in “source document quality” to that end (limited grammars, style guides, etc.).
As Alex Lane said in the original Quora topic nobody is crowdsourcing advertising, accounting or legal translation work (he might have meant the functions and not translations, but I took it to mean translation). Even Twitter does not do that because the language of those domains is formal. Our community of translators are an asset because they know our domain so well. They don’t know those more formal domains so they would do as poorly at those as an LSP does with things like “unfollow“.