Economically, this market is finally starting to take shape — the ideas and attempts have been out there for a few years, but consumer companies have been on the fence about whether the blogosphere is worth listening in on. Until recently, that is. Umbria claims they’ll have $2M revenue this year and will be profitable next year, but the overall market for this kind of service is still only $20M according to the article (Intelliseek has about 1/3rd of that market).
Technologically, Umbria also sounds pretty interesting. They claim to have a competitive edge in automating most of the process:
Umbria’s solution is entirely software-based. [Umbria’s] competitors also meet with clients to interpret the data and suggest strategic responses. “Ultimately we rely on both technology and humans for analysis,” says Max Kalehoff, marketing director for BuzzMetrics [another Umbria competitor]. “Umbria takes an extremely automated approach.”
Umbria’s technology sounds like a pipeline of parsers that generates features that in turn drive product and sentiment classifiers (and those drive reporting):
Every few hours Umbria sends an application called a spider out over the web to scour the blogosphere for postings about the firm’s clients, most of which are big consumer companies, such as Electronic Arts, SAP, and Sprint. By analyzing keywords in blogs, Umbria can classify each citation thematically. In the case of Sprint, for example, Umbria’s software can tell whether a blogger is talking about customer service, the company’s advertisements, or a particular calling plan.
Another big challenge is to decipher what’s on a blogger’s mind. To figure out whether an opinion is strong or tepid, for example, it helps to know that “awesome” is a stronger endorsement than “pretty cool,” and that “shoddy” is less damning than “abominable.” Umbria has several employees with Ph.D.s in linguistics and artificial intelligence who are forever tweaking the software to make it better at categorizing opinions.
I can’t help thinking that more manual tweaking goes into each client’s setup than this description lets on, but still, I’m glad they’re seeing success, and I bet those linguists are having fun with the blogosphere, even if they have to do a bit of slumming to come up with their rules:
The software can also estimate the author’s age and gender. Elongated spellings (“soooooooo”), multiple exclamation marks (!!!), and acronyms such as POS (“parent over shoulder”) suggest a teenage female member of Generation Y (born after 1979). The blogger is probably a teenage boy if a posting is rife with hip-hop terminology such as “aight” (translation: “all right”) and “true dat” (“I agree!”).
There you have it, you don’t even have to know the language to have your voice heard by the people who want to sell you more stuff. Now that’s power. On one side of that function, at least.