Grokker is back, as Yahoo! front end

Grokker has resurfaced as a front-end for Yahoo! search results. Uses a Java plugin (why not Shockwave?) and devotes a sizeable chunk of the page to Overture ads. For some search use cases, the results are marginally more useful than Vivisimo‘s clusters, because instead of clusters being completely data-driven (as they appear to be on Vivisimo), Grokker seems to use a shallow taxonomy for presenting high-level categories. That makes at least the labels more recognizable, but the overall effect is still to obscure rather than illuminate the search space.

For example, search for campaign finance on Grokker, and you get the following top-level categories:

  • Campaign Reporting (disclosures by campaigns and oversight groups?)
  • Data (statistics? Does this include the above reporting?)
  • General (aha! Also known as “miscellaneous”)
  • Information (Includes subgroups “Search” and “Finance Database” — shouldn’t the latter have been in “Data”, above? Besides, are the other groups not information?)
  • Institute (hmm…)
  • Law (this turns out be a useful partition)
  • Public Policy (duh. I knew there was a public policy angle to this topic…)
  • U.S. Supreme Court (shouldn’t this be within “Law”?)
  • More… (… of what?)
  • Besides one or two promising (but sparsely populated) categories, these results consist mostly of vague (and overlapping) subsets that force the user into pruning the search tree without any good basis for his choices. The underlying cause of the weakness of this kind presentation is that the clustering system is superimposing an organization on a presumed topic with only a keyword query to go on. The organization is derived from keyword search results, which are notoriously short on precision.

    Because I dragged Vivisimo into this, here, for completeness, are the top-level categories from a search for campaign finance on Vivisimo :

  • Campaign Finance Reform
  • Money, Politics
  • Campaign Finance Law
  • Campaign Finance Reports
  • Secretary of State
  • Board
  • Presidential
  • Campaign Finance Bill
  • Bush
  • Cato, Representative
  • I said above that Grokker’s presumed shallow taxonomy makes its labels more recognizable (avoiding categories like “Board” in the Vivisimo results), but overall, I’d have to say that the absence of groups like “Information” and “General” in the Vivisimo results is actually refreshing. Unfortunately, each of the clusters only has between a dozen and twenty results, so again, given that all of this organization is based simply on a two-word query, pruning all 22 million Yahoo! Search Results down to fewer than twenty seems less than useful unless the results are selected in a partitioned in a very thoughtful way (and they usually aren’t in web-wide search).

    There’s a huge spectrum of information needs out there (writing term papers, researching a product, finding a fix to a computer issue), and I’m sure some of them are well-served by clustering approaches like Vivisimo’s and Grokker’s. But I find that for most of my search use cases, a well-ranked list of all matching pages still yields the most transparency and flexibility. When clustering of web-wide search results approaches what you’d expect from a human reference librarian, that’s when this approach will get interesting.