I’m feeling lucky: can algorithms engineer serendipity?

(Originally posted at Nieman Lab July 16, 2014)

serendipity-cc

I’m feeling lucky: Can algorithms better engineer serendipity in research — or in journalism?

Some historical collections are aiming to enable serendipitous content discovery, peering beyond the current limitations of search to capture happy accidents.

Let’s say you have a research topic, and maybe even an angle. You dive in by reading the canonical classics, all of which seem to cite one other, and maybe some of the most recent debates. Now what? Or perhaps you’ve been studying the same topic for years and feel stuck. How can you find a fresh take on a stale debate?

By this point, you might have exhausted the help that discovery platforms like Google and Facebook can provide. Google will reveal the most-cited works (especially on the more specialized Google Scholar or Google News), and Facebook might yield the ones your friends or subject experts value — but there’s no easy way to break out of the networks that define these platforms. Libraries provide content-based discovery portals, which offer one way out, but they often give you too much to wade through, with clunky interfaces and varying levels of relevance.

These limitations are not exclusive to serious researchers. News consumers frequent the same platforms, and they are subsequently directed to the most cited, the most retweeted, and the most relevant keywords. Network-based, big data methods for sorting the wheat from the chaff carry promise, but they rely on their own assumptions about value (mostly based on what’s already popular or viral), and they risk boxing out hidden gems and chance encounters in the process. In other words, the filter bubble affects history scholars as much as casual news browsers — and scholars’ careers often depend on unearthing something rare and different.

As a result, some researchers in the humanities and library worlds are looking for possible paths out of the research bubble for historians and scholars. By looking towards existing browsing and searching habits in both physical and digital environments, they hope to help scholars never miss the information they need — a problem that carries great weight in the news world as well.

The goal, in effect, is to increase the role of serendipitous discovery in online research. Old-school types are nostalgic for the days of walking into the library stacks and seeing what books catch one’s eye; digital tools often have trouble enabling this sort of accidental discovery, where a user finds something valuable that they didn’t even know they wanted.

But serendipitous encounters don’t have to be analog; if anything, digital tools should be able to foster more serendipity, since they can effortlessly reorder categories, effectively rearranging stacks based on the researcher’s avenue of inquiry. But how would one engineer serendipity — and can we even call something serendipitous if it was engineered?

What is serendipitous?

Serendipity can be loosely defined as a chance encounter or an accidental discovery that leads to added insight or value. It seems random, but this definition goes beyond merely injecting randomness into an algorithm. One definition proposed by Gary Fine and James Deegan is “the unique and contingent mix of insight coupled with chance.” The “insight” part is crucial. Serendipity requires a user who is ready to make connections that aren’t obviously there — making it a particularly difficult problem for a computer.

In attempting to classify serendipity, Stephann Makri and Ann Blandford see three facets: how unexpected was the encounter or connection; how much insight did it require from the person making it; and how much value did it give them? Whether or not this works for every instance, it shows the variety of ways in which one can define an encounter as serendipitous — and how often a seemingly lucky event was in fact somewhat directed. Finding a fortuitous article on Facebook or making an important contact at a conference still require following the right person or attending the right conference.

Anabel Quan-Haase, a professor at the University of Western Ontario, has been researching the role of serendipity in the research process for humanities scholars. She sees the process of serendipitous discovery as a function of a researcher’s “prepared mind,” the first step for serendipity. In order for any accidental connection to occur, the user must be ready for it, which makes the timing of the encounter crucial. This may set back digital tools that are geared towards highly targeted search, where a user is already in the mindset of looking for a specific item.

Beyond a prepared mind, a user must notice the find, stop to review it, extract the information, and finally return to it for future use. In each of these steps, design and user experience play a crucial role — even beyond the initial question of engineering a random-but-relevant encounter.

Engineering serendipity

A user must be mentally prepared for an accidental insight, but it’s unlikely that they’re thinking “I’m feeling serendipitous today.” So standalone platforms that encourage random discoveries could limit the ways in which serendipity can integrate into our digital lives.

But Quan-Haase says that adding serendipity to targeted search would make little sense, given how long we have spent honing the search experience for specific results. Instead, perhaps, we could augment rather than replace search — something like a “serendipity widget,” for example, in the sidebar of a search interface.

Such a widget could display articles at random pertaining to the keyword — or perhaps it could target a little further. One could envision a system that looks at your past searches and attempts to blend them with your present one, or grabs exclusively from sources you don’t normally peruse. You might call this a separate facet for targeted discovery rather than a truly serendipitous encounter (again, there are levels of serendipity), but it could serve the goal of finding what you didn’t know you wanted.

Many users in Quan-Haase’s studies cite Twitter as a serendipitous platform. I for one have found many of my most useful sources while randomly browsing Twitter, sometimes after hours of fruitless searching in specialized databases. I know I don’t see every tweet by everyone I follow, but I also know that some of the most inspiring tweets or links won’t be found by simple heuristics like most-retweeted or most-favorited — so I often follow my firehose in hopes of a nugget of gold, and quite often I am not disappointed.

This might suggest that Twitter might be a more serendipitous platform than Facebook or Google, which emphasize more targeted customization and personalization. It — along with the Twitter API’s ease of use — also might explain why many organizations take advantage of Twitter to create whimsical bots that inject a bit of randomness into your feed.

For instance, the Digital Public Library of America’s DPLA Bot grabs a random noun and uses its API to share the first result it finds. Lamenting that “the API has no means of calling up totally random items,” the DPLA Bot aims to “infuse what we all love about libraries — serendipitous discovery — into the DPLA.” For now though, this random dive into digital stacks is not personalized, which means you could be in the wrong section of the library.

The British Library’s Mechanical Curator similarly posts random resources with no customization, but its special focus on images in the library’s 17th- to 19th-century collections gives it a lighter and more visual feel. More for curiosity seekers than serious researchers, the library suggests on its blog that “the pursuit of knowledge is not the point.”

The TroveNewsBot, built on the National Library of Australia’s 370 million resources, features more interactivity. Send the bot any text, and it will dig through the Trove API for a matching result:

It doesn’t stop there: adding #earliest gives the first result in their collection, #latest the most recent; you can also limit the query by year and location. Give the bot a URL and it will fetch the link’s keywords and query the API with them, allowing TroveNewsBot to “respond” to any article on the web. The bot strikes a nice balance between targeted search and random luck, although your luck starts to run out if your interests lie far from Trove’s collections (primarily, Australian newspapers published between 1803 and 1954). Regardless, it’s good fun, as exemplified by the TroveNewsBot’s guide to child rearing.

Designing for serendipity

Veering away from Twitter, one tool that seems to get serendipity right is Serendip-o-matic, a project of the One Week | One Tool initiative. Brian Croxall explains that due to the project’s one-week time frame, experimentation and play were baked into the development process, and emphasized at the outset over feature-complete engineering marvels. Rather than using language like “select” or “upload,” they suggest that you “grab some text.” When you hit “Make some magic!” the tool peruses digital collections from the DPLA, Trove, Europeana, and Flickr, returning a series of multimedia documents that hopefully broaden your horizons to the topic at hand.

As might be expected, some results are more serendipitous than others. It’s also hard to know why a certain image or document was selected, which could otherwise be helpful in directing future searches. All the same, Serendip-o-matic’s playful setup and language prime the user well for making accidental discoveries.

These tools (along with others, such as the EuropeanaBot) are primarily targeting digital humanists and historians who are in a rut, but they each have their own insights about what is serendipitous versus simply random. It is difficult to plan for unplanned discoveries, especially so for a computer. Events are only serendipitous in hindsight, consisting of varying levels of planning versus dumb luck. But it seems quite possible to design for serendipitous discoveries, and to help put a user in the mindset for it.

[relatedstory slug=”qa-tarleton-gillespie-says-algorithms-may-be-new-but-editorial-calculations-arent”]Imagine a “serendipity widget” in your Facebook or Twitter feed, or on the sidebar of a New York Times article. The number and variety of signals that could go into it are endless, and many would bring their own biases. All the same, it would at least offer another pathway into news that relies on different assumptions, adds a sense of playfulness, and reminds a user that there’s more than one way to slice content.

Injecting randomness and play into recommendation systems could be valuable in its own right, but it seems especially timely given the current moment’s intense focus on content personalization. We all want relevant information, but perhaps you want to see something that users unlike you liked, or something no one has ever stumbled across ever before. Controlled randomness could be one small way to push back on hyper-curation.

Photo by Bob Gaffney used under a Creative Commons license.