Deprecated: Return type of Requests_Cookie_Jar::offsetExists($key) should either be compatible with ArrayAccess::offsetExists(mixed $offset): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Cookie/Jar.php on line 63

Deprecated: Return type of Requests_Cookie_Jar::offsetGet($key) should either be compatible with ArrayAccess::offsetGet(mixed $offset): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Cookie/Jar.php on line 73

Deprecated: Return type of Requests_Cookie_Jar::offsetSet($key, $value) should either be compatible with ArrayAccess::offsetSet(mixed $offset, mixed $value): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Cookie/Jar.php on line 89

Deprecated: Return type of Requests_Cookie_Jar::offsetUnset($key) should either be compatible with ArrayAccess::offsetUnset(mixed $offset): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Cookie/Jar.php on line 102

Deprecated: Return type of Requests_Cookie_Jar::getIterator() should either be compatible with IteratorAggregate::getIterator(): Traversable, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Cookie/Jar.php on line 111

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::offsetExists($key) should either be compatible with ArrayAccess::offsetExists(mixed $offset): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 40

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::offsetGet($key) should either be compatible with ArrayAccess::offsetGet(mixed $offset): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 51

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::offsetSet($key, $value) should either be compatible with ArrayAccess::offsetSet(mixed $offset, mixed $value): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 68

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::offsetUnset($key) should either be compatible with ArrayAccess::offsetUnset(mixed $offset): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 82

Deprecated: Return type of Requests_Utility_CaseInsensitiveDictionary::getIterator() should either be compatible with IteratorAggregate::getIterator(): Traversable, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in /home/liamcioa/public_html/wp-includes/Requests/Utility/CaseInsensitiveDictionary.php on line 91
Don’t let the bots win – Liam Andrew

media / technology // research / development

Don’t let the bots win

Posted April 23, 2016 by landrew

Presented with Daniel Craigmile at NICAR 2016.

Whether they’re built by data journalists, search engines, web archivists or malicious troublemakers, bots are substantial users of news websites. At The Texas Tribune, bots account for half of our site’s traffic, and they’re responsible for our search presence and our archival legacy as well as site attacks and performance problems. In this session, we dove into the world of bot users, and share some tips for identifying and managing these crawlers and scrapers, helping the “good” bots do their work, and keeping the “bad” ones from wreaking havoc.

We approached the topic through the lens of a mysterious site that was serving an exact mirror of The Texas Tribune for several weeks, wreaking havoc on our servers and analytics. In our effort to identify the source and method for this mystery site, we enlisted the aid of investigative reporters, business staff, and systems engineers. In the process we learned a great deal about bots, and how much their face has changed and their numbers have increased in recent years. We believe that closer tracking of bot users could lead to new stories and insights for data and tech reporting.

Slides:

Tipsheet:

Related