March 25, 2006...11:38 am

scrAPIs

Though no one has a concrete definition of “Web 2.0″, it’s at least partially about APIs — Application Programming Interfaces — which allow programmers to connect applications together in meaningful ways.

But some companies and applications may be slow to offer APIs. Thor Muller suggests a concept called “scrAPIs” or scraped APIs:

APIs are the gold standard for data access. They’re managed as critical pieces of a larger application, and it shows, with their usually reliable performance, solid documentation and adherence to best practices. But…there are real costs associated with developing, supporting and maintaining APIs. Most businesses, even Internet and software firms, have no plans to open up their APIs, and for good reason.

So we scrape. We build little collections of tortured code that splice and dice html, text files, PDFs and other documents to pull out the structured data that our apps need. We have no idea whether we’re the only ones in the world parsing for a particular set of data, or if there are hundreds of others duplicating the effort.

Scraping may not be cost effective for one-off projects — though some companies make scraping easier — so it makes sense to build open source, scraped APIs that everyone can share.

For example, the LDS Church offers private photo directories for every ward (congregation), but each person’s photo much be uploaded individually by the ward clerk, such a hassle that most wards don’t use the online photo directory.

On a Sunday afternoon a few weeks ago, I built a PHP script that would automatically upload all the photos for our ward. I resized the photos with ImageMagick, matched them to their names, and then gave the script my admin name and password, and it did the rest.

If I develop this script into a public API that any ward clerk can use, I will have built a “scrAPI”. “Scraping programmers” like me can maintain it and make sure it continues to interface properly with the Church website, and a much larger group of API programmers can think of creative ways to get photos in and out of their ward directories. Maybe some future digital camera with Internet connectivity will upload ward pictures directly to the ward website through an API. Even if the Church never offers an API, wards could hypothetically use this scraped API.

Sources:
ThorMuller.com
ProgrammableWeb.com

If you want to be notified the next time I write something, sign up for email alerts or subscribe to the RSS feed. Thanks for reading.

2 Comments

Leave a Reply