Reminiscing about Provo411.com and Scraping the Course Catalog

One of my first web development projects and biz partnerships with Brian Stucki was Provo411.com. We were roommates at BYU and conceived of a website where students could share events — parties, concerts, football games, etc. We were already in our beds for the night when the idea came, but we couldn’t go to sleep before buying the domain. I think it was the first domain I ever bought. It was September 2002.

I developed a calendar in PHP and wrote a few scripts to scrape byucougars.com and retrieve the sports schedules. I also developed a WML app so Brian and I could add events to the calendar from our pre-iPhone mobile phones. I recall being at a party in south Provo, in a former dental office, and using my Nextel phone to add the party to Provo411. If you go back far enough, you can see events on the calendar. My brother Alan did the artwork.

I always wanted Provo411.com to have a course schedule alert system. Perhaps students would pay $3 to receive an email or SMS alert when hard-to-get classes had an opening. It shouldn’t have been hard technically, but the publicly available course catalog isn’t updated in real-time. I could have scraped the authenticated course catalog on Route Y, but BYU might have objected and it’d be a fragile business model.

My brother Michael recently came home from his mission and started school at CSN. The business classes he wanted were full, so I put the old “course schedule alert” idea to the test with some new tools — Ruby and Mac OS X’s speech. Here’s what I came up with:

#!/usr/bin/env ruby

# a list of course call numbers to check
call_numbers = %w{ 46405 46407 46409 46411 46415 46413 53252 53254 53256 53258 53260 53262 53268 53270 53272 53274 46423 46435 53276 46443 }

# auth_token obtained via Firefox+TamperData while my brother logged into CSN
auth_token = "123456789012345"

say "Checking"

call_numbers.uniq.sort.each do |call_number|
    c = `curl -si -d CONVTOKEN=#{auth_token} -d AUDITT=N -d CALLT=#{call_number} -d CONTINUE=Continue "https://bighorn.nevada.edu/sis_csn/XSMBWEBM/SIVRE04.STR"`
    print "Call number #{call_number}: "
    if (c =~ /<p class="p5">([^< &#93;+)<br\/>/m)
        if $1.strip.empty?
            puts "May have openings\n"
            3.times {say "Michael, class number #{call_number} may be open!"}
        else
            puts "#{$1.strip}\n"
        end
    else
        puts "could not find message"
        say "Help. I cannot access the C S N website."
        return
    end
    sleep 5
end

# Ouput an audible message via Mac OS X's speech function
def say(message)
    `say "#{message}"`
end

We set this to run every 15 minutes on the living room iMac, and we turned up the volume. Every 15 minutes we could hear “Checking” from the computer. A few hours later we heard the script announce that a class had opened up. Michael, I’m still waiting for my $3.

9 replies on “Reminiscing about Provo411.com and Scraping the Course Catalog”

  1. i dont understand script, but i love this story. i remember when you told me about this when you used it at BYU–i thought “he should charge 3$ for this!”
      1. Hey, it seems a lot of people have had the same idea! I am the admin at schedule snatcher and we are thinking of moving up to BYU-I. I was wondering if any of you guys would like to help out with the code you have already been using. 😀
  2. Richard! this is Jesse. I knew you from enclave… i was friends with jamie woodward. I found your site googling something. anyway, crazy script. genius.
  3. Cool script. You might want to also check out the ‘mechanize’ gem. It acts as a browser in maintaining session cookies, setting the User Agent string to appear to be Safari/Firefox/IE, etc. It has CSS selector support for scraping information that you want from a page or submitting forms. It easily turns any website into a web service.


    Jimmy

  4. @D. Jamison: Awesome thoughts and comments. Thanks. I see it’s not a lone interest of mine.

    Your idea of giving people software they can run on their own computer seems on the money. Wesabe.com does this (with personal finances) – they offer a desktop app for logging into your bank account and pulling in your transactions. Your bank password stays safely on your machine. I imagine this should nearly eradicate concerns about liability, anonymizing traffic, randomizing requests, etc.

  5. I’m glad to see that others recognize the need for a third party to assist in course registration. This seems to be a common problem and I don’t expect the schools have any interest in addressing it. I had written something like what you have there for undergrad. I went a couple steps further: mine registered me for the class and sent me a text when I was in.

    I let a couple friends use my code but I otherwise forgot about the idea until law school. Something about those first-year grades really gives everyone a lot of anxiety. And we all wanted to have our results the instant they came out. I was even checking constantly on the weekends, knowing that they wouldn’t be released on a Saturday. I finally put together a website that automated the task. I had some friends help test it. It worked as advertised, sending alerts when new grades arrived.

    I realized, like you did, that the school may not appreciate being hammered by such a system. I came up with a couple solutions. The first is to add some intelligence to the checking. You can aggregate requests for a particular course (whether it be for grades or for a spot in registration) and then your requests become O(num. of courses) instead of O(num. of users) (depending on how the school’s system works, of course). I did some research and discovered that many schools use the same 3rd parties for their web/database systems. So expanding to several schools seemed like it would be easy: customizing, when necessary, for proprietary systems.

    But that only works if students want to be notified when grades are available – not for alerting the students to their individual grade. In order to do that, the student has to authenticate to the site. Sure, aggregation could reduce the polling for updates and then the individuals could be logged in to retrieve corresponding data (or register them). It just seemed to snowball into lots of traffic. More than that – we’d be requiring the credentials of the students in order to do their F5ing for them. There are privacy/security issues there and I’m sure it violates the policies of every school.

    The solution was to put the tools at the edge: let the users run the software. And this is where I abandoned the project – I didn’t care about my grades any longer. But the idea was this: give users some app that they can run on their machine. The credentials remain with the student, and the traffic hitting the school is now distributed. The traffic could even be made to look like human browsing (with a fake user-agent and random delays between requests).

    Anyway, I think the demand is there. Students who desperately need a class to graduate will pay for this and I think overzealous grade checkers will too. The money comes in through sales of apps that expire and use the right licenses. Maybe there’s a web-based solution that can be platform-independent, but I don’t know what it is.

Comments are closed.