Charlottetown Building Permits in RSS

From rukapedia
Revision as of 10:38, 16 February 2006 by Peter (talk | contribs)
Jump to: navigation, search

This page documents an experimental script to scrape the City of Charlottetown website to create an RSS feed of weekly building permit summaries.

Download Source Code

You can browse the source code from a browser, or grab the latest version of the source code using Subversion at:

svn://svn.reinvented.net/CharlottetownRSS/BuildingPermits

or

http://svn.reinvented.net/CharlottetownRSS/BuildingPermits

Dependencies

The script has the following dependencies:

MySQL Table for Permit Summary Cache

You'll need a MySQL server where you can create a table to hold a cache of weekly summaries already discovered. In a database on that server you'll need to create a table with the the following structure:

CREATE TABLE `permitcache` (
  `number` int(11) NOT NULL auto_increment,
  `dateadded` datetime NOT NULL default '0000-00-00 00:00:00',
  `url` text NOT NULL,
  `heading` varchar(50) NOT NULL default '',
  `filesize` int(11) NOT NULL default '0',
  UNIQUE KEY `numberdex` (`number`)
) 

You don't have to call the table permitcache -- you can set an arbitrary name for it in the script.

Install

To get the script running, take the following steps:

  1. Create the MySQL table as above, and note the hostname, database name, table name, and authentication information for the table.
  2. In the permits2rss.php script, modify the user-configurable options:
    1. Set the MySQL server, database and table information.
    2. Change the URL of the web page containing the index of weekly permit summaries, if required.
    3. Change the variables used to store information to be embedded in the RSS feed, if required.
    4. Set the location for the RSS feed to be created; this should be a web-accessible file location if you want to make the RSS feed public.

If all goes according to plan, you should now be able to run the script as follows:

php ./permits2rss.php

...and the result should be an RSS file in the location you specified.

To be useful, you should set the script up to run regularly -- perhaps once a day -- as a cron job, so that the RSS file is always current.

Example

I've set up a Test RSS Feed. Examine the contents of this feed if you want to see the intended result of the script.

Bugs and To Do

The City of Charlottetown webserver appears to be somewhat flaky, and will sometimes return a HTTP/1.1 500 Internal Server Error, which causes the script to crash like this:

Warning: file_get_contents(http://www.city.charlottetown.pe.ca/residents/application_fees.cfm): failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error
 in /root/cron/buildingpermits/permits2rss.php on line 161

The code should probably be more graceful about handling this.