Charlottetown Building Permits in RSS
This page documents an experimental script to scrape the City of Charlottetown website to create an RSS feed of weekly building permit summaries.
Download Source Code
You can browse the source code from a browser, or grab the latest version of the source code using Subversion at:
The script has the following dependencies:
- The XML_HTMLSax PEAR package. You can probably install this with
pear install XML_HTMLSax
- The sweet FeedCreator.class.php RSS creation class.
MySQL Table for Permit Summary Cache
You'll need a MySQL server where you can create a table to hold a cache of weekly summaries already discovered. In a database on that server you'll need to create a table with the the following structure:
CREATE TABLE `permitcache` ( `number` int(11) NOT NULL auto_increment, `dateadded` datetime NOT NULL default '0000-00-00 00:00:00', `url` text NOT NULL, `heading` varchar(50) NOT NULL default '', `filesize` int(11) NOT NULL default '0', UNIQUE KEY `numberdex` (`number`) )
You don't have to call the table permitcache -- you can set an arbitrary name for it in the script.
To get the script running, take the following steps:
- Create the MySQL table as above, and note the hostname, database name, table name, and authentication information for the table.
- In the permits2rss.php script, modify the user-configurable options:
- Set the MySQL server, database and table information.
- Change the URL of the web page containing the index of weekly permit summaries, if required.
- Change the variables used to store information to be embedded in the RSS feed, if required.
- Set the location for the RSS feed to be created; this should be a web-accessible file location if you want to make the RSS feed public.
If all goes according to plan, you should now be able to run the script as follows:
...and the result should be an RSS file in the location you specified.
To be useful, you should set the script up to run regularly -- perhaps once a day -- as a cron job, so that the RSS file is always current.
I've set up a Test RSS Feed. Examine the contents of this feed if you want to see the intended result of the script.
Bugs and To Do
The City of Charlottetown webserver appears to be somewhat flaky, and will sometimes return a HTTP/1.1 500 Internal Server Error, which causes the script to crash like this:
Warning: file_get_contents(http://www.city.charlottetown.pe.ca/residents/application_fees.cfm): failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error in /root/cron/buildingpermits/permits2rss.php on line 161
The code should probably be more graceful about handling this.