Software: RSS and CaRP

By mtrac · Jan 31, 2005 ·
  1. mtrac
    Introduction

    I run a community-oriented VBulletin BBS and VBAdvanced Portal. This article is based on my experience as both a provider and user of RSS feeds.

    What is RSS?

    RSS is an abbreviation for Really Simple Syndication. You can get the technical details at http://blogs.law.harvard.edu/tech/rss. Also see http://www.geckotribe.com/rss/about-rss.php. Rather than worrying about what it is, you should think about what it can do for you as an administrator:
    • You can distribute your board’s latest content to anyone with an RSS reader who wants it. That includes other areas of your board. Users don’t even have to visit, unless they see something that interests them. If you use VB, I recommend this hack: http://www.vbulletin.org/forum/showthread.php?t=69834.
    • You can get fresh, free content for your site from throughout the Internet. RSS feeds aren’t limited to news and geek sites; even the venerable Farmer’s Almanac has several. Some sites that normally require registration, such as the New York Times, give RSS users a free pass. But how to pull feeds? With CaRP.
    What is CaRP?

    A PHP-based Caching RSS Processor available at http://www.geckotribe.com/rss/carp. CaRP can take pretty much any RSS feed, including a non-compliant one, and generate HTML from it. Its power comes from its caching, formatting, and filtering capabilities. It is also fairly easy to use. Two lines of code will get you on your way.

    The GPL version is free, but you will probably need to purchase a more capable version as you get deeper into RSS. It’s not expensive. There is a feature comparison at http://www.geckotribe.com/rss/carp/features.php.

    Installing CaRP

    Download and follow the directions. The semi-automated installation didn’t work for me, and so required manually changing access permissions. For many of you this may be second nature, but the last time I had to worry about, let alone change, *NIX permissions was on a PDP-11/70 around 1983. I used my ISP’s utility. You’ll also need to understand your directory structure, which won’t look like your URL.

    CaRP’s author offers an installation service if you get in over your head. You might – I’m not the only one I know who initially had trouble. You can also try seeking assistance at http://www.voy.com/188105/.

    All installed! Now what?

    Two pieces of advice. First, the files you create need to end with .php, not .htm or .html; i.e., index.php. Second, don’t use an HTML editor, such as FrontPage, unless you can directly edit the code. Otherwise, the editor will automagically turn your code into a bunch of HTML special characters. “<?php” is not going to work. Windows Notepad serves my purposes. If you need something more elaborate, try http://www.thelinuxconsultancy.co.uk/phpeditors.

    Create the following file, call it carptest.php, and put it on your server. Note that “path” will generally be “home” and “to” is generally your user account. Also note *NIX servers are case-sensitive.

    PHP:
    <?php
    require_once '/path/to/carp/carp.php';
    CarpCacheShow('http://www.geckotribe.com/press/rss/pr.rss');
    ?>
    Try loading the file in your browser; i.e., http://www.yourdomain.com/carptest.php. Hopefully, you’ll get a list of Gecko Tribe press releases.

    [R]equire_once will always come before any other CaRP statements. CarpCacheShow does quite a bit behind the scenes. It retrieves an RSS feed from the source, caches it on the server, and then displays it. Successive calls to CarpCacheShow will retrieve the feed from the local cache, not Gecko Tribe, until the cache interval elapses. The default interval is 60 minutes, though like almost everything else it can be configured.

    Some examples

    When I first added weather to my site, I used the National Weather Service’s RSS feed. While timely, all it currently has is inclement and severe weather warnings. I later added a feed from WeatherClicks, and ultimately dropped the NWS. The following is stripped-down, but works.

    PHP:
    <?php
        
    require_once '/home/account/carp/carp.php';
        
    CarpConf('poweredby','');
        
    CarpConf('carperrors',0);
        
    CarpConf('cborder','image');
        
    CarpConf('filterout','title:ad');
        
    CarpConf('cacheinterval',60); 
        
    CarpCacheShow('http://www.weatherclicks.com/rss/07020''carpweathertest');
    ?>
    Note the numerous CarpConf statements. They override CaRP’s default settings

    poweredby: Removes the CaRP blurb at the end of each feed. CaRP’s author requests you not do this with the free version, but you can if you want.

    carperrors: Setting this to zero suppresses errors. I learned the hard way that feeds can be perfect 364 days a year and go haywire the 365th. This is the feed’s fault, not CaRP’s.

    cborder: Allows you to specify what information you want from the channel. CaRP has its own internal field names for title, link, url, image, date, author, and description. The only thing I want is the image. Note you’ll need either Koi or Evolution to display images. Important: you should familiarize yourself with the XML making up the feed. It’s not rocket science. Otherwise, you will grope in the dark for the feed’s contents, and what you want to use and remove.

    filterout: I’m removing some advertising from the feed. You’re on your own regarding the legality of filtering feeds, but in this example the feed is providing scraped U.S. Government data. Note I’ve left in their logo and a bunch of clickable links, so I don’t feel too badly. [F]ilterout is a GPL and KOI function. Evolution includes it, but also offers a plugin with vastly better filtering than the other versions.

    cacheinterval: This is superfluous since the default is 60 minutes, but it reminds me how often the feed is refreshed.

    Now it’s showtime, literally. CaRP grabs the feed, applies the configuation settings to it, stores it locally on the server in a file called carpweathertest, and displays it. I suggest you use a named cache file though it's not essential. Important: developing a script is usually an iterative process. You create your PHP file, run it, and see how it looks. While CaRP has a function to clear the cache, I suggest you go to the server and physically delete the cache file. If you don’t, CaRP will retrieve the original cache contents all day long. Well, for the cache interval, anyhow. Nothing is more frustrating than changing your code but getting the same results.

    As mentioned in the introduction, I use the VBAdvanced portal. I currently pull two feeds, in two separate modules. The base for both modules is the code at http://www.vbadvanced.com/forum/showthread.php?t=1011. The next example demonstrates Evolution, but only to overcome what is either a bug or an annoyance.

    PHP:
    <?php
        $feed
    ='Today\'s Advice';
        
    ob_start(); 
        require_once 
    '/home/account/carp/carp.php';
        
    CarpLoadPlugin('replacetext.php'); 
        
    CarpConf('carperrors',0);
        
    CarpConf('cachetime','6:00');
        
    CarpConf('cborder','');
        
    CarpConf('iorder','desc');
        
    CarpConf('poweredby','');
        
    CarpConf('descriptiontags','b|/b|i|/i|a|/a|p|/p');
        
    ReplaceTextConf(1,'desc',0,'<br />','');
        
    CarpCacheShow('http://www.almanac.com/rss/advice.xml','Advice');
        
    $string ob_get_contents();
        
    ob_end_clean(); 
        eval(
    '$home[$mods[\'modid\']][\'content\'] .= "'.fetch_template('adv_portal_fa_advice') .'";');
        
    CarpConfReset();
    ?>
    I’ll only describe the important changes from the previous example.

    ob_start(): This sends CaRP’s output to a buffer, after which it will go into a string.

    CarpLoadPlugin: this loads the replacetext plugin. Evolution comes with a number of plugins and offers extensibility for more.

    cachetime: This feed updates only once per day, I believe at midnight. I update the cache once per day, at 6:00AM. A minor note is that the cache only updates when the script runs. Don’t be surprised if your file time is 7:28, because CaRP itself isn’t going to go out and get it for you like a cron job. See http://www.geckotribe.com/rss/carp/docs/examples/refresh.php if you need to update on a schedule.

    descriptiontags: My understanding of this is that only the HTML tags specified, each separated with a pipe character, will display in the feed. So, I’ve specified the tags I want. Unfortunately, it lets the <br /> tag through. One of these days I’ll check with the author to see why that is, but I’m pretty sure it’s intentional.

    ReplaceTextConf: This is a powerful plugin that will replace both text and regular expressions with almost anything you want. Here, I’m using it to get rid of the <br /> at the end of the feed by replacing it with an empty string.

    ob_get_contents(): This puts CaRP’s output into a string of HTML that VBAdvanced then displays.

    CarpConfReset: This resets all of CaRP’s configuration options to defaults. If you pull more than one feed, and the settings differ at all, I suggest you include this.

    Is that all?

    No! I’ve barely scratched the surface, especially concerning formatting. Download it, play with it, and RTFM. If you need help, try the support group mentioned above.

    Where can I find feeds?
    • http://www.syndic8.com
    • http://www.thefeeddirectory.com
    • http://www.completerss.com
    • http://chordata.geckotribe.com
    • The Pluck RSS reader, available at http://www.pluck.com, comes with an excellent assortment of pre-configured feeds and also offers a feed of the day.
    • Almost every major news site has one, though in some cases you need to register. Look for references to XML or RSS.
    • Try asking the site owner, though you may not like the answer you get. I was hoping to get an XML feed of local home listings from MLS, since the listings are already available to the public on their site. Turns out they have one, but it’s only available to members.
    I imagine in the future that the major search engines will index feeds and squash the specialized ones. Presently, they index the smaller engines.

    Note that you won’t necessarily find a feed for everything. For instance, none of the state and local papers I’m familiar with have one on their site.

    Important: if your site offers a feed, you should submit the feed’s address wherever you can. I have mine as part of the site description. I also describe the feeds in messages and the FAQs, and have an [​IMG] icon that links to a FAQ.

    Conclusion

    I don’t know if RSS is the next big thing, but there is a tremendous amount of content that is yours for the taking, particularly if you run a personal or non-commercial site. You should avail yourself of it with CaRP.

    Share This Article

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.