Most webloggers pay an inordinate amount of attention to their referer logs, the logs made by the web server that show the page from which a visitor linked to their site. A significant fraction of these inbound links come from Internet search engines such as google,yahoo, or excite, and the search terms people used to find your site can be as entertaining as they are informative. The German word zeitgeist means, literally, "time ghost," figuratively meaning the "spirit of the time," and search engine traffic can provide a peek at this.
Many ISPs and web hosting providers provide a dry and matter of fact view of referer logs in the form of a "stats page," but if you want to share a view of the inbound search traffic with your readers, pointing them at your stats page is probably not the best way to go about it.
If you have access to the raw log files for your site, and your web server is configured to capture referer information, then you can have a perl script that builds a web page showing the search terms from search engine referrals in a decorative way that visually indicates the number of times a term was used to find your site by altering the text size:
The first thing you need to do is make sure that your web server log file is recording referer information. There are two approaches to referer logs: combine them with the access log, or keep a separate log just for refererrals. If your web server is already using the Combined Log Format, your log entries will look similar to this:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
"http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
The
http://www.example.com/start.html
part is the referer. The Apache configuration for Combined Log Format is:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
CustomLog /var/log/http/access_log combined
If your web server is not already configured to use Combined Log Format, then you should consider making a separate referer log (because who knows what other hacks are already parsing your current log format!). To do that in Apache, add the following to your Apache configuration:
LogFormat "%h [%{Referer}i] : %U" reflog
CustomLog /var/log/httpd/referer_log reflog
(making adjustments to suit where your log files are kept). This will result in referer log entries that have the following format:
127.0.0.1 [http://www.example.com/start.html] : /apache_pb.gif
You may have to work with your ISP or web server admin to arrange these things. Referrals from search engines usually have the form
http://search.engine.com/search?q=this+is+what+I+seek&other=engineSpecficGarbage
The trick is to locate all of the referral URLs that come from search engines, parse out the search terms, keep track of how many there are and what they're pointing to.
That's where the
Zeitgeist.pm
perl module comes in. Once you know what format your referer logs are in, and where they are kept, and where you want your zeitgeist page to live, you can use
Zeitgeist.pm
to write a script that does all the work. If you have separate referer logs in the format shown above, then here is a simple script that will take
/var/log/httpd/referer_log
and build a zeitgeist page
/home/user/www/zeitgeist.html
:
#!/usr/bin/perl
use Zeitgeist.pm
my $reflog = '/var/log/httpd/referer_log';
my $zeitgeist = '/home/user/www/zeitgeist.html';
my $z = new Zeitgeist();
$z->readlogs( files => [$reflog] );
$z->toHTML("$zeitgeist");
You could run this periodically by hand, or make a cron job which runs it periodically to keep it up to date.
If your logs are in Combined Log Format, you will have to make a couple of adjustments
my $reflog = '/var/log/httpd/access_log';
my $z = new Zeitgeist( refpos => 10, targetpos => 6);
This indicates the position of the referer URL and the target URL in the log output, counting "words" separated by whitespace, starting at zero. There are other situations that the
Zeitgeist.pm
module can deal with. For example, say you rotate and compress your log files periodically, but you still want to include the search terms from the compressed logs. Also, you would prefer to have '+' signs separating the search terms in the HTML output (instead of the default · character). For this latter option, you would use the separator option to the
Zeitgeist::new
method:
my $z = new Zeitgeist( refpos => 10,
targetpos => 6,
zcat => '/usr/local/bin/zcat',
separator => '+');
$z->readlogs( files => [$reflog, "$reflog.1.gz", "$reflog.2.gz"] );
Zeitgeist.pm
automatically tries to decompress files ending in '.gz'. You may need to tell Zeitgeist where your gzip binary is located. You can also pass readlogs an open FileHandle object, if you need to do something more complex to get at your referer information.
$z->readlogs( handle => new FileHandle("/my/groovy/hack |") );
Since the output is pretty plain, you can sandwich it between header and footer HTML code by supplying
Zeitgeist::new
with the names of files to include at the top and bottom of the output:
my $z = new Zeitgeist( header => "/home/user/www/z_head.html",
footer => "/home/user/www/z_foot.html");
Alternatively, you could include the
zeitgeist.html
file into another page using a Server-side Include:
<html>
...
<!--#include file="zeitgeist.html"-->
...
</html>