2023-11-08
Updated 08 Nov 2023 version 17
This is the help file for Super Webtrax version S24. updated
These Your feedback on the help file and the program is welcome.
SWT is open source and can be downloaded from https://github.com/thvv/swt
Super Webtrax (SWT) reads web server logs and produces an HTML page containing a daily web site usage report, covering the previous day's usage. The report has multiple report sections and many options.
Web servers, such as Apache and Nginx, write a log file entry every time they send a file to a user. Once a day, SWT loads a web server log into a MySQL database. SWT expands templates to produce HTML reports with graphs and tables.
I look at the report every day for
A visit to your site is a sequence of web page views from the same net address. If SWT hasn't seen this address before, it's a new visitor. Some visits are from humans using a web browser: these are "non-indexer (or NI) visits". The rest are from web indexers building indexes like Google and Yahoo, or web crawlers mining pages for advertising: these are "indexer visits".
The report web page is divided into report sections by headings in blue bands. Click the little control on the extreme right of the blue band to expand a report section into a more detailed version.
For most people, the Month Summary and the Visit Details sections are the most interesting.
SWT is a web server log file analysis program. It works best on logs that include the "referrer" and "browser" fields, such as the "NCSA Combined Format."
A typical Apache log file record looks like this:
207.46.13.81 - - [03/Jun/2021:00:01:14 -0400] "GET /mtbs/mtb757.html HTTP/1.1" 301 515 "-" "Mozilla/5.0"
Each log record has nine fields, separated by spaces. If a field might contain spaces, it is enclosed in quotes. The fields are
[IP Address] [-] [username] [timestamp] [request] [statuscode] [bytes] [referrer] [user_agent]
where [request] is [verb] [resource] [protocol]
SWT reads a web server log file, loads the log file into a MySQL database, and writes over 40 different graphical and tabular report sections summarizing visits and hits.
SWT output is extremely extensible and customizable.
SWT uses programs written in Perl and MySQL (both are free) and is therefore portable to many platforms.
Super Webtrax has been used since 2006 on a small number of sites. See the "future work" section below. I have used SWT on web sites that have a few dozen hits per day, and ones that had a million hits per day. I have extended SWT for specialized sites with custom reports that summarize logs generated by server-side applications, and reports that look for particular access patterns, such as a "funnel analysis" that analyzed users' progress through a transaction.
I have used SWT with web server log file extracts that cover a day's worth of accesses, from various ISPs. For example, Pair Networks places a daily log extract in the directory www_logs, named www.yyyyMMdd, if you configure this option.
I have used SWT on Unix and Linux server machines that generate log files covering many days, and occasionally roll over to a new log. For example, I have set up virtual servers on Rackspace, installed Apache and MySQL, and used a program, logextractor2 (supplied with SWT) to extract the previous day's log records into a temporary log file, and fed that file to SWT. I ran logextractor2 like this:
logextractor2 -day yesterday /var/log/www_log /var/log/www_log.0 > oneday.log
to handle the case where the log might roll over during a day and split usage into two files. Then I fed oneday.log to SWT.
I have used SWT to analyze traffic on a group of web server machines, extracting each server's previous-day log data and then modifying and merging the logs into a single stream of records, and producing one combined SWT report for all the servers.
SWT is oriented toward producing a single daily usage report and is not appropriate for real-time traffic monitoring. Sites with very large numbers of hits per day might want to create additional reports that summarize features of their usage.
SWT can only display information from the logs it's given; some information may not get into the server logs because of
Information that isn't in the HTTP protocol. There is no unique identification of the person viewing the file included in the protocol. What we have is the IP address used by the computer that requested the file. Assuming that this address corresponds to a single computer or a single "visitor" to the web site doesn't account for various IP sharing arrangements, proxies, multi-user computers, serial use of the same computer by many people, dialup pools, and many other possible confounding factors. Super Webtrax aggregates successive hits from the same IP address within a configurable period of time into a "visit."
Caching at the visitor's browser. An end user's browser may display a page or image to the screen without reading a file from the web, if it thinks has seen it before. In this case, the web server doesn't know it happened and doesn't write a log record.
Caching at a network proxy. An end user's web browser may request a file, and some intermediate server may answer the request. AOL, for example, used to cache pages, images, and applets somewhere between your web server and the end user. The web server saw far fewer hits than you might expect, and if you combined all the AOL hits together (as you might with the webtrax "pre_domain" mapping), then the resulting path through the site appeared to be one visitor who jumped from file to file in unexpected ways.
Other proxy behavior. For example, it used to be that if a visitor from Microsoft visited your site, the server logged a whole cloud of hits from multiple different IPs, but it was all the same visitor, or a mix of multiple visitors. There was no way to tell from the logs. Other VPN and cloaking proxies can similarly break the association between IP address in the log and actual visitor behavior.
Browser prefetching. An end user's browser may see links on a page and pre-fetch resources in case the end user requests them. This may cause hits to appear in the browser log even though the end user never views the resources; this behavior could also cause the time between pages to display as zero.
Site slurpers. A web log may contain many records with timestamps very close together from the same IP, generated by a program that reads website files and follows their links. Common programs such as wget and curl can cause such log sequences. Some web browsers "read ahead" and issue background requests for pages linked to by the one a visitor is viewing, in case links are followed. Web indexers find your site somehow, and then read every page they can find, by chasing all pages' links.
Misleading information sent to the web server. Web server logs contain two data fields copied from the visitor's request: referring URL and user agent. These fields are interesting enough to report, but can be spoofed by the requestor. Several browsers allow the user to specify what user agent to use in requests, in order to elicit desired web server responses. Web crawler programs routinely insert bogus data into these fields. (Malicious attackers can also try to insert attack code, so these fields must be handled with care.)
Web server behavior writing the log. Some web servers may discard log events in order to keep up at times of heavy load. If the disk partition where the log resides becomes full, the web server may keep serving files but skip writing the logs. Log entries may not be written in the order that requests were issued from the end user: I have seen cases on some web servers where the log entry for a graphic linked by a page has a timestamp before the page's entry.
Web client bugs. Some browsers, crawlers, and web apps send HTTP requests that are not in the standard form.
SWT ignores these problems. This is reasonable for web sites with light to medium activity, where a burst of accesses from the same IP address is usually the result of one user's actions. SWT does not use web cookies or JavaScript or Flash code to distinguish visitors.
It is up to the reader of SWT reports to interpret patterns of accesses in the report, for instance noticing that pages are requested faster than a human user could read or click, or the sequence of pages read does not follow from the link structure of the site. SWT has a few heuristics for marking some visits as "Indexer," for example if a session begins with a hit on robots.txt or has a user_agent of a known web crawler.
Webtrax was a Perl program originally written by John Callender on best.com about 1995, and substantially enhanced by Tom Van Vleck from 1996 to 2005. Many users contributed suggestions and features. Like any program that grew incrementally over ten years, Webtrax had its share of mistakes and problems. Its large number of options (over 80) and their inconsistent naming and interaction became an embarrassment. The non-modularity of the program made sense when it was little, but became a problem that inhibited further enhancements. Vestigial features that might or might not work littered the code. Perl was a wonderful tool for writing Webtrax, but it was used in a low-level way and its performance and memory consumption limited the size of log that could be processed.
Super Webtrax represents a second generation, begun in 2006. It loads the log data into a temporary MySQL database table and then generates report sections from queries against the database. The new version uses 7-10 times less CPU and substantially less memory (a report on 241,000 hits took less than 30 minutes to create). Because each report section is generated from one or more database queries, adding new report sections and debugging is easier. SWT's totals are more accurate and consistent, and data is more consistently sanitized against XSS attacks. Each report section is generated from a template using expandfile.
The downside of the new implementation is that users need to install more tools in order to run the program, and report developers need to know more (e.g. SQL) to enhance the output.
The programs have been tested on FreeBSD, Linux, and Mac OS X.
Super Webtrax has been used for years by its developer. Click the "info" link in the navigation bar for a list of bugs and changes.
The current version of Super Webtrax assumes a knowledgeable UNIX shell programmer is configuring it. There is no GUI based installer or configurator: The scripts configure and install install the system, and the user can re-run these to make some configuration changes, or edit SQL statements in a text file.
If the SQL database or the computer running SWT crashes during report generation, recovery requires a knowledgeable UNIX shell programmer to look at the partial results and take appropriate action to restart.
Further work could be done to optimize the SQL database structure and queries to improve performance and scalability.
Super Webtrax displays three kinds of information:
SWT loads web server log records into a database table so that they can be queried with SQL, and deletes the hits and detailed derived information when the next day's log is processed. The fundamental design assumption is that the user does not have enough storage to save every log record indefinitely. Instead, the program saves cumulative counts in SQL tables.
SWT performs the following steps:
SWT is driven by a shell script that invokes functions for each step. One function loads the database; another performs SQL queries for various totals and loads their result into environment variables. For report sections, the reporting function invokes the utility expandfile to expand a template. Each template fetches its SQL query from the database, and then invokes SQL and formats the results into HTML. Different templates use different environment variables; a single template can be used to create multiple report sections by changing the SQL tables containing the query and the variables used for labeling the output. For example, there is one template that produces a horizontal bar chart preceded by three columns of numbers. Setting appropriate parameters in the configuration can cause different columns of different tables in the database to be queried and displayed. The configuration variables for SWT are stored in SQL tables.
Super Webtrax produces its output in a series of report sections. Each report section can be enabled or disabled. By default, SWT produces its output in a web page, showing the heading for each report section and a brief summary. Clicking a control on the web page replaces the summary with the full content of the report section. SWT can optionally write additional output files for input to other programs. Below are all the report sections.
(report section numbers in the list below will vary depending on which report sections a site enables.)
Navigation links
These links to each report section are automatically generated.
Links to the release notes and to the current configuration ar also included.
Optional preamble copied from a text file. See the "preamble" global option.
Month Summary
Usage by day for the last month, and comparson of usage to yesterday, week ago, and month average.
Table showing date,
Visits, MB, file hits, HTML hits by day for last month.
Highest numbers in this listing are colored red, lowest blue.
Horizontal bar chart for each day striped with green for HTML files, blue for graphics and red for other files.
Table comparing today's usage to usage on the previous day, usage a week ago, and average usage for the last N days.
Figures are showin in gray if the change is small,
red (or blue if negative) if the change is large.
Short Report: comparison table only.
Pie Charts
Pie charts summarizing usage.
(These pie charts are produced by a Javascript program that uses the CANVAS tag.
If the program can tell that CANVAS is not supported, it will display a tabular representation instead.)
You can configure which charts are shown in the long and short reports. Each chart is identified by a three letter code: For example, the chart of All Visits by Browser is designated AVO. There are 96 possible charts, but a few are uninteresting, e.g. Indexer Hits by Class, since all hits from an indexer are in the class "indexer."
Short Report: The default short report has five charts, AVC: All Visits by Class, NVR: Non-Indexer Visits by Browser, NVP: Non-Indexer Visits by Platform, NVO: Non-Indexer Visits by Continent, ABF: All MB by File type.
Here are 10 example pie charts.
Pie charts are selected for inculsion in the short report view if their shortweight is nonzero, ordered by the value of shortweight. Pie charts are selected for inculsion in the long report view if their longweight is nonzero, ordered by the value of longweight.
tablecode | shortweight | longweight | title |
---|---|---|---|
ABC | 000 | 050 | MB by Class |
ABF | 060 | 180 | MB by Filetype |
ABO | 000 | 050 | MB by Continent |
ABP | 000 | 050 | MB by Platform |
ABR | 000 | 050 | MB by Browser |
ABS | 000 | 050 | MB by Source |
ABT | 000 | 150 | MB by Country |
ABY | 000 | 050 | MB by City |
AHC | 000 | 050 | Hits by Class |
AHF | 000 | 190 | Hits by Filetype |
AHO | 000 | 050 | Hits by Continent |
AHP | 000 | 050 | Hits by Platform |
AHR | 000 | 050 | Hits by Browser |
AHS | 000 | 050 | Hits by Source |
AHT | 000 | 170 | Hits by Country |
AHY | 000 | 050 | Hits by City |
AVC | 090 | 130 | Visits by Class |
AVF | 000 | 050 | Visits by Filetype |
AVO | 000 | 100 | Visits by Continent |
AVP | 000 | 050 | Visits by Platform |
AVR | 000 | 050 | Visits by Browser |
AVS | 000 | 140 | Visits by Source |
AVT | 000 | 160 | Visits by Country |
AVY | 000 | 050 | Visits by City |
AWC | 000 | 050 | Views by Class |
AWF | 000 | 000 | Views by Filetype |
AWO | 000 | 050 | Views by Continent |
AWP | 000 | 050 | Views by Platform |
AWR | 000 | 050 | Views by Browser |
AWS | 000 | 050 | Views by Source |
AWT | 000 | 050 | Views by Country |
AWY | 000 | 050 | Views by City |
IBC | 000 | 000 | Ix MB by Class |
IBF | 000 | 050 | Ix MB by Filetype |
IBO | 000 | 050 | Ix MB by Continent |
IBP | 000 | 000 | Ix MB by Platform |
IBR | 000 | 000 | Ix MB by Browser |
IBS | 000 | 000 | Ix MB by Source |
IBT | 000 | 050 | Ix MB by Country |
IBY | 000 | 050 | Ix MB by City |
IHC | 000 | 000 | Ix Hits by Class |
IHF | 000 | 050 | Ix Hits by Filetype |
IHO | 000 | 000 | Ix Hits by Continent |
IHP | 000 | 000 | Ix Hits by Platform |
IHR | 000 | 000 | Ix Hits by Browser |
IHS | 000 | 000 | Ix Hits by Source |
IHT | 000 | 050 | Ix Hits by Country |
IHY | 000 | 050 | Ix Hits by City |
IVC | 000 | 000 | Ix Visits by Class |
IVF | 000 | 050 | Ix Visits by Filetype |
IVO | 000 | 000 | Ix Visits by Continent |
IVP | 000 | 000 | Ix Visits by Platform |
IVR | 000 | 000 | Ix Visits by Browser |
IVS | 000 | 000 | Ix Visits by Source |
IVT | 000 | 050 | Ix Visits by Country |
IVY | 000 | 050 | Ix Visits by City |
IWC | 000 | 000 | Ix Views by Class |
IWF | 000 | 000 | Ix Views by Filetype |
IWO | 000 | 050 | Ix Views by Continent |
IWP | 000 | 000 | Ix Views by Platform |
IWR | 000 | 000 | Ix Views by Browser |
IWS | 000 | 000 | Ix Views by Source |
IWT | 000 | 050 | Ix Views by Country |
IWY | 000 | 050 | Ix Views by City |
NBC | 000 | 050 | NI MB by Class |
NBF | 000 | 080 | NI MB by Filetype |
NBO | 000 | 050 | NI MB by Continent |
NBP | 000 | 050 | NI MB by Platform |
NBR | 000 | 050 | NI MB by Browser |
NBS | 000 | 050 | NI MB by Source |
NBT | 000 | 051 | NI MB by Country |
NBY | 000 | 050 | NI MB by City |
NHC | 000 | 050 | NI Hits by Class |
NHF | 000 | 090 | NI Hits by Filetype |
NHO | 000 | 050 | NI Hits by Continent |
NHP | 000 | 050 | NI Hits by Platform |
NHR | 000 | 050 | NI Hits by Browser |
NHS | 000 | 050 | NI Hits by Source |
NHT | 000 | 070 | NI Hits by Country |
NHY | 000 | 050 | NI Hits by City |
NVC | 000 | 050 | NI Visits by Class |
NVF | 000 | 050 | NI Visits by Filetype |
NVO | 060 | 050 | NI Visits by Continent |
NVP | 070 | 110 | NI Visits by Platform |
NVR | 080 | 120 | NI Visits by Browser |
NVS | 000 | 050 | NI Visits by Source |
NVT | 000 | 060 | NI Visits by Country |
NVY | 000 | 050 | NI Visits by City |
NWC | 000 | 050 | NI Views by Class |
NWF | 000 | 000 | NI Views by Filetype |
NWO | 000 | 050 | NI Views by Continent |
NWP | 000 | 050 | NI Views by Platform |
NWR | 000 | 050 | NI Views by Browser |
NWS | 000 | 050 | NI Views by Source |
NWT | 000 | 050 | NI Views by Country |
NWY | 000 | 050 | NI Views by City |
To disable or enable a pie chart for the short or long views, add a line like the following to swt-user.sql:
UPDATE wtpiequeries WHERE tablecode = 'NVC' SET longweight = '000';
Analysis
Table summarizing usage totals.
Table:
Hits, visits and MB for:
HTML files,
Graphics,
visits with no html,
head pages,
searches,
links,
indexers,
authenticated visits
No short/long report.
HTML pages
Report of hits on HTML pages with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source.
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
Graphic files
Report of hits on graphic files with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source.
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
CSS files
Report of hits on css files with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
Flash files
Report of hits on flash files with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
Files Downloaded
Report of hits on binary download files with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
Sound files
Report of hits on sound files with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
XML files
Report of hits on XML files with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
Java Class files
Report of hits on Java Class files with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
Source files
Report of hits on source files with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
Other files
Report of hits on other files with horizontal bar chart striped by hit source.
Table: filename, KB, Hits
Horizontal bar chart of hits, striped by hit source
Short Report: total KB, total hits, most popular filename, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular filename has been on top.
Files not found
Report of attempts to access nonexistent files.
Summary of missing file pathnames showing number of hits.
Expected missing files, such as Microsoft discussion hits and Java class machinery requests are excluded, via the wtexptected404 table.
Short Report: total hits, most popular filename, hits.
Forbidden transactions
Table showing attempts to access files denied by .htaccess restriction.
Table showing forbidden transactions denied by .htaccess restriction (403 error code).
filename, KB, number of hits, referring domain.
Short Report: total files, total KB, total hits, most popular filename, KB, hits, referrer.
Illegal referrers
Report showing non-HTML files not referred by a local HTML file with horizontal bar chart.
Table showing illegal references, ie non-HTML files not referred by a local HTML file.
referring page (hotlinked), filename, KB, number of hits.
Horizontal bar chart of number of hits.
Short Report: total domains, total files, total KB, total hits.
Hits by access time
Vertical bar chart: hits by access time, striped by html/graphic/other.
Vertical bar chart: hits by access time, striped by html/graphic/other, with total hits and MB.
No short/long report.
NI Visits by duration
Report of non-indexer visits ordered by estimated duration with horizontal bar chart.
excluding indexers and RSS feeds,
showing visitor, hits, HTML hits, KB, duration of visit.
Horizontal bar chart of duration, colored according to visit class.
Short Report:total domains, total KB, total hits, longest visitor, hits, HTML hits, KB, duration.
NI Visits by number of hits
Report of non-indexer visits ordered by number of hits.
Excluding indexers and RSS feeds,
showing visitor (if unique), hits, KB, number of visits with that many hits.
New visitor names shown in color.
Horizontal bar chart of number of visits, striped by visit class.
Short Report: total hits, total KB, total NI visits; visitor name if unique, largest number of hits in a visit, KB.
NI Visits by number of page views
Report of non-indexer visits ordered by number of page views.
(ie page views), excluding indexers and RSS feeds,
showing visitor (if unique), HTML hits, KB, number of visits with that many hits.
New visitor names shown in color.
Horizontal bar chart of number of visits, striped by visit class.
Short Report:total hits, total KB, total NI visits; visitor name if unique, largest number of HTML hits in a visit, KB.
Visitors
Report of visits by domain with horizontal bar chart striped by visit class.
Table of visitor domain name, Visits, DSPV, KB, Hits.
New visitors are shown in color with blank DSPV.
Horizontal bar chart of number of hits, striped by visit class.
Short Report: same as long report but ignoring indexer visits.
Visits by Top Level Domain
Report of visits by toplevel domain with horizontal bar chart striped by visit class.
Table of toplevel domain name and explanation, Visits, KB, Hits.
Horizontal bar chart of number of hits, striped by visit class, for each toplevel domain.
Short Report: total TLDs, total visits, KB, hits; TLD with largest number of hits, KB, visits.
Visits by Second Level Domain
Report of visits by second level domain with horizontal bar chart striped by visit class.
Table: second level domain name, Visits, KB, Hits.
Horizontal bar chart of number of hits, striped by html/graphic/other, for each second level domain.
Short Report: total domains, total visits, KB, hits.
Visits by Third Level Domain
Report of visits by third level domain with horizontal bar chart striped by file type.
Table: third level domain name, Visits, KB, Hits.
Horizontal bar chart of number of hits, striped by html/graphic/other, for each third level domain.
Short Report: total domains, total visits, KB, hits.
Visits by City
Report of non-indexer visits by city with horizontal bar chart striped by visit class.
Table: Country code, continent, country name, city, Visits.
Horizontal bar chart of visits by city showing country, continent and visits, striped by visit class.
Short Report: total visits; Country code, continent, country name, city with most visits, visits.
Visits by Authenticated Users
Report of visits by username used to authenticate with horizontal bar chart striped by file type.
Table: authenticated user name, Visits, KB, Hits.
Horizontal bar chart of number of hits, striped by html/graphic/other, for each user ID.
Short Report: total domains, total visits, KB, hits.
Visits by Class
Report of hits by class, striped by source.
Table: visit class, Visits, KB, Hits.
Horizontal bar chart of hits by class, striped by hit source.
Short Report: total classes, total visits, total KB, total hits.
Hits by Browser
Report of hits by browser, striped by visit class.
Table: browser, type and platform, Visits, KB, Hits
Horizontal bar chart of hits, striped by visit class.
Short Report: browser count, total visits, total KB, total hits; most popular browser, visits, KB, hits.
Hits by Query
Report of hits by query with horizontal bar chart.
Table: query, KB, Hits.
Horizontal bar chart of hits.
Short Report: number of queries, total KB, total hits; query with largest number of hits, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular query has been on top.
Word Usage in Queries
Report of words in queries with horizontal bar chart.
Table: Words used in queries, number of uses today.
Words are strings of letters: punctuation and digits are ignored.
Horizontal bar chart of uses.
Short Report: number of words, total uses.
Visits by Search Engine
Report of hits by search engine with horizontal bar chart.
Horizontal bar chart of hits, striped blue for image search, green otherwise.
Short Report: number of engines, total KB, total hits; engine with largest number of hits, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular engine has been on top.
Files Crawled by Google
Report of files crawled by Google showing time.
Table: filename, KB, date for files crawled by googlebot.
(This report section is an example that shows how to generate a specific table.
I chose Google because most hits come from Google; similar reports could look for other crawlers instead or in addition.)
Short Report: filename, KB, date, for most recent file crawled.
Hits by Referrer
Report of hits by referrer with horizontal bar chart.
Table: referring URL, Visits, KB, Hits.
Referrers are hotlinked. Beware of following links, because they could be to a malicious website.
New referrers are shown in red.
Watched referrers are shown in a color specified in the preferences.
Horizontal bar chart of hits.
Short Report: number of referrers, total KB, total hits; referrer with largest number of hits, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular referrer has been on top.
Hits by Referring Domain
Report of hits by referring domain with horizontal bar chart.
Table: referring Domain, Visits, KB, Hits.
Watched domains are shown in a color specified in the preferences.
Horizontal bar chart of hits.
Short Report: number of referring domains, total KB, total hits; referring domain with largest number of hits, KB, hits.
The short report indicates the percent change of the totals since the previous day, and the number of days the most popular referring domain has been on top.
Number of Hits by file size
Report of hits by file size with horizontal bar chart.
Count of files by log10(size).
Table: file size bucket, KB, Hits.
Horizontal bar chart of hits.
Short Report: size bucket with largest number of hits, KB, hits.
Hits by Local Query
Report of hits by local query with horizontal bar chart.
Table: Filename, local query, Hits.
Horizontal bar chart of hits.
Short Report: total hits; local query with largest number of hits, hits.
Repeated Hits by Domain
Report of repeated hits by domain with horizontal bar chart.
Table: Excess loading of a file by a domain. Filename, domain, browser, Hits.
Horizontal bar chart of hits.
This report shows real hits only, from non-indexer visits. This may say that the user is not caching the file content, or is attacking the site.
Short Report: total hits.
Attacks on the site
Table showing various attacks on the site.
Table: Hits on selected CGIs with no HTML or graphics. Time, attacking domain, filename attacked, return code.
Table: Probes for known holes on the site. Time, attacking domain, filename attacked, return code.
Table: Shellshock attacks. Time, attacking domain, filename attacked, return code.
Table: log4j attacks. Time, attacking domain, filename attacked, return code.
Table: Excessive use of specified files. Attacking domain, filename attacked, return code, number of hits.
Short Report: total CGI, probe, shellshock, log4j, and overuse attacks, shown in red if nonzero.
Transactions by server return code
Table of transactions by return code.
Table: Transactions, KB, server return code, explanation.
Short Report: percentage of good transactions, in red if it is below the specified threshold.
Transactions by protocol verb
Table of transactions by protocol verb.
Table: Transactions/KB by protocol verb
Short Report: verb with largest number of transactions.
Visit Details
Listing of HTML files accessed in each visit by time.
An entry for each visit including
date and time,
domain ID (IP address or domain name),
content files viewed (e.g. HTML, PHP, and PDF but not graphics and machinery),
time between files,
query and referring URL,
total hits and bandwidth,
user agent,
visit class, and
authenticated user name.
Visits are selected for display by evaluating "criteria" for its side effect on the variable "print". If "print" is 1, the visit is displayed.
In April 2023, I added a few additional variables that can be used in the "criteria": 'ninvisit', 'htmlinvisit', 'graphicsinvisit', 'bytesinvisit', and 'duration'. Thus one can compute the average time between html pages for a visit, and suppress those visits that appear to be from site suckers.
The visit details report section also includes "events" from an optional event log table. Events from the event log are shown in time sequence in italics.
Visitors new today have their domain name shown in blue.
Authenticated sessions have the domain name shown with a yellow background.
The first time a referring URL appears, it is shown in red.
Search queries are shown in green.
Watched filenames are shown in a color specified in the preferences.
Watched referrers are shown in a color specified in the preferences.
Domains and browsers can also be watched: visits selected by these rules can be shown in the short report, summarized, or hidden.
Filenames are shown in a color specified in the preferences if the server return code has specific values,
e.g. missing files (code 404) are shown in light gray.
The long report provides a control that can hide or show visits with the visit class "indexer".
Short Report: Same report, using different criteria to select visits for printing.
The default criteria select visits containing a watched file or those from new referrers.
Cumulative Non-search Hits by Referrer Domain
Report of cumulative hits by referring domain with horizontal bar chart.
Table: KB, Hits by referring domain this year
Watched domains are shown in a color specified in the preferences.
Horizontal bar chart of hits. (The database actually has the referring URL, but usually it is more useful to have the domain only, e.g. digg.com)
Short Report: total referrers, MB, hits; referring domain with largest number of hits, MB, hits.
Cumulative Hits by Query
Report of cumulative hits by query with horizontal bar chart.
Table: KB, Hits by query this year
Horizontal bar chart of hits, striped by HTML/Graphic/Other.
Short Report: total queries, MB, hits; query with largest number of hits, MB, hits.
Cumulative Query Word Usage
Report of cumulative words in queries with horizontal bar chart.
Table: Words used in queries, number of uses cumulative.
Horizontal bar chart of uses.
Short Report: number of words, total uses.
Cumulative hits by visitor
Report of cumulative hits by visitor with horizontal bar chart.
Table: visitor name, DSLV, Visits, KB, Hits.
Horizontal bar chart of hits, striped by HTML/Graphic/Other.
Short Report: total visitors, MB, hits; visitor with most hits, MB, hits.
Data for this report is trimmed to "ndomhistdays" days.
Visitors by days since last visit
Report of number of visitors by days since last visit with horizontal bar chart.
Table: Visitor name, days since last visit.
Horizontal bar chart of hits, striped by HTML/Graphic/Other.
Short Report: total visitors and new visitors today.
Cumulative hits on HTML Pages
Report of cumulative hits by HTML page with horizontal bar chart.
Table: filename, Visits, KB, Hits.
Horizontal bar chart of hits, striped by HTML/Graphic/Other.
Short Report: total visitors, MB, hits; file with largest number of hits.
Hits by month
Vertical bar chart of hits by month striped by html/graphic/other.
vertical bar chart: hits by month, bars striped by HTML/Graphic/Other.
No long/short report.
Traffic by year
Report traffic by year with horizontal bar chart.
Horizontal bar chart, one row per year, of hits striped by HTML/Graphic/Other.
Short Report: total visitors, bytes, hits
Optional postamble copied from a text file. See the "postamble" global option.
Navigation links
Paths through the site
SWT also produces a .dot file for input to GraphViz, to visualize transitions between pages.
Dashboard
SWT also produces dash.csv, CSV format file containing daily totals for inclusion in a dashboard.
Super Webtrax requires perl5 and MySQL 4.1 or better because it uses nested queries. The "configure" script will try a sample query to test that this feature works.
Perform the following steps on the machine that will be running Super Webtrax:
Create a directory named /bin in your home directory for personal command line tools, and set your PATH.
cd $HOME mkdir bin echo "export PATH=$HOME/bin:$PATH" >> .bash_profile . .bash_profile
(the last line above assumes your shell is bash.)
Make sure you have a reasonably recent version of MySQL installed. You should install it before you install CPAN module DBD::mysql. Set up a database username and password.
"On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file. This occurs even if a --port or -P option is given to specify a port number. To ensure that the client makes a TCP/IP connection to the local server, use --host or -h to specify a host name value of 127.0.0.1, or the IP address or name of the local server. You can also specify the connection protocol explicitly, even for localhost, by using the --protocol=TCP option."
Set up the file .my.cnf in your home directory. It should look like
[client] user=dbusername password=pass host=domain database=dbname [mysqldump] user=dbusername password=pass host=domain database=dbname
Execute chmod 600 .my.cnf.
Make sure the mysql command works.
Make sure you have a reasonably recent version of Perl installed. (On a Mac, see https://formyfriendswithmacs.com/cpan.html).
Make a link from /usr/local/bin/perl to the Perl you will be using, so that shebang lines will work.
Set your environment variables, for example if your Perl version is 5.26, to
export VERSIONER_PERL_PREFER_32_BIT="no" export PERL5LIB="$HOME/bin:/opt/local/lib/perl5/5.26" export PERL_LOCAL_LIB_ROOT="/opt/local/lib/perl5/5.26" export PERL_MB_OPT="--install_base \"/opt/local/lib/perl5/5.26\"" export PERL_MM_OPT="INSTALL_BASE=/opt/local/lib/perl5/5.26"
Install the CPAN modules LWP::Simple, Term::ANSIColor, DBI, DBD::mysql, XML::LibXML, and XML::Simple. (You have to install MySQL first because DBD::mysql's installation tests access to MySQL.)
Install expandfile to your $HOME/bin. Install Perl modules expandfile.pm readbindsql.pm readbindxml.pm eadapacheline.pm in your $HOME/bin. (These files are supplied in the /tools subdirectory.)
Type the command expandfile and you should get a usage message like USAGE: expandfile [var=value]... tpt....
Create a MySQL database for the log data. (If you wish to produce multiple SWT reports on one machine, you must create a different database for each report.)
If you are going to to do GEOIP processing on your log file, using the MaxMind free geolocation database,
Create directory /swt/install-swt in your home directory.
Visit https://github.com/thvv/swt in your browser. Click the green "Code" button. You can choose "Clone" or "Download ZIP." Move the downloaded files into your swt/install-swt directory. This populates the directory install-swt, including subdirectories install-swt/tools and install-swt/live.
Configure SWT by executing the command
cd install-swt; ./configure
The first time you run configure you will be asked for multiple data items
Answer these questions.
The result of running configure is a file CONFIGURE.status which records the desired configuration. If you run configure when a CONFIGURE.status exists, it asks the questions again, but provides a default answer contained in the file, so that it is easy to change just a few configuration values, and just hit RETURN to accept the rest.
The configure script runs simple tests to ensure that mysql works and can create, load, and query SQL database tables with the supplied MySQL server name, database, userid, and password.
The configure script tests to ensure that expandfile works and can access the question answers from the shell environment. It then checks that expandfile can access the database and perform a nested SELECT with the supplied MySQL server name, database, userid, and password.
configure tests to make sure that logextractor2 works and that the MaxMind database is found, if GeoIP processing was selected.
configure uses expandfile to generate shell helper and configuration files that will be used when swt is executed.
Check over the files that result from configuration, and then execute:
./install
The new software will be installed in the installation directory you specified. If it appears that this is a COLD install, you will be asked
reset database???
and if you answer yes, the cumulative databases will be re-initialized.
You can tailor your SWT configuration to your local setup by modifying the file swt-user.sql. The configure script set up an initial version, but you may wish to add more information, such as:
To set up log file translation, tailor the cron job script created by configure to create your report page once a day.
If the web logs provided by your ISP do not contain referrer and agent, then Super Webtrax will not work well for you. If you control the web server configuration, select the NCSA Combined log format.
Where are your web logs and what are they called? The generated cron job assumes that some other agency places the logs in a specific directory, possibly gzipped. If your web logs must be copied from another machine, you may need to ensure that you can access the logs and handle the case of log rollover. This will require some shell script editing.
Does your web server log contain hits from one day or many?
How does your web server log indicate the source of a web transaction?
If the cachefile gets to be large, like 16MB, then bad things will happen: truncate it every so often.
Do you want IPs shown with a country name suffix and optional city, e.g. 12.178.27.243[US/Palo Alto CA] or adsl-226-123-174.mem.bellsouth.net[US]? logextractor2 can do this at additional cost in processing time, by using free data from MaxMind. To do this,
Does your web server log directly to SQL? If so, you will need to write a program to extract hit data from the table written by the server, and modify the swt and swtfunct.sh scripts to run it in place of logvisits.
Try running Super Webtrax once from the command line and see what happens.
./swt http_log_file
It should produce swtreport.html. Correct any problems.
The installer generates a cron script to run Super Webtrax every night and to move the output files to your web statistics display directory. This job may require hand editing to adapt it to your operating system and account. Because jobs started by cron do not execute your shell startup, you should set $PATH and $PERL5LIB in your crontab. Try running the cron script from the command line to see if it creates a report page that looks right, and correctly moves it into your web space.
When the cron script is ready to install, use the facilities provided by your account to schedule it. On Linux or Unix, this may be the crontab -e command, or some other method provided by your operating system resource manager. Wait till the job runs and check the output. There can be access problems because the environment for cron jobs is not the same as the command line. Once you get a clean run of SWT, it should run without further supervision. I set my cron jobs up so that the program output is mailed to me every day, and glance at the message and delete it.
About once a month, I visit each client machine and delete files that have been processed.
The main parts of SWT are:
Super Webtrax uses MySQL to store its data. It stores three kinds of data in the database:
Your nightly cron script will obtain a log data file to be processed, and then execute
./swt inputfile
If the log file name ends with ".gz", ".z", or ".Z", the file will be read through gzcat to unzip it.
The swt script reads the log and generates the report page. The cron script is responsible for
The "config" link on the navigation links bar at the top and bottom of the web page goes to a generated configuration web page that displays the current reporting configuration.
swtconfig.htmi is a configuration file that contains the location of the database server, database name, and user name and password on the server. Pointed to by swt. Secure this correctly if you are on a shared server. You may want to set up a .my.cnf file for use by mysql command (also secured).
mysqlload and mysqlrun are shell scripts used to load data into the MySQL database. mysqldumpcum is a shell script that is used to load data into the MySQL database. Pointed to by swt. These files may contain the database password, if the user cannot set up a .my.cnf file. Secure them correctly.
The following values can be set in swt-user.sql in the wtglobalvalues table.
Name | Meaning | Default |
CHECKSWTFILES | script that makes sure files are present .. can be overridden by swt-user.sql | $HOME/swt/tools/checkswtfiles |
CLEANUP | File deletion command. For debugging, change rm to echo | rm |
COMMANDPREFIX | Prefix commands with this command, can be "nice" or null | nice |
CONFIGFILE | Pathname of database configuration file -- should be mode 400, has database password | swtconfig.htmi |
cumquerybytemin | min number of bytes to keep query in wtcumquery table | 2500 |
cumquerycntmin | min number of queries to keep query in wtcumquery table | 2 |
DASHFILE | name of output dashboard report | swtdash.csv |
DATADIR | Directory where data files are kept .. can be overridden by swt-user.sql | $HOME/swt |
ECHO | change to "true" to shut program up | echo |
EXPAND | Template expander command | ./tools/expandfile |
gbquota | bandwidth quota in GB for this account | 0 |
gbquotadrophighest | Y to drop the highest day of the month in the bandwidth calculation | N |
glb_bar_graph_height | height of a bar in horiz graph, also width of bar in vertical graph | 10 |
glb_bar_graph_width | width in pixels of horizontal bar graph | 500 |
IMPORTANT | component name of output important visits report | important |
IMPORTANTFILE | name of last-7-days report | important.html |
LOGVISITS | perl prog | perl ./tools/logvisits3.pl |
MYSQLDUMPCUM | how to invoke mysqldump, contains password, must match config file, mode 500 | ./tools/mysqldumpcum |
MYSQLLOAD | invoke mysql to source a file, contains password, must match config file, mode 500 | ./tools/mysqlload |
MYSQLRUN | invoke mysql for one command, contains password, must match config file, mode 500 | ./tools/mysqlrun |
ndomhistdays | number of days to keep in wtdomhist table | 366 |
OUTPUTFILE | name of output report | swtreport.html |
PATHSFILE | name of output paths report | paths.dot |
pieappletheight | Height of pie chart | 220 |
pieappletwidth | Width of pie chart | 260 |
postamble | File copied at bottom of report | |
preamble | File copied at top of report | |
PRINTVISITDETAIL | perl prog | perl ./tools/printvisitdetail3.pl |
PROGRAMDIR | Directory where templates are installed .. can be overridden by swt-user.sql | $HOME/swt |
REPORTDIR | where to move the output report .. shd be overridden by swt-user.sql | $HOME/swt/live |
returnurl | URL of user website, for return link | index.html |
siteid | Short name for the dashboard | User |
sitename | Title for the usage report | User Website |
stylesheet | name of style sheet | swtstyle.css |
TOOLSDIR | Directory where data files are kept .. can be overridden by swt-user.sql | $HOME/swt/tools |
urlbase | Absolute URL prefix to live help | http://www.multicians.org/thvv/ |
VISITDATA | perl prog | perl ./tools/visitdata3.pl |
visitdata_refspamthresh | more than this many different referrers in one visit is spamsign | 2 |
WORDLIST | perl prog | perl ./tools/wordlist3.pl |
wtversion | version of this program for selecting help file | S24 |
The following configuration tables can be extended or altered in swt-user.sql.
wtboring | Boring pages. Discourage display of a visit in the important details. |
wtcolors | Watched pages. Which pages should be shown in details and what color to display them in. |
wtexpected404 | Files expected to be not found. Suppress these from the not found listing. |
wtglobalvalues | Global constants and configuration, documented above. |
wthackfilenames | File names that do not exist on the site and that attackers look for. Evidence of hacker attacks. |
wthackfiletypes | File suffixes that do not exist on the site and that attackers look for. Evidence of hacker attacks. |
wtheadpages | Which pages are head pages. |
wtindexers | Web crawlwers. Which user agents are web spiders etc. |
wtlocalreferrerregexp | Local referrer definitions. Defines which domains count as part of the website. |
wtpclasses | File assignments to visit class. |
wtpiequeries | Pie chart queries and weights. |
wtpredomain | Transformations applied to the source domain of each hit before processing. |
wtprepath | Transformations applied to file paths before processing. |
wtprereferrer | Transformations applied to referrers before processing. |
wtreferrercolor | Watched referrers. Which referring pages should be shown in details and in color. |
wtreportoptions | Report option values, documented with the individual reports. |
wtretcodes | Return code explanation. Describes the HTTP error codes. |
wtrobotdomains | Robot domains. Which domains are used only by web crawlers. |
wtshowanyway | Combinations of referrer and pathname to display even if wtsuffixclass says not to. |
wtsuffixclass | Suffix classes. Grouping of file suffixes, and display options. |
wtvclasses | Visit class definitions and color assignments. |
wtvsources | Visit source definitions and color assignments. These sources are built into visitdata.pl. |
wtwatch | Watch for domains and browsers to display specially in details report. |
The web page is formatted using a standard style sheet unless you override the wtglobalvalues.stylesheet configuration item in swt-user.sql. If no style sheet is specified, the following definitions are used:
<style> /* Styles for superwebtrax report */ BODY {background-color: #ffffff; color: #000000;} H1, H2, H3, H4 {font-family: sans-serif; font-weight: bold;} h1 {font-size: 125%;} h2 {font-size: 110%;} h3 {font-size: 100%;} h4 {font-size: 95%;} th {font-family: sans-serif;} .headrow {background-color: #ddddff;} h2 {background-color: #bbbbff;} h3 {background-color: #ccddff;} .brow {} .vc {} .indexer {font-style: italic;} .refdom {font-weight: bold;} .firstrefdom {font-weight: bold; color: blue;} .authsess {background-color: #ffffaa;} .newref {color: red;} .max {color: red;} .min {color: blue;} .query {color: green;} .details {font-size: 80%;} .details dt {float: left} .details dd {margin-left: 40px} .legendbar {font-size: 80%; font-weight: normal;} .navbar {font-size: 70%;} .chart {} .chart td {font-size: 90%; padding-top: 0; padding-bottom: 0; margin-top: 0; margin-bottom: 0; border-top-width: 0; border-bottom-width: 0; line-height: 90%;} .monthsum {} .monthsum td {font-size: 80%; padding-top: 0; padding-bottom: 0; margin-top: 0; margin-bottom: 0; border-top-width: 0; border-bottom-width: 0; line-height: 80%;} .analysis {font-size: 90%;} .analysis td {font-size: 90%;} .sessd {} .pie {} .fnf {color: gray;} .cac {color: pink;} .fbd {color: green;} .flg {color: purple;} /* flag for 'wtwtach' notes */ .filetype {font-size: 80%;} .illegal {} .subtitle {font-size: 80%;} .fineprint {font-size: 80%;} .logtime {font-style: italic;} .logtext {font-style: italic;} .cpr2 {padding-right: 2em;} /* cell-pad-right */ .cpl2 {padding-left: 2em;} /* cell-pad-right */ .numcol {padding-left: 5px; text-align: right;} /* cell-pad-left-align-right */ .mthsum {padding-right: 10px;} .alert {background-color: #ffffff; color: red;} .vhisto {padding: 0 10px 0 0} .vbar {padding: 0 2px 0 1px; margin: 0 0 0 0; vertical-align: bottom; font-family: sans-serif; font-size: 8pt; } img.block {display: block;} a:link {color: #0000ff;} a:visited {background-color: #ffffff; color: #777777;} a:hover {background-color: #ffdddd; color: black;} a:active {background-color: #ffffff; color: #ff0000;} .ctl {font-size: 12pt; float: right;} /* for the [+] control */ .h2title {text-decoration: none; color: black;} h2 a:link {color: black;} h2 a:visited {color: black;} h2 a:hover {background-color: #ffdddd; color: black;} h2 a:active {color: black;} .starthidden {display: none;} .short {font-size: 80%;} .datatable {margin: 10px; float: left;} .datatable td {padding-left: 5px;} .datacanvas{float: left;} .inred {color: red;} .inblue {color: blue;} .ingray {color: gray;} .ingreen {color: green;} .inorange {color: orange;} .inpink {color: pink;} .inpurple {color: purple;} .inyellow {color: yellow;} .inblack {color: black;} .incyan {color: cyan;} .indarkblue {color: darkblue;} .infuchsia {color: fuchsia;} .ingoldenrod {color: goldenrod;} .inindigo {color: indigo;} .inlightgreen {color: lightgreen;} .inlime {color: lime;} .inmaroon {color: maroon;} .innavy {color: navy;} .inolive {color: olive;} .insilver {color: silver;} .inteal {color: teal;} .inviolet {color: violet;} .inwhite {color: white;} </style>
If the log being processed includes the referrer string, this indicates what page a vistor's browser was displaying when it generated a request for your file. If the hit was generated by a search engine, the query may be included in the referrer string. (Google has mostly stopped including this: see "Search Queries and Engines" below.) SWT uses the referrer string to drive a lot of its analyses. Web crawlers sometimes spoof the referrer string; web proxies sometimes remove it.
SWT detects some queries as coming from search engines. Many popular engines are built in. In 2013, Google changed to use HTTPS security for many search references. This changed the way that it presents links to result sites; the links no longer show the text for a search query in the REFERRER field, so SWT cannot display or summarize such queries.
If what appears to be a visit starts with a hit referred by a local page, this may be a sign that the visitor is accessing the site very slowly, or that the visitor is accessing your site through a proxy that uses more than one address (microsoft.com seems to do this). Some web servers seem to put hits in their logs out of order, and this may also cause this. I have set the default expire_time up to 30 minutes, and still see a lot of these on my site. Visits that begin with a "local" hit are marked with "*" in the visit details.
Configure this in swt-user.sql table wtlocalreferrerregexp
Many people have asked to be able to ignore their own hits on their site.
By default, SWT summarizes hits by toplevel domain, e.g. ".com". For toplevel domains that correspond to a country, the country name is shown.
Configure this in swt-user.sql tables wtpredomain, wtprepath, wtprereferrer.
SWT produces a table of all transactions logged by the web server, organized by return code. Most of the transactions will have code 200; but code 304 means that a distant proxy was checking to see if the file had changed, so it also counts as a hit. Code 206 means that part of the file was returned; a big file might be requested in chunks. Currently SWT counts all of these transactions as hits, since it can't tell which partial content answers are part of the same request. Other return codes are counted but their transaction is not considered a hit.
SWT produces a chart of total accesses by platform, that is, by operating system, based on the user-agent string. This chart matches patterns against the referrer string sent by the browser and is not 100% accurate (browsers and crawlers can misrepresent themselves).
Each file transmitted by the server to a browser is logged by the web server as a "hit." For example, if a visitor visits an HTML page that refers to three GIFs, a .css file, and a Java applet, this visit would generate at least six hits, one for the HTML file, three for the GIF files, one for the css file, and one (or more) for the applet binary. (Assuming the visitor's browser has Java enabled and is loading images.) SWT can be told to ignore certain hits in various ways.
If there is a sequence of hits from the same domain, these are counted as a single visit. If the hits stop for longer than a certain idle time, and then start again, SWT will see two visits. You can configure the length of the idle interval. (See "How can a visit be 'local?'".)
If a section of the website requires the visitor to give a userid and password, the entire visit by the user will be marked with the userid given.
Each visit is classified with an identifier. Visits that appear to be from web indexers are classified as "indexer". Web indexer visits are detected if the visit touches the file "robots.txt" (which must exist for this check to work), or if it comes from a web indexer URL (from table wtrobotdomains) or user agent (from table wtindexers) listed in the configuration.
Non-indexer visits are assigned a class by looking at the class of each file hit, and choosing the most popular.
The table wtpclasses in the configuration defines a class specifier for file pathnames relative to the server root. File names can be specified as a path name, or as a directory prefix name. The most specific value will be chosen. Directory prefixes should begin and end with slash. If no matching class is found, and the hit pathname contains a directory, the first level directory name is used as the class name. The class specifier is a comma separated list of class names, so that a file can be declared to be a member of more than one class. Ambiguous specifications weigh 1/N for each class if they contain N classes.
The table wtvclasses in the configuration specifies the color the class will be shown in, and a short explanation of the class.
For comparing website activity, HTML page loads (that is, hits on pages with a suffix associated with 'html' in table wtsuffixclass) are more interesting than hits.
Each hit is classified as to its source; a hit may be the result of a search, the result of a link, generated by a web indexer, a local reference, or unspecified.
Search engines are detected by looking at the referring URL, which has the URL of the search engine's page, and often the query used to search.
When a hit appears to come from a search engine, SWT tries to determine what the engine was searching for. It can't always extract the query; some engines, like Gamelan, don't put the query term in the referring URL, and in these cases SWT doesn't show a query. Google encrypts the query, so we don't show those. See Search Engines and Queries above.
Other web sites' pages can have hyperlinks to yours. If the person browsing your site uses a browser that sends the referrer info, and if your web server puts that information in the log, you can see who links to you and how often those links are used. SWT will summarize the number of links to your pages.
An illegal hit is a reference to an object on your site (not a source file) from a referrer that is not a source file on your site. One cause for this is people linking to your graphics from their pages. Another possible cause is an incorrect referrer string sent by a browser.
Visits that do not reference any source files are summarized separately. Such visits may result from web crawlers that look only at graphics files or PDF files, or from illegal references to your graphics from others' sites, or from a reference to a graphic, PDF, or whatever on your site in a mail message. These visits are not shown in the visit details section.
Each hit comes from a machine identified by its Internet domain name like barney.rubble.com. If the visitor cannot be identified by name, its IP Number is shown. If geoIP processing is performed by logextractor2, the IP will have a country name suffix (and optional city name) in brackets.
Days since previous visit.
Days since last visit. 0 if visited today.
Toplevel domains are the least specific part of the name, like .com or .de.
SWT accumulates some statistics for a period longer than a day... someday.
Search engines work by reading your pages and building a big index on disk. When they do this it creates a sequence of hits. SWT will count these separately if you tell it the names of the search engines' indexers or domains, and if the browser (user agent) name is provided in the log. You can suppress these indexer visits from the visit details by setting an option; if the hits are displayed, they are in the CSS class "indexer", which a custom style sheet can decorate.
You can suppress visits with fewer than a specified number of HTML pages: the default is 1. You can suppress visits by indexers.
Here is an example listing:
16:38 xxx01.xxx.net -- g.html (gamelan:-) 0:01, ga.class 2:25, gv.class [4, 212 KB; MSIE] {code}
For each visit, Webtrax shows
The program logextractor22 is supplied with SWT. It reads an NCSA [combined] web server log and extracts a day's worth of data. It optionally does reverse DNS lookup on numeric IPs. It also optionally does geoIP lookup on numeric IPs, and Super Webtrax will accept domains with the geoip lookup already done.
nice logextractor2 [-dns cachefile] [-geoipcity $HOME/lib/GeoLite2-City.mmdb] -day mm/dd/yyyy filepath ... > outpath nice logextractor2 [-dns cachefile] [-geoipcity $HOME/lib/GeoLite2-City.mmdb] -day yyyy-mm-dd filepath ... > outpath nice logextractor2 [-dns cachefile] [-geoipcity $HOME/lib/GeoLite2-City.mmdb] -day yesterday filepath ... > outpath nice logextractor2 [-dns cachefile] [-geoipcity $HOME/lib/GeoLite2-City.mmdb] -day all filepath ... > outpath
Finds all log entries that occurred on the given day and writes them to stdout.
The program can use the free geo-location database provided by MaxMind Inc at www.maxmind.com by specifying the -geoipcity argument with the path of the binary "city" database. You need a (free) license to download the database.
Your web server may serve pages for multiple domains. One way to handle this is to map each domain to a separate directory and use the ability to name a visit class after a toplevel directory. However, some servers produce a separate virtual host web usage log for each domain served, and merge the server logs into a single log, altering the toplevel directory. The programs combinelogs and logmerge are supplied with SWT. logmerge reads multiple NCSA [combined] web server logs, merges them, and writes a combined log. combinelogs is used to find the files to merge and prepare the arguments. This facility should be run before processing with logextractor2.
nice $BIN/combinelogs combinelogs.conf | sh
where combinelogs.conf looks like
www -drop /thvv lilli.com -add /lilli formyfriendswithmacs.com -add /formyfriendswithmacs multicians.org
Each line in combinelogs.conf lists the prefix of one log file. combinelogs.conf looks in the current working directory for sets of logs having the same date, and invokes logmerge to merge them. Log files are expected to be named e.g. www.20110418.gz, where the .gz suffix is optional. The digits are required. If -add or -drop are specified, they are passed through to logmerge to alter the top level file pathname by adding or dropping a prefix. The resulting output is named comb.20110418.gz.
Each web server computer that prepares an SWT report runs a daily CRON job to invoke SWT once a day: it generates an HTML formatted report on the previous day's usage. These computers can obtain the web logs for the previous day in several ways:
SWT uses a set of tables with fixed names for each report that it produces. To generate more than one SWT report on a server, you need to create more than one MySQL database. Each SWT instance will have its own directory subtree containing tailored files generated by configure and install. The instance will have its own swt-user.sql containting report tailoring and parameters.
For example, I manage an ISP account server used for client websites that provides virtual hosts for 13 domains. Separate web logs are generated by the ISP for each domain. One of these domains gets its own SWT report; the rest of the domains are combined into a single report.
To set this up, create a MySQL database for each group of sites, and a directory where SWT will be installed. Arrange for separate web logs to be generated for each site. Then install SWT once for each group. I set up the CRON job for daily log processing so that it moves the logs for each group into a different directory, and then runs SWT daily processing twice, from the different install directories. The 11 logs for the combined sites are processed by logmerge to create a combined log, with file names rewritten to include the site ID.
Here is info on my personal setup that maintains and updates SWT installations on five web server machines. (Some of this need to be updated.)
At a non-published URL, I have set up a CGI that displays a daily status table. It expands an HTMX template that shows information for the previous day:
I usually check this once a day.
Every site will have a different swt-user.sql and cron job. You may need a local copy of the main shell script swt. Most control table changes can be done in swt-user.sql since it overrides swt.sql. New report sections can be provided to all SWT clients, or written specially for a particular client.
To create a new report section:
Individual reports are expanded by the "sectionrep" macro in swt given the section ID as argument as listed in the "swtreports" table. The "wtreportoptions" table identifies parameters for reports and these values are mapped into environment variables visible to expandfile, e.g. ('rpt_403','template','report403.htmt','') will cause variable rpt_403_template to be defined with value report403.htmt at runtime. Each report template .htmx file has similar boilerplate in its header, some of which, including the show/hide control, is in rptheader.htmi. If a report has a short and long form, the character ⊗ is shown in the H2: clicking on any text in the H2 will switch these reports from short to long and back. Tables start open instead of closed if a line like the following is included into swt_user.sql:
INSERT INTO wtreportoptions VALUES('rpt_domain','start','long','long=start with long report') ON DUPLICATE KEY UPDATE optvalue='long';
Copyright (c) 2006-2023 by Tom Van Vleck