Statistical Analysis Of Pageviews On Web Sites
Price
Free (open access)
Volume
28
Pages
Published
2002
Size
550 kb
Paper DOI
10.2495/DATA020941
Copyright
WIT Press
Author(s)
P H A J M van Gelder, G Beijer & M Berger
Abstract
Pageview statistics are useful to describe and predict the behaviour of clients on internet sites. From a theoretical point of view, the number of pageviews during a day should be Poisson distributed. However, violation of stationarity assumptions causes other distribution types to fit pageviews data usually much better. In this paper a procedure is described that explains how to homogenise the data (with detrending techniques) and allows several distribution functions as possible candidates. A goodness-of-fit test will select the optimal distribution for the given dataset. In particular attention will be paid to the occurence probabilities of large numbers of pageviews on different types of slightly correlated websites. The paper furthermore presents models for giving forecasts on the number of pageviews during the rest of the day (given a number of pageviews earlier that day) and for giving uncertainty intervals with that forecast. 1 Introduction Pageview statistics are useful to describe and predict the behaviour of clients on internet sites. Typical questions that are related to visitor behaviour are the frequency and length of visits during a certain time period, the entrance and exit locations of visitors, the percentage of visitors who reach key pages (such as a sign-up page, cash register, etc), the paths they take, the traffic trend, the prediction of traffic spikes, the accommodation of server space for increased traffic, the adjustment for browser technology, the evaluation of behaviour variations among subsets of customers and the change during sales, etc, etc. However, these questions are difficult to answer because of the existence of several boundary conditions: human behaviour is very stochastic and data can be incomplete or noisy caused by the existence of proxy servers, fire walls, caching,
Keywords