Association and Sequence Mining in Web Usage
Abstract
Web servers worldwide generate a vast amount of information on web users’ browsing
activities. Several researchers have studied these so‐called clickstream or web access log
data to better understand and characterize web users. Clickstream data can be enriched
with information about the content of visited pages and the origin (e.g., geographic,
organizational) of the requests. The goal of this project is to analyse user behaviour by
mining enriched web access log data. With the continued growth and proliferation of ecommerce,
Web services, and Web‐based information systems, the volumes of click stream
and user data collected by Web‐based organizations in their daily operations has reached
astronomical proportions. This information can be exploited in various ways, such as
enhancing the effectiveness of websites or developing directed web marketing campaigns.
The discovered patterns are usually represented as collections of pages, objects, or resources
that are frequently accessed by groups of users with common needs or interests.
The focus of this paper is to provide an overview how to use frequent pattern techniques
for discovering different types of patterns in a Web log database. In this paper we will focus
on finding association as a data mining technique to extract potentially useful knowledge
from web usage data. I implemented in Java, using NetBeans IDE, a program for
identification of pages’ association from sessions. For exemplification, we used the log files
from a commercial web site.
Collections
- 2011_fascicula1_nr2 [17]