Abstract

This paper considers the problem of mining closed frequent itemsets over a data stream sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets. However, monitoring only frequent itemsets will make it impossible to detect new itemsets when they become frequent. In this paper, we introduce a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding window. The selected itemsets contain a boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are reflected by boundary movements in the CET. In other words, a status change of any itemset (e.g., from non-frequent to frequent) must occur through the boundary. Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET. Our experiments show that our algorithm performs much better than representative algorithms for the sate-of-the-art approaches.

Keywords

Sliding window protocolBoundary (topology)Data miningData streamComputer scienceWindow (computing)Set (abstract data type)Data structureData stream miningTree (set theory)EnumerationMoment (physics)Space (punctuation)AlgorithmMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2006
Type
article
Volume
10
Issue
3
Pages
265-294
Citations
173
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

173
OpenAlex

Cite This

Yün Chi, Haixun Wang, Philip S. Yu et al. (2006). Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge and Information Systems , 10 (3) , 265-294. https://doi.org/10.1007/s10115-006-0003-0

Identifiers

DOI
10.1007/s10115-006-0003-0