An effective hash-based algorithm for mining association rules

Abstract

In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we propose an effective hash-based algorithm for the candidate set generation. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.

Keywords

Association rule learningComputer scienceHash functionData miningSet (abstract data type)BottleneckDatabase transactionAlgorithmDatabase

Affiliated Institutions

IBM Research - Thomas J. Watson Research Center US

Related Publications

Mining association rules between sets of items in large databases

Rakesh Agrawal , Tomasz Imieliński , Arun Swami

We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates a...

1993 14674 citations

Mining sequential patterns

R. K. Agrawal , Ramakrishnan Srikant

We are given a large database of customer transactions, where each transaction consists of customer-id, transaction time, and the items bought in the transaction. We introduce t...

2002 5114 citations

An Efficient Algorithm for Mining Association Rules in Large Databases

Ashoka Savasere , Edward Omiecinski , Shamkant B. Navathe

Mining for association rules between items in a large database of sales transactions has been described as an important database mining problem. In this paper we present an effi...

1995 1598 citations

Mining frequent patterns without candidate generation

Jiawei Han , Jian Pei , Yiwen Yin

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previ...

2000 ACM SIGMOD Record 6285 citations

Limiting privacy breaches in privacy preserving data mining

Alexandre Evfimievski , Johannes Gehrke , Ramakrishnan Srikant

There has been increasing interest in the problem of building accurate data mining models over aggregate data, while protecting privacy at the level of individual records. One a...

2003 819 citations

Publication Info

Year: 1995
Type: article
Pages: 175-186
Citations: 1410
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

An effective hash-based algorithm for mining association rules

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

1410

OpenAlex

Cite This

APA Style

                            
                                    Jong Soo Park, 
                                
                                    Ming-Syan Chen⋆, 
                                
                                    Philip S. Yu
                                
                            (1995). 
                            An effective hash-based algorithm for mining association rules. 
                            
                            , 175-186.
                            https://doi.org/10.1145/223784.223813

Identifiers

DOI: 10.1145/223784.223813