Integrating association rule mining with relational database systems

Sunita Sarawagi; Shiby Thomas; Rakesh Agrawal

doi:10.1145/276304.276335

Abstract

Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loose-coupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache-Mine. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and inter-operability.

Keywords

Computer scienceSQLStored procedureDatabaseSoftware portabilityAssociation rule learningQuery by ExampleSQL/PSMData miningOperating systemInformation retrieval

Affiliated Institutions

Related Publications

Inference aggregation detection in database management systems

Thomas H. Hinke

The author identifies inference aggregation and cardinality aggregation as two distinct aspects of the aggregation problem. He develops the concept of a semantic relationship gr...

2003 110 citations

Set-oriented mining for association rules in relational databases

M.A.W. Houtsma , A. Swami

Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpos...

2002 270 citations

Scalable object-tracking through unattended techniques (SCOUT)

Satish Kumar , C. Alaettinglu , Deborah Estrin

A scalable object location service can enable users to search for various objects in an environment where many small, networked devices are attached to objects. We investigate t...

2002 Proceedings 2000 International Confer... 54 citations

Knowledge Discovery in Databases: An Attribute-Oriented Approach

Jiawei Han , Yandong Cai , Nick Cercone

Knowledge discovery in databases, or data mining, is an important issue in the development of data- and knowledge-base systems. An attribute-oriented induction method has been d...

1992 385 citations

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

Antony Rowstron , Peter Druschel

This paper presents and evaluates the storage management and caching in PAST, a large-scale peer-to-peer persistent storage utility. PAST is based on a self-organizing, Internet...

2001 1220 citations

Publication Info

Year: 1998
Type: article
Pages: 343-354
Citations: 304
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Integrating association rule mining with relational database systems

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

304

OpenAlex

Cite This

APA Style

                            
                                    Sunita Sarawagi, 
                                
                                    Shiby Thomas, 
                                
                                    Rakesh Agrawal
                                
                            (1998). 
                            Integrating association rule mining with relational database systems. 
                            
                            , 343-354.
                            https://doi.org/10.1145/276304.276335

Identifiers

DOI: 10.1145/276304.276335