Classification in Networked Data: A Toolkit and a Univariate Case Study

Abstract

This paper is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well---well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes---that is, Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection.

Keywords

Computer scienceArtificial intelligenceMachine learningClassifier (UML)Feature selectionInferenceStatistical relational learningData miningClass (philosophy)Relational database

Affiliated Institutions

New York University US

Related Publications

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

Hanchuan Peng , Fuhui Long , Chen Ding

Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion base...

2005 IEEE Transactions on Pattern Analysis... 10050 citations

Unsupervised Feature Learning via Non-parametric Instance Discrimination

Zhirong Wu , Yuanjun Xiong , Stella X. Yu +1 more

Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so. We study whether...

2018 3435 citations

Emerging Properties in Self-Supervised Vision Transformers

Mathilde Caron , Hugo Touvron , Ishan Misra +4 more

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond t...

2021 2021 IEEE/CVF International Conferenc... 4220 citations

Statistical pattern recognition: a review

Anil K. Jain , Peter Duin , Jianchang Mao

The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated...

2000 IEEE Transactions on Pattern Analysis... 6667 citations

Stability-Based Validation of Clustering Solutions

Tilman Lange , Volker Röth , Mikio L. Braun +1 more

Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract “natural” group structure in data. Such groupings need to be validated ...

2004 Neural Computation 508 citations

Publication Info

Year: 2007
Type: article
Volume: 8
Issue: 34
Pages: 935-983
Citations: 447
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Classification in Networked Data: A Toolkit and a Univariate Case Study

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

447

OpenAlex

Cite This

APA Style

                            
                                    Sofus A. Macskassy, 
                                
                                    Foster Provost
                                
                            (2007). 
                            Classification in Networked Data: A Toolkit and a Univariate Case Study. 
                            Journal of Machine Learning Research
                            , 8
                            (34)
                            , 935-983.
                            https://doi.org/10.5555/1248659.1248693

Identifiers

DOI: 10.5555/1248659.1248693