An algorithm for suffix stripping | RDL Research Database

Abstract

The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.

Keywords

SuffixStripping (fiber)Simple (philosophy)Computer scienceSuffix arrayGeneralized suffix treeCompressed suffix arrayAlgorithmField (mathematics)Measure (data warehouse)SIMPLE algorithmNatural language processingSuffix treeArtificial intelligenceMathematicsData miningLinguisticsPhysicsPure mathematicsEngineering

Affiliated Institutions

University of Cambridge GB

Related Publications

MUMmer4: A fast and versatile genome alignment system

Guillaume Marçais , Arthur L. Delcher , Adam M. Phillippy +3 more

The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer...

2018 PLoS Computational Biology 2427 citations

A Scalable Hierarchical Distributed Language Model

Andriy Mnih , Geoffrey E. Hinton

Neural probabilistic language models (NPLMs) have been shown to be competitive with and occasionally superior to the widely-used n-gram language models. The main drawback of NPL...

2008 848 citations

Mining frequent patterns without candidate generation

Jiawei Han , Jian Pei , Yiwen Yin

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previ...

2000 ACM SIGMOD Record 6285 citations

Appraisal of a simple arsenic removal method for ground water of Bangladesh

A. H. Khan , Shahriar Bin Rasul , A. K. M. Munir +4 more

Abstract A simple three‐pitcher (locally known as '3‐kalshi') filtration assembly made entirely from readily available local materials is tested for its efficacy in removing ars...

2000 Journal of Environmental Science and ... 147 citations

GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array

Shuji Suzuki , Masanori Kakuta , Takashi Ishida +1 more

DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, h...

2014 PLoS ONE 91 citations

Publication Info

Year: 1980
Type: article
Volume: 14
Issue: 3
Pages: 130-137
Citations: 8045
Access: Closed

External Links

Download PDF (Free) View on DOI.org Semantic Scholar

Social Impact

Altmetric

An algorithm for suffix stripping

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

8045

OpenAlex

293

Influential

4437

CrossRef

Cite This

APA Style

                            
                                    Martin Porter
                                
                            (1980). 
                            An algorithm for suffix stripping. 
                            Program electronic library and information systems
                            , 14
                            (3)
                            , 130-137.
                            https://doi.org/10.1108/eb046814

Identifiers

DOI: 10.1108/eb046814

Data Quality

Data completeness: 77%