closed-social
/
gitea

# This fork...

I'm maintaining this fork because the original author was not replying to issues or pull requests.  For now I plan on maintaining this fork as necessary.
## Status

[![Build Status](https://travis-ci.org/blevesearch/go-porterstemmer.svg?branch=master)](https://travis-ci.org/blevesearch/go-porterstemmer)
[![Coverage Status](https://coveralls.io/repos/blevesearch/go-porterstemmer/badge.png?branch=HEAD)](https://coveralls.io/r/blevesearch/go-porterstemmer?branch=HEAD)
# Go Porter Stemmer

A native Go clean room implementation of the Porter Stemming Algorithm.
This algorithm is of interest to people doing Machine Learning orNatural Language Processing (NLP).
This is NOT a port. This is a native Go implementation from the human-readabledescription of the algorithm.
I've tried to make it (more) efficient by NOT internally using string's, butinstead internally using []rune's and using the same (array) buffer used bythe []rune slice (and sub-slices) at all steps of the algorithm.
For Porter Stemmer algorithm, see:
http://tartarus.org/martin/PorterStemmer/def.txt      (URL #1)
http://tartarus.org/martin/PorterStemmer/             (URL #2)
# Departures

Also, since when I initially implemented it, it failed the tests at...
http://tartarus.org/martin/PorterStemmer/voc.txt      (URL #3)
http://tartarus.org/martin/PorterStemmer/output.txt   (URL #4)
... after reading the human-readble text over and over again to try to figure outwhat the error I made was (and doing all sorts of things to debug it) I came to theconclusion that the some of these tests were wrong according to the human-readabledescription of the algorithm.
This led me to wonder if maybe other people's code that was passing these tests hadrules that were not in the human-readable description. Which led me to look at the sourcecode here...
http://tartarus.org/martin/PorterStemmer/c.txt        (URL #5)
... When I looked there I noticed that there are some items marked as a "DEPARTURE",which differ from the original algorithm. (There are 2 of these.)
I implemented these departures, and the tests at URL #3 and URL #4 all passed.
## Usage

To use this Golang library, use with something like:
    package main        import (      "fmt"      "github.com/reiver/go-porterstemmer"    )        func main() {            word := "Waxes"            stem := porterstemmer.StemString(word)            fmt.Printf("The word [%s] has the stem [%s].\n", word, stem)    }
Alternatively, if you want to be a bit more efficient, use []rune slices instead, with code like:
    package main        import (      "fmt"      "github.com/reiver/go-porterstemmer"    )        func main() {            word := []rune("Waxes")            stem := porterstemmer.Stem(word)            fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))    }
Although NOTE that the above code may modify original slice (named "word" in the example) as a sideeffect, for efficiency reasons. And that the slice named "stem" in the example above may be asub-slice of the slice named "word".
Also alternatively, if you already know that your word is already lowercase (and you don't needthis library to lowercase your word for you) you can instead use code like:
    package main        import (      "fmt"      "github.com/reiver/go-porterstemmer"    )        func main() {            word := []rune("waxes")            stem := porterstemmer.StemWithoutLowerCasing(word)            fmt.Printf("The word [%s] has the stem [%s].\n", string(word), string(stem))    }
Again NOTE (like with the previous example) that the above code may modify original slice (named"word" in the example) as a side effect, for efficiency reasons. And that the slice named "stem"in the example above may be a sub-slice of the slice named "word".