You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

246 lines
8.2 KiB

  1. roaring [![Build Status](https://travis-ci.org/RoaringBitmap/roaring.png)](https://travis-ci.org/RoaringBitmap/roaring) [![Coverage Status](https://coveralls.io/repos/github/RoaringBitmap/roaring/badge.svg?branch=master)](https://coveralls.io/github/RoaringBitmap/roaring?branch=master) [![GoDoc](https://godoc.org/github.com/RoaringBitmap/roaring?status.svg)](https://godoc.org/github.com/RoaringBitmap/roaring) [![Go Report Card](https://goreportcard.com/badge/RoaringBitmap/roaring)](https://goreportcard.com/report/github.com/RoaringBitmap/roaring)
  2. =============
  3. This is a go version of the Roaring bitmap data structure.
  4. Roaring bitmaps are used by several major systems such as [Apache Lucene][lucene] and derivative systems such as [Solr][solr] and
  5. [Elasticsearch][elasticsearch], [Metamarkets' Druid][druid], [LinkedIn Pinot][pinot], [Netflix Atlas][atlas], [Apache Spark][spark], [OpenSearchServer][opensearchserver], [Cloud Torrent][cloudtorrent], [Whoosh][whoosh], [Pilosa][pilosa], [Microsoft Visual Studio Team Services (VSTS)][vsts], and eBay's [Apache Kylin][kylin].
  6. [lucene]: https://lucene.apache.org/
  7. [solr]: https://lucene.apache.org/solr/
  8. [elasticsearch]: https://www.elastic.co/products/elasticsearch
  9. [druid]: http://druid.io/
  10. [spark]: https://spark.apache.org/
  11. [opensearchserver]: http://www.opensearchserver.com
  12. [cloudtorrent]: https://github.com/jpillora/cloud-torrent
  13. [whoosh]: https://bitbucket.org/mchaput/whoosh/wiki/Home
  14. [pilosa]: https://www.pilosa.com/
  15. [kylin]: http://kylin.apache.org/
  16. [pinot]: http://github.com/linkedin/pinot/wiki
  17. [vsts]: https://www.visualstudio.com/team-services/
  18. [atlas]: https://github.com/Netflix/atlas
  19. Roaring bitmaps are found to work well in many important applications:
  20. > Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods ([Wang et al., SIGMOD 2017](http://db.ucsd.edu/wp-content/uploads/2017/03/sidm338-wangA.pdf))
  21. The ``roaring`` Go library is used by
  22. * [Cloud Torrent](https://github.com/jpillora/cloud-torrent): a self-hosted remote torrent client
  23. * [runv](https://github.com/hyperhq/runv): an Hypervisor-based runtime for the Open Containers Initiative
  24. * [InfluxDB](https://www.influxdata.com)
  25. * [Pilosa](https://www.pilosa.com/)
  26. * [Bleve](http://www.blevesearch.com)
  27. This library is used in production in several systems, it is part of the [Awesome Go collection](https://awesome-go.com).
  28. There are also [Java](https://github.com/RoaringBitmap/RoaringBitmap) and [C/C++](https://github.com/RoaringBitmap/CRoaring) versions. The Java, C, C++ and Go version are binary compatible: e.g, you can save bitmaps
  29. from a Java program and load them back in Go, and vice versa. We have a [format specification](https://github.com/RoaringBitmap/RoaringFormatSpec).
  30. This code is licensed under Apache License, Version 2.0 (ASL2.0).
  31. Copyright 2016-... by the authors.
  32. ### References
  33. - Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri, Chris O'Hara, François Saint-Jacques, Gregory Ssi-Yan-Kai, Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), 2018 [arXiv:1709.07821](https://arxiv.org/abs/1709.07821)
  34. - Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin,
  35. Better bitmap performance with Roaring bitmaps,
  36. Software: Practice and Experience 46 (5), 2016.
  37. http://arxiv.org/abs/1402.6407 This paper used data from http://lemire.me/data/realroaring2014.html
  38. - Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), 2016. http://arxiv.org/abs/1603.06549
  39. ### Dependencies
  40. Dependencies are fetched automatically by giving the `-t` flag to `go get`.
  41. they include
  42. - github.com/smartystreets/goconvey/convey
  43. - github.com/willf/bitset
  44. - github.com/mschoch/smat
  45. - github.com/glycerine/go-unsnap-stream
  46. - github.com/philhofer/fwd
  47. - github.com/jtolds/gls
  48. Note that the smat library requires Go 1.6 or better.
  49. #### Installation
  50. - go get -t github.com/RoaringBitmap/roaring
  51. ### Example
  52. Here is a simplified but complete example:
  53. ```go
  54. package main
  55. import (
  56. "fmt"
  57. "github.com/RoaringBitmap/roaring"
  58. "bytes"
  59. )
  60. func main() {
  61. // example inspired by https://github.com/fzandona/goroar
  62. fmt.Println("==roaring==")
  63. rb1 := roaring.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
  64. fmt.Println(rb1.String())
  65. rb2 := roaring.BitmapOf(3, 4, 1000)
  66. fmt.Println(rb2.String())
  67. rb3 := roaring.New()
  68. fmt.Println(rb3.String())
  69. fmt.Println("Cardinality: ", rb1.GetCardinality())
  70. fmt.Println("Contains 3? ", rb1.Contains(3))
  71. rb1.And(rb2)
  72. rb3.Add(1)
  73. rb3.Add(5)
  74. rb3.Or(rb1)
  75. // computes union of the three bitmaps in parallel using 4 workers
  76. roaring.ParOr(4, rb1, rb2, rb3)
  77. // computes intersection of the three bitmaps in parallel using 4 workers
  78. roaring.ParAnd(4, rb1, rb2, rb3)
  79. // prints 1, 3, 4, 5, 1000
  80. i := rb3.Iterator()
  81. for i.HasNext() {
  82. fmt.Println(i.Next())
  83. }
  84. fmt.Println()
  85. // next we include an example of serialization
  86. buf := new(bytes.Buffer)
  87. rb1.WriteTo(buf) // we omit error handling
  88. newrb:= roaring.New()
  89. newrb.ReadFrom(buf)
  90. if rb1.Equals(newrb) {
  91. fmt.Println("I wrote the content to a byte stream and read it back.")
  92. }
  93. }
  94. ```
  95. If you wish to use serialization and handle errors, you might want to
  96. consider the following sample of code:
  97. ```go
  98. rb := BitmapOf(1, 2, 3, 4, 5, 100, 1000)
  99. buf := new(bytes.Buffer)
  100. size,err:=rb.WriteTo(buf)
  101. if err != nil {
  102. t.Errorf("Failed writing")
  103. }
  104. newrb:= New()
  105. size,err=newrb.ReadFrom(buf)
  106. if err != nil {
  107. t.Errorf("Failed reading")
  108. }
  109. if ! rb.Equals(newrb) {
  110. t.Errorf("Cannot retrieve serialized version")
  111. }
  112. ```
  113. Given N integers in [0,x), then the serialized size in bytes of
  114. a Roaring bitmap should never exceed this bound:
  115. `` 8 + 9 * ((long)x+65535)/65536 + 2 * N ``
  116. That is, given a fixed overhead for the universe size (x), Roaring
  117. bitmaps never use more than 2 bytes per integer. You can call
  118. ``BoundSerializedSizeInBytes`` for a more precise estimate.
  119. ### Documentation
  120. Current documentation is available at http://godoc.org/github.com/RoaringBitmap/roaring
  121. ### Goroutine safety
  122. In general, it should not generally be considered safe to access
  123. the same bitmaps using different goroutines--they are left
  124. unsynchronized for performance. Should you want to access
  125. a Bitmap from more than one goroutine, you should
  126. provide synchronization. Typically this is done by using channels to pass
  127. the *Bitmap around (in Go style; so there is only ever one owner),
  128. or by using `sync.Mutex` to serialize operations on Bitmaps.
  129. ### Coverage
  130. We test our software. For a report on our test coverage, see
  131. https://coveralls.io/github/RoaringBitmap/roaring?branch=master
  132. ### Benchmark
  133. Type
  134. go test -bench Benchmark -run -
  135. To run benchmarks on [Real Roaring Datasets](https://github.com/RoaringBitmap/real-roaring-datasets)
  136. run the following:
  137. ```sh
  138. go get github.com/RoaringBitmap/real-roaring-datasets
  139. BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run -
  140. ```
  141. ### Iterative use
  142. You can use roaring with gore:
  143. - go get -u github.com/motemen/gore
  144. - Make sure that ``$GOPATH/bin`` is in your ``$PATH``.
  145. - go get github/RoaringBitmap/roaring
  146. ```go
  147. $ gore
  148. gore version 0.2.6 :help for help
  149. gore> :import github.com/RoaringBitmap/roaring
  150. gore> x:=roaring.New()
  151. gore> x.Add(1)
  152. gore> x.String()
  153. "{1}"
  154. ```
  155. ### Fuzzy testing
  156. You can help us test further the library with fuzzy testing:
  157. go get github.com/dvyukov/go-fuzz/go-fuzz
  158. go get github.com/dvyukov/go-fuzz/go-fuzz-build
  159. go test -tags=gofuzz -run=TestGenerateSmatCorpus
  160. go-fuzz-build github.com/RoaringBitmap/roaring
  161. go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200
  162. Let it run, and if the # of crashers is > 0, check out the reports in
  163. the workdir where you should be able to find the panic goroutine stack
  164. traces.
  165. ### Alternative in Go
  166. There is a Go version wrapping the C/C++ implementation https://github.com/RoaringBitmap/gocroaring
  167. For an alternative implementation in Go, see https://github.com/fzandona/goroar
  168. The two versions were written independently.
  169. ### Mailing list/discussion group
  170. https://groups.google.com/forum/#!forum/roaring-bitmaps