You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

323 lines
9.8 KiB

  1. # TODO list
  2. ## Release v0.6
  3. 1. Review encoder and check for lzma improvements under xz.
  4. 2. Fix binary tree matcher.
  5. 3. Compare compression ratio with xz tool using comparable parameters
  6. and optimize parameters
  7. 4. Do some optimizations
  8. - rename operation action and make it a simple type of size 8
  9. - make maxMatches, wordSize parameters
  10. - stop searching after a certain length is found (parameter sweetLen)
  11. ## Release v0.7
  12. 1. Optimize code
  13. 2. Do statistical analysis to get linear presets.
  14. 3. Test sync.Pool compatability for xz and lzma Writer and Reader
  15. 3. Fuzz optimized code.
  16. ## Release v0.8
  17. 1. Support parallel go routines for writing and reading xz files.
  18. 2. Support a ReaderAt interface for xz files with small block sizes.
  19. 3. Improve compatibility between gxz and xz
  20. 4. Provide manual page for gxz
  21. ## Release v0.9
  22. 1. Improve documentation
  23. 2. Fuzz again
  24. ## Release v1.0
  25. 1. Full functioning gxz
  26. 2. Add godoc URL to README.md (godoc.org)
  27. 3. Resolve all issues.
  28. 4. Define release candidates.
  29. 5. Public announcement.
  30. ## Package lzma
  31. ### Release v0.6
  32. - Rewrite Encoder into a simple greedy one-op-at-a-time encoder
  33. including
  34. + simple scan at the dictionary head for the same byte
  35. + use the killer byte (requiring matches to get longer, the first
  36. test should be the byte that would make the match longer)
  37. ## Optimizations
  38. - There may be a lot of false sharing in lzma.State; check whether this
  39. can be improved by reorganizing the internal structure of it.
  40. - Check whether batching encoding and decoding improves speed.
  41. ### DAG optimizations
  42. - Use full buffer to create minimal bit-length above range encoder.
  43. - Might be too slow (see v0.4)
  44. ### Different match finders
  45. - hashes with 2, 3 characters additional to 4 characters
  46. - binary trees with 2-7 characters (uint64 as key, use uint32 as
  47. pointers into a an array)
  48. - rb-trees with 2-7 characters (uint64 as key, use uint32 as pointers
  49. into an array with bit-steeling for the colors)
  50. ## Release Procedure
  51. - execute goch -l for all packages; probably with lower param like 0.5.
  52. - check orthography with gospell
  53. - Write release notes in doc/relnotes.
  54. - Update README.md
  55. - xb copyright . in xz directory to ensure all new files have Copyright
  56. header
  57. - VERSION=<version> go generate github.com/ulikunitz/xz/... to update
  58. version files
  59. - Execute test for Linux/amd64, Linux/x86 and Windows/amd64.
  60. - Update TODO.md - write short log entry
  61. - git checkout master && git merge dev
  62. - git tag -a <version>
  63. - git push
  64. ## Log
  65. ### 2019-02-20
  66. Release v0.5.6 supports the go.mod file.
  67. ### 2018-10-28
  68. Release v0.5.5 fixes issues #19 observing ErrLimit outputs.
  69. ### 2017-06-05
  70. Release v0.5.4 fixes issues #15 of another problem with the padding size
  71. check for the xz block header. I removed the check completely.
  72. ### 2017-02-15
  73. Release v0.5.3 fixes issue #12 regarding the decompression of an empty
  74. XZ stream. Many thanks to Tomasz Kłak, who reported the issue.
  75. ### 2016-12-02
  76. Release v0.5.2 became necessary to allow the decoding of xz files with
  77. 4-byte padding in the block header. Many thanks to Greg, who reported
  78. the issue.
  79. ### 2016-07-23
  80. Release v0.5.1 became necessary to fix problems with 32-bit platforms.
  81. Many thanks to Bruno Brigas, who reported the issue.
  82. ### 2016-07-04
  83. Release v0.5 provides improvements to the compressor and provides support for
  84. the decompression of xz files with multiple xz streams.
  85. ### 2016-01-31
  86. Another compression rate increase by checking the byte at length of the
  87. best match first, before checking the whole prefix. This makes the
  88. compressor even faster. We have now a large time budget to beat the
  89. compression ratio of the xz tool. For enwik8 we have now over 40 seconds
  90. to reduce the compressed file size for another 7 MiB.
  91. ### 2016-01-30
  92. I simplified the encoder. Speed and compression rate increased
  93. dramatically. A high compression rate affects also the decompression
  94. speed. The approach with the buffer and optimizing for operation
  95. compression rate has not been successful. Going for the maximum length
  96. appears to be the best approach.
  97. ### 2016-01-28
  98. The release v0.4 is ready. It provides a working xz implementation,
  99. which is rather slow, but works and is interoperable with the xz tool.
  100. It is an important milestone.
  101. ### 2016-01-10
  102. I have the first working implementation of an xz reader and writer. I'm
  103. happy about reaching this milestone.
  104. ### 2015-12-02
  105. I'm now ready to implement xz because, I have a working LZMA2
  106. implementation. I decided today that v0.4 will use the slow encoder
  107. using the operations buffer to be able to go back, if I intend to do so.
  108. ### 2015-10-21
  109. I have restarted the work on the library. While trying to implement
  110. LZMA2, I discovered that I need to resimplify the encoder and decoder
  111. functions. The option approach is too complicated. Using a limited byte
  112. writer and not caring for written bytes at all and not to try to handle
  113. uncompressed data simplifies the LZMA encoder and decoder much.
  114. Processing uncompressed data and handling limits is a feature of the
  115. LZMA2 format not of LZMA.
  116. I learned an interesting method from the LZO format. If the last copy is
  117. too far away they are moving the head one 2 bytes and not 1 byte to
  118. reduce processing times.
  119. ### 2015-08-26
  120. I have now reimplemented the lzma package. The code is reasonably fast,
  121. but can still be optimized. The next step is to implement LZMA2 and then
  122. xz.
  123. ### 2015-07-05
  124. Created release v0.3. The version is the foundation for a full xz
  125. implementation that is the target of v0.4.
  126. ### 2015-06-11
  127. The gflag package has been developed because I couldn't use flag and
  128. pflag for a fully compatible support of gzip's and lzma's options. It
  129. seems to work now quite nicely.
  130. ### 2015-06-05
  131. The overflow issue was interesting to research, however Henry S. Warren
  132. Jr. Hacker's Delight book was very helpful as usual and had the issue
  133. explained perfectly. Fefe's information on his website was based on the
  134. C FAQ and quite bad, because it didn't address the issue of -MININT ==
  135. MININT.
  136. ### 2015-06-04
  137. It has been a productive day. I improved the interface of lzma.Reader
  138. and lzma.Writer and fixed the error handling.
  139. ### 2015-06-01
  140. By computing the bit length of the LZMA operations I was able to
  141. improve the greedy algorithm implementation. By using an 8 MByte buffer
  142. the compression rate was not as good as for xz but already better then
  143. gzip default.
  144. Compression is currently slow, but this is something we will be able to
  145. improve over time.
  146. ### 2015-05-26
  147. Checked the license of ogier/pflag. The binary lzmago binary should
  148. include the license terms for the pflag library.
  149. I added the endorsement clause as used by Google for the Go sources the
  150. LICENSE file.
  151. ### 2015-05-22
  152. The package lzb contains now the basic implementation for creating or
  153. reading LZMA byte streams. It allows the support for the implementation
  154. of the DAG-shortest-path algorithm for the compression function.
  155. ### 2015-04-23
  156. Completed yesterday the lzbase classes. I'm a little bit concerned that
  157. using the components may require too much code, but on the other hand
  158. there is a lot of flexibility.
  159. ### 2015-04-22
  160. Implemented Reader and Writer during the Bayern game against Porto. The
  161. second half gave me enough time.
  162. ### 2015-04-21
  163. While showering today morning I discovered that the design for OpEncoder
  164. and OpDecoder doesn't work, because encoding/decoding might depend on
  165. the current status of the dictionary. This is not exactly the right way
  166. to start the day.
  167. Therefore we need to keep the Reader and Writer design. This time around
  168. we simplify it by ignoring size limits. These can be added by wrappers
  169. around the Reader and Writer interfaces. The Parameters type isn't
  170. needed anymore.
  171. However I will implement a ReaderState and WriterState type to use
  172. static typing to ensure the right State object is combined with the
  173. right lzbase.Reader and lzbase.Writer.
  174. As a start I have implemented ReaderState and WriterState to ensure
  175. that the state for reading is only used by readers and WriterState only
  176. used by Writers.
  177. ### 2015-04-20
  178. Today I implemented the OpDecoder and tested OpEncoder and OpDecoder.
  179. ### 2015-04-08
  180. Came up with a new simplified design for lzbase. I implemented already
  181. the type State that replaces OpCodec.
  182. ### 2015-04-06
  183. The new lzma package is now fully usable and lzmago is using it now. The
  184. old lzma package has been completely removed.
  185. ### 2015-04-05
  186. Implemented lzma.Reader and tested it.
  187. ### 2015-04-04
  188. Implemented baseReader by adapting code form lzma.Reader.
  189. ### 2015-04-03
  190. The opCodec has been copied yesterday to lzma2. opCodec has a high
  191. number of dependencies on other files in lzma2. Therefore I had to copy
  192. almost all files from lzma.
  193. ### 2015-03-31
  194. Removed only a TODO item.
  195. However in Francesco Campoy's presentation "Go for Javaneros
  196. (Javaïstes?)" is the the idea that using an embedded field E, all the
  197. methods of E will be defined on T. If E is an interface T satisfies E.
  198. https://talks.golang.org/2014/go4java.slide#51
  199. I have never used this, but it seems to be a cool idea.
  200. ### 2015-03-30
  201. Finished the type writerDict and wrote a simple test.
  202. ### 2015-03-25
  203. I started to implement the writerDict.
  204. ### 2015-03-24
  205. After thinking long about the LZMA2 code and several false starts, I
  206. have now a plan to create a self-sufficient lzma2 package that supports
  207. the classic LZMA format as well as LZMA2. The core idea is to support a
  208. baseReader and baseWriter type that support the basic LZMA stream
  209. without any headers. Both types must support the reuse of dictionaries
  210. and the opCodec.
  211. ### 2015-01-10
  212. 1. Implemented simple lzmago tool
  213. 2. Tested tool against large 4.4G file
  214. - compression worked correctly; tested decompression with lzma
  215. - decompression hits a full buffer condition
  216. 3. Fixed a bug in the compressor and wrote a test for it
  217. 4. Executed full cycle for 4.4 GB file; performance can be improved ;-)
  218. ### 2015-01-11
  219. - Release v0.2 because of the working LZMA encoder and decoder