Skip to content
Snippets Groups Projects
  1. Jul 13, 2018
  2. Apr 27, 2018
    • kohensu's avatar
      Improve performance of Buf::get_*() (#195) · 51e435b7
      kohensu authored
      The new implementation tries to get the data directly from bytes() (this is
      possible most of the time) and if there is not enough data in bytes() use the
      previous code: copy the needed bytes in a temporary buffer before returning
      the data
      
      Here the bench results:
                                     Before                After           x-faster
      get_f32::cursor             64 ns/iter (+/- 0)    20 ns/iter (+/- 0)    3.2
      get_f32::tbuf_1             77 ns/iter (+/- 1)    34 ns/iter (+/- 0)    2.3
      get_f32::tbuf_1_costly      87 ns/iter (+/- 0)    62 ns/iter (+/- 0)    1.4
      get_f32::tbuf_2            151 ns/iter (+/- 18)  160 ns/iter (+/- 1)    0.9
      get_f32::tbuf_2_costly     180 ns/iter (+/- 2)   187 ns/iter (+/- 2)    1.0
      
      get_f64::cursor             67 ns/iter (+/- 0)    21 ns/iter (+/- 0)    3.2
      get_f64::tbuf_1             80 ns/iter (+/- 0)    35 ns/iter (+/- 0)    2.3
      get_f64::tbuf_1_costly      82 ns/iter (+/- 3)    60 ns/iter (+/- 0)    1.4
      get_f64::tbuf_2            154 ns/iter (+/- 1)   164 ns/iter (+/- 0)    0.9
      get_f64::tbuf_2_costly     170 ns/iter (+/- 2)   187 ns/iter (+/- 1)    0.9
      
      get_u16::cursor             66 ns/iter (+/- 0)    20 ns/iter (+/- 0)    3.3
      get_u16::tbuf_1             77 ns/iter (+/- 0)    35 ns/iter (+/- 0)    2.2
      get_u16::tbuf_1_costly      85 ns/iter (+/- 2)    62 ns/iter (+/- 0)    1.4
      get_u16::tbuf_2            147 ns/iter (+/- 0)   154 ns/iter (+/- 0)    1.0
      get_u16::tbuf_2_costly     160 ns/iter (+/- 1)   177 ns/iter (+/- 0)    0.9
      
      get_u32::cursor             64 ns/iter (+/- 0)    20 ns/iter (+/- 0)    3.2
      get_u32::tbuf_1             77 ns/iter (+/- 0)    35 ns/iter (+/- 0)    2.2
      get_u32::tbuf_1_costly      91 ns/iter (+/- 2)    63 ns/iter (+/- 0)    1.4
      get_u32::tbuf_2            151 ns/iter (+/- 40)  157 ns/iter (+/- 0)    1.0
      get_u32::tbuf_2_costly     162 ns/iter (+/- 0)   180 ns/iter (+/- 0)    0.9
      
      get_u64::cursor             67 ns/iter (+/- 0)    20 ns/iter (+/- 0)    3.4
      get_u64::tbuf_1             78 ns/iter (+/- 0)    35 ns/iter (+/- 1)    2.2
      get_u64::tbuf_1_costly      87 ns/iter (+/- 1)    59 ns/iter (+/- 1)    1.5
      get_u64::tbuf_2            154 ns/iter (+/- 0)   160 ns/iter (+/- 0)    1.0
      get_u64::tbuf_2_costly     168 ns/iter (+/- 0)   184 ns/iter (+/- 0)    0.9
      
      get_u8::cursor              64 ns/iter (+/- 0)    19 ns/iter (+/- 0)    3.4
      get_u8::tbuf_1              77 ns/iter (+/- 0)    35 ns/iter (+/- 0)    2.2
      get_u8::tbuf_1_costly       68 ns/iter (+/- 0)    51 ns/iter (+/- 0)    1.3
      get_u8::tbuf_2              85 ns/iter (+/- 0)    43 ns/iter (+/- 0)    2.0
      get_u8::tbuf_2_costly       75 ns/iter (+/- 0)    61 ns/iter (+/- 0)    1.2
      get_u8::option              77 ns/iter (+/- 0)    59 ns/iter (+/- 0)    1.3
      
      Improvement on the basic std::Cursor implementation are clearly visible.
      
      Other implementations are specific to the bench tests and just map a static
      slice. Different variant are:
       - tbuf_1: only one call of 'bytes()' is needed.
       - tbuf_2: two calls of 'bytes()' is needed to read more than one byte.
       - _costly version are implemented with #[inline(never)] on 'bytes()',
         'remaining()' and 'advance()'.
      
      The cases that are slower (slightly) correspond to implementations that are not
      really realistic: more than one byte is never possible in one time
      Unverified
      51e435b7
    • kohensu's avatar
      Improve performance of Buf::get_*() (#195) · e4447220
      kohensu authored
      The new implementation tries to get the data directly from bytes() (this is
      possible most of the time) and if there is not enough data in bytes() use the
      previous code: copy the needed bytes in a temporary buffer before returning
      the data
      
      Here the bench results:
                                     Before                After           x-faster
      get_f32::cursor             64 ns/iter (+/- 0)    20 ns/iter (+/- 0)    3.2
      get_f32::tbuf_1             77 ns/iter (+/- 1)    34 ns/iter (+/- 0)    2.3
      get_f32::tbuf_1_costly      87 ns/iter (+/- 0)    62 ns/iter (+/- 0)    1.4
      get_f32::tbuf_2            151 ns/iter (+/- 18)  160 ns/iter (+/- 1)    0.9
      get_f32::tbuf_2_costly     180 ns/iter (+/- 2)   187 ns/iter (+/- 2)    1.0
      
      get_f64::cursor             67 ns/iter (+/- 0)    21 ns/iter (+/- 0)    3.2
      get_f64::tbuf_1             80 ns/iter (+/- 0)    35 ns/iter (+/- 0)    2.3
      get_f64::tbuf_1_costly      82 ns/iter (+/- 3)    60 ns/iter (+/- 0)    1.4
      get_f64::tbuf_2            154 ns/iter (+/- 1)   164 ns/iter (+/- 0)    0.9
      get_f64::tbuf_2_costly     170 ns/iter (+/- 2)   187 ns/iter (+/- 1)    0.9
      
      get_u16::cursor             66 ns/iter (+/- 0)    20 ns/iter (+/- 0)    3.3
      get_u16::tbuf_1             77 ns/iter (+/- 0)    35 ns/iter (+/- 0)    2.2
      get_u16::tbuf_1_costly      85 ns/iter (+/- 2)    62 ns/iter (+/- 0)    1.4
      get_u16::tbuf_2            147 ns/iter (+/- 0)   154 ns/iter (+/- 0)    1.0
      get_u16::tbuf_2_costly     160 ns/iter (+/- 1)   177 ns/iter (+/- 0)    0.9
      
      get_u32::cursor             64 ns/iter (+/- 0)    20 ns/iter (+/- 0)    3.2
      get_u32::tbuf_1             77 ns/iter (+/- 0)    35 ns/iter (+/- 0)    2.2
      get_u32::tbuf_1_costly      91 ns/iter (+/- 2)    63 ns/iter (+/- 0)    1.4
      get_u32::tbuf_2            151 ns/iter (+/- 40)  157 ns/iter (+/- 0)    1.0
      get_u32::tbuf_2_costly     162 ns/iter (+/- 0)   180 ns/iter (+/- 0)    0.9
      
      get_u64::cursor             67 ns/iter (+/- 0)    20 ns/iter (+/- 0)    3.4
      get_u64::tbuf_1             78 ns/iter (+/- 0)    35 ns/iter (+/- 1)    2.2
      get_u64::tbuf_1_costly      87 ns/iter (+/- 1)    59 ns/iter (+/- 1)    1.5
      get_u64::tbuf_2            154 ns/iter (+/- 0)   160 ns/iter (+/- 0)    1.0
      get_u64::tbuf_2_costly     168 ns/iter (+/- 0)   184 ns/iter (+/- 0)    0.9
      
      get_u8::cursor              64 ns/iter (+/- 0)    19 ns/iter (+/- 0)    3.4
      get_u8::tbuf_1              77 ns/iter (+/- 0)    35 ns/iter (+/- 0)    2.2
      get_u8::tbuf_1_costly       68 ns/iter (+/- 0)    51 ns/iter (+/- 0)    1.3
      get_u8::tbuf_2              85 ns/iter (+/- 0)    43 ns/iter (+/- 0)    2.0
      get_u8::tbuf_2_costly       75 ns/iter (+/- 0)    61 ns/iter (+/- 0)    1.2
      get_u8::option              77 ns/iter (+/- 0)    59 ns/iter (+/- 0)    1.3
      
      Improvement on the basic std::Cursor implementation are clearly visible.
      
      Other implementations are specific to the bench tests and just map a static
      slice. Different variant are:
       - tbuf_1: only one call of 'bytes()' is needed.
       - tbuf_2: two calls of 'bytes()' is needed to read more than one byte.
       - _costly version are implemented with #[inline(never)] on 'bytes()',
         'remaining()' and 'advance()'.
      
      The cases that are slower (slightly) correspond to implementations that are not
      really realistic: more than one byte is never possible in one time
      e4447220
  3. Mar 12, 2018
    • Carl Lerche's avatar
      Fix `copy_to_slice` to use correct increment var · ebe52273
      Carl Lerche authored
      This patch fixes the `copy_to_slice` function, rectifying the logic.
      However, the incorrect code does not result in incorrect behavior as the
      only case `cnt != src.len()` is during the final iteration, and since
      `src.len()` is greater than `cnt` in that case, `off` will be
      incremented by too much, but this will still trigger the `off <
      dst.len()` condition.
      
      The only danger is `src.len()` could cause an overflow.
      Unverified
      ebe52273
    • Sean McArthur's avatar
      Remove ByteOrder generic methods from Buf and BufMut (#187) · 025d5334
      Sean McArthur authored
      * make Buf and BufMut usable as trait objects
      
      - All the `get_*` and `put_*` methods that take `T: ByteOrder` have
        a `where Self: Sized` bound added, so that they are only usable from
        sized types. It was impossible to make `Buf` or `BufMut` into trait
        objects before, so this change doesn't break anyone.
      - Add `get_n_be`/`get_n_le`/`put_n_be`/`put_n_le` methods that can be
        used on trait objects.
      - Deprecate the export of `ByteOrder` and methods generic on it.
      
      * remove deprecated ByteOrder methods
      
      Removes the `_be` suffix from all methods, implying that the default
      people should use is network endian.
      025d5334
    • Sean McArthur's avatar
      Make Buf and BufMut usable as trait objects (#186) · ce79f0a2
      Sean McArthur authored
      - All the `get_*` and `put_*` methods that take `T: ByteOrder` have
        a `where Self: Sized` bound added, so that they are only usable from
        sized types. It was impossible to make `Buf` or `BufMut` into trait
        objects before, so this change doesn't break anyone.
      - Add `get_n_be`/`get_n_le`/`put_n_be`/`put_n_le` methods that can be
        used on trait objects.
      - Deprecate the export of `ByteOrder` and methods generic on it.
      
      Fixes #163 
      ce79f0a2
  4. Jan 26, 2018
  5. Jun 27, 2017
  6. May 24, 2017
  7. Apr 30, 2017
  8. Mar 19, 2017
    • Carl Lerche's avatar
      Clarify when `BufMut::bytes_mut` can return &[] · bed128b2
      Carl Lerche authored
      Closes #79
      bed128b2
    • Dan Burkert's avatar
      Add inline attributes to Vec's MutBuf methods (#80) · 5a265cc8
      Dan Burkert authored
      I found this significantly improved a
      [benchmark](https://gist.github.com/danburkert/34a7d6680d97bc86dca7f396eb8d0abf)
      which calls `bytes_mut`, writes 1 byte, and advances the pointer with
      `advance_mut` in a pretty tight loop. In particular, it seems to be the
      inline annotation on `bytes_mut` which had the most effect. I also took
      the opportunity to simplify the bounds checking in advance_mut.
      
      before:
      
      ```
      test encode_varint_small  ... bench:         540 ns/iter (+/- 85) = 1481 MB/s
      ```
      
      after:
      
      ```
      test encode_varint_small  ... bench:         422 ns/iter (+/- 24) = 1895 MB/s
      ```
      
      As you can see, the variance is also significantly improved.
      
      Interestingly, I tried to change the last statement in `bytes_mut` from
      
      ```
      &mut slice::from_raw_parts_mut(ptr, cap)[len..]
      ```
      
      to
      
      ```
      slice::from_raw_parts_mut(ptr.offset(len as isize), cap - len)
      ```
      
      but, this caused a very measurable perf regression (almost completely
      negating the gains from marking bytes_mut inline).
      5a265cc8
    • Dan Burkert's avatar
      Clarify BufMut::advance_mut docs (#78) · 4fe4e942
      Dan Burkert authored
      Also fixes an issue with a line wrap in the middle of an inline code
      block.
      4fe4e942
  9. Mar 16, 2017
  10. Mar 07, 2017
    • Carl Lerche's avatar
      Remove buf::Source in favor of buf::IntoBuf · 06b94c55
      Carl Lerche authored
      The `Source` trait was essentially covering the same case as `IntoBuf`,
      so remove it.
      
      While technically a breaking change, this should not have any impact due
      to:
      
      1) There are no reverse dependencies that currently depend on `bytes`
      2) Source was not supposed to be implemented externally
      3) IntoBuf provides the same implementations as `Source`
      
      Given these points, the change should be safe to apply.
      06b94c55
    • Carl Lerche's avatar
      Provide Debug impls for all types · d70f575a
      Carl Lerche authored
      d70f575a
  11. Mar 02, 2017
  12. Mar 01, 2017
  13. Feb 28, 2017
  14. Feb 17, 2017
  15. Feb 16, 2017
  16. Feb 15, 2017
  17. Feb 03, 2017
  18. Nov 22, 2016
  19. Nov 21, 2016
  20. Nov 03, 2016
  21. Nov 02, 2016
    • Carl Lerche's avatar
      Remove default for SliceBuf<T> · 11fe277c
      Carl Lerche authored
      11fe277c
    • Carl Lerche's avatar
      Restructure and trim down the library · 57e84f26
      Carl Lerche authored
      This commit is a significant overhaul of the library in an effort to head
      towards a stable API. The rope implementation as well as a number of buffer
      implementations have been removed from the library and will live at
      https://github.com/carllerche/bytes-more while they incubate.
      
      **Bytes / BytesMut**
      
      `Bytes` is now an atomic ref counted byte slice. As it is contigous, it offers
      a richer API than before.
      
      `BytesMut` is a mutable variant. It is safe by ensuring that it is the only
      handle to a given byte slice.
      
      **AppendBuf -> ByteBuf**
      
      `AppendBuf` has been replaced by `ByteBuf`. The API is not identical, but is
      close enough to be considered a suitable replacement.
      
      **Removed types**
      
      The following types have been removed in favor of living in bytes-more
      
      * RingBuf
      * BlockBuf
      * `Bytes` as a rope implementation
      * ReadExt
      * WriteExt
      57e84f26
  22. Sep 23, 2016
  23. Sep 22, 2016
  24. Sep 20, 2016
Loading