-
- Downloads
Improve performance of Buf::get_*() (#195)
The new implementation tries to get the data directly from bytes() (this is possible most of the time) and if there is not enough data in bytes() use the previous code: copy the needed bytes in a temporary buffer before returning the data Here the bench results: Before After x-faster get_f32::cursor 64 ns/iter (+/- 0) 20 ns/iter (+/- 0) 3.2 get_f32::tbuf_1 77 ns/iter (+/- 1) 34 ns/iter (+/- 0) 2.3 get_f32::tbuf_1_costly 87 ns/iter (+/- 0) 62 ns/iter (+/- 0) 1.4 get_f32::tbuf_2 151 ns/iter (+/- 18) 160 ns/iter (+/- 1) 0.9 get_f32::tbuf_2_costly 180 ns/iter (+/- 2) 187 ns/iter (+/- 2) 1.0 get_f64::cursor 67 ns/iter (+/- 0) 21 ns/iter (+/- 0) 3.2 get_f64::tbuf_1 80 ns/iter (+/- 0) 35 ns/iter (+/- 0) 2.3 get_f64::tbuf_1_costly 82 ns/iter (+/- 3) 60 ns/iter (+/- 0) 1.4 get_f64::tbuf_2 154 ns/iter (+/- 1) 164 ns/iter (+/- 0) 0.9 get_f64::tbuf_2_costly 170 ns/iter (+/- 2) 187 ns/iter (+/- 1) 0.9 get_u16::cursor 66 ns/iter (+/- 0) 20 ns/iter (+/- 0) 3.3 get_u16::tbuf_1 77 ns/iter (+/- 0) 35 ns/iter (+/- 0) 2.2 get_u16::tbuf_1_costly 85 ns/iter (+/- 2) 62 ns/iter (+/- 0) 1.4 get_u16::tbuf_2 147 ns/iter (+/- 0) 154 ns/iter (+/- 0) 1.0 get_u16::tbuf_2_costly 160 ns/iter (+/- 1) 177 ns/iter (+/- 0) 0.9 get_u32::cursor 64 ns/iter (+/- 0) 20 ns/iter (+/- 0) 3.2 get_u32::tbuf_1 77 ns/iter (+/- 0) 35 ns/iter (+/- 0) 2.2 get_u32::tbuf_1_costly 91 ns/iter (+/- 2) 63 ns/iter (+/- 0) 1.4 get_u32::tbuf_2 151 ns/iter (+/- 40) 157 ns/iter (+/- 0) 1.0 get_u32::tbuf_2_costly 162 ns/iter (+/- 0) 180 ns/iter (+/- 0) 0.9 get_u64::cursor 67 ns/iter (+/- 0) 20 ns/iter (+/- 0) 3.4 get_u64::tbuf_1 78 ns/iter (+/- 0) 35 ns/iter (+/- 1) 2.2 get_u64::tbuf_1_costly 87 ns/iter (+/- 1) 59 ns/iter (+/- 1) 1.5 get_u64::tbuf_2 154 ns/iter (+/- 0) 160 ns/iter (+/- 0) 1.0 get_u64::tbuf_2_costly 168 ns/iter (+/- 0) 184 ns/iter (+/- 0) 0.9 get_u8::cursor 64 ns/iter (+/- 0) 19 ns/iter (+/- 0) 3.4 get_u8::tbuf_1 77 ns/iter (+/- 0) 35 ns/iter (+/- 0) 2.2 get_u8::tbuf_1_costly 68 ns/iter (+/- 0) 51 ns/iter (+/- 0) 1.3 get_u8::tbuf_2 85 ns/iter (+/- 0) 43 ns/iter (+/- 0) 2.0 get_u8::tbuf_2_costly 75 ns/iter (+/- 0) 61 ns/iter (+/- 0) 1.2 get_u8::option 77 ns/iter (+/- 0) 59 ns/iter (+/- 0) 1.3 Improvement on the basic std::Cursor implementation are clearly visible. Other implementations are specific to the bench tests and just map a static slice. Different variant are: - tbuf_1: only one call of 'bytes()' is needed. - tbuf_2: two calls of 'bytes()' is needed to read more than one byte. - _costly version are implemented with #[inline(never)] on 'bytes()', 'remaining()' and 'advance()'. The cases that are slower (slightly) correspond to implementations that are not really realistic: more than one byte is never possible in one time
Loading
Please register or sign in to comment