Conversation
|
Typo in README code example - sorted not replaced with shuffled. |
|
I like the idea but the implementation seems pretty intense and not the most obvious so I'm having a bit of trouble figuring out why. What are the advantages of all this over creating a sequence of iterators into the |
|
It's zero-cost by memory. Every step is just some simple asm operations (mask, shift). If you use std::shuffle - you must have enough memory to store all shuffled set. In the presented case we can shuffle, for example, very big files line-by-line, or rows in big SQL tables (...if we don't care about speed of random seeking through it. If we do - we can use Mixed Product - it's sequential). Off course if all shuffled data already are in the memory - we have random-access iterators (through dumb_advance), so shuffling will be as fast as std::shuffle. |
|
Well that does sound pretty cool. Can I see the IPv4 shuffling code? |
|
In this test I use IPv4 pseudo container and shuffle through it: rators project will be overwritten to use cppitertools::suffled and cppitertools::mixed_product soon |
1. There was a bug. When we approximate, for example 10 with power of two - we got 16 (2^4). So register size must be 4. But instead register if size 5 was used. It's not efficient. 2. Even we have efficient std::distance and operator+(int n) functions dumb_advanced and dumb_size doesn't uses them, so they was replaced with std::advance and std::distance accordingly
|
IPv4 shuffling with help of cppitertools and https://github.com/hoxnox/iptools with zero memory cost: for (auto i : iter::shuffled(cidr_v4("0.0.0.0/0")))
void(0);
|
|
I'm sorry this is taking me so long to get through. I'm working on just language stuff for the most part at this time. You have a point in your test file to make sure "shuffled not store container inside" but it needs to in the case of an rvalue Why did you make a point of this? I've made the change on my checkout on the branch but I want to make sure I'm not missing something. Why do we need the check at the top of What is being gained by having |
|
|
1. Store the shuffled container internally 2. Removed unnecessary checking in operator++ 3. Distance is always uint64_t
|
You convinced me. =) |
|
I tested the code in 2 projects. It successfully runs in production under Gentoo-amd64, Debian-jessie-x64 and Debian-jessie-i386. |
Allows iteration over a sequence in shuffled order. Randomization released through Linear Feedback Shift Register.
Additional convinient feature - ability to restore iterator state with zero cost (not present in README - see tests).