Conversation
| let mut result: Vec<IndexValue<T>> = vec![]; | ||
| for interval in intervals.iter() { | ||
| for index in interval.0..=interval.1 { | ||
| let mut index_records = parse_index_page::<T, PAGE_SIZE>(file, index as u32)?; |
There was a problem hiding this comment.
Hm, it looks like for every Link we are currently seeking at page start and re-reading header and then seeking to data start. I think it's not the best way to do that
| Ok(result) | ||
| } | ||
|
|
||
| fn read_links<DataType, const PAGE_SIZE: usize>( |
There was a problem hiding this comment.
This part also looks suspicious. I think for big tables it will not work
|
@a-rodin I think we can use iterators approach. You need to implement custom ones for this task. I think there will be two iterators: index iterator and data iterator. Index iterator will load one index page and yield links. When page is read then next page is loaded into memory and parsed. So one page load will give data for many next calls. For data iterator you need to add fn that will read multiple data records from single page. It will have vec of Data iterator is harder as I think. The issue now we have is frequent re-read of data and too much seeks. I think data iterator can have next logic. It will use index iterator and will get vec of
|
|
@Handy-caT I have added three iterators, |
the2pizza
left a comment
There was a problem hiding this comment.
Looks good to me with small notes
| use super::{Interval, SpaceInfo}; | ||
|
|
||
| pub fn map_index_pages_to_general<T>(pages: Vec<IndexData<T>>) -> Vec<General<IndexData<T>>> { | ||
| let mut header = &mut GeneralHeader::new(0.into(), PageType::Index, 0.into()); |
There was a problem hiding this comment.
By my compiler warnings it shouldn't be mut
| for index in interval.0..interval.1 { | ||
| let index_page = parse_index_page::<T, PAGE_SIZE>(file, index as u32)?; | ||
| result.push(index_page.inner); | ||
| for index in interval.0..=interval.1 { |
There was a problem hiding this comment.
I'm a bit doubt when I see nested loops. For improving performance I suppose we can merge intervals to reduce amount of iterations.
This PR is a part of work on pathscale/WorkTable#24.