C++ API. When the row I seek to is in the same row-group as the current row, why don't use the skip function directly, but instead seek to the row-group again and then skip?

We use orc as the storage format for our real-time data warehouse, our online query will have a lot of random reads and frequent seeks.  We found that a lot of time is consumed in SeekToRowGroup and Skip. 
Many of our target rows in multiple seeks are in the same row group, This leads to the problem in my title.
For example, there is an online query, we need to read the data of row 100 and row 130,
The current behavior is
1. SeekToRowGroup
2. Skip(100)
3. Next(1)
4. SeekToRowGroup
5. Skip(130)
6. Next(1)

Why not
1. SeekToRowGroup
2. Skip(100)
3. Next(1)
4. Skip(29)
5. Next(1)
We simply modified the code and found that in our scenario it can bring at least 50% read performance benefits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

C++ API. When the row I seek to is in the same row-group as the current row, why don't use the skip function directly, but instead seek to the row-group again and then skip? #2084

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

C++ API. When the row I seek to is in the same row-group as the current row, why don't use the skip function directly, but instead seek to the row-group again and then skip? #2084

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions