-
Notifications
You must be signed in to change notification settings - Fork 15
Add pop(idx) support to both single and double linked list implementations #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…elete function for clearing popped node and references
|
In my test, this does perform well wiith smaller-sized lists (as in, faster than the python base list) But still python base list is faster for larger sizes (like around 1 million elements), because the walk time takes so long. I'm currently looking into implementing a compound linked list. That is a blocked series of linked lists in an array, so for example if you had 500 elements and block size 100, you'd have 5 linked lists in an array. If you requested element 220, you'd start at idx=2 and only have to walk 20 items instead of 220. I haven't seen such before, I may submit it soon if it solves my needs (beating python list at random-access popping from large sizes) |
…ight, and call those functions. This allows us to always skip forward one walk iteration, and remove any logic for first or last. Also, on double-linked list pop, walk from the back if we are popping past middle. These changes bring the linked list random-pop benchmark much closer to the python list performance even further out (like 400 list elements, 200 pops they perform the same. Below that linked list impl's now perform better, above that list performs better (because of walk time)
|
After my updates today, it performs even better. On a 400 element list with 200 random-index pops, single-linked-list and python base list perform the same, double-linked-list is faster. Below that, the linked list impls are faster, above that python list gets faster (because walk time takes up so much of the time) |
src/dllist.c
Outdated
| self->last = (PyObject*)prev_node; | ||
| } | ||
|
|
||
| if (self->last_accessed_node == (PyObject*)del_node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a problem with this if statement. last_accessed_node and last_accessed_idx should always point at the same node. When you remove an item from a list, all nodes behind it are shifted to a lower index, so these two fields can become unsynchronized.
The simplest solution for this issue would be to unconditionally invalidate the cache like dllist_remove. Alternatively you could decrement last_accessed_idx if it's larger than the removed index (dllist_popleft does something similar).
| index = ((DLListObject*)self)->size + index; | ||
|
|
||
| /* Either a negative greater than index size, or a positive greater than size */ | ||
| if ( index < 0 || index >= ((DLListObject*)self)->size ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might reuse dllist_get_node_internal here to locate the node to remove. This function can walk the list from the last accessed element, so the following code could be faster:
item = list[idx] # remembers last acessed item
list.pop(idx)
|
Hey, Thanks for the pull request. The code looks good and a generalized Also, could you update the documentation of the Thanks, |
…ehow. Fixup 'middle', and have it be used by the pop(idx) function for now. This greatly greatly greatly speeds up double-list on large data sets
…at-optimized-but-hard-to-understand directional mod methods across the board. Cleanup some functions, remove some unused stuff
…o now DLList can extend directly with SLList and SLList can extend directly with a DLList, at a much much higher performance rate than before
ajakubek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last_accessed node and index fields are meant to improve access time for nearby items when they are fetched using subscript operator (list[index]).
This feature makes dllist extremely fast (80% speed of Python's builtin list) as long as index is changed by a small amount between accesses.
Your commit implements midpoint caching, which makes item lookup to 50% faster in general case, but at the same time completely removes the former optimization.
Compare results of the following benchmark with and without this commit to see the difference:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from collections import deque
from llist import dllist, sllist
import time
num = 100000
def measure_iteration(container):
for idx in range(len(container)):
container[idx]
for container in [deque, dllist, sllist]:
c = container(range(num))
start = time.time()
measure_iteration(c)
elapsed = time.time() - start
print "Completed %s iteration in \t\t%.8f seconds:\t %.1f ops/sec" % (
container.__name__,
elapsed,
num / elapsed)
| while ( (PyObject*)node != Py_None ) | ||
| { | ||
| if( node->value == value ) | ||
| return PyLong_FromSsize_t(idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyLong_FromSsize_t is unavailable in Python 2.5, so we need to drop support for it in README and documentation. Shouldn't be an issue (2.5 was EOLed years ago).
| /* NOTE - THIS FUNCTION DOES NOT WORK!! | ||
| * | ||
| * dllist([1, 5, 9]) has the SAME hash as dllist([5, 1, 9]) | ||
| * and thus it is NOT a hash function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that we should have chosen a better hashing function. I'll change it to something that detects simple reordering of list items.
|
Ok, to sum up the current state of this merge request. There are several things that I like. The |
…doing a popright on sllist and dllist
Also, move debug function to llist.h
Hey bud,
This patch adds .pop(idx) to both linked and double-linked list implementation.
This follows the same pattern as the standard python list, where an index can be provided (positive or negative) to the pop function to pop a specific element.
The index is optional, and if not provided defaults to popright (so backwards-compatible, and compatible with base list).
Please consider merging and re-releasing with the addition of this patch.
The primary advantage of a linked-list or double-linked-list versus an array is the ability to work on the middle without rebuilding the entire list. You already have support for inserts in the middle of the list.
This adds that benefit with removal (via "pop") to these implementations, and thus allows them to out-perform the base python list for various scenarios.
Thanks,
Tim