Add pop(idx) support to both single and double linked list implementations #2

kata198 · 2017-04-01T05:30:05Z

Hey bud,

This patch adds .pop(idx) to both linked and double-linked list implementation.

This follows the same pattern as the standard python list, where an index can be provided (positive or negative) to the pop function to pop a specific element.

The index is optional, and if not provided defaults to popright (so backwards-compatible, and compatible with base list).

Please consider merging and re-releasing with the addition of this patch.

The primary advantage of a linked-list or double-linked-list versus an array is the ability to work on the middle without rebuilding the entire list. You already have support for inserts in the middle of the list.
This adds that benefit with removal (via "pop") to these implementations, and thus allows them to out-perform the base python list for various scenarios.

Thanks,
Tim

…ests

…elete function for clearing popped node and references

kata198 · 2017-04-01T05:51:04Z

In my test, this does perform well wiith smaller-sized lists (as in, faster than the python base list)

But still python base list is faster for larger sizes (like around 1 million elements), because the walk time takes so long.

I'm currently looking into implementing a compound linked list. That is a blocked series of linked lists in an array, so for example if you had 500 elements and block size 100, you'd have 5 linked lists in an array. If you requested element 220, you'd start at idx=2 and only have to walk 20 items instead of 220.

I haven't seen such before, I may submit it soon if it solves my needs (beating python list at random-access popping from large sizes)

…both directions

…ight, and call those functions. This allows us to always skip forward one walk iteration, and remove any logic for first or last. Also, on double-linked list pop, walk from the back if we are popping past middle. These changes bring the linked list random-pop benchmark much closer to the python list performance even further out (like 400 list elements, 200 pops they perform the same. Below that linked list impl's now perform better, above that list performs better (because of walk time)

kata198 · 2017-04-02T04:40:12Z

After my updates today, it performs even better. On a 400 element list with 200 random-index pops, single-linked-list and python base list perform the same, double-linked-list is faster. Below that, the linked list impls are faster, above that python list gets faster (because walk time takes up so much of the time)

ajakubek · 2017-04-02T16:57:28Z

src/dllist.c

        self->last = (PyObject*)prev_node;
    }

+    if (self->last_accessed_node == (PyObject*)del_node)


There is a problem with this if statement. last_accessed_node and last_accessed_idx should always point at the same node. When you remove an item from a list, all nodes behind it are shifted to a lower index, so these two fields can become unsynchronized.
The simplest solution for this issue would be to unconditionally invalidate the cache like dllist_remove. Alternatively you could decrement last_accessed_idx if it's larger than the removed index (dllist_popleft does something similar).

ajakubek · 2017-04-02T17:05:24Z

src/dllist.c

+        index = ((DLListObject*)self)->size + index;
+
+    /* Either a negative greater than index size, or a positive greater than size */
+    if ( index < 0 || index >= ((DLListObject*)self)->size )


You might reuse dllist_get_node_internal here to locate the node to remove. This function can walk the list from the last accessed element, so the following code could be faster:

item = list[idx] # remembers last acessed item list.pop(idx)

ajakubek · 2017-04-02T17:22:41Z

Hey,

Thanks for the pull request. The code looks good and a generalized pop method can indeed be useful. I'll merge your branch once the problem with last accessed item cache is resolved. Would you like to provide a patch for this issue?

Also, could you update the documentation of the pop method? It's in docs/index.rst, you can build it with make docs.

Thanks,
Adam

…ehow. Fixup 'middle', and have it be used by the pop(idx) function for now. This greatly greatly greatly speeds up double-list on large data sets

…at-optimized-but-hard-to-understand directional mod methods across the board. Cleanup some functions, remove some unused stuff

…o now DLList can extend directly with SLList and SLList can extend directly with a DLList, at a much much higher performance rate than before

ajakubek

The last_accessed node and index fields are meant to improve access time for nearby items when they are fetched using subscript operator (list[index]).
This feature makes dllist extremely fast (80% speed of Python's builtin list) as long as index is changed by a small amount between accesses.

Your commit implements midpoint caching, which makes item lookup to 50% faster in general case, but at the same time completely removes the former optimization.

Compare results of the following benchmark with and without this commit to see the difference:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from collections import deque
from llist import dllist, sllist
import time

num = 100000

def measure_iteration(container):
    for idx in range(len(container)):
        container[idx]

for container in [deque, dllist, sllist]:
    c = container(range(num))
    start = time.time()
    measure_iteration(c)
    elapsed = time.time() - start
    print "Completed %s iteration in \t\t%.8f seconds:\t %.1f ops/sec" % (
        container.__name__,
        elapsed,
        num / elapsed)

ajakubek · 2017-04-09T18:21:06Z

src/dllist.c

+    while ( (PyObject*)node != Py_None )
+    {
+        if( node->value == value )
+            return PyLong_FromSsize_t(idx);


PyLong_FromSsize_t is unavailable in Python 2.5, so we need to drop support for it in README and documentation. Shouldn't be an issue (2.5 was EOLed years ago).

ajakubek · 2017-04-09T18:49:13Z

src/dllist.c

+/* NOTE - THIS FUNCTION DOES NOT WORK!!
+*
+*   dllist([1, 5, 9]) has the SAME hash as dllist([5, 1, 9])
+*     and thus it is NOT a hash function


You are right that we should have chosen a better hashing function. I'll change it to something that detects simple reordering of list items.

ajakubek · 2017-04-09T19:41:08Z

Ok, to sum up the current state of this merge request. There are several things that I like. The pop, index and rindex functions are definitely useful and I'd be happy to merge them. I can also pull the minor fixes and the refactoring into generic LListObject.
However, the part which replaces caching of recently accessed item with the middle one is problematic. It causes significant regression in performance for use cases which had been actually documented to be fast. I'm a bit wary to merge such change.
I can cherry-pick the other commits into my repository and close this PR next weekend. Or you can resubmit the merge request without the removal of index caching, and then I'll just pull your branch.

…doing a popright on sllist and dllist

…nal from loop)

Also, move debug function to llist.h

…o fixup manifest

kata198 added 7 commits April 1, 2017 00:55

Add pop function with an index argument to single-linked list, with t…

28fe14f

…ests

Add pop with index arg to double-linked list

4c1b144

Fix warnings on sllist

8181feb

Fix warnings in dllist

7c5f375

Add copyright note

8f7b965

Make help strings the same for pop function

1acfe14

Clear last_access_node if we need to in idx-pop, and use dllistnode_d…

c8e7184

…elete function for clearing popped node and references

kata198 added 2 commits April 2, 2017 00:16

Add some more tests, test each pop in the full-pop-out, and pop from …

9195502

…both directions

kata198 added 3 commits April 2, 2017 02:04

Add 'middle' to double-linked list

fc16922

We are already relinking at the bottom, so get rid of these.

45a6121

Merge branch 'master' into 0.5branch

ee71daf

ajakubek reviewed Apr 2, 2017

View reviewed changes

kata198 added 13 commits April 2, 2017 13:28

Remove the last_accessed node and index from dllist. It is broken som…

7898e64

…ehow. Fixup 'middle', and have it be used by the pop(idx) function for now. This greatly greatly greatly speeds up double-list on large data sets

Move out magic number for when middle index begins into a define

7cf6f18

Add benchmark script

4aae917

Add benchmark script

7ab2e91

Remove some noise form llist_test

f3ed019

Add more test for node_at

e308f6f

double-linked list: Use middle on node-access, when available.

020e197

Use consistant generic adjustment method, instead of the maybe-somewh…

4ef4284

…at-optimized-but-hard-to-understand directional mod methods across the board. Cleanup some functions, remove some unused stuff

Add executable bit to files, and change setup.py to using env python

3e2a674

Skip a compare and jump straight, and other micro optimizations in pop

eb71b31

Add inline function to check and conditionally clear middle

8315705

Use check instead of do inline function here

8970d59

Remove errant return

97260aa

kata198 added 4 commits April 7, 2017 15:34

Improve extend and extendleft, higher performance all-around, and als…

d82f362

…o now DLList can extend directly with SLList and SLList can extend directly with a DLList, at a much much higher performance rate than before

Add benchmark_extend

f9f4310

Update ChangeLog

05aa302

Merge branch '0.5branch'

0c84350

ajakubek reviewed Apr 9, 2017

View reviewed changes

kata198 added 22 commits April 9, 2017 21:37

Dlist implementation of slicing / subscript

8011204

Slist implementation of slicing / subscript

058d829

Move some common stuff to llist.h and llist_types.h

ab85a81

Add tilda files to gitignore

c99fad1

Update Changes

324bb60

Some minor cleanups

febdbf7

Update speed_test - Fix for python3, and fix popleft... was actually …

4acb06a

…doing a popright on sllist and dllist

speed_text: fix columns to always line up and make speed_test executable

d95db50

Implement contains method

2702fa9

Add two more benchmarks

6f00302

Fix warning

3d3d416

Greatly improve performance with extending on dllist

bba344a

Greatly improve performance of extending on sllist

9ea5e9e

Simplify extending by pulling out the first set (removes the conditio…

4d452b8

…nal from loop)

Rename this fork to cllist, bump to version 1.0.0 to prepare for release

593f2cd

Add __version__ and __version_tuple__

f4faf4e

Fix warnings and fix normalize_indexes for python2

6ff909e

Also, move debug function to llist.h

Rename READMEs to match markdown (for github) and rst (for pypi). Als…

3f45f02

…o fixup manifest

Add pydocs

22d953c

Fixup setup.py

0b6b649

Add my name as an author

1156292

Update READMEs

1fe6810

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add pop(idx) support to both single and double linked list implementations #2

Add pop(idx) support to both single and double linked list implementations #2

Uh oh!

kata198 commented Apr 1, 2017 •

edited

Loading

Uh oh!

kata198 commented Apr 1, 2017

Uh oh!

kata198 commented Apr 2, 2017

Uh oh!

ajakubek Apr 2, 2017

Uh oh!

ajakubek Apr 2, 2017

Uh oh!

ajakubek commented Apr 2, 2017

Uh oh!

ajakubek left a comment

Uh oh!

ajakubek Apr 9, 2017

Uh oh!

ajakubek Apr 9, 2017

Uh oh!

ajakubek commented Apr 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add pop(idx) support to both single and double linked list implementations #2

Are you sure you want to change the base?

Add pop(idx) support to both single and double linked list implementations #2

Uh oh!

Conversation

kata198 commented Apr 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kata198 commented Apr 1, 2017

Uh oh!

kata198 commented Apr 2, 2017

Uh oh!

ajakubek Apr 2, 2017

Choose a reason for hiding this comment

Uh oh!

ajakubek Apr 2, 2017

Choose a reason for hiding this comment

Uh oh!

ajakubek commented Apr 2, 2017

Uh oh!

ajakubek left a comment

Choose a reason for hiding this comment

Uh oh!

ajakubek Apr 9, 2017

Choose a reason for hiding this comment

Uh oh!

ajakubek Apr 9, 2017

Choose a reason for hiding this comment

Uh oh!

ajakubek commented Apr 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kata198 commented Apr 1, 2017 •

edited

Loading