-
Notifications
You must be signed in to change notification settings - Fork 1
Description
In the example below, a Deque is created using PersistentDirectoryClient with "mydir" as the storage dir. 3 items are added to the deque. Then all the items are printed. Lastly, 1 item ('Zero') is removed using popleft(). The result should be 2 remaining items in the Deque ('First', and 'Second'). There should be 2 files inside "mydir" representing the remaining items.
Prior to this test, modify iterables/clients.py#L82 to be "w+b" to fix #16
$ ls -l mydir/
total 0
$ cat disk_issue.py
from functools import partial
from diskcollections.serializers import PickleSerializer
from diskcollections.iterables import Deque, PersistentDirectoryClient
pdc = partial(PersistentDirectoryClient,"mydir")
mydeque = partial(
Deque,
client_class=pdc,
serializer_class=PickleSerializer
)
queue = mydeque()
# Add strings
queue.append("Zero")
queue.append("First")
queue.append("Second")
# Inspect the contents
print("Contents of the deque:")
for item in queue:
print(f"- {item}")
print("POPPING LEFT")
popped = queue.popleft()
print(f"POPPED: {popped}")
Result:
Contents of the deque:
- Zero
- First
- Second
POPPING LEFT
Traceback (most recent call last):
File "/Users/utdrmac/pdctest/disk.py", line 27, in <module>
popped = queue.popleft()
File "/Users/utdrmac/pdctest/python-disk-collections/src/diskcollections/iterables/iterables.py", line 228, in popleft
del self[0]
~~~~^^^
File "/Users/utdrmac/pdctest/python-disk-collections/src/diskcollections/iterables/iterables.py", line 184, in __delitem__
del self.__client[idx]
~~~~~~~~~~~~~^^^^^
File "/Users/utdrmac/pdctest/python-disk-collections/src/diskcollections/iterables/clients.py", line 134, in __delitem__
file = open(file_path, mode="r+")
FileNotFoundError: [Errno 2] No such file or directory: 'mydir/1'
$ ls -l mydir/
total 8
-rw-r--r-- 1 utdrmac staff 21 May 29 08:03 0
(pdctest) [utdrmac@test1 pdctest]$ cat mydir/0
��
�Second�.
This is very much incorrect. The file for the 2nd entry, "First", is gone, resulting in data loss. I believe the issue is in iterables/clients.py delitem function for PersistentDirectoryClient.
for i in range(len(self.__files))[::-1]:
if i < index:
continue
self.__files[i].close()
old_file_path = self.get_file_path(i + 1)
new_file_path = self.get_file_path(i)
os.rename(old_file_path, new_file_path)
This loop processes remaining files in reverse order which causes the data loss. After adding the 3 entries, you have files mydir/0, mydir/1, mydir/2. When you popleft(), mydir/0 is removed and the length of __files decrements to 2. range(2)[::-1] generates the sequence 1, 0. old_file_path is now 1+1, 2 and new_file_path is 1. Rename mydir/2 to mydir/1. That rename results in data loss due to overriding the contents of mydir/1.