Skip to content

Conversation

@n-peugnet
Copy link
Contributor

Hi @jamesmoss i'm so glad you merged my previous PR so fast. I started to implement indexes. but I still don't feel like the job is finished. I make this PR as a draft so you can give me your opinion.

I only did HashIndexes yet but I plan to make some more. I'm thinking for example about a FulltextIndex that could index words in strings.

to use an ugly json encoding for the indexes
by removing add and remove functions as only update is required
update and delte functions of the repo were not tested and didn't work
this is now fixed
these two operators are not very efficient so i'm not sure to
keep them in the hashindex, but at least they are working
- move $data initialisation into concrete StoredIndex extensions
- add constructor definition to IndexInterface to make sure it is
  compatible with the initialisation in Repository
- completely abstract filesystem from HashIndex (constructor)
- removed operators '===' and '!==' as I don't think we can implement
  them with the current HashIndex data structure. (We could maybe write
  the type in the key to implement this)
- change test accordingly
- add inconsistent data test
- simplify index generation from document files
use an array instead of an stdClass to store the data
+ fix tests for php5.3
@n-peugnet n-peugnet marked this pull request as ready for review December 2, 2019 19:49
@n-peugnet
Copy link
Contributor Author

n-peugnet commented Dec 3, 2019

I still have one problem with this approach:

Before, it was possible to edit the values of the document from an external source and still have coherent results. This is not possible anymore when the query is using the index. I'm wondering how big of a problem it is.
I guess if one wants to be able to do it, one could just don't add any index.
Maybe I could also add a function to regenerate the indexes to make sure they are in sync with the documents.

Or I could also test the diff between the index file date and the repository's directory date.
edit: This is not going to work in every situation

thing left to do (maybe)

  • compare dates to regenerate index
  • cache index (I am not sure how to do this) or add a cached bloom filter

@jamesmoss
Copy link
Owner

Before, it was possible to edit the values of the document from an external source and still have coherent results.

For me this was a major design decision and a benefit of Flywheel. I know indexes are opt-in right now which is good but I'd been keen to not lose that and still get the performance improvements.

Can you think of another approach we could take? it might be impossible and I might be asking too much here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants