-
Notifications
You must be signed in to change notification settings - Fork 17
Marko doc #621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Marko doc #621
Changes from all commits
8bc43cf
09db35d
85f46fa
250a247
86f6fc7
550d74c
c6b6fec
433d869
b70823e
58d9823
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| { | ||
| "label": "Best Practice", | ||
| "position": 125 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are both folders at position 125? |
||
| } | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,166 @@ | ||||||
| --- | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add an index.md file to both folders. This will serve as an introduction. The user will see it when they click the folder in the side menu, so use it to provide context and tell the reader what to expect in this section. It doesn't have to be long. |
||||||
| sidebar_position: 50 | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are there multiple sidebar_positions at 50? Please put them in numerical order, preferably using two-digit numbers (10, 20, 30). This makes it easier to deal add files in later. |
||||||
| title: LIMIT and OFFSET vs Cursor API and hasMore | ||||||
| --- | ||||||
|
|
||||||
| Queries that return large volumes of data may require more than one response to provide the complete results. A common method is using the `LIMIT` and `OFFSET` statements in the query. However, this is not the optimized approach. When these options are used, each time the query is invoked query processing is performed. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use "might" rather than "may" to refer to the possibility that something could happen. "May" suggests permission, as in, "You may submit that PR."
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This introduction does not give me any idea what to expect as I read on through the text. If feel like I got dropped in the deep end of a pool blindfolded. Can you use simpler language to start out and explain what is going on? I gather that you are talking to someone using queries that return large volumes of data, perhaps something like "Manage Large Query Returns" as a title? |
||||||
|
|
||||||
| A better approach is to use the `cursor` API. The response from the cursor API call contains a boolean attribute, `hasMore`. If `hasMore` is true, the next batch of records will be ready on the server. In subsequent calls, the records are returned from the last position. | ||||||
|
|
||||||
| The `batchSize` parameter can be set in the `cursor` API to configure the number of documents returned per request. The `batchSize` default is `100` and the maximum value is `1000`. | ||||||
|
|
||||||
| **Query with `LIMIT` and `OFFSET`** | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do not use bold in place of headings. The h1 will be set by the topic title, so the next one down in the hierarchy is h2. |
||||||
|
|
||||||
| ```sql | ||||||
| FOR car in Cars | ||||||
| FILTER car.type == 'SUV' | ||||||
| SORT car._key DESC | ||||||
| LIMIT 0, 3 | ||||||
| RETURN car | ||||||
| ``` | ||||||
|
|
||||||
| **Query without `LIMIT` and `OFFSET`** | ||||||
|
|
||||||
| ```sql | ||||||
| FOR car in Cars | ||||||
| FILTER car.type == 'SUV' | ||||||
| SORT car._key DESC | ||||||
| RETURN car | ||||||
| ``` | ||||||
|
|
||||||
| **Cursor API Request Example (create cursor)** | ||||||
|
|
||||||
| The request body has attributes for `batchSize`, `bindVars`, `count`, `query`, and `options`. The `options` attribute can receive several different key/value pairs. To utilize `hasMore` we will need to include the `stream` attribute set to `true`. Note the default value for `stream` is `false`. | ||||||
|
|
||||||
| ``` | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Always indicate what kind of code is in the code block. |
||||||
| curl -X 'POST' \ | ||||||
| 'https://api-gdn.paas.macrometa.io/_fabric/_system/_api/cursor' \ | ||||||
| -H 'accept: application/json' \ | ||||||
| -H 'Content-Type: application/json' \ | ||||||
| -H 'Authorization: bearer <token>' \ | ||||||
| -d '{ | ||||||
| "batchSize": 100, | ||||||
| "bindVars": {}, | ||||||
| "options": { | ||||||
| "stream": true | ||||||
| }, | ||||||
| "query": "FOR car in Cars FILTER car.type == '\''SUV'\'' SORT car._key DESC RETURN car", | ||||||
| "ttl": 30 | ||||||
| }' | ||||||
| ``` | ||||||
|
|
||||||
| **Cursor API Response Example (create cursor)** | ||||||
|
|
||||||
| The response from the Cursor API request will contain several attributes. The most important to this example are `hasMore` and `id`. The `id` identifies the cursor during subsequent requests and `hasMore` is a boolean value to indicate whether there are more results to be retrieved. | ||||||
|
|
||||||
| ``` | ||||||
| { | ||||||
| "result": [ | ||||||
| { | ||||||
| { | ||||||
| "_id": "Cars/377189715", | ||||||
| "_key": "377189715", | ||||||
| "_rev": "_eWFT8Eu--_", | ||||||
| "customer_id": 994, | ||||||
| "make": "Jeep", | ||||||
| "model": "Wagoneer", | ||||||
| "type": "SUV", | ||||||
| "year": 2022 | ||||||
| }, | ||||||
| ... | ||||||
| { | ||||||
| "_id": "Cars/377187243", | ||||||
| "_key": "377187243", | ||||||
| "_rev": "_eWFTXbS--_", | ||||||
| "customer_id": 890, | ||||||
| "make": "Volkswagen", | ||||||
| "model": "Atlas", | ||||||
| "type": "SUV", | ||||||
| "year": 2021 | ||||||
| } | ||||||
| ], | ||||||
| "hasMore": true, // shows if there are more results | ||||||
| "id": "463970894", // identifies the cursor to return the next batch | ||||||
| "count": 195, | ||||||
| "extra": { | ||||||
| "stats": { | ||||||
| "writesExecuted": 0, | ||||||
| "writesIgnored": 0, | ||||||
| "scannedFull": 0, | ||||||
| "scannedIndex": 195, | ||||||
| "filtered": 0, | ||||||
| "httpRequests": 0, | ||||||
| "executionTime": 0.0004, | ||||||
| "peakMemoryUsage": 100 | ||||||
| }, | ||||||
| "warnings": [] | ||||||
| }, | ||||||
| "cached": false, | ||||||
| "error": false, | ||||||
| "code": 201 | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| **Cursor API Request Example (read next batch)** | ||||||
|
|
||||||
| ```bash | ||||||
| curl -X PUT "https://api-gdn.pass.macrometa.io/_fabric/_system/_api/cursor/463970894" \ | ||||||
| -H "Authorization: bearer <token>" | ||||||
| ``` | ||||||
|
|
||||||
| **Cursor API Response Example (read next batch)** | ||||||
|
|
||||||
| When the `hasMore` value is false there are no further results to return from the server. | ||||||
|
|
||||||
| ```json | ||||||
| { | ||||||
| "result": [ | ||||||
| { | ||||||
| "_id": "Cars/377176738", | ||||||
| "_key": "377176738", | ||||||
| "_rev": "_eWFQ7di--_", | ||||||
| "customer_id": 345, | ||||||
| "make": "Toyota", | ||||||
| "model": "RAV4", | ||||||
| "type": "SUV", | ||||||
| "year": 2022 | ||||||
| }, | ||||||
| ... | ||||||
| { | ||||||
| "_id": "Cars/349446110", | ||||||
| "_key": "349446110", | ||||||
| "_rev": "_eWFB8OW--_", | ||||||
| "customer_id": 123, | ||||||
| "make": "Audi", | ||||||
| "model": "Q5", | ||||||
| "type": "SUV", | ||||||
| "year": 2019 | ||||||
| }, | ||||||
| ], | ||||||
| "hasMore": false, // when false no more results can be returned | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| "id": "463970894", // same cursor id from the initial response | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| "count": 195, | ||||||
| "extra": { | ||||||
| "stats": { | ||||||
| "writesExecuted": 0, | ||||||
| "writesIgnored": 0, | ||||||
| "scannedFull": 0, | ||||||
| "scannedIndex": 195, | ||||||
| "filtered": 0, | ||||||
| "httpRequests": 0, | ||||||
| "executionTime": 0.0004, | ||||||
| "peakMemoryUsage": 100 | ||||||
| }, | ||||||
| "warnings": [] | ||||||
| }, | ||||||
| "cached": false, | ||||||
| "error": false, | ||||||
| "code": 201 | ||||||
| } | ||||||
| ``` | ||||||
|
|
||||||
| API Reference Docs | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be a heading. |
||||||
|
|
||||||
| [Create Query Cursor](https://macrometa.com/docs/api#/operations/createQueryCursor) | ||||||
|
|
||||||
| [Modify Query Cursor](https://macrometa.com/docs/api#/operations/modifyQueryCursor) | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,113 @@ | ||
| --- | ||
| sidebar_position: 50 | ||
| title: Multiple collections vs single large collection | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use title case in file names, not sentence case. Also, this is a long title. How did it look when you built it locally? |
||
| --- | ||
|
|
||
| Query performance is linked, in part, to the number of documents in the collections and the indexes used. When a single collection contains a large number of complex documents optimizing for performance becomes difficult. Designing collections around purpose-built documents and indexes for returning specific results makes query writing simpler and improves performance. | ||
|
|
||
| In this example, we have a single collection, `Garage`. It contains `Account`, `Cars`, `Orders`, and `Staff` attributes with further nested attributes. This makes query writing and indexing difficult. Here is an example document for the `Garage` collection. | ||
|
|
||
| ``` | ||
| { | ||
| "_id": "Garage/349351645", | ||
| "_key": "349351645", | ||
| "_rev": "_eUgrDn2--_", | ||
| "account": { | ||
| "first_name": "John", | ||
| "id": 123, | ||
| "joined_date": "2022-01-01", | ||
| "last_name": "Doe", | ||
| "phone": "555-555-5555" | ||
| }, | ||
| "cars": { | ||
| "car_a": { | ||
| "make": "Audi", | ||
| "model": "Q5", | ||
| "year": "2019" | ||
| }, | ||
| "car_b": { | ||
| "make": "Ford", | ||
| "model": "F-150", | ||
| "year": "2021" | ||
| } | ||
| }, | ||
| "orders": { | ||
| "account_id": 123, | ||
| "car_id": "car_b", | ||
| "customer_phone": "555-555-5555", | ||
| "date": "2022-03-14", | ||
| "invoice_number": 456, | ||
| "price": "$100.00" | ||
| }, | ||
| "staff": { | ||
| "first_name": "Jane", | ||
| "last_name": "Smith", | ||
| "tech_id": 789 | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| The next example shows how one might structure documents inside of individual collections. This approach can help in creating indexes on correct attributes in each collection and reduce record scan count. | ||
|
|
||
| ``` | ||
| //Account Document | ||
| { | ||
| "_id": "Accounts/349491803", | ||
| "_key": "349491803", | ||
| "_rev": "_eUhBHmi--_", | ||
| "car_ids": [ | ||
| "Cars/349434363", | ||
| "Cars/349446110" | ||
| ], | ||
| "first_name": "John", | ||
| "id": 123, | ||
| "joined_date": "2022-01-01", | ||
| "last_name": "Doe", | ||
| "phone": "555-555-5555" | ||
| } | ||
|
|
||
| //Car Document | ||
| { | ||
| "_id": "Cars/349446110", | ||
| "_key": "349446110", | ||
| "_rev": "_eUg1Tl6--_", | ||
| "customer_id": 123, | ||
| "make": "Audi", | ||
| "model": "Q5", | ||
| "year": 2019 | ||
| }, | ||
| { | ||
| "_id": "Cars/349434363", | ||
| "_key": "349434363", | ||
| "_rev": "_eUg1fJe--_", | ||
| "customer_id": 123, | ||
| "make": "Ford", | ||
| "model": "F-150", | ||
| "year": 2021 | ||
| } | ||
|
|
||
| // Order Document | ||
| { | ||
| "_id": "Orders/349454643", | ||
| "_key": "349454643", | ||
| "_rev": "_eUg9dXS--_", | ||
| "account_id": 123, | ||
| "car_ids": [ | ||
| "Cars/349446110" | ||
| ], | ||
| "date": "2022-03-14", | ||
| "invoice_number": 456, | ||
| "price": "$100.00", | ||
| "staff_id": 789 | ||
| } | ||
|
|
||
| // Staff Document | ||
| { | ||
| "_id": "Staff/349422825", | ||
| "_key": "349422825", | ||
| "_rev": "_eUgvNOW--_", | ||
| "first_name": "Jane", | ||
| "last_name": "Smith", | ||
| "tech_id": 789 | ||
| } | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,14 @@ | ||||||
| --- | ||||||
| sidebar_position: 50 | ||||||
| title: Use of indexes for COLLECT operation | ||||||
| --- | ||||||
|
|
||||||
| If there is a `COLLECT` operation in the query, the records with similar attribute values are grouped. Persistent index on the attribute value on which `COLLECT` operation is performed helps to optimize the query. In the following example, the persistent index on the `country` attribute will help to optimize the query. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| ``` | ||||||
| FOR p IN players | ||||||
| COLLECT country = p.country | ||||||
| RETURN { | ||||||
| "country" : country | ||||||
| } | ||||||
| ``` | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| --- | ||
| sidebar_position: 50 | ||
| title: Use of composite index | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this topic so extremely short? |
||
| --- | ||
|
|
||
| If there are multiple attributes used in `FILTER` criteria, it’s recommended to create a composite index with all the attributes. For e.g, if there are `3` attributes used in `FILTER`, the `composite index` created on these 3 attributes will give better query performance than `3` separate indexes. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| --- | ||
| sidebar_position: 50 | ||
| title: Use of SEARCH for array attributes | ||
| --- | ||
|
|
||
| If the user wants to `FILTER` against an array of values the `ALL`, `ANY`, and `NONE` operators are used. Array indexes would not help because those are not utilized. Users can create `SEARCH VIEW` to optimize these queries. | ||
|
|
||
| To filter attributes against an array of values you would commonly use the array comparison operators, `ALL`, `ANY`, or `NOT`, as a prefix in conjunction with the common comparison operator `IN`. However, this is not an optimized approach and will not utilize any indexes. | ||
|
|
||
| The optimized approach used the `SEARCH` feature. An index is created on the attributes defined in the search view. You can read more about `SEARCH` and search views here, [search](https://macrometa.com/docs/search/search). | ||
|
|
||
| ``` | ||
| /* Query on a collection with FILTER */ | ||
|
|
||
| LET carMakes = ["Ford", "Audi", "Mazda"] | ||
| FOR car in cars | ||
| FILTER car.make ANY IN carMakes | ||
| FILTER car.type == "SUV" | ||
| RETURN { car : car} | ||
|
|
||
| /* Query on Search view with SEARCH */ | ||
| /* Search VIEW is created with the required attributes. */ | ||
|
|
||
| LET carMakes = ["Ford", "Audi", "Mazda"] | ||
| FOR car in CARS_VIEW | ||
| SEARCH ANALYZER(car.make ANY IN carMakes), "identity") | ||
| RETURN car | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| --- | ||
| sidebar_position: 50 | ||
| title: Use of SEARCH for SORT operations | ||
| --- | ||
|
|
||
| Due to known limitations, if `SORT` operation is specified in the query, indexes are not used for attributes specified in `FILTER` part. The alternative to this is to create a `SEARCH VIEW` with the required attributes. The attribute on which sort need to be done, use it as a primary sort attribute in the `SEARCH VIEW` | ||
| Note: Only `1` attribute can be added as a `Primary Sort` attribute | ||
| ``` | ||
| FOR city in cities | ||
| FILTER city.continent == "ASIA" AND | ||
| city.country == "CHINA" AND | ||
| city.type == "RURAL" AND | ||
| city.population > 40000 | ||
| SORT city.population DESC | ||
| return { city : city} | ||
|
|
||
| /* | ||
| * Query on Search view with SEARCH | ||
| * Search VIEW is created with the required attributes. | ||
| * Add PrimarySort with the required attribute and order | ||
| */ | ||
| FOR city in CITIES_VIEW | ||
| SEARCH ANALYZER(city.continent == "ASIA" AND | ||
| city.country == "CHINA" AND | ||
| city.type == "RURAL" AND | ||
| city.population > 40000 ), "identity") | ||
| return { city : city} | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| --- | ||
| sidebar_position: 50 | ||
| title: Use of Stream Worker for optimization of reporting-related jobs | ||
| --- | ||
|
|
||
| For example, there is a scheduled reporting job at the end of the week on a collection with millions of records. In the report, it is expected to have records for each day of the week. It is not efficient to run the query on that big collection to get the data for all seven days. To tackle this a `Stream Worker` can be used. A `Stream worker` can process data on the `Stream` associated with the collection. It can analyze it and generate the `staged` data and can store data in some `CACHE` collection. | ||
| E.g. Get the number of the `GET` requests each day from each `IP Address` Instead of scanning the huge `ACCESS LOG` collections, the `Stream worker` can analyze and store data in `CACHE` collection with `user information`, a number of `GET` requests, `Date`, `User name` As there are fewer records compare to that big `ACCESS LOG` collection in `CACHE` collection, query execution would be faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these best practices for? Queries? Or a bunch of different things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of them are for Queries.