Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/best-practice/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Best Practice",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"label": "Best Practice",
"label": "Best Practices",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these best practices for? Queries? Or a bunch of different things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of them are for Queries.

"position": 125
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are both folders at position 125?

}
166 changes: 166 additions & 0 deletions docs/best-practice/limit-and-offset-vs-cursor-api-and-hasmore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an index.md file to both folders. This will serve as an introduction. The user will see it when they click the folder in the side menu, so use it to provide context and tell the reader what to expect in this section. It doesn't have to be long.

sidebar_position: 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there multiple sidebar_positions at 50? Please put them in numerical order, preferably using two-digit numbers (10, 20, 30). This makes it easier to deal add files in later.

title: LIMIT and OFFSET vs Cursor API and hasMore
---

Queries that return large volumes of data may require more than one response to provide the complete results. A common method is using the `LIMIT` and `OFFSET` statements in the query. However, this is not the optimized approach. When these options are used, each time the query is invoked query processing is performed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "might" rather than "may" to refer to the possibility that something could happen. "May" suggests permission, as in, "You may submit that PR."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduction does not give me any idea what to expect as I read on through the text. If feel like I got dropped in the deep end of a pool blindfolded. Can you use simpler language to start out and explain what is going on?

I gather that you are talking to someone using queries that return large volumes of data, perhaps something like "Manage Large Query Returns" as a title?


A better approach is to use the `cursor` API. The response from the cursor API call contains a boolean attribute, `hasMore`. If `hasMore` is true, the next batch of records will be ready on the server. In subsequent calls, the records are returned from the last position.

The `batchSize` parameter can be set in the `cursor` API to configure the number of documents returned per request. The `batchSize` default is `100` and the maximum value is `1000`.

**Query with `LIMIT` and `OFFSET`**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use bold in place of headings. The h1 will be set by the topic title, so the next one down in the hierarchy is h2.

# h1
## h2
### h3


```sql
FOR car in Cars
FILTER car.type == 'SUV'
SORT car._key DESC
LIMIT 0, 3
RETURN car
```

**Query without `LIMIT` and `OFFSET`**

```sql
FOR car in Cars
FILTER car.type == 'SUV'
SORT car._key DESC
RETURN car
```

**Cursor API Request Example (create cursor)**

The request body has attributes for `batchSize`, `bindVars`, `count`, `query`, and `options`. The `options` attribute can receive several different key/value pairs. To utilize `hasMore` we will need to include the `stream` attribute set to `true`. Note the default value for `stream` is `false`.

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always indicate what kind of code is in the code block.

curl -X 'POST' \
'https://api-gdn.paas.macrometa.io/_fabric/_system/_api/cursor' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H 'Authorization: bearer <token>' \
-d '{
"batchSize": 100,
"bindVars": {},
"options": {
"stream": true
},
"query": "FOR car in Cars FILTER car.type == '\''SUV'\'' SORT car._key DESC RETURN car",
"ttl": 30
}'
```

**Cursor API Response Example (create cursor)**

The response from the Cursor API request will contain several attributes. The most important to this example are `hasMore` and `id`. The `id` identifies the cursor during subsequent requests and `hasMore` is a boolean value to indicate whether there are more results to be retrieved.

```
{
"result": [
{
{
"_id": "Cars/377189715",
"_key": "377189715",
"_rev": "_eWFT8Eu--_",
"customer_id": 994,
"make": "Jeep",
"model": "Wagoneer",
"type": "SUV",
"year": 2022
},
...
{
"_id": "Cars/377187243",
"_key": "377187243",
"_rev": "_eWFTXbS--_",
"customer_id": 890,
"make": "Volkswagen",
"model": "Atlas",
"type": "SUV",
"year": 2021
}
],
"hasMore": true, // shows if there are more results
"id": "463970894", // identifies the cursor to return the next batch
"count": 195,
"extra": {
"stats": {
"writesExecuted": 0,
"writesIgnored": 0,
"scannedFull": 0,
"scannedIndex": 195,
"filtered": 0,
"httpRequests": 0,
"executionTime": 0.0004,
"peakMemoryUsage": 100
},
"warnings": []
},
"cached": false,
"error": false,
"code": 201
}
```

**Cursor API Request Example (read next batch)**

```bash
curl -X PUT "https://api-gdn.pass.macrometa.io/_fabric/_system/_api/cursor/463970894" \
-H "Authorization: bearer <token>"
```

**Cursor API Response Example (read next batch)**

When the `hasMore` value is false there are no further results to return from the server.

```json
{
"result": [
{
"_id": "Cars/377176738",
"_key": "377176738",
"_rev": "_eWFQ7di--_",
"customer_id": 345,
"make": "Toyota",
"model": "RAV4",
"type": "SUV",
"year": 2022
},
...
{
"_id": "Cars/349446110",
"_key": "349446110",
"_rev": "_eWFB8OW--_",
"customer_id": 123,
"make": "Audi",
"model": "Q5",
"type": "SUV",
"year": 2019
},
],
"hasMore": false, // when false no more results can be returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"hasMore": false, // when false no more results can be returned
"hasMore": false, // When false, no more results can be returned

"id": "463970894", // same cursor id from the initial response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"id": "463970894", // same cursor id from the initial response
"id": "463970894", // Same cursor ID from the initial response

"count": 195,
"extra": {
"stats": {
"writesExecuted": 0,
"writesIgnored": 0,
"scannedFull": 0,
"scannedIndex": 195,
"filtered": 0,
"httpRequests": 0,
"executionTime": 0.0004,
"peakMemoryUsage": 100
},
"warnings": []
},
"cached": false,
"error": false,
"code": 201
}
```

API Reference Docs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a heading.


[Create Query Cursor](https://macrometa.com/docs/api#/operations/createQueryCursor)

[Modify Query Cursor](https://macrometa.com/docs/api#/operations/modifyQueryCursor)
113 changes: 113 additions & 0 deletions docs/best-practice/multiple-collections-vs-single-large-collection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
sidebar_position: 50
title: Multiple collections vs single large collection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use title case in file names, not sentence case.

Also, this is a long title. How did it look when you built it locally?

---

Query performance is linked, in part, to the number of documents in the collections and the indexes used. When a single collection contains a large number of complex documents optimizing for performance becomes difficult. Designing collections around purpose-built documents and indexes for returning specific results makes query writing simpler and improves performance.

In this example, we have a single collection, `Garage`. It contains `Account`, `Cars`, `Orders`, and `Staff` attributes with further nested attributes. This makes query writing and indexing difficult. Here is an example document for the `Garage` collection.

```
{
"_id": "Garage/349351645",
"_key": "349351645",
"_rev": "_eUgrDn2--_",
"account": {
"first_name": "John",
"id": 123,
"joined_date": "2022-01-01",
"last_name": "Doe",
"phone": "555-555-5555"
},
"cars": {
"car_a": {
"make": "Audi",
"model": "Q5",
"year": "2019"
},
"car_b": {
"make": "Ford",
"model": "F-150",
"year": "2021"
}
},
"orders": {
"account_id": 123,
"car_id": "car_b",
"customer_phone": "555-555-5555",
"date": "2022-03-14",
"invoice_number": 456,
"price": "$100.00"
},
"staff": {
"first_name": "Jane",
"last_name": "Smith",
"tech_id": 789
}
}
```

The next example shows how one might structure documents inside of individual collections. This approach can help in creating indexes on correct attributes in each collection and reduce record scan count.

```
//Account Document
{
"_id": "Accounts/349491803",
"_key": "349491803",
"_rev": "_eUhBHmi--_",
"car_ids": [
"Cars/349434363",
"Cars/349446110"
],
"first_name": "John",
"id": 123,
"joined_date": "2022-01-01",
"last_name": "Doe",
"phone": "555-555-5555"
}

//Car Document
{
"_id": "Cars/349446110",
"_key": "349446110",
"_rev": "_eUg1Tl6--_",
"customer_id": 123,
"make": "Audi",
"model": "Q5",
"year": 2019
},
{
"_id": "Cars/349434363",
"_key": "349434363",
"_rev": "_eUg1fJe--_",
"customer_id": 123,
"make": "Ford",
"model": "F-150",
"year": 2021
}

// Order Document
{
"_id": "Orders/349454643",
"_key": "349454643",
"_rev": "_eUg9dXS--_",
"account_id": 123,
"car_ids": [
"Cars/349446110"
],
"date": "2022-03-14",
"invoice_number": 456,
"price": "$100.00",
"staff_id": 789
}

// Staff Document
{
"_id": "Staff/349422825",
"_key": "349422825",
"_rev": "_eUgvNOW--_",
"first_name": "Jane",
"last_name": "Smith",
"tech_id": 789
}
```
14 changes: 14 additions & 0 deletions docs/best-practice/use-o-indexes-for-collect-operation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
sidebar_position: 50
title: Use of indexes for COLLECT operation
---

If there is a `COLLECT` operation in the query, the records with similar attribute values are grouped. Persistent index on the attribute value on which `COLLECT` operation is performed helps to optimize the query. In the following example, the persistent index on the `country` attribute will help to optimize the query.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If there is a `COLLECT` operation in the query, the records with similar attribute values are grouped. Persistent index on the attribute value on which `COLLECT` operation is performed helps to optimize the query. In the following example, the persistent index on the `country` attribute will help to optimize the query.
If there is a `COLLECT` operation in the query, then the records with similar attribute values are grouped. Persistent indexes on the attribute value on which `COLLECT` operation is performed helps to optimize the query. In the following example, the persistent index on the `country` attribute helps to optimize the query.


```
FOR p IN players
COLLECT country = p.country
RETURN {
"country" : country
}
```
6 changes: 6 additions & 0 deletions docs/best-practice/use-of-composite-index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
sidebar_position: 50
title: Use of composite index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this topic so extremely short?

---

If there are multiple attributes used in `FILTER` criteria, it’s recommended to create a composite index with all the attributes. For e.g, if there are `3` attributes used in `FILTER`, the `composite index` created on these 3 attributes will give better query performance than `3` separate indexes.
28 changes: 28 additions & 0 deletions docs/best-practice/use-of-search-for-array-attributes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
sidebar_position: 50
title: Use of SEARCH for array attributes
---

If the user wants to `FILTER` against an array of values the `ALL`, `ANY`, and `NONE` operators are used. Array indexes would not help because those are not utilized. Users can create `SEARCH VIEW` to optimize these queries.

To filter attributes against an array of values you would commonly use the array comparison operators, `ALL`, `ANY`, or `NOT`, as a prefix in conjunction with the common comparison operator `IN`. However, this is not an optimized approach and will not utilize any indexes.

The optimized approach used the `SEARCH` feature. An index is created on the attributes defined in the search view. You can read more about `SEARCH` and search views here, [search](https://macrometa.com/docs/search/search).

```
/* Query on a collection with FILTER */

LET carMakes = ["Ford", "Audi", "Mazda"]
FOR car in cars
FILTER car.make ANY IN carMakes
FILTER car.type == "SUV"
RETURN { car : car}

/* Query on Search view with SEARCH */
/* Search VIEW is created with the required attributes. */

LET carMakes = ["Ford", "Audi", "Mazda"]
FOR car in CARS_VIEW
SEARCH ANALYZER(car.make ANY IN carMakes), "identity")
RETURN car
```
28 changes: 28 additions & 0 deletions docs/best-practice/use-of-search-for-sort-operations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
sidebar_position: 50
title: Use of SEARCH for SORT operations
---

Due to known limitations, if `SORT` operation is specified in the query, indexes are not used for attributes specified in `FILTER` part. The alternative to this is to create a `SEARCH VIEW` with the required attributes. The attribute on which sort need to be done, use it as a primary sort attribute in the `SEARCH VIEW`
Note: Only `1` attribute can be added as a `Primary Sort` attribute
```
FOR city in cities
FILTER city.continent == "ASIA" AND
city.country == "CHINA" AND
city.type == "RURAL" AND
city.population > 40000
SORT city.population DESC
return { city : city}

/*
* Query on Search view with SEARCH
* Search VIEW is created with the required attributes.
* Add PrimarySort with the required attribute and order
*/
FOR city in CITIES_VIEW
SEARCH ANALYZER(city.continent == "ASIA" AND
city.country == "CHINA" AND
city.type == "RURAL" AND
city.population > 40000 ), "identity")
return { city : city}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
sidebar_position: 50
title: Use of Stream Worker for optimization of reporting-related jobs
---

For example, there is a scheduled reporting job at the end of the week on a collection with millions of records. In the report, it is expected to have records for each day of the week. It is not efficient to run the query on that big collection to get the data for all seven days. To tackle this a `Stream Worker` can be used. A `Stream worker` can process data on the `Stream` associated with the collection. It can analyze it and generate the `staged` data and can store data in some `CACHE` collection.
E.g. Get the number of the `GET` requests each day from each `IP Address` Instead of scanning the huge `ACCESS LOG` collections, the `Stream worker` can analyze and store data in `CACHE` collection with `user information`, a number of `GET` requests, `Date`, `User name` As there are fewer records compare to that big `ACCESS LOG` collection in `CACHE` collection, query execution would be faster.
Loading