Skip to content

Commit 8da0241

Browse files
add-mockDataGenerator-module
1 parent fab0ccd commit 8da0241

File tree

4 files changed

+368
-0
lines changed

4 files changed

+368
-0
lines changed

di/mockDataGen/init.q

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
\l ::mockDataGen.q
2+
3+
export:([initschema;mockDataOne;mockData;mockHdb;mockDataR;clearTables])

di/mockDataGen/mockDataGen.md

Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
# Mock Data Generator
2+
3+
This module is used for generating realistic mock datasets. This also allows to generate additional datasets and works in batch. Module consists of four main functions that generates realistic mock datasets based on the following inputs from the user:
4+
5+
-sym/instrument: the symbol to generate for
6+
-date : the date
7+
-start time and end time : to allow generation within a range
8+
-rowcount : the number of rows of data to generate
9+
-start price: the starting price of the instrument/sym
10+
-level: If level 1, generates data for trades and quotes tables. If level 2, generates data for depth along with trades and quotes tables
11+
12+
## Example
13+
Below is an example of loading the module into a session and viewing the functions present in the module.
14+
15+
```q
16+
// Loading the module into a session
17+
mockData: use `di.mockDataGen
18+
19+
// View dictionary of functions
20+
mockData
21+
```
22+
23+
## Overview
24+
25+
- **`mockDataOne`** – Generates mock data for single instrument on a given date.
26+
- **`mockData`** – Generates mock data for multiple instruments on a given date.
27+
- **`mockDataR`** – Generates mock data for multiple instruments in a given date range.
28+
- **`mockHdb`** – writes the data down to a specified HDB directory and sets the attribute to the date partitions.
29+
30+
31+
## Functions
32+
33+
### ⚙️`mockDataOne`
34+
35+
Generates mock data for the given single instrument on the given date along with the following given parameters.
36+
37+
**Parameters**
38+
- `sym`: Instrument/symbol for which the data is generated.
39+
- `date`: Trading date for which the data is generated.
40+
- `startTime`: Market open time or the starting time from which data generation begins.
41+
- `endTime`: Market close time or the ending time up to which data is generated.
42+
- `rowCnt`: Number of rows to generate the data for. Also equals to the number of rows for trade table.
43+
- `startPx`: Starting price of the instrument of type float.
44+
- `level`: Controls the depth of data generation:
45+
- `1`: Generates trades and quotes tables.
46+
- `2`: Generates trades, quotes, and depth tables.
47+
48+
**Examples**
49+
50+
```q
51+
// Function signature:
52+
mockDataOne[sym; date; startTime; endTime; rowCnt; startPx; level]
53+
54+
// Loading the module into a session
55+
md: use `di.mockDataGen
56+
57+
// Level 1: Generate trades and quotes only
58+
// for the AAPL instrument on a given trading day:
59+
md.mockDataOne[`AAPL; 2025.01.10; 09:30:00.00; 17:30:00.00; 3000; 22.35; 1]
60+
61+
// Level 2: Generate trades, quotes, and depth
62+
// for the AAPL instrument on a given trading day:
63+
md.mockDataOne[`AAPL; 2025.01.10; 09:30; 16:00; 300; 22.35; 2]
64+
65+
// to view the data
66+
.m.di.0mockDataGen.trades
67+
68+
time sym src price size
69+
--------------------------------------------------
70+
2025.01.10D09:32:15.619000000 AAPL O 22.34 1283
71+
2025.01.10D09:32:46.924000000 AAPL O 22.38 8105
72+
2025.01.10D09:32:48.758000000 AAPL O 22.34 263
73+
2025.01.10D09:33:30.234000000 AAPL N 22.31 474
74+
2025.01.10D09:34:04.825000000 AAPL N 22.36 131
75+
2025.01.10D09:34:15.211000000 AAPL O 22.33 8281
76+
77+
.m.di.0mockDataGen.quotes
78+
79+
time sym src bid ask bsize asize
80+
--------------------------------------------------------------
81+
2025.01.10D09:30:17.136000000 AAPL L 22.34 22.35 7200 4200
82+
2025.01.10D09:30:41.169000000 AAPL L 22.33 22.36 12000 7800
83+
2025.01.10D09:30:48.010000000 AAPL O 22.31 22.35 8400 6000
84+
2025.01.10D09:30:52.784000000 AAPL L 22.32 22.35 12000 1800
85+
2025.01.10D09:30:55.239000000 AAPL N 22.32 22.37 5400 9000
86+
2025.01.10D09:30:55.556000000 AAPL O 22.35 22.38 3000 9600
87+
88+
.m.di.0mockDataGen.depth
89+
90+
time sym bid1 bsize1 bid2 bsize2 bid3 bsize3 bid4 bsize4 bid5 bsize5 ask1 asize1 ask2 asize2 ask3 asize3 ask4 asize4 ask5 asize5
91+
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
92+
2025.01.10D09:30:02.340000000 AAPL 22.34 3600 22.33 4800 22.32 9600 22.31 6000 22.3 4800 22.35 6600 22.36 7800 22.37 8400 22.38 7800 22.39 8400
93+
2025.01.10D09:30:05.040000000 AAPL 22.33 7200 22.32 9600 22.31 7800 22.3 13200 22.29 10800 22.36 4800 22.37 7800 22.38 6000 22.39 5400 22.4 7200
94+
2025.01.10D09:30:05.464000000 AAPL 22.31 9000 22.3 9600 22.29 12600 22.28 10200 22.27 19800 22.35 9600 22.36 10200 22.37 11400 22.38 10800 22.39 12600
95+
2025.01.10D09:30:11.246000000 AAPL 22.32 9000 22.31 12000 22.3 14400 22.29 10200 22.28 19200 22.35 2400 22.36 3000 22.37 4200 22.38 3000 22.39 4800
96+
2025.01.10D09:30:14.423000000 AAPL 22.32 2400 22.31 3600 22.3 7200 22.29 11400 22.28 9000 22.37 3600 22.38 4200 22.39 4800 22.4 4200 22.41 5400
97+
2025.01.10D09:30:19.556000000 AAPL 22.35 10200 22.34 11400 22.33 12600 22.32 16800 22.31 19800 22.38 11400 22.39 13200 22.4 12600 22.41 13800 22.42 13200
98+
99+
```
100+
101+
### ⚙️`mockData`
102+
103+
Generates mock data for the given multiple instruments on the given date along with the following given parameters.
104+
105+
**Parameters**
106+
- `syms`: Instruments/symbols for which the data is generated.
107+
- `date`: Trading date for which the data is generated.
108+
- `startTime`: Market open time or the starting time from which data generation begins.
109+
- `endTime`: Market close time or the ending time up to which data is generated.
110+
- `rowCnts`: Number of rows to generate the data for each syms. This should be passed as a dictionary, for example: `AAPL`MSFT`META!300 500 200
111+
- `startPxs`: Starting price of the given instruments of type float. Should be passed as a dictionary, for example: `AAPL`MSFT`META!22.33 38.34 29.43
112+
- `level`: Controls the depth of data generation:
113+
- `1`: Generates trades and quotes tables.
114+
- `2`: Generates trades, quotes, and depth tables.
115+
116+
**Examples**
117+
118+
```q
119+
// Function signature:
120+
mockData[syms; date; startTime; endTime; rowCnts; startPxs; level]
121+
## Example
122+
Below is an example of loading the module into a session and viewing the size of different objects.
123+
124+
// Loading the module into a session
125+
md: use `di.mockDataGen
126+
127+
// Level 1: Generate trades and quotes for multiple instruments
128+
// on a single trading day:
129+
md.mockData[`AAPL`MSFT`META; 2025.01.10; 09:30:00; 16:00:00;
130+
`AAPL`MSFT`META!300 500 200;
131+
`AAPL`MSFT`META!22.33 38.34 29.43;
132+
1]
133+
134+
// Level 2: Generate trades, quotes, and depth for multiple instruments
135+
// on a single trading day:
136+
md.mockData[`AAPL`MSFT`META; 2025.01.10; 09:30:00; 16:00:00;
137+
`AAPL`MSFT`META!300 500 200;
138+
`AAPL`MSFT`META!22.33 38.34 29.43;
139+
2]
140+
```
141+
142+
### ⚙️`mockDataR`
143+
144+
Generates mock data for the given multiple instruments in the given date range along with the following given parameters.
145+
146+
**Parameters**
147+
- `syms`: Instruments/symbols for which the data is generated.
148+
- `datelist`: List of dates for which the data is generated.
149+
- `startTime`: Market open time or the starting time from which data generation begins.
150+
- `endTime`: Market close time or the ending time up to which data is generated.
151+
- `rowCnts`: Number of rows to generate the data for each syms. This should be passed as a dictionary, for example: `AAPL`MSFT`META!300 500 200
152+
- `startPxs`: Starting price of the given instruments of type float. Should be passed as a dictionary, for example: `AAPL`MSFT`META!22.33 38.34 29.43
153+
- `level`: Controls the depth of data generation:
154+
- `1`: Generates trades and quotes tables.
155+
- `2`: Generates trades, quotes, and depth tables.
156+
157+
// Note
158+
- For multi-day data generation, price continuity is maintained by using the previous day’s last traded price as the opening price for the following day.
159+
160+
**Examples**
161+
162+
```q
163+
// Function signature:
164+
mockDataR[syms; datelist; startTime; endTime; rowCnts; startPxs; level]
165+
166+
// Loading the module into a session
167+
md: use `di.mockDataGen
168+
169+
// Level 1: Generate trades and quotes for multiple instruments
170+
// on a single trading day:
171+
md.mockDataR[`AAPL`MSFT`META; 2025.01.10 2025.01.11 2025.01.12; 09:30:00; 16:00:00;
172+
`AAPL`MSFT`META!300 500 200;
173+
`AAPL`MSFT`META!22.33 38.34 29.43;
174+
1]
175+
176+
// Level 2: Generate trades, quotes, and depth for multiple instruments
177+
// on a single trading day:
178+
md.mockDataR[`AAPL`MSFT`META; 2025.01.10 2025.01.11 2025.01.12; 09:30:00; 16:00:00;
179+
`AAPL`MSFT`META!300 500 200;
180+
`AAPL`MSFT`META!22.33 38.34 29.43;
181+
2]
182+
```
183+
184+
185+
### ⚙️`mockHdb`
186+
187+
writes down the data to the specified HDB directory
188+
189+
**Parameters**
190+
- `dir`: Target HDB directory where the generated data will be written.
191+
- `syms`: List of instrument symbols for which data is generated and saved to HDB.
192+
- `dates`: List of trading dates for which data will be generated and persisted.
193+
- `startTime`: Market open time or the starting timestamp from which data generation begins.
194+
- `endTime`: Market close time or the ending timestamp up to which data is generated.
195+
- `rowCnts`: Number of rows to generate per instrument.
196+
This must be provided as a dictionary, for example:
197+
`AAPL`MSFT`META!300 500 200
198+
- `startPxs`: Starting price for each instrument, specified as floating-point values.
199+
This must be provided as a dictionary, for example:
200+
`AAPL`MSFT`META!22.33 38.34 29.43
201+
- `level`: Controls the depth of data generation:
202+
- `1`: Generates and saves trades and quotes tables.
203+
- `2`: Generates and saves trades, quotes, and depth tables.
204+
205+
// Note
206+
- price continuity is maintained by using the previous day’s last traded price as the opening price for the following day.
207+
208+
**Examples**
209+
210+
```q
211+
// Function signature:
212+
mockHdb[dir; syms; dates; startTime; endTime; rowCnts; startPxs; level]
213+
214+
// Loading the module into a session
215+
md: use `di.mockDataGen
216+
217+
// Level 1: Generate trades and quotes for multiple instruments
218+
// on a single trading day:
219+
md.mockHdb[`:hdb;`AAPL`MSFT`META; 2025.01.10 2025.01.11 2025.01.12; 09:30:00; 16:00:00;
220+
`AAPL`MSFT`META!300 500 200;
221+
`AAPL`MSFT`META!22.33 38.34 29.43;
222+
1]
223+
224+
// Level 2: Saves dwon the generated trades, quotes, and depth for multiple instruments to a specified HBD directory
225+
// on a single trading day:
226+
md.mockHdb[`:hdb;`AAPL`MSFT`META; 2025.01.10 2025.01.11 2025.01.12; 09:30:00; 16:00:00;
227+
`AAPL`MSFT`META!300 500 200;
228+
`AAPL`MSFT`META!22.33 38.34 29.43;
229+
2]
230+
231+
// to view the data in HDB
232+
\l hdb
233+
select from trades
234+
235+
time sym src price size
236+
--------------------------------------------------
237+
2025.01.10D09:32:15.619000000 AAPL O 22.34 1283
238+
2025.01.10D09:32:46.924000000 AAPL O 22.38 8105
239+
2025.01.10D09:32:48.758000000 AAPL O 22.34 263
240+
2025.01.10D09:33:30.234000000 AAPL N 22.31 474
241+
2025.01.10D09:34:04.825000000 AAPL N 22.36 131
242+
2025.01.10D09:34:15.211000000 AAPL O 22.33 8281
243+
244+
```

di/mockDataGen/mockDataGen.q

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
initschema:{[]
2+
.z.m.trades:([] time:`timestamp$(); sym:`g#`$(); src:`g#`$(); price:`float$(); size:`int$());
3+
.z.m.quotes:([] time:`timestamp$(); sym:`g#`$(); src:`g#`$(); bid:`float$(); ask:`float$(); bsize:`int$(); asize:`int$());
4+
.z.m.depth:([] time:`timestamp$(); sym:`g#`$(); bid1:`float$(); bsize1:`int$(); bid2:`float$(); bsize2:`int$(); bid3:`float$(); bsize3:`int$(); bid4:`float$(); bsize4:`int$(); bid5:`float$(); bsize5:`int$(); ask1:`float$(); asize1:`int$(); ask2:`float$(); asize2:`int$(); ask3:`float$(); asize3:`int$(); ask4:`float$(); asize4:`int$(); ask5:`float$(); asize5:`int$());
5+
};
6+
7+
// Utility Functions
8+
rnd:{0.01*floor 100*x};
9+
10+
clearTables:{[]
11+
initschema[];
12+
};
13+
14+
// funtion to generate mock data for a single symbol/instrument
15+
mockDataOne:{[sym;date;startTime;endTime;rowCnt;startPx;level]
16+
tradeCnt:rowCnt;
17+
quoteCnt:5*tradeCnt;
18+
hoursinday:endTime-startTime;
19+
t0:date+startTime;
20+
t1:date+endTime;
21+
ttimes:date+ `#asc startTime+tradeCnt?hoursinday;
22+
qtimes:date+ `#asc startTime+quoteCnt?hoursinday;
23+
mids:startPx* exp sums 0.0005*-1+quoteCnt?2f;
24+
mids:0.01*floor 100*mids;
25+
bid:rnd mids-quoteCnt?0.03;
26+
ask:rnd mids+quoteCnt?0.03;
27+
bsize:`int$(600*1+quoteCnt?20);
28+
asize:`int$(600*1+quoteCnt?20);
29+
tradeIdx:til tradeCnt;
30+
quoteIdx:5*tradeIdx;
31+
side:tradeCnt?`buy`sell;
32+
price:0.01*floor 100*?[side=`buy; ask[quoteIdx]; bid[quoteIdx]];
33+
tsize:`int$((tradeCnt?1f)*?[side=`buy; asize[quoteIdx]; bsize[quoteIdx]]);
34+
.z.m.trades,:flip `time`sym`src`price`size!(ttimes;tradeCnt#sym;tradeCnt?`N`O`L;price;tsize);
35+
.z.m.quotes,:flip `time`sym`src`bid`ask`bsize`asize!(qtimes;quoteCnt#sym;quoteCnt?`N`O`L;bid;ask;bsize;asize);
36+
if[level=2;
37+
depthCnt:25*tradeCnt;
38+
dtimes:date+ `#asc startTime+depthCnt?hoursinday;
39+
dIdx:(til depthCnt) mod quoteCnt;
40+
dBid:bid[dIdx];dAsk:ask[dIdx];
41+
b1:`int$(600*1+depthCnt?20);b2:b1+`int$(600*1+depthCnt?5);b3:b1+`int$(600*1+depthCnt?10);b4:b1+`int$(600*1+depthCnt?15);b5:b1+`int$(600*1+depthCnt?20);
42+
a1:`int$(600*1+depthCnt?20);a2:a1+`int$(600*1+depthCnt?5);a3:a1+`int$(600*1+depthCnt?5);a4:a1+`int$(600*1+depthCnt?5);a5:a1+`int$(600*1+depthCnt?5);
43+
.z.m.depth,:flip `time`sym`bid1`bsize1`bid2`bsize2`bid3`bsize3`bid4`bsize4`bid5`bsize5`ask1`asize1`ask2`asize2`ask3`asize3`ask4`asize4`ask5`asize5!(dtimes;depthCnt#sym;dBid;b1;dBid-0.01;b2;dBid-0.02;b3;dBid-0.03;b4;dBid-0.04;b5;dAsk;a1;dAsk+0.01;a2;dAsk+0.02;a3;dAsk+0.03;a4;dAsk+0.04;a5);
44+
];
45+
};
46+
47+
// function to generate the mock data for multiple syms on a given date
48+
mockData:{[syms;date;startTime;endTime;rowCnts;startPxs;level]
49+
syms:$[11h=type syms; syms; enlist syms];
50+
rc:$[99h=type rowCnts; rowCnts; (enlist syms)!enlist rowCnts];
51+
spx:$[99h=type startPxs; startPxs; (enlist syms)!enlist startPxs];
52+
{[s;rc;spx;date;startTime;endTime;level]
53+
sp:$[`sp in key .z.m; $[null .z.m.sp[s]; spx[s]; .z.m.sp[s]]; spx[s]];
54+
mockDataOne[s;date;startTime;endTime;rc[s];sp;level]}[;rc;spx;date;startTime;endTime;level] each syms;
55+
};
56+
57+
// function to generate mock data for multiple syms for the given date list
58+
mockDataR:{[syms;datelist;startTime;endTime;rowCnts;startPxs;level]
59+
mockData[syms;datelist[0];startTime;endTime;rowCnts;startPxs;level];
60+
.z.m.sp:exec last price by sym from .z.m.trades;
61+
{[syms;x;startTime;endTime;rowCnts;sp;level]
62+
.z.m.sp:exec last price by sym from .z.m.trades;
63+
mockData[syms;x;startTime;endTime;rowCnts;sp;2]}[syms;;startTime;endTime;rowCnts;sp;2]each 1_datelist;
64+
.z.m.sp:syms!(count syms)#0nf;
65+
};
66+
67+
// function to write the data down to HDB for the given date list
68+
mockHdb:{[dir;syms;dates;startTime;endTime;rowCnts;startPxs;level]
69+
.z.m.sp:syms!(count syms)#0nf;
70+
{[dir;syms;d;startTime;endTime;rowCnts;startPxs;level]
71+
mockData[syms;d;startTime;endTime;rowCnts;startPxs;level];
72+
.z.m.sp:syms!{last exec price from .z.m.trades where sym = x} each syms;
73+
`trades set .z.m.trades;
74+
`quotes set .z.m.quotes;
75+
`depth set .z.m.depth;
76+
.Q.hdpf[`:;dir;d;`sym]; clearTables[] }[dir;syms;;startTime;endTime;rowCnts;startPxs;level] each dates;
77+
.z.m.sp:syms!(count syms)#0nf;
78+
};

0 commit comments

Comments
 (0)