Module: Fake Data Generation #74
jonnypress
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Would like a module to allow us to build up reasonably realistic mock data sets.
We have two samples to work from
feed.q has a more realistic way to generate a price path than fakedb.q, though fakedb.q allows generation of additional datasets and works in batch, which is what we need (rather than tick-by-tick).
The core of this module should be to build a realistic set of data for a single day. It should support generation of either level 1 or level 2 datasets. Level 1 is:
trade:([]
time:timestamp;
sym:symbol;
exch:symbol;
price:float;
size:int)
quote:([]
time:timestamp;
sym:symbol;
exch:symbol;
bid:float;
bidsize:int;
ask:float;
asksize:int)
If the dataset is level2, then we want to generate the above but also
depth:([]
time:timestamp;
sym:symbol;
exch:symbol;
bid1:float;
bidsize1:int;
...
bid5:float;
bidsize5:int;
ask1:float;
asksize1:int;
...
ask5:float;
asksize5:int)
The key thing is to make sure that the prices line up - for any given time point the trade data should align to the quote data which should align to the depth data. The fakedb.q script does this.
The main function should be one that generates data for a single instrument on a single day. It should have inputs of:
instrument: the symbol to generate for
date : the date
start / end time : to allow generation within a range
rowcount : the number of rows of data to generate
start price: the starting price of the instrument
levels: 1 or 2. If level 1, generate trades and quotes. If level 2, generate trades, quotes, depth.
For simplicity, lets not make the volatility, day range, trading sizes etc. configurable. On average on any given day the price should move 3%. The prices shouldn't ever go -ve. If this is difficult to achieve we can review.
When we have a function that creates one day of data we need a additional functions to:
Once we have these, we can create a wrapper function which takes input of:
The process here will be iterate through the symbols and generate data for each date. When it has generated data for one symbol and one date it writes this to the HDB, then moves on to generate date for that same symbol on the second date. It uses the last price from the first date as the opening price for the second date. When it has geneated data for all the symbols across all the date ranges, it sets the attributes across the HDB.
Beta Was this translation helpful? Give feedback.
All reactions