This code harnesses the AlphaVantage API to download and analyze equity data on any constituent of the following groups:
- the S&P 500 stock index
- the Dow 30 stock index
- the NASDAQ 100 stock index
- the top 100 most-traded ETFs
- the top 25 most-traded mutual funds These data are readily transformed using technical indicators and processed into features for machine learning.
This code also downloads and analyzes data from the United States SEC's Financial Statements datasets, which supplies additional feature data from a fundamental analysis standpoint.
- download.py
load_singledownloads and processes a single symbol from AlphaVantage API into a fileload_single_drivedownloads and processes a single symbol from local drive into a variableload_separatedownloads and processes many symbols from AlphaVantage API into many filesload_combined_drivedownloads and processes many symbols from local drive into one variable- command prompt options:
-tickerUniverse: collection of tickers to download (can also be a CSV of ticker symbols)-folderPath: location of folder to store file-apiKey: AlphaVantage API key (user-specific)-function: distinguishes between intraday, daily, weekly, etc. downloads-intervalspecifies what kind of intraday (1min, 15min, etc.)
- auto_update.py
update_in_folderupdates all equity files in a folder, using the latest data from AlphaVantage- command prompt options:
-folderPath: location of folder to look for files-apiKey: AlphaVantage API key (user-specific)
- See the AlphaVantage documentation for more details on their API calls.
- edgar_load.py
download_unzipdownloads and unzips data directly from the SEC websiteproc_in_directorywalks through download directory and parses each filepost_procperforms post-processing on files, which makes them smallerjson_buildbuilds one JSON file for each company chosen- command prompt options:
-folderPath: location of folder to download data-stockFolderPath: location of folder to look for stock data-financialFolderPath: location of folder to load financial data-suppressDownload: order to suppress downloading data (i.e. already downloaded)-suppressProcess: order to suppress processing data (i.e. already processed)
- edgar_parse.py
json_parseprovides backup parsing code to clean up the JSON files in edgar_load.pyget_sic_namesscrapes SEC.gov for data on industry codes (SICs)submission_parseparses submission filesnumber_parseparses number filespresentation_parseparses presentation filestag_parseparses tag files- command prompt options:
- none (does not need any)
- edgar_pull.py
get_unique_tagsreturns list of unique tags in a single company's JSON fileget_data_this_tagwrites data on one chosen tag to an output file- command prompt options:
- pending
- return_calculator.py
get_rolling_returnscalculates a list of rolling returns on an asset or portfoliooverall_returnscalculates the overall return from start to finish- command prompt options:
- none (does not need any)
- technicals.py
accum_swingcomputes the accumulation (running total) of the swing index, which depends on asset dataad_linereturns the accumulation-distribution line of asset dataadxreturns the average directional (movement) index of asset dataadxrreturns the average directional (movement) index rating of asset dataaroonreturns the Aroon indicator of asset dataaroon_oscillatorreturns the Aroon oscillator of asset dataaverage_pricereturns the average price of asset dataaverage_true_rangereturns the average true range of asset databollingerreturns the Bollinger bands and width thereof, for asset datachaikinreturns the Chaikin money flow of price datachaikin_ad_oscreturns the Chaikin accumulation-distribution oscillator of asset datachaikin_volatilityreturns the Chaikin volatility of asset datachande_momentum_oscillatorreturns the Chande momentum oscillator of a price inputdemareturns the "double" exponential moving average of inputdetrended_price_oscreturns the de-trended price oscillator of a price inputdirectional_indexreturns the directional indices (+DI and -DI) of asset datadirectional_movt_indexreturns the directional movement index (based directly on +DI and -DI) of asset datadynamic_momentum_indexreturns the DSI of price dataease_of_movtreturns the ease of movement of asset dataexponential_moving_averagereturns the exponential moving average of inputgeneral_stochasticreturns the general Stochastic indicator of a price inputklinger_oscreturns the Klinger oscillator of asset datamacdreturns the MACD of a price input (same as price oscillator with 26-period slow EMA and 12-period fast EMA)market_fac_indexreturns the market facilitation index of asset datamass_indexreturns the mass index of asset datamedian_pricereturns the median price of asset datamomentumreturns the momentum of a price inputmoney_flow_indexreturns the money flow index of asset datanegative_volume_indexreturns the negative volume index of asset datanormalized_pricereturns the baseline-normalized price (a.k.a. performance indicator) of a price inputon_balance_volumereturns the on balance volume of asset dataparabolic_sarreturns the parabolic SAR of asset datapercent_volume_oscillatorreturns the percent volume oscillator of volume datapolarized_fractal_efficiencyreturns the polarized fractal efficiency of asset datapositive_volume_indexreturns the positive volume index of asset dataprice_channelreturns the high and low price channels of a price inputprice_oscillatorreturns the price oscillator of a price input, which depends on a choice of moving average functionprice_rate_of_changereturns the price rate of change of a price inputprice_volume_rankreturns the price-volume rank of asset data (with user choice for which price)price_volume_trendreturns the price-volume trend of asset dataqstickreturns the Q-stick indicator of asset data, which depends on a choice of moving average functionrandom_walk_indexreturns the random walk index of asset datarange_indicatorreturns the range indicator of asset datarel_momentum_indexreturns the relative momentum index of a price input (typically closing price)rel_strength_indexreturns the 14-day relative momentum index of a price inputrel_vol_indexreturns the relative volatility index of a price inputsimple_moving_averagereturns the simple moving average of inputstochastic_momentum_indexreturns the stochastic momentum index of asset datastochastic_oscillatorreturns the stochastic oscillator of asset data, which depends on a choice of moving average functionstochastic_rsireturns the general stochastic of the RSI (dependent on a price input)swing_indexreturns the swing index of asset datatee_threeandtee_fourreturn T3 and T4, generalizations of DEMA, of input, respectivelytrend_scorereturns the trend score of a price inputtriangular_moving_averagereturns the triangular moving average of inputtriple_emareturns the triple exponential moving average of inputtrixreturns the TRIX indicator of a price inputtrue_rangereturns the true range of asset datatrue_strength_indexreturns the true strength index of a price inputtypical_pricereturns the typical price of asset dataultimate_oscillatorreturns the ultimate oscillator of asset datavariable_moving_averagereturns the variable moving average of a price inputvertical_horizontal_filterreturns the vertical-horizontal filter (VHF) of asset datavol_adj_moving_averagereturns the volume-adjusted moving average of asset dataweighted_closereturns the weighted close of asset dataweighted_moving_averagereturns the weighted moving average of a price inputwilliams_adreturns the Williams accumulation-distribution indicator of asset datawilliams_percentreturns the Williams %R indicator of asset datazero_lag_emareturns the "zero-lag" exponential moving average of a price input
- stats.py
adfcomputes the augmented Dickey-Fuller test for mean reversion properties- command prompt options:
- none (for now!)
- feature_build.py
get_featuresreturns a dataframe of features, with one column for each indicator listed above- command prompt options:
-tickerUniverse: collection of tickers to download (can also be a comma-delimited list of ticker symbols)-baseline: selection of symbol to use as baseline asset/index-startDate: start date of data to process into features-endDate: end date of aforementioned-function: distinguishes between intraday, daily, weekly, etc. downloads-intervalspecifies what kind of intraday (1min, 15min, etc.)-folderPath: location of folder to write the files-plotOnly: if indicated, plot the heatmaps; otherwise, build from scratch without plots
- strategy.py
hold_clearbuilds a simple strategy for buying/selling, holding one's position, and clearingcrossoverbuilds a strategy for buying when trend crosses below baseline and selling when trend crosses above (or vice versa)zscore_distancebuilds a strategy for for buying when trend crosses far below baseline and selling when trend crosses far above (or vice versa), as measured by z-scores- command prompt options:
- none (does not need any)
- portfolio.py
asset_rankerranks a group of assets based on a certain criterion, choosing which ones should be bought long or sold shortapply_tradesapplies any series of trades to any set of symbols, yielding a portfolio simulation- command prompt options:
- none (does not need any)
- performance.py
betacalculates asset/portfolio betasharpe_ratiocalculates the Sharpe ratio for a given portfolio- proxy for risk-free rate is the 3-month US T-bill
treynor_ratiocalculates the Treynor ratio for given portfolio- proxy for risk-free rate is the 3-month US T-bill
returns_valuationvalues the portfolio (initial value, final value, and return) against a benchmark (such as an index)- command prompt options:
- none (does not need any)
- command_parser.py
get_generic_from_promptsgets any non-tickerverse prompt from a list of command promptsget_tickerverse_from promptsreturns a tickerverse and its name, from a list of command prompts- command prompt options:
- none (does not need any)
- io_support.py
get_current_symbolslooks for stock ticker symbols in the files within directorymemory_checkverifies if file occupies too much space in RAMmerge_chunkedinner-joins a small dataframe (left) with a large one (right), the latter being read in chunkswrite_as_appendwrites dataframe to file path in append mode- command prompt options:
- none (does not need any)
- bonds.py
periodic_compoundcalculates the bond discounting factor under periodic compoundingcontinuous_compoundcalculates the bond discounting factor under continuous compoundingfixed_rate_bondcalculates the initial bond price for zero- and non-zero-coupon fixed-rate bonds
- ticker_universe.py
obtain_parse_nasdaqgets the Nasdaq 100 stocks from stockmonitor.com.obtain_parse_wikigets either the S&P 500 or the Dow 30 stocks from Wikipedia.obtain_parse_mutual_fundsgets the top 25 mutual funds from marketwatch.com.obtain_parse_etfsgets the top 100 ETFs from etfdb.com.- command prompt options:
- none (does not need any)
- plotter.py
feature_plotplots a file of features as a correlation heatmapcandle_plotplots a single asset in candlestick formprice_plotplots a single asset against any number of prices, trends, indicators, etc.- command prompt options:
-symbol: symbol of the asset to plot-folderPath: location of folder to look for files-function: distinguishes between intraday, daily, weekly, etc. downloads-intervalspecifies what kind of intraday (1min, 15min, etc.)-startDate: start date of data to plot-endDate: end date of aforementioned-column: choice of price or volume to plot-candlestick: choice to use candlestick plot instead of typical plot