Skip to content

USEtox/PROVESID

Repository files navigation

PROVESID

Documentation Status Tests Python 3.8+ License: MIT

PROVESID is a member of the family of PROVES packages that provides Pythonic access to online services of chemical identifiers and data. The goal is to have a clean interface to the most important online databases with a simple, intuitive (and documented), up-to-date, and extendable interface. We offer interfaces to PubChem, NCI chemical identifier resolver, CAS Common Chemistry, IUPAC OPSIN, ChEBI, and ClassyFire. We highly recommend the new users to jump head-first into examples folder and get started by playing with the code. We also keep documenting the old and new functionalities here.

Installation

The package can be installed from PyPi by running

pip install provesid

To install the latest development version (for developers and enthusiasts), clone or download this repository, for to the root folder and install it by

pip install -e .

Examples

PubChem

from provesid.pubchem import PubChemAPI
pc = PubChemAPI()  # Now with unlimited caching!
cids_aspirin = pc.get_cids_by_name('aspirin')
res_basic = pc.get_basic_compound_info(cids_aspirin[0])

which returns

{
  "CID": 2244,
  "MolecularFormula": "C9H8O4",
  "MolecularWeight": "180.16",
  "SMILES": "CC(=O)OC1=CC=CC=C1C(=O)O",
  "InChI": "InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)",
  "InChIKey": "BSYNRYMUTXBXSQ-UHFFFAOYSA-N",
  "IUPACName": "2-acetyloxybenzoic acid",
  "success": true,
  "cid": 2244,
  "error": null
}

PubChem View for data

from provesid import PubChemView, get_property_table
logp_table = get_property_table(cids_aspirin[0], "LogP")
logp_table

which returns a table with the reported values of logP for aspirin (including the references for each data point).

Chemical Identifier Resolver

from provesid import NCIChemicalIdentifierResolver
resolver = NCIChemicalIdentifierResolver()
smiles = resolver.resolve(compound, 'smiles')

OPSIN

from provesid import OPSIN
opsin = OPSIN()
methane_result = opsin.get_id("methane")

which returns:

{'status': 'SUCCESS',
 'message': '',
 'inchi': 'InChI=1/CH4/h1H4',
 'stdinchi': 'InChI=1S/CH4/h1H4',
 'stdinchikey': 'VNWKTOKETHGBQD-UHFFFAOYSA-N',
 'smiles': 'C'}

CAS Common Chemistry

# One-time API key setup
from provesid import set_cas_api_key
set_cas_api_key("your-cas-api-key")  # Configure once

# Then use anywhere without specifying API key
from provesid import CASCommonChem
ccc = CASCommonChem()  # Automatically uses stored API key
water_info = ccc.cas_to_detail("7732-18-5")
print("Water (7732-18-5):")
print(f"  Name: {water_info.get('name')}")
print(f"  Molecular Formula: {water_info.get('molecularFormula')}")
print(f"  Molecular Mass: {water_info.get('molecularMass')}")
print(f"  SMILES: {water_info.get('smile')}")
print(f"  InChI: {water_info.get('inchi')}")
print(f"  Status: {water_info.get('status')}")

which returns

Water (7732-18-5):
  Name: Water
  Molecular Formula: H<sub>2</sub>O
  Molecular Mass: 18.02
  SMILES: O
  InChI: InChI=1S/H2O/h1H2
  Status: Success

ChEBI

Access to the European Bioinformatics Institute ChEBI (Chemical Entities of Biological Interest) database. See the tutorial notebook.

ZeroPM Global Chemical Inventory

PROVESID now includes access to the ZeroPM global chemical inventory database, which provides information about chemicals listed in regulatory inventories worldwide. The database is automatically downloaded on first use:

from provesid.zeropm import ZeroPM

# Initialize - database downloads automatically if not present
zpm = ZeroPM()

# Query by CAS number
query_id = zpm.query_cas("50-00-0")  # Formaldehyde

# Get SMILES from CAS
smiles = zpm.get_smiles_from_cas("50-00-0")

# Search by chemical name
results = zpm.query_similar_name("formaldehyde", threshold=80)

# Query by regulatory inventory
eu_chemicals = zpm.query_by_inventory(inventory_name="REACH")

# Query by country
us_chemicals = zpm.query_by_country(country_name="United States")

# Get all available inventories
inventories = zpm.get_all_inventories()

# Get database statistics
stats = zpm.get_database_stats()

The database file (~400MB) is downloaded automatically from GitHub on first use and cached locally. You can also manually download it:

# Manual download (only needed if auto-download fails)
zpm = ZeroPM(auto_download=False)  # Skip auto-download
zpm.download_database()  # Manually trigger download

See the ZeroPM tutorial notebook for more examples.

ClassyFire

See the tutorial notebook.

Other tools

Several other Python (and other) packages and sample codes are available. We are inspired by them and tried to improve upon them based on our personal experiences working with chemical identifiers and data.

TODO list

We will provide Python interfaces to more online services. Please open an issue and let us know what else you would like to have included.