DEVSTORY #1 : Cron based Nymex commodity price ingest into energy data warehouse
A startup natural gas trading desk within a large midstream energy company (Martin Energy Trading, LLC), needed a repository of Nymex settlement prices for a large number of traded products.
Being an upstart trading desk they did not have a budget to procure information from data vendors. The traders, analysts, quants, risk managers, and mid-office personnel required a data warehouse to capture these daily settlement prices published by the Nymex. A simple greenfield solution built on open source technology was called for to meet the sophisticated needs of the trading desk that rivaled capabilities of the investment bank that the principals were accustomed to.
The traders needed the previous day’s settlement to calculate their daily end-of-day P&L estimates. The mid-office personnel needed prices from the previous day, trade inception date, current date, first of the quarter, first of the year, and settlement prices as of a particular date to report on P&L to risk managers, senior management, and accounting. Back office personnel needed the same information to reconcile invoices from brokers. Risk managers, quants, and analysts needed information in bulk as inputs to their analytics models.
To meet this requirement, it was decided to build Ubuntu 16.04 LTS cron driven python centric infrastructure that would retrieve data from the Nymex FTP site, store the files in long term storage, load the raw data into a MySQL table and then extract, transform and load (ETL) the data into an MYSQL 5.5 data warehouse.
Data marts were developed for each individual use case targeting traders, mid-office, back-office, risk managers, fundamental analysts, technical analysts, and business intelligence users. The election was made to put all processing into python scripts rather than use SQL. A significant amount of processing was done using the python pandas library and SQL Alchemy. Data was stored in a multi-model database platform. Analytics platform was a consumer of this data platform.
A single system of record for all Nymex settlement data was established in a few days. This system of record was integrated into spreadsheets using SQL, ETRM systems using a rest interface to the data marts, self serve analytics servers ( Superset and Tableau ), and Tier 1 enterprise reporting systems.
A significant win from this system was the elimination of discrepancies between reported values by various actors within the small firm. A significant number of errors requiring reconciliation were introduced because every actor had different ways of ingesting and processing the Nymex Settlement prices. Since the source was a single system of record those errors were eliminated, streamlining the end-of-day P&L reporting process. This win impacted the Natural Gas, Crude Oil, Refined Products, and NGLs desk across the larger firm.
The next generation of this system EDWIN is being built by Energiewerks Analytics, LLC on state of the art highly available kubernetes cluster based cloud infrastructure