DEVSTORY #15: RDBMS data cleansing: Using pandas/modin instead of SQL to clean ISO data for FTR desk
Clean data makes data scientists more efficient. Using simple tools data wrangling tools like Pandas/Modin, DASK, and Vaex instead of traditional SQL procedures can result in cleaner data and better analytics results
An enormous amount of analysis time was saved when the data was cleansed. An immediate increase in throughput was realized with analysts spending more time on analysis and less time on data cleansing. Since the window of time to put bids in is limited, an increase in the time afforded multiple network simulation and analysis passes to identify opportunities.
FTR Analysts source a significant amount of nodal data from the ISOs across the US. A large FTR market participant used automated scrapes to assemble data collected at 15 min, hourly and daily intervals across the following ISOs NEISO, NYISO, PJM, ERCOT, MISO, and CAISO. The extraction process was not very efficient as zip files were downloaded from the various ISO.
During the ingestion process using SQL, multiple rows for each node and date would result due to the inefficient ETL processes resulted in bad data. The FTR analysts would then have to clean data manually in their spreadsheets so as to prepare the data for analysis. Errors were a daily occurrence and had a direct impact on the effectiveness of the FTR analyst to identify trading opportunities.
An enormous amount analysis time was saved when the data was cleansed. Immediate increase in throughput was realized with analysts spending more time on analysis and less time on data cleansing. Since the window of time to put bids in is limited, increase in time afforded multiple network simulation and analysis passes to identify opportunities.
DEVSTORY #15: RDBMS data cleansing: Using pandas/modin instead of SQL details out the two-week process that saved us a lot of time cleansing data that we could then use to analyze the data to identify opportunities.