Case Study: An Analysis of Unstructured Data

Situation

An organization approached CCNY for assistance in analyzing unstructured data (wide tables with repeating chunks of columns, no consistency in either number of columns or number of repeating chunks), while reducing the manual operations required and efficiently reproducing the analysis as needed. The organization faced an issue of not being able to load the data into a database management system let alone analyzing the data. The organization asked CCNY to (1) find a workaround for the problem of loading data into a database management system, and (2) transforming the unstructured data into structured data to be used for analysis.

Intervention

CCNY’s Research and Evaluation team worked with the customer to obtain a copy of the data set for analysis. CCNY utilized the data provided by the customer and the database management system tools to load the data into the software after modifications. We developed a code in the standard database management language (SQL) to convert unstructured data to a structured data set (long table with fewer columns). The result was a long code which required manual attention (editing) every time the import file was changed and took a long time to generate the code (table structure changes frequently). Though it would solve the problem, it was not an effective way. So, we further worked on developing a stored procedure (code that generates another code) which scans the input data and intelligently develops the code which wouldn’t need manual attention.

Result

The disorganized data was reorganized, using the stored procedure. The code scans the input data and develops another code, which when executed would transform the unstructured data into structured data that can be used for analysis. The customer used this stored procedure (one word), validated results and found results outstanding.  They were able to use this data for analysis with minimal effort, saving them hours of person power.