Project description: implement data integration process which aggregates multiple comma delimited, XML and JSON files of different structures from several FTP sites, performs transformations, and creates a multi-dimensional JSON object which it then sends by email to the subscriber
Customer needed a solution which will allow them to aggregate data from the different heterogeneous file-based data sources. They wanted to create a single nested multi-dimensional JSON object which they could send to the subscribers by email. Customer didn’t want to stage data in the database so they needed an in-memory solution which can perform set operations such as join, intersect and union on multiple datasets. They also needed a way to “assemble” a multi-dimensional JSON object using different pieces of information.
Customer used Data Explorer Pro edition to design and create ETL scenario. Scenario is executed by ETL Framework deployed to the JBoss application server which handles transactions, scheduling and error recovery. They used built-in transformations such as “join”, “intersect”, “union” and “add dimension” to assemble object. They also used ability of the ETL framework to stream data so it can easily handle large datasets. The framework was embedded in the existing J2EE application, which among other things, is responsible for emailing files to the subscribers.