Toolsverse-Data management tools

Use Cases

Project description: implement repeatable migration process which copies 200 tables from strong Microsoft SQL Server database to the Oracle and HSSQLDB databases

Challenge

Customer needed to develop a repeatable and really high performance ETL process which will migrate data from the MS SQL Server database to the Oracle and HSSWLDB databases. ETL process copies data to the existing Oracle and HSSWLDB tables which include CLOB and BLOB columns. The average size of the source table was in the neighborhood of 105 rows but there were ~30 tables with 106 rows or more.

Solution

Customer chose to use ETL Framework and develop ETL process completely in Java, no XML-based language involved. The key factor was a decision to use streaming ability of the ETL framework along with batch processing and bind variables. Customer developed adaptive algorithm which controls how multiple extract and load threads are created by ETL framework. Toolsverse helped customer with examples and best practices. Ability to use bind variables was added to the framework upon request of this customer.


Project description: migrate legacy application from Microsoft SQL Server database to the MySql

Challenge

Customer needed to develop a data migration process to move data from a legacy application in the Microsoft SQL Server to the cloud based system with MySql database as a backend. One of the main requirements was that ETL process must create tables in the destination MySql database on the fly, using source tables in the Microsoft SQL Server as a template. Also, some destination columns must be renamed while the values of some others must be calculated.

Solution

Customer used Data Explorer ETL edition to interactively develop ETL scenarios for this project using ETL Designer. Customer used ability of the ETL engine to create tables on the fly and built-in JavaScript to define functions which calculate column values. We have added several usability enhancements to the ETL Designer upon request of this customer.


Project description: implement system which requests data from JSON based Web services and updates Oracle database

Challenge

Customer uses a system which requests data from the external JSON-based Web services and produces CRUD updates for their cloud based Oracle database. One response can include multiple updates which must be performed within a transaction. Customer was using specific code to communicate with each Web service and they wanted a more unified approach; most importantly, they wanted to get out of the business of developing and supporting custom ETL code.

Solution

Toolsverse developed a solution which is using the ETL Framework deployed to the AWS instance running Linux. We also created ETL scenarios for this customer. ETL is scheduled as periodic cron job which can automatically notify their IT stuff by email in case of any failures. We used the ability of the ETL framework to call Web services over HTTP and built-in notifications by email. JSON is one of the data formats natively supported by ETL Framework.


Project description: implement offline replication between Derby and DB2 databases

Challenge

Customer needed to implement offline asymmetrical two-way replication between Derby database (used as a back end for their downtime solution) and DB2 database used as OLTP. They wanted to have a batch job executed periodically in their data center and controlled by their IT personnel. They were going to extract data into XML, send it over the wire and load into the designated Derby database. Similarly, they wanted a completely automated and self-contained ETL process which can be initiated by the client to push data to the server when connectivity is restored.

Solution

Customer used ETL Framework to implement replication. They embedded framework directly into the client part of the application and they created a batch job to run in the data center. They used the ability of the ETL framework to extract and load data separately as two independent tasks. They chose Toolsverse XML format for serialization since it includes data and metadata such as column's data types. They also used ability of the ETL framework to create zip files and copy files using FTP protocol. The projected amount of data which need to be extracted and loaded was in the neighborhood of hundreds of megabytes, so the ability of the framework to extract from multiple sources and load into multiple destinations in parallel threads was a critical part of the project success.


Project description: implement reporting system which produces Excel reports and sends them to the remote SFTP location

Challenge

Customer had IT personnel to periodically create reports for upper management from the Oracle data warehouse.  They were using Excel and they were sending reports directly to the management by email. They wanted to make this process completely automated and they wanted a Web-based dashboard containing information about created reports and status. They still wanted to use Excel as a format for reports. 

Solution

Customer used Data Explorer Enterprise edition deployed on the Tomcat app server to develop and schedule ETL scenario which creates Excel reports by running extracts on the Oracle database. Scenario automatically updates HTML dashboard directly in the Web server and notifies IT by email in case of any failure. Customer used the ability of the ETL engine to copy files using SFTP protocol so reports created on the Tomcat are getting pushed to the Windows-based file server where they can be accessed by the management.  The ability to schedule and monitor ETL tasks was the key factor in choosing Data Explorer Enterprise edition.


Project description: implement data integration process which aggregates multiple comma delimited, XML and JSON files of different structures from several FTP sites, performs transformations, and creates a multi-dimensional JSON object which it then sends by email to the subscriber

Challenge

Customer needed a solution which will allow them to aggregate data from the different heterogeneous file-based data sources. They wanted to create a single nested multi-dimensional JSON object which they could send to the subscribers by email.  Customer didn’t want to stage data in the database so they needed an in-memory solution which can perform set operations such as join, intersect and union on multiple datasets. They also needed a way to “assemble” a multi-dimensional JSON object using different pieces of information.

Solution

Customer used Data Explorer Pro edition to design and create ETL scenario. Scenario is executed by ETL Framework deployed to the JBoss application server which handles transactions, scheduling and error recovery. They used built-in transformations such as “join”, “intersect”, “union” and “add dimension” to assemble object. They also used ability of the ETL framework to stream data so it can easily handle large datasets. The framework was embedded in the existing J2EE application, which among other things, is responsible for emailing files to the subscribers.  


Project description: implement repeatable interactive solution which can be used to read normalized data from the OLTP database and produce de-normalized report

Challenge

The database architect who works for large University is responsible for creating relatively complicated reports for the faculty members. Creating some of the reports involves writing SQL queries and denormalizing result sets so multiple rows for the same combination of keys become a single row with multiple columns.

Solution

Customer is using Data Explorer SQL edition to automate routine denormalization transformations. They save frequently used transformations and reuses them when needed. They run queries from Data Explorer and apply built-in denormalization. After transformation is applied, they export result sets to Excel. They also use the ability of Data Explorer to automatically split data into multiple files during export, so for example, different files can be created for each month of the year.


Project description: implement repeatable ETL process which copies configuration data from the production database to the QA environment and from the golden schema to the production database

Challenge

One of the tasks which this medium-size development shop frequently faces is a need to copy configuration settings for their application back and forth between different databases. In most cases they need to merge data because there are existing reference constraints in the database which cannot be ignored.

Solution

Customer developed ETL scenarios for different configuration domains. They manually execute scenarios when needed from the Data Explorer Pro edition. When creating scenarios, they use ability of the framework to execute database specific code (PL/SQL in this particular case). They also use column level functions which can automatically generate new primary keys using sequences and update referential constraints accordingly.


Project description: off the shelf solution which allows one to audit what SQL scripts are executed, by whom, when and what was the outcome 

Challenge

Customer needed a way to audit when particular SQL scripts are executed in the particular environments.  They wanted to know who and when executed the scrip and see the output log as it was available at the time of execution. Most importantly, they needed a substitut for the rather large selection of database development tools used across the organization.  One of the requirements was that different developers should have different level of access to the different environments or not have access at all. Also, they needed the ability to maintain a shared library of the frequently used SQL scripts.

Solution

The only database development tool with such capabilities available on the market is Database Explorer Enterprise edition. Since customer didn’t care about ETL functionality, they requested a quote for the Database Explorer Enterprise without ETL. All editions of the Data Explorer share the same components so configuring product for their particular needs was as easy as sending a new license. At that time, it saved customer a considerable amount of money and they are currently considering upgrading to the full Data Explorer Enterprise for a fraction of the list price.


Project description: data feeds from data warehouse

Challenge

Customer needed an easy way to create coma delimited data feeds from their data warehouse. These data feeds include financial information for legacy mainframe-based system and also used to communicate with the third party systems.

Solution

Customer is using Data Explorer Pro edition to create ETL scenarios in combination with ETL framework executed as a standalone batch process in their data center. Writing most ETL scenarios is easy as writing a single SQL statement since the framework take cares of the rest, including parsing and creating files, streaming, notifications, etc. Time needed to create and deploy scenarios was reduced significantly while productivity of the developers increazed as they don’t need to write custom code for each new data feed.


Project description: classic data warehouse ETL

Challenge

For years the customer used a home grown ETL process to populate data warehouse from the OLTP database. They were using a combination of SQL scripts, shell scripts and Oracle sql*load scenarios. It was working rather well for them until they reached a point when ETL became really slow due to the amount of data which needed to be moved from the OLTP to OLAP database daily. The designated time window become too small and they were facing constant delays. Customer was looking for a solution which will allow them to use commodity hardware and massive parallel processing for ETL.

Solution

Customer developed a solution which is using ETL Framework together with Hadoop MapReduce engine. In the end, they managed to cut ETL time in half without investing in new expensive hardware. The transition from separate SQL scripts and sql*load scenarios was relatively easy because they used ability of the ETL framework to execute native SQL and Oracle sql*loader. They also used Data Explorer ETL edition to create scenarios in the Visual ETL designer.


Project description: universal database development tool

Challenge

Working as a freelancer this customer was looking for inexpensive universal database development tool. He had very specific requirements for the tool, such as support for certain databases and ability to work with packages and stored procedures. Most importantly, as he was dealing with a lot of different databases, he was looking for a tool with any-to-any data migration capabilities built-in. He also needed the ability to view and edit data directly in the tool, including CLOBs and BLOBs, complex data types and arrays. Intuitive and familiar drag and drop interface was also a plus.

Solution

Customer selected Data Explorer SQL edition. There are a lot of great products on the market, so the choice he made was not easy.  The main selling point was the Data Migration Wizard.  This easy to use any-to-any data migration tool is integrated directly into Data Explorer. Customer is now able to perform data conversions without wring a single line of code. The fact that other requirements were also met certainly helped.


Project description: educational tool

Challenge

As an educational institution customer needed a tool to teach database management courses for undergraduate students. One of the requirements was ability to view and search metadata such as database objects, constraints, etc. Also they needed an easy way to query database so visual query designer was a big plus.

Solution

Customer received a free site license for the Data Explorer Pro edition which was installed in several labs. Toolsverse used it as an opportunity to beta test many useful database development and database management features which are currently available in the product.


Project description: integration testing and development support

Challenge

When developing a data driven application, creating an integration test can be a problem. You need to have data in a cohesive state in order to test reliably. Customer was experimenting with several testing frameworks such as DBUnit, but still was not satisfied. The data preparation process was mostly manual meaning that developers needed to describe all data needed for the test either in XML or by other means. Supporting tests become a full time job since every change in the database structure required changes in N amount of tests as well.

Solution

Customer developed integration testing framework which uses ETL Framework to prepare data for the tests. Particular cohesive sets of data are extracted upfront, zipped and later fed to the ETL Framework which updates data in the designated test database. The extract and load process are fully automated and require a small configuration file for each test or test suite. File includes table names and other parameters. Developers don’t need to include or exclude columns, think about dependencies, etc. Test database is getting flushed before continuous integration (CI) process runs entire test suite. Customer developed a large and growing list of ETL scenarios for different domains; as a positive side effect developers are now able to quickly get data from the production or QA environments to test and fix issues.