Data Migration – Plan the Plan
Implementing new systems with old data – basic elements to include in your data migration plan
Implementing new systems is part of any IT life cycle. Business must continue to increase productivity and IT systems are a key part of many improvement efforts. It is typical that the new system must accommodate the old data to make the transition work effectively. This requirement means that many projects have a critical and complex sub-project to migrate the old data to the new system. Any migration requires the use of an ETL process (extract, transform and load), but system upgrades or replacements tend to be the most complicated. They typically must be done in a short period of time and require full migration or there is a high risk of data obsolescence and/or extra effort by existing operations. Changing systems is an added stress to an operation so data challenges add to the extra work that is required. Further complicating the process is the need to extract all the rules for the data from the old system and convert it with the appropriate codes, categories, history, formats and other dependencies. Such conversions require an understanding of the old system which may not be fully documented or understood by current staff. The old rules may not easy translate to the new system structure. Just because a system no longer meets the needs of an organization it does not mean that the old data is no longer needed and certainly does not simplify the work required to transfer this information to the new system.
Almost every ETL process needs to ensure scrubbing and validation. Migrations for system upgrades typically require special work to migrate a complete transaction record. As opposed to ETL work to data warehouses used for data analytics or other accountability usage, the data cannot be stored as a translated final result, but rather is converted in the state of the transaction as used by the new system. The new system must be able to complete any workflow remaining (e.g. , accounts receivable) and provide a complete history of all transactional events.
While the job has many challenges, there is a finite set of data to be converted so a plan can be made and the conversion may be tested prior to the go-live. Additionally, the entire process can be scripted; timed and documented so very accurate schedules can be made to assist during the high-stress conversion. The following items are a summary of items that can be included in such a plan.
We recommend that a data migration plan address business rules, data categorization, quality assurance, processing (the ETL functions of the migration), migration optimization and the iteration process typical in migration work.
Experts – Finding the staff that know the rules and getting access to their time is essential to identifying the rules unless there is already a well-documented set of rules and the data complies to them well. Getting the buy-in of their managers as well as the staff themselves is important. As with any project, setting proper expectations and deadlines is helpful to avoid unfocused meetings that consume too much time. Of course, the issue of complicated and ‘fuzzy’ rules makes for situations that can consume a great deal of time. For the sake of your team spirit and the time budget a good facilitator may be needed to document all issues and divide the list into those that can be addressed individually and those that require the team. Bringing sample data as well as defining the business rules may be necessary to address the most complicated issues.
Old rules – Identifying and defining the existing rules is mixture or art and science. The longer the legacy system has been in place, the more challenging rule definitions can be do identify. There is often a series of rules that have been applied in code, databases and procedures. Documentation of course may be missing or incomplete. Access to experienced staff is a great place to begin, but a good data extract sometimes is an equally good place to start. Finding unique codes and identifying relationships are smart ways to ensure the experienced team members have a full set of data categories to examine and explain. Getting a complete scope of work makes planning more reliable.
New rules – It is recommended to create validations on the load process to enforce the rules as a final check to reduce bad data inserts. Many commercial systems provide smart load software to enforce data rules and reduce errors.
Bad Data – Once the old rules are identified and the rules established, then the old data must be reviewed to ensure that it follows the rules as defined. The same data processing that will be used for the migrated data may be applied as the old data is processed. The old system may have had special rules established to handle bad data or the system may have just failed. Rules for what data may be safely discarded must be established or a remediation process must be applied. Documentation started early and well organized will save time in later stages.
Categories – The more that the rules can be classified or categorized the easier it will be to communicate with the team and establish processes for the migration. For example, a Telecom migration of billing would have voice, data, video and security as major service areas. Voice sub categories are residential and business. Then services lines fit within these categories. For example a possible definition may start with an outline and then work the rules. In the example below, older services for a Telecom may have mixed rules as new technologies allow services to be blended together. Identifying old data structures to match services and map to the new structure can assist to identify rules that must defined to migrate the data.
Sample outline of a service category work sheet
Planning the QA
It may seem counter-intuitive, but many teams work on the QA rules before they plan the transformation development. One benefit is to allow the ETL team to fully understand the conversion rules as they will know the tests that must be applied. While some argue that a team gets a better result if the final testing criteria are not provided to the development team, others argue that automation of testing is the best approach to success. Either way, make sure the full testing rules are established along with the experts to ensure all testing is accomplished whether during the migration afterwards or in both ways. Here is an outline of items to consider when draft testing scenarios and tests:
With the business rules identified and the testing scripts and scenarios defined, the migration process is well organized and straight-forward to complete. There are many tools available to use for the process. Our toolset includes Pentaho Data Interchange and custom programming. Most Big Data companies provide ETL and migration tools. Some companies specialize in migration and many large software vendors provide loading tools and programming aids to ensure only good data is loaded. Of course, spreadsheets are a tool any experienced data analyst will use when a quick sample data set needs to be sorted and shared.
As with any good development project, begin with the end in mind. Build in your validation and exception handlers from the start so that it is easy verify success and address data exceptions. Assume that you will find more rules as you go and you need to account easily exceptions as they occur. A little extra work in the beginning of a big migration project saves hours if not weeks before the job is done. When the job is really complex we write our load information into a database and create a quick Pentaho CTools dashboard so we can look at the load results to motivate the ETL team to speed things up.
Sometimes a temporary solution or an alternative to a full migration is used so that historical data is not fully migrated. This approach requires the maintainence of the legacy system in ‘view’ mode to allow users to see the history. Some teams determine that the new system is so different that the complete history is not justified. Within a few years the new system will have better and more useful history. In this situation, the old system is left running in ‘view’ mode so history can be checked as needed or until the full migration can be done.
The faster the data can be converted the less time you need to shut down an operation during a migration. There are lots of ways to speed up a migration. Most teams focus on the business rule development first, and then work on optimization. If your plan calls for lots of testing iterations, you may want to look at steps to optimize early to allow for more testing cycles. For an outline how to approach optimization please see the blog – ETL Performance troubleshooting with Pentaho Data Interchange. While the blog discuss Pentaho PDI, the concepts apply to most ETL or migration efforts.
The conversion of data can be done until perfected or until the amount of data that cannot be migrated is small enough that manual conversion is faster. As with the initial business-rule-definition phase, having a clear scope of work and timeline can help to motivate the team towards a finished result. Each test will yield results that may warrant a partial repetition of the process. That is, results may indicate that business rules have been missed, misunderstood, improperly applied to the transformation or incompletely tested.
It is normal to find exceptions and additional rules during migration; especially as the process moves from the early partial migrations to full migration testing. It is important to categorize and attack the issues holistically. A good spreadsheet with sorting of exceptions can make a massive set of exceptions into palatable set of problems that may warrant a programmatic solution.
For large projects that require a large team to complete the migration, track the status of your key conversion. Break the data into categories and report exceptions, migration times and other key variables. Establish goals for migration execution speed and for exception elimination. Sometimes a basic dashboard can show the team progress and keep them motivated when exception counts feel too high.
Establish the parameters for the project. Break the plan into phases and establish goals for migration execution, data exception and insertion rates over time. Document the rules and decisions that are made and ensure they are kept up-to-date and referenced when group decisions are required. Specify the steps, be specific with the goals and publish the results along the way. Informed team members make good decisions and get results.