Data Migration – Plan the Plan

Implementing new systems with old data – basic elements to include in your data migration plan

Implementing new systems is part of any IT life cycle. Business must continue to increase productivity and IT systems are a key part of many improvement efforts. It is typical that the new system must accommodate the old data to make the transition work effectively. This requirement means that many projects have a critical and complex sub-project to migrate the old data to the new system. Any migration requires the use of an ETL process (extract, transform and load), but system upgrades or replacements tend to be the most complicated. They typically must be done in a short period of time and require full migration or there is a high risk of data obsolescence and/or extra effort by existing operations. Changing systems is an added stress to an operation so data challenges add to the extra work that is required. Further complicating the process is the need to extract all the rules for the data from the old system and convert it with the appropriate codes, categories, history, formats and other dependencies. Such conversions require an understanding of the old system which may not be fully documented or understood by current staff. The old rules may not easy translate to the new system structure. Just because a system no longer meets the needs of an organization it does not mean that the old data is no longer needed and certainly does not simplify the work required to transfer this information to the new system.

Almost every ETL process needs to ensure scrubbing and validation. Migrations for system upgrades typically require special work to migrate a complete transaction record.   As opposed to ETL work to data warehouses used for data analytics or other accountability usage, the data cannot be stored as a translated final result, but rather is converted in the state of the transaction as used by the new system. The new system must be able to complete any workflow remaining (e.g. , accounts receivable) and provide a complete history of all transactional events.

While the job has many challenges, there is a finite set of data to be converted so a plan can be made and the conversion may be tested prior to the go-live. Additionally, the entire process can be scripted; timed and documented so very accurate schedules can be made to assist during the high-stress conversion. The following items are a summary of items that can be included in such a plan.

We recommend that a data migration plan address business rules, data categorization, quality assurance, processing (the ETL functions of the migration), migration optimization and the iteration process typical in migration work.

Business rules

Experts – Finding the staff that know the rules and getting access to their time is essential to identifying the rules unless there is already a well-documented set of rules and the data complies to them well. Getting the buy-in of their managers as well as the staff themselves is important. As with any project, setting proper expectations and deadlines is helpful to avoid unfocused meetings that consume too much time. Of course, the issue of complicated and ‘fuzzy’ rules makes for situations that can consume a great deal of time. For the sake of your team spirit and the time budget a good facilitator may be needed to document all issues and divide the list into those that can be addressed individually and those that require the team. Bringing sample data as well as defining the business rules may be necessary to address the most complicated issues.

Old rules – Identifying and defining the existing rules is mixture or art and science. The longer the legacy system has been in place, the more challenging rule definitions can be do identify. There is often a series of rules that have been applied in code, databases and procedures. Documentation of course may be missing or incomplete. Access to experienced staff is a great place to begin, but a good data extract sometimes is an equally good place to start.  Finding unique codes and identifying relationships are smart ways to ensure the experienced team members have a full set of data categories to examine and explain. Getting a complete scope of work makes planning more reliable.

New rules – It is recommended to create validations on the load process to enforce the rules as a final check to reduce bad data inserts. Many commercial systems provide smart load software to enforce data rules and reduce errors.

Bad Data – Once the old rules are identified and the rules established, then the old data must be reviewed to ensure that it follows the rules as defined. The same data processing that will be used for the migrated data may be applied as the old data is processed. The old system may have had special rules established to handle bad data or the system may have just failed. Rules for what data may be safely discarded must be established or a remediation process must be applied.  Documentation started early and well organized will save time in later stages.

Categories – The more that the rules can be classified or categorized the easier it will be to communicate with the team and establish processes for the migration. For example, a Telecom migration of billing would have voice, data, video and security as major service areas. Voice sub categories are residential and business. Then services lines fit within these categories.  For example a possible definition may start with an outline and then work the rules. In the example below, older services for a Telecom may have mixed rules as new technologies allow services to be blended together. Identifying old data structures to match services and map to the new structure can assist to identify rules that must defined to migrate the data.

 Sample outline of a service category work sheet

Data migration category worksheet sample image

 Planning the QA

 It may seem counter-intuitive, but many teams work on the QA rules before they plan the transformation development. One benefit is to allow the ETL team to fully understand the conversion rules as they will know the tests that must be applied. While some argue that a team gets a better result if the final testing criteria are not provided to the development team, others argue that automation of testing is the best approach to success. Either way, make sure the full testing rules are established along with the experts to ensure all testing is accomplished whether during the migration afterwards or in both ways. Here is an outline of items to consider when draft testing scenarios and tests:

Data migration test plan outline

Processing

With the business rules identified and the testing scripts and scenarios defined, the migration process is well organized and straight-forward to complete. There are many tools available to use for the process. Our toolset includes Pentaho Data Interchange and custom programming. Most Big Data companies provide ETL and migration tools. Some companies specialize in migration and many large software vendors provide loading tools and programming aids to ensure only good data is loaded. Of course, spreadsheets are a tool any experienced data analyst will use when a quick sample data set needs to be sorted and shared.

As with any good development project, begin with the end in mind. Build in your validation and exception handlers from the start so that it is easy verify success and address data exceptions. Assume that you will find more rules as you go and you need to account easily exceptions as they occur. A little extra work in the beginning of a big migration project saves hours if not weeks before the job is done. When the job is really complex we write our load information into a database and create a quick Pentaho CTools dashboard so we can look at the load results to motivate the ETL team to speed things up.

Sometimes a temporary solution or an alternative to a full migration is used so that historical data is not fully migrated. This approach requires the maintainence of the legacy system in ‘view’ mode to allow users to see the history. Some teams determine that the new system is so different that the complete history is not justified. Within a few years the new system will have better and more useful history. In this situation, the old system is left running in ‘view’ mode so history can be checked as needed or until the full migration can be done.

Conversion optimization

The faster the data can be converted the less time you need to shut down an operation during a migration. There are lots of ways to speed up a migration. Most teams focus on the business rule development first, and then work on optimization. If your plan calls for lots of testing iterations, you may want to look at steps to optimize early to allow for more testing cycles. For an outline how to approach optimization please see the blog – ETL Performance troubleshooting with Pentaho Data Interchange. While the blog discuss Pentaho PDI, the concepts apply to most ETL or migration efforts.

Testing iterations

The conversion of data can be done until perfected or until the amount of data that cannot be migrated is small enough that manual conversion is faster. As with the initial business-rule-definition phase, having a clear scope of work and timeline can help to motivate the team towards a finished result. Each test will yield results that may warrant a partial repetition of the process. That is, results may indicate that business rules have been missed, misunderstood, improperly applied to the transformation or incompletely tested.

It is normal to find exceptions and additional rules during migration; especially as the process moves from the early partial migrations to full migration testing. It is important to categorize and attack the issues holistically. A good spreadsheet with sorting of exceptions can make a massive set of exceptions into palatable set of problems that may warrant a programmatic solution.

For large projects that require a large team to complete the migration, track the status of your key conversion. Break the data into categories and report exceptions, migration times and other key variables. Establish goals for migration execution speed and for exception elimination. Sometimes a basic dashboard can show the team progress and keep them motivated when exception counts feel too high.

Summary

Establish the parameters for the project. Break the plan into phases and establish goals for migration execution, data exception and insertion rates over time. Document the rules and decisions that are made and ensure they are kept up-to-date and referenced when group decisions are required. Specify the steps, be specific with the goals and publish the results along the way. Informed team members make good decisions and get results.

Learn More

Agile Dashboards with Pentaho CTools

Dashboards are very powerful tools to implement tracking and accountability. They provide a quick and simple means to share data to assess performance objectively – identify strengths and weaknesses one of the key data points to help with accountability. Their use is easily justified in large organizations with important goals and often warrants extensive investments in time and money. With an Agile approach, time and money required can be greately reduced. And somewhat surprisingly, the faster build often yields more highly tailored solutions to meet even complex tracking and accountability rules.

We have found the following approach allows non-technical business managers to work with highly technical teams to achieve tracking and accountability in a straight-forward and cost-effective manner. These steps will work with any dashboard tool that allows changes to the layout, look and feel, data queries and data sources in a flexible environment. Our tool of choice is Pentaho CTools. Whenever possible we start with a template so that the team moves more quickly to touching and using ‘real’ data to explain their needs and confirm the desired user experience.
[callout size=’col-12′ last_column=’false’ title=’Phase I – Getting started – define the dashboard project’ description=”

  1. Determine the goal
  2. Establish a target date
  3. Determine the team
  4. Establish daily or frequent meeting times. Establish face to face or virtual meetings to enable the dashboards to be viewed and used.
  5. Establish possible sources of data
  6. Write a short summary, confirm with team and move into Dashboard Creation

” flip_right_edge=’true’][clear]

Phase II – Start with a template

A working dashboard is an easy way to get user buy-in and saves time when trying to explain the options available to the users. Allowing a user to click and observe is quick and easy. Start with a template when possible. If you have a new team, get help. If you have a similar model, then use it. If do not have a base dashboard, consider the purchase a sample that is similar to what you need. Identify the changes that you need to be made and document them. Be clear – writing it down and getting confirmation saves time in the next phase.

Phase III – Get or create sample data

Whenever possible, build a sample data set to show in your template dashboard. In this way, the users can confirm you understand the dashboard logic and summary points or clarify their needs. It may be possible to create sample data in a spreadsheet so that you can easily create any needed data to test all the filtering, drilldown and summary items. If the template can attach to a spreadsheet, it is often possible to cycle through different data options quickly to keep the user engaged. Avoiding extra meetings or long responses can keep the business users motivated.

Phase IV –  Identify all the data options that you need

  1. Captions for the filters and columns
  2. Summary rules (total, average, min, max, etc.)
  3. If possible, get your corporate branding so the dashboard looks as professional your team deserves. With Pentaho CTools the CSS utilization makes design a snap for most web-based developers.
  4. Filter options – get specific with your users. Ask them to define the filter options. Here are typical approaches:
    1.  Dynamic selection based on the data available
    2.  Pre-determined list – just in case you want to show that some options are not present, but the user knows the data is expected.

Free-form options to a quick drill down. So many times a manager needs to get data quickly to address a problem. Make it easy when you can!

Phase V – Rules for invalid data handling

One of the important items that can easily be neglected is ‘bad data’ handling. In the rush to build the dashboard to meet requirements, make sure there is a step of validation. Users can be very helpful to define how to deal with exceptions as well as what defines the exception. An experienced data developer should also apply a process to ensure all data has been presented and fully included.

Phase VI – Iterate to final

Here lies the fun and the challenge. There is a fine balance between making it great and making it perfect. One must make sure it serves the purpose, but also not get carried away with the power of cyclical development. Review your goals and see if you have them achieved. If so, it may be wise to declare success and start a new project with a new goal. Celebrating a real victory is always helpful.

Summary

Use a template approach and start with a subset of data that is easily manipulated, is representative of the real data to be presented and has enough categorical diversity to demonstrate the desired filters and sorting. Additionally, there needs to be enough detail to enable a reasonable test bed for drill down demonstrations.

Warning: While AGILE approaches can greatly enhance both the time to solution, the innovation and the customer satisfaction, there are times when the agile approach may not best fit your needs. If developers are unfamiliar with the tools, the fast turn-around beneficial to understanding may lead to excessive stress for developers and frustration for non-technical users that must interrupt their daily schedule. New and untested staff may not respond well to the time pressure. Also, an untested development environment and/or use of production systems my lead to wasted time for the agile team or the production users.

Learn More