Terry's TMG Tips

Merging Projects and Data Sets

This page updated 30 Apr 2005

Version note: Applies to TMG 8 & 9

Users occasionally find that they have two Projects that they want to combine, or two Data Sets that they want combine. The separate Projects or Data Sets may have originated because the user initially entered data into them separately. Or perhaps more commonly, one receives a data file from another researcher and wants to combine that information with one's existing Data Set.

Because the process seems a bit complex, and because users need to use this process only rarely, the subject seems to generate considerable confusion. This article explores how the merge process works, illustrates the steps involved, and provides some cautions and thoughts for consideration. It first covers merging of Projects, then merging Data Sets. If you are not comfortable with the terms "Project" and "Data Set" you may find my article on Projects and Data Sets helpful.

Topics Included in this Article
Merging Projects An overview on Merging Projects
Mechanics of Merging Projects Merging Projects, step-by-step
Merging Data Sets An overview on Merging Data Sets
Mechanics of Merging Data Sets Merging Data Sets, step-by-step
Potential Conflicts Issues that can arise in Merging Data Sets
Using Copy Persons Consider using Copy Persons rather than Merging Data Sets
Merging a Good Practice? Should you? An editorial

While there are a number of reasons for needing to merge Projects, and Data Sets, for purposes of the examples here we will assume that we are dealing with comparing data received from another researcher with our own existing data (but see my editorial below for reasons why you might not really want to do this). The process is the same regardless of the reason, but this assumption helps maintain consistent labeling in the various steps of the examples for clarity.

Merging Projects

We will start with two Projects that we want to merge, perhaps to allow easier analysis of the people they contain, to allow merging of the Data Sets in them, or to allow copying of specific people from one Project into the other. We will assume one is our regular Project, and one is one we received from a cousin.

You would need to work with separate Projects in this case only if you receive (or create) separate Projects. If you receive a GEDCOM file or a file from another genealogy program, you can import them directly into your existing Project and avoid the step of merging Projects. To import into an existing Project, either 1) import from the Data Set Manager, accessed from the File menu, or 2) with your existing Project open, use the File > Import command. Then, at Step 3 of the Import wizard, choose "Add to the current project," which is the default choice.

We start with our two Projects:

My Project
Cousin Bob's Project
My Project
Cousin Bob's Project

After we Merge the two Projects, we find this result:

My Project
Cousin Bob's Project
My Project
Cousin Bob's Project

The result may not be exactly what you expected. We still have two Projects. One contains both Data Sets formerly found in both, but the second remains unchanged. In other words, the Data Set(s) in one project are copied to the other. The settings you choose in the Merge screen determine which becomes the combined one and which remains unchanged (see the next section for details).

Note that while "My Project" now contains all the people formerly found in both Projects, those people remain in their respective Data Sets. That's good if you want to examine the people from "Cousin Bob's Project" and possibly copy a few into your working data. But if you intend to fully merge and interconnect all the people, you must now merge the Data Sets, as described in a section below.

Since the two Data Sets remain separate, you can examine them together, copy people or information between them if you choose, and delete the second Data Set, leaving your original Data Set untouched. Also, because they remain separate, there are no possible conflicts with different Tag Types, Source Types, or any other differences in custom settings between the two Data Sets, as do occur when you merge Data Sets, or copy people from one to the other.

Mechanics of Merging Projects

Now we will look at the actual steps required to merge two Projects, as depicted schematically above. We start by opening one of the two Projects. Then, open the Merge Projects screen, using the File > Merge Projects on the menu:

Merge projects screen

The Project that was already open appears on the upper line, as "Project A." We enter the second Project as "Project B," either by typing in the path and file name, or clicking the [...] button on the right and selecting it. The next step is the key one in defining which Project becomes the combined Project containing all the Data Sets from both. In the box below the Project names, choose "Import B to A" or "Import A to B." I find the comment in parentheses to be helpful in figuring out which to choose – the one that "remains intact" will be unchanged; the other becomes the combined Project. In this case we choose to have Cousin Bob's Project "remain intact" so that we get all the Data Sets copied into our original Project.

After the merge, if we open Data Set Manager (File menu) we see that there are now two Data Sets in "My Project:"

Data Set Manager

If we want, we can select the new Data Set and click the Edit button and change the name of the Data Set to better tell us what it is, as I have here. We could also enter a note in the Data Set Memo to describe where the Data Set came from.

Since there are now more than one Data Set in the Project, each person's ID# is now preceded with the Data Set number. We can see that any place ID#s are displayed, for example on the Picklist:

Expanded Picklist

In this case, both Data Sets contain what would appear to be the same people. Note for example Mary Smith, who appears both as ID# 1:2 and ID# 2:2. This means there is a person named Mary Smith in each of the two Data Sets.

Next we will consider how to merge the Data Sets within a Project.

Merging Data Sets

We will start with a Project with two Data Sets. In our example it will be the Project created above by merging two Projects. But it could just a well be a Project into which we have imported a GEDCOM file we received, or one in which we initially created two Data Sets for entering two different family lines and have now decided we want them combined. Our Project looks like this:

My Project
My Project

After we merge the Data Sets, we have this:

My Project
My Project

Note that like after we merged Projects, we have one Data Set that contains all the people previously in both Data Sets, and also have the second Data Set unchanged. That is, the people in one Data Set are copied to the other. The settings you choose in the Merge screen determine which becomes the combined one and which remains unchanged (see the next section for details).

All the people formerly in Data Set #2 will appear twice in the Project - once in the combined Data Set and once in the original. You will likely want to delete (using Data Set Manager) the second Data Set after the Merge operation.

Mechanics of Merging Data Sets

Now we will look at the actual steps required to merge two Data Sets, as depicted schematically above. We start by opening the Data Set Manager, from the File menu:

Data Set Manager

We see the two Data Sets, as we did above. Note that "My Data Set" is selected. Now we click the Merge button to open the Merge Data Sets screen:

Merge Data Sets screen

We would select the Data Sets to be merged from the drop down lists opened with the buttons on the right, but with only two Data Sets in the Project we don't have to do that as both are selected automatically, with the one that was selected in the Data Set Manager appearing as Data Set A. As with merging Projects, we then select which Data Set becomes the combined one containing all the people from both. In the box below the Data Set names, choose "Merge B to A" or "Merge A to B." I find the comment in parentheses to be helpful in figuring out which to choose – the one that "remains intact" will be unchanged; the other becomes the Data Set with combined set of people. In this case we choose to have Cousin Bob's Data Set "remain intact" so that all the people are combined in our original Data Set.

Note that after the Merge, we still see both Data Sets in the Data Set Manager:

Data Set Manager

That's because, as we saw in the schematic diagram, both Data Sets remain. After reviewing to see that everything worked as expected, you might want to come back to the Data Set Manager, select Data Set #2, and click the Delete button to delete that Data Set.

What is not evident from the Data Set Manager is that Data Set #1 now contains a copy of all those people who are also contained in Data Set #2. But we can see that by looking at the Picklist:

Expanded Picklist

We can see here that each person from Data Set #2 now appears twice. Look, for example, at Mary Smith. We see her as ID# 1:2 - that is her from our original Data Set. We also see her as ID# 2:2 - that is her as found in Data Set #2, which came from Cousin Bob's data. But, because we merged the two Data Sets, the Mary Smith from Cousin Bob's data now also was copied to Data Set #1, where she appears as ID# 1:6. Looking at the other people we see the same pattern, though for them Cousin Bob has entered slightly different given names.

If we decide, as appears likely here, that our Mary Smith and Cousin Bob's Mary Smith are the same person, we could then merge ID# 1:2 and ID# 1:6, because they are in the same Data Set. We cannot merge them with ID# 2:2 , because you cannot merge people in different Data Sets. Merging people is discussed in my article on Merging People.

Potential Conflicts when Merging Data Sets

When you merge Data Sets TMG must undertake a highly complex process, copying not only the people from one Data Set to the other, with all their attached Tags and Sources, but also dealing with potentially dissimilar Tag Types, Source definitions, Source Types, and Source Elements, among other issues. The user is more likely to obtain the intended result if he or she has a working understanding of some of the issues involved.

Some of the most significant considerations are discussed below. The extent to which these issues impact a specific situation depends mostly on the complexity of the data being merged. If both Data Sets use only default Tag Types, Source Types, and the like, there will be no issue. Even if you have customized some of these items, merging a Data Set created by importing a simple GEDCOM file with no sources into your own Data Set will create no issues, provided you merge the GEDCOM Data Set into your own, and not the reverse. By contrast, merging two TMG Data Sets in which extensive use has been made of TMG's advanced customization features will offer the most complications.

In most cases the exact result when you have such conflicts depends on which Data Set is Merged with the other. In the discussion below I use the terms "sending," "receiving" and "combined" Data Sets:

Receiving:
Data Set to which the people from the sending Data Set will be copied during the merge. It becomes the combined Data Set.
Sending:
Data Set from which people are copied into the receiving Data Set, to produce the combined Data Set. It is the one that "remains intact," that is, unchanged, after the merge process.
Combined:
Data Set after the merge that contains all the people formerly in both the sending and receiving Data Sets. The receiving Data Set becomes the combined Data Set after the merge.

Which Data Set becomes sending and which one is receiving is controlled by choice selected in the lower box on the Merge Data Sets screen, as shown in the screenshot above.

The potential areas of conflict, and how those conflicts are treated, are as follows:

If the Custom Category (default for new Projects created in TMG 5 or later) is used, all Source Types in the sending Data Set that are different in any way from those in the receiving Data Set will be copied to the receiving Data Set and will be applied to the copied Sources. If the two Data Sets have Source Types with the same name but different output templates, a number will be appended to the name of the one copied from the source Data Set to distinguish it from the other.

If the Evidence (Mills) or Cite Your Sources (Lackey) categories are used in the Project, and any of the default Source Types has been modified, the merge will not be allowed. You must follow the directions on screen that appears to 1) change the Source Category to Custom, then 2) Initialize to the Evidence or Cite Your Sources category before you can complete the merge operation. It is very important that you do both these steps, or your customizations to the default Source Types will be lost.

If both Data Sets have custom Flags with the same name, only one Flag of that name will remain in the combined Data Set. Each person will retain the Flag values that were set in the original Data Sets. If the allowable values for the Flag is different in the two Data Sets, the values in the receiving Data Set will prevail. However, persons from the other Data Set will retain their previous values, even if those values are no longer "legal" in the combined Data Set.

If one or both Data Sets has custom Flags not present in the other, those flags will be present in the combined Data Set. People from a Data Set that did not previously have that Flag will be set to the default value of the Flag.

Using Copy Persons Rather than Merging Data Sets

The preceding sections discussed merging two Data Sets, as you might do if you were combining two Data Sets you had created separately, or if you wanted to combine some received data with your own. But if you merge two Data Sets that contain a lot of the same people, you end up with a lot of duplicate people to merge, one-by-one. Or, you may want to include only a few of the people from the second Data Set. In these cases, consider using the Copy Person(s) function instead. It allows you to copy only selected persons from one Data Set to another, while merging Data Sets causes everyone to be added to the receiving Data Set. The Copy Person(s) function is described in my article on Copying or Moving People.

Is Merging Received Data a Good Practice? An Editorial

In this article, I've used as an example a Data Set containing data received from a cousin, showing how it can be merged into one's own Data Set. In fact, I cannot recommend this practice.

I've used it as an example because it is simple and thus serves well to communicate the process, and because it seems that at least some users want to do exactly what the example describes. There are some very good reasons to merge Data Sets, including merging Data Sets one originally created separately and now wants combined. But as a general matter, I believe that merging data received from another researcher into one's own Data Set is a poor practice.

"Why?" you might ask. "Isn't merging a received file a lot easier than inputting all the data manually?" Easier, yes, but not better, in my opinion, nor in the opinions of other experienced users who have commented on the issue on the TMG users e-mail list, TMG-L. Some of the reasons why I think it's better to reenter the data than to use Data Set Merge or Copy Person(s) to incorporate a received file into one's own data:

That is not to say that I advocate manually retyping all the data you receive from another researcher. That is not only a lot of work, but invites new errors. Rather, I use the standard Windows copy and paste functions to transfer names, dates, places and other items from the received record to their proper places in TMG screens, correcting them to meet my data entry standards as I go. This copy and paste technique works equally well whether the received data is in the form of a PDF, word processor or rich text file, a GEDCOM or other genealogy program file, or a TMG Project.

If the received data is any form of genealogy program file, import it or open it in TMG, preferably as a separate Project. Then open a second copy of TMG, with your Project in one and the received data in the other. Arrange the two copies of TMG so that at least some of each is visible on your screen. Then use Windows copy and paste functions to copy the data from the received Project to your own.


ReigelRidge Home Terry's Tips Home Contact Terry

 

The Second Edition of my sell-out book, A Primer for The Master Genealogist, is now available.

Details are can be seen here.

 

Copyright 2000- by Terry Reigel