Genetic Affairs’ AutoCluster now has “Enriched Surnames” feature!

I swear Dr. Blom’s brain never sleeps. His AutoCluster tool quickly and easily forms shared match clusters from DNA test results without users having to download any software. It should be in everyone’s DNA analysis toolbox. And thanks to that brain of his, now AutoCluster also analyzes the prevalence of all user-entered site profile surnames and locations. This new feature, called Enriched Surnames, is automatically included with any AutoCluster run and works with 23andMe and FTDNA. (It will not likely be implemented for Ancestry because of timing limits and that bear we’re trying not to poke.)

How does this Enriched Surnames feature work?

For each cluster, the user profile surnames are analyzed for number of occurrences in that particular cluster compared to the number of occurrences of that surname amongst all the shared matches included in the chart. From there, the most common surnames in a cluster are displayed in an Enriched Surnames summary chart by order of cluster number and in a Detailed Enriched Surname table.

Some clusters have no shared surnames or no surnames entered and so the list will not contain surnames for every cluster. Surnames within parenthesis, such as (Akin-Atkin) on Cluster 21 in the summary chart below, are grouped together based on close spelling. Because there are no checks on what the user enters for surnames, you may also see words that are merely misplaced notes. For instance, the (Father-Father’s) entry on Cluster 21.

The chart also includes user-entered locations if they meet occurrence thresholds. You’ll see Kentucky and Virginia in Cluster 11 and Poland in Cluster 31. Because the surnames and locations are combined into the same field, take care whenever a country that could also be a surname (like England) appears. And ignore all those Uniteds and States (unless of course…).

The summary chart for Enriched Surnames and locations is found between the AutoCluster chart and the AutoCluster match information table. Just keep scrolling.

Image of Enriched Surnames Summary Table

What else do we get with Enriched Surnames?

You also get a Detailed Enriched Surname Table. Scroll down! Here the clusters are listed in order of their “pvalue” as it relates to the overrepresented surnames. This means you see the more common names at the top. This data table lists each enriched surname by cluster and includes the total number of matches in the AutoCluster chart that also have this surname listed. In addition, it shows the number of matches with that name in this particular cluster. In the example below, Cluster 37, the name Stegemann-Stegman appears three times in the AutoCluster chart and three times in Cluster 37 (meaning, all of the Stegemann-Stegmans are in the same cluster).

If there are more members in the “all clusters” column than in the “this cluster” one, the surname will also be found in the user-entered surnames of one or more matches outside this cluster. To help link family branches together, seek the other instances of the surname. Use control-F to search within the page but this may not find them all. The all_matches spreadsheet would be a good place to search too, once Evert-Jan fixes a tiny issue with that.

Hover over the surnames for a pop-up that indicates how many total matches are in the cluster and how many of those matches have surnames and/or locations listed in their profiles. Surnames with multiple spellings are also broken down for number of each different spelling.

Detailed Enriched Surname Table Image

Why are Enriched Surnames cool?

A surname and location prevalence table included with an AutoCluster identifies surnames and locations of interest and saves the time and effort of doing that analysis yourself. Plus, quick recognition of known surnames can identify the likely branch for that cluster. But remember, not every tester enters surnames or location information. There may be numerous match descendants of Granny Zerhusen in the AutoCluster chart, but if none of them entered her surname, it cannot appear in the Enriched Surname table. What will Evert-Jan think of next? Stayed tuned.


