What the numbers mean, or, what the tools can and cannot doThe search tools can help you find alternate spellings for your favorite names or find names based on spelling and popularity. Skip them if you already know the spelling variants for all the names you want to research or if you want to use the names as we have grouped them. If you do use it, try out a few different settings for the similarity percentage until you are comfortable that you're probably not missing a real variation on your name.The ranking tools analyze the SSA lists to calculate "combined rank" and percentage occurrence for names and their spelling variations. When we report the "combined rank" of a group of similar names that you have entered we do not take into account the combined rank of any other groups of similar names. That is, the reported combined rank of "John, Jon, Jonathan" is number one in 2002, with a total of 33521 babies. It is only number one because we are comparing it to the single entry Jacob (30122 babies) on the original list rather than the combination "Jacob, Jake, Jakob" (35722 babies). With the caveat that our groups are arbitrary and subjective, the disclaimer about combined rank does not apply to the ranking of name groups when you use our arbitrary, subjective groupings. In those cases we are comparing the total counts for all names in a group against the total counts of other groups. Because the combined rankings can be difficult to really assign meaning to, it might be more useful to consider the percentages. Parents looking for unique, or at least less common, baby names can use the percentages as indicators of how many similarly named children they will come across in playgroups, schools, sports teams, etc. Knowing that there might be six (1.52% for single name in 2002) or seven (1.83% for similar names as we have them grouped) Jacobs in your son's graduating class of 400 in 2020 seems likely more relevant than knowing it is ranked number one. The accuracy of the calculated percentage is dependent of course on our knowledge of the sample population. For the single year top 1000 lists (1990-2003 for example) the SSA does not publish this information. To overcome this data deficit, we have extrapolated total population numbers from the population with names on the top 1000 and the ratios derived from the 1990s top 1000 decadal lists (where we do have both the number of babies with top 1000 names and the total number of babies in the SSA's 5% sample population). Those factors are 1.17 for boys and 1.34 for girls. Taking data from 1998, these numbers roughly check with the fact that 3.8M babies received SSNs in FY1998. When we report that babies with the names "Jacob" made up 1.52% of the population in 2002, we mean that they made up 1.52% of the approximately 1.98M boy babies who received SSNs. There were 1.7M boy babies with names on the top 1000 list that year: 1.7*1.17 ~= 1.98. According to the National Center for Health Statistics there were 2,057,979 boys born in 2002. Technical Details of the Name Similarity ToolThe similarity algorithm used for the pink tool on the search page is based on the fstrcmp (fuzzy string comparison) function in GNU diff, which in turn is based on the algorithm described in "An O(ND) Difference Algorithm and its Variations", Eugene Myers, Algorithmica Vol. 1 No. 2, 1986, pp. 251-266. Increase the similarity setting if you find that you are getting too many seemingly dissimilar results. Values of half-mostly (50-70%) have worked well in testing. The search lists consist of the 2327 (boys) and 2883 (girls) unique names that appear on the SSA top 1000 lists for 1900s-2003.How we determine the trend of a nameThe trend score is a weighted average of the yearly increase of a name's (or name group's) population over the past decade. The most recent five years are weighted more heavily than the preceding five years. If a name is off the list in a given year we assume a population equal to one less than the population of the last name that did make the list in that year. This allows for a conservative calculation of percent change in the first year of a name's appearance. Keep in mind that popularity and trend are not necessarily related. The hottest names (with average yearly increases in population greater than 100% in some cases) often get their high trend score by making a first appearance on lists at relatively high ranks (500 or so). Assuming a position of 1001 in the previous year this tends to represent a big jump in population. This can even occur for names not on the list in the most recent year if sometime over the past decade the name was very hot suddenly and then fell back off the lists. Names tend to drop off the list much more slowly. The lowest negative scores are typically 20-30%. |