Skip to content

Election Lines and Name Frequencies


I voted this morning. Far more interesting to me than anything on the ballot was the line for voting. Or rather the four lines.

To obtain a ballot you stood in a line devoted to a portion of the alphabet. Last names beginning with A-D, E-K, L-R, and S-Z each had their own line. I stood waiting in my line (L-R) behind 7 or so people, and in the entirety of my time in line not more than a single person entered any of the other lines.

Perhaps the nice volunteer handing out ballots for L-R was just very slow. But another possibility is that the way names were grouped created an imbalance of line lengths. I can think of lots of last names staring with L-R, but fewer for E-K for example. This could be important because many people may be discouraged from voting if lines are too long.

Proportion of people whose last names start in different parts of the alphabet

To test whether or not the groupings they chose were reasonable, I downloaded some of the 2010 census data on names. They have compiled a file of last names and how many times they occur. This file only includes names that have appeared nationwide at least 100 times, so extremely rare names are not represented.

I counted the number of individuals with last names beginning in each of these four groupings, and the proportion of names falling in each grouping are shown in the bar plot. As you can see there is a clear overrepresentation of people whose last names start with L-R, and an underrepresentation of names starting with S-Z.

But these census data are nationwide; I was unable to find data specific for Durham county (North Carolina). It is possible that the distribution of names in Durham is different from that nationwide. The reason is the ethnic composition of the area.

For example, in California where the Asian population is very high, I would expect the frequency of names beginning with S-Z to be very high, because there are many Chinese surnames that begin with W, X, Y, and Z. Durham county, and especially my precinct, is a largely black community. If the last names of blacks in Durham tend to begin with L-R more than the national average, this could magnify the already existing bias.

The probability of ethnicity given your last name

Whether or not this is true is not an easy question. The 2010 census data show that some names are very indicative of ethnicity (see the table reproduced here). For example, If your last name is Yoder the chance you are white is 98.1%. There are many such names that correlate closely with ethnicity for whites, Asians, and Hispanics. But among blacks the names are not as telling. Besides Washington (89.9%), Jefferson (75.2%), and Booker (65.6%), there is no name in the dataset where the probability of being black given that you have that name is over 60%. This is not at all true for the other ethnicities. This means that black names are perhaps more likely to represent the national average than are those of other ethnicities.

Perhaps most interestingly to me, this was the kind of problem that would be easier addressed in an analog fashion than a digital one.  If I only had a Durham county phone book, I could simply count the pages of names A-D, E-K, L-R, and S-Z.  This would have controlled for geography and demography, and would have probably been faster too!

One Comment leave one →
  1. 2012.11.06 09:05

    I’m shocked that you’ve managed to find a use for a phone book. I thought those days were behind us. Congratulations on this seemingly impossible achievement, Chris.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: