DIY Big Year: A Geeky Look At Data
Oct 2, 2017 | by Greg Miller
If you know me then you know I like numbers. A lot. Actually. I love data. It can be so powerful. But it can also be misleading and confusing.
In an early DIY Big Year post I told you about some eBird data that I have been wrangling with for over a year now. In this post I want to give you a peak at some different ways to look at the data I have collected.
Let’s have some fun with Top 10 lists. I have data rolled up for 299 Counties in the United States. The Counties are found all over. All 50 States are represented. Checklist data is from eBird (http://ebird.org) from 2006-2016 as of September 2016. Area and population data come from census.gov.
10 Largest Counties | ||
Rank | County | Area (sq mi) |
1 | San Bernardino County, CA | 20,057 |
2 | Coconino County, AZ | 18,619 |
3 | Nye County, NV | 18,182 |
4 | Kenai Peninsula County, AK | 16,075 |
5 | Mohave County, AZ | 13,311 |
6 | Inyo County, CA | 10,181 |
7 | Maricopa County, AZ | 9,200 |
8 | Pima County, AZ | 9,187 |
9 | Kern County, CA | 8,132 |
10 | Yavapai County, AZ | 8,124 |
10 Smallest Counties | ||
Rank | County | Area (sq mi) |
299 | New York County, NY | 23 |
298 | San Francisco County, CA | 47 |
297 | Suffolk County, MA | 58 |
296 | Richmond County, NY | 58 |
295 | Kings County, NY | 71 |
294 | Newport County, RI | 102 |
293 | Queens County, NY | 109 |
292 | Los Alamos County, NM | 109 |
291 | Clarke County, GA | 119 |
290 | Philadelphia County, PA | 134 |
10 Counties with Largest Population | ||
Rank | County | 2016 Population Estimate |
1 | Los Angeles County, CA | 10,137,915 |
2 | Cook County, IL | 5,203,499 |
3 | Harris County, TX | 4,589,928 |
4 | Maricopa County, AZ | 4,242,997 |
5 | San Diego County, CA | 3,317,749 |
6 | Orange County, CA | 3,172,532 |
7 | Miami-Dade County, FL | 2,712,945 |
8 | Kings County, NY | 2,629,150 |
9 | Dallas County, TX | 2,574,984 |
10 | Riverside County, CA | 2,387,741 |
10 Counties with Smallest Population | ||
Rank | County | 2016 Population Estimate |
299 | Cameron Parish, LA | 6,882 |
298 | Custer County, SD | 8,596 |
297 | Brewster County, TX | 9,200 |
296 | Mono County, CA | 13,981 |
295 | San Juan County, WA | 16,339 |
294 | Socorro County, NM | 17,027 |
293 | Mariposa County, CA | 17,410 |
292 | Inyo County, CA | 18,144 |
291 | Los Alamos County, NM | 18,147 |
290 | Teton County, WY | 23,191 |
10 Most Densely Populated Counties | ||
Rank | County | Population per sq mi |
1 | New York County, NY | 71,999 |
2 | Kings County, NY | 37,124 |
3 | Queens County, NY | 21,497 |
4 | San Francisco County, CA | 18,581 |
5 | Suffolk County, MA | 13,486 |
6 | Philadelphia County, PA | 11,692 |
7 | Richmond County, NY | 8,155 |
8 | Cook County, IL | 5,504 |
9 | Nassau County, NY | 4,782 |
10 | Bergen County, NJ | 4,031 |
10 Least Densely Populated Counties | ||
Rank | County | Population per sq mi |
299 | Brewster County, TX | 1.5 |
298 | Inyo County, CA | 1.8 |
297 | Nye County, NV | 2.4 |
296 | Socorro County, NM | 2.6 |
295 | Kenai Peninsula County, AK | 3.6 |
294 | Mono County, CA | 4.6 |
293 | Cameron Parish, LA | 5.4 |
292 | Custer County, SD | 5.5 |
291 | Teton County, WY | 5.8 |
290 | Coconino County, AZ | 7.6 |
10 Counties with Highest Number of Checklists | ||
Rank | County | Total Checklists |
1 | Los Angeles County, CA | 124,721 |
2 | Cook County, IL | 110,781 |
3 | Pima County, AZ | 104,968 |
4 | Tompkins County, NY | 89,995 |
5 | San Diego County, CA | 87,942 |
6 | Middlesex County, MA | 75,238 |
7 | King County, WA | 73,768 |
8 | Essex County, MA | 72,725 |
9 | Harris County, TX | 69,955 |
10 | St. Louis County, MN | 69,352 |
10 Counties with Lowest Number of Checklists | ||
Rank | County | Total Checklists |
299 | Custer County, SD | 2,034 |
298 | Hancock County, MS | 2,097 |
297 | Ward County, ND | 2,238 |
296 | Pulaski County, KY | 3,218 |
295 | Cass County, ND | 3,324 |
294 | Harrison County, MS | 3,527 |
293 | Dodge County, NE | 4,349 |
292 | Nye County, NV | 4,450 |
291 | Benton County, AR | 4,618 |
290 | Washington County, AR | 4,621 |
10 Counties with Highest Number of Species | ||
Rank | County | Total Species |
1 | Los Angeles County, CA | 494 |
2 | San Diego County, CA | 488 |
3 | Santa Barbara County, CA | 448 |
4 | Cochise County, AZ | 440 |
5 | San Francisco County, CA | 439 |
5 | Ventura County, CA | 439 |
7 | Cameron County, TX | 434 |
8 | Pima County, AZ | 431 |
9 | Orange County, CA | 429 |
10 | Humboldt County, CA | 428 |
10 Counties with Lowest Number of Species | ||
Rank | County | Total Species |
299 | Kauai County, HI | 141 |
298 | Hawaii County, HI | 155 |
297 | Honolulu County, HI | 171 |
296 | Spartanburg County, SC | 204 |
295 | Kanawha County, WV | 222 |
294 | Fulton County, GA | 231 |
293 | Chemung County, NY | 236 |
292 | Anchorage County, AK | 236 |
291 | Greenville County, SC | 237 |
290 | Herkimer County, NY | 241 |
10 Counties with Highest Number of Checklists per capita | ||
Rank | County | Checklists per capita |
1 | Brewster County, TX | 1.24 |
2 | Cameron Parish, LA | 1.09 |
3 | Los Alamos County, NM | 1.03 |
4 | Mariposa County, CA | 0.98 |
5 | Addison County, VT | 0.91 |
6 | San Juan County, WA | 0.89 |
7 | Tompkins County, NY | 0.86 |
8 | Santa Cruz County, AZ | 0.85 |
9 | Mono County, CA | 0.76 |
10 | Inyo County, CA | 0.75 |
10 Counties with Lowest Number of Checklists per capita | ||
Rank | County | Checklists per capita |
299 | Clark County, NV | 0.00602 |
298 | Dallas County, TX | 0.00734 |
297 | Shelby County, TN | 0.00743 |
296 | Wayne County, MI | 0.00849 |
295 | Queens County, NY | 0.00867 |
294 | Honolulu County, HI | 0.00958 |
293 | Broward County, FL | 0.00963 |
292 | Tarrant County, TX | 0.00980 |
291 | Tulsa County, OK | 0.01016 |
290 | Providence County, RI | 0.01097 |
There you have it—a preliminary look at the data I am using. The cross section of data is pretty diverse. It has highly populated areas as well as those that are sparsely populated. Some are large in area. Some are small.
But all of these Counties have one thing in common—they are the most checklists submitted in the United States (or for some States, the most checklists in the State. See my criteria in my previous posts).
Do the most populated Counties have the most checklists? Not always. Do the Counties with the most checklists have the highest number of species? Not always. Is there a significant correlation between population and total checklists submitted? Nope. How about between checklists submitted and number of species? No again.
So how does this all figure into planning a Big Year? I’m glad you asked. Because I am going to tell you anyway. We live in an age with dizzying amounts of data. Having data can be powerful. But only if you are able to harness the information to help you in a way that makes sense.
What am I talking about? Our goal for a Big Year is to see as many unique species in one calendar year as possible. And remember I hope to make this affordable and efficient, too. A bigger population means are larger number of checklists submitted, but only up to a point. And a larger number of checklists means a larger number of species, but again, only up to a point.
All of the data above is interesting. But it does not yet answer the questions about where and when to go birding for the greatest number of unique species in the shortest amount of time.
The total number of species listed above is for the whole period from 2006-2016. It encompasses all seasons. So that number really is not a great indicator of where to go and a poor indicator of when to go.
There are 299 Counties. And each County has 4 weeks of data per month. A month is always 4 weeks in eBird. The first week is the 1st through the 7th. The second week is the 8th through the 14th. The third week is the 15th through the 21st. And the fourth (and last) week is the 22nd through the end of the month. So 299 Counties x 48 weeks of data gives one a large number of possibilities of where to go and when. In fact, that number of possibilities is 14,352. (Have I told you how much I like numbers?)
Now you may be asking, out of 14,352 possibilities where does one even begin to guess where to go and when? Oh, you thought that was complicated. Throw in 984 species into the mix. Yep. You will need more than a hand calculator. You could do it in a spreadsheet if you could do a pivot table of 7 million rows and then sort it. Good luck with that.
A database is a perfect solution. It is unparalleled in its power to perform on problems like this. It can make calculations on mind-boggling amounts of data and retrieve the information in a matter of seconds (ok, minutes for some of our questions). And this is what I did. I took all those spreadsheets of downloaded data and loaded them into a database. (I used Sql Server Express)
In my next post I will tell you how I used this data to arrive at some answers to the questions of the best places to go at the best times of year to maximize the total number of species on each trip. You won’t want to miss that one!