My last article outlined approaches to jump-starting or continuing your data science journey. Since then, a perfect opportunity arose for me to take my own advice!
ATLytiCS is a 501(c)(3) that provides Atlanta-based nonprofits with data and insights to help fund humanitarian initiatives within our community. ATLytiCS offers education, mentorship, and data for good initiatives. One of these initiatives is an annual hackathon. This year’s hackathon involved studying food deserts in Atlanta, Georgia.
I jumped at the opportunity despite never having competed in a hackathon. I learned a lot along the way and received an honorable mention.
Below, I’ll share the tools, techniques, and lessons I learned.
2022 Data for Hope™ Competition
Communities across America have long struggled with food deserts, which COVID only exacerbated. A food desert is an area that has limited access to affordable and nutritious food. The hackathon participants were challenged to explore, identify, and possibly predict the status of the at-risk communities’ access to food.
Hackathon participants received extensive data sets to analyze. AnalyticsIQ, a corporate sponsor, provided anonymized individual-level data for Georgia that included everything from diet to type of cell phone. Federal data from the census and USDA were also offered. Participants were allowed to utilize any additional open-source data.
Participating teams had one week to analyze the data and make recommendations to help improve Atlanta’s food deserts. The finish line was a slide deck and a 7-minute recorded presentation. (See our presentation here.)
I partnered with Max Leaming and dove in head first.
Given the time constraint and magnitude of the task, I opted to lean heavily on analytic tools rather than scripting. My first stop was Alteryx.
Alteryx is the swiss-army knife of analytics, but its bread and butter, in my opinion, is ETL (extract, transform, load). For this reason, Alteryx performed all of our data engineering tasks. It easily consumed and transformed the large CSVs of data. Using the Browse and Summation tools, I could efficiently discover data quality issues or interesting observations for further analysis. Plus, Alteryx aggregated and blended the numerous data sources into a cohesive view (more on this later!).
The competition was fierce, so finding additional data would help set us apart. The provided data overwhelmingly painted a picture of the socio-economic makeup of Atlanta’s communities. However, the makeup of the food stores serving those communities was not as clear.
To develop this viewpoint, I needed to scrape the location data of food stores throughout Atlanta. I used a Google Chrome extension called Instant Data Scraper. It allowed me to scrape addresses for a dozen food stores efficiently. I grouped the food stores into two categories — grocery stores, which sell fresh produce, and dollar stores, which typically don’t sell fresh produce. Instant Data Scraper assembled the addresses, but ultimately I needed latitude and longitude for mapping. I used Geocod.io to geolocate the addresses.
I opted to base the presentation around maps since food deserts are geographic in nature. Plus, 7 mins is not long to convey analytics for such a complex challenge.
Tableau is a terrific tool for creating beautiful maps. But first, Tableau needed some help from Alteryx. The data contained multiple levels of geographic aggregation and differing boundaries for metro Atlanta. I resolved these challenges using shapefiles.
Shapefiles are helpful when working with polygons, such as the region comprising a zip code or county. I blended the shapefiles into the primary data set using Alteryx. The shapefiles allowed Tableau to map a consistent boundary for metro Atlanta at multiple levels of aggregation. Census.gov is a good resource for shapefiles.
With the inclusion of shapefiles, Tableau’s mapping features brought the data to life. Now, hundreds of data features could be drug onto the map for instant exploration and insights.
Tableau is a popular data visualization product, so I assumed other teams might also use it. Tableau’s native maps are limited and easily recognizable. To stand out, I wanted to level up my map’s aesthetics.
MapBox is a service that provides seemingly unlimited map customization. Fortunately, it easily integrates with Tableau. Even more fortunately, I could borrow Amy Walton’s beautiful MapBox Blueprint design. Here’s a guide to integrating MapBox with Tableau.
As the visual analysis in Tableau wound down, I started to dream up the presentation slides. I opted to use Canva — the Tableau of graphic design — to stand out, figuring participants would use PowerPoint or Google Slides. Canva has a wealth of purpose-designed, professional-quality presentation templates. The templates are effortlessly modified through a drag-and-drop online interface. Canva allowed us to create a high-quality presentation efficiently.
With the slides complete, nothing was left but to record and post our short presentation covering our insights and recommendations to YouTube. (See our presentation here.)
The hackathon presented unique challenges related to data (too much) and time (too little).
Lesson 1: Evaluate your data’s level of aggregation
Data scientists tend to want as much data as they can get their hands on, but this can create issues.
AnalyticsIQ provided 9M rows and 145 columns of anonymized personal data. The data set size was cumbersome to work with, especially given the time constraint. Exploratory data analysis, data cleansing, and building models were prohibitively slow.
After wasting too much time, we abandoned the hope of using individual data. Instead, we aggregated the data to the zip code level, shrinking the data size and saving time.
Lesson 2: Use the best tool for the job
Initially, I planned to use only Python as an educational opportunity. Quickly I realized the inefficiency would be untenable. Putting aside what was best for me, I opted to use what was best for the project. Rather than Python, I used Alteryx, Tableau, MapBox, Canva, Geocod.io, and Instant Data Scraper.
In parallel, my teammate used Python to mathematically explore and confirm the insights we were discovering visually in Tableau.
Lesson 3: Leverage force multipliers
Time was limited, and we were competing with larger teams. We needed as much help as we could get! In military science, force multiplication is a concept that quantifies the increased effectiveness of personnel from utilizing factors such as technology.
Our effectiveness — in quantity and quality of output — was significantly increased by leveraging efficient tools and others’ open-sourced work, such as MapBox and Canva templates.
Volunteering with data for good initiatives like ATLytiCS is a win-win opportunity to help others while learning. Along the way, you’re free to pick up new skills or sharpen the ones you have. I enjoyed the project’s creative freedom and open-ended questions.
Set yourself up for success by remembering Abraham Lincoln’s adage, “Give me six hours to chop down a tree and I will spend the first four sharpening the axe.” Invest the time upfront to explore and engineer the data effectively. Then make up the time by using force multipliers like efficient analytic tools and open-source resources.