Diving into Data Science

Jared Carollo
6 min readJul 29, 2022

--

Photo by Chase Baker on Unsplash

Data science is an enticing career. It’s a growing field that pays well due to ever-increasing demand. It’s multi-disciplinary, challenging, and highly creative, but not everyone may enjoy it. A career in data science is highly technical, requires immense problem-solving skills, and never stops changing and growing.

One of the best things about data science is its accessibility. Though an exceedingly challenging discipline, nothing more than a basic computer is needed. For this reason, data science is easy to take for a test drive to see if it’s the career for you!

If your test drive pans out, you’re ahead of the game because learning data science first-hand is an effective approach. In this article, I’ll share why self-teaching is key and the resources that helped me.

Challenges

Setting out solo to learn data science understandably presents challenges. Creating your path can quickly become convoluted and overwhelming. Just knowing where to start can be the most challenging part, especially in a field as diverse as data science.

Forging your path requires self-motivation and personal accountability. You won’t have a boss, work deadlines, or the pressure of a team to keep you on track.

Your critical thinking skills will be pushed to the limit as you face challenge after challenge. Often you’ll have to be self-reliant to overcome inevitable challenges. This reality can lead to feelings of isolation as the only friend you may have to turn to is stackoverflow.com.

Benefits

Fortunately, teaching yourself, as opposed to learning in school or on the job, has numerous benefits.

  1. Low stakes: your livelihood, reputation, or GPA are not on the line.
  2. Set your own pace: no one depends on the outcome or is watching over your shoulder.
  3. Indulge your interests: you have a blank canvas to chase after whatever entices you.
  4. Creative freedom: you’re not bound by your boss’s expectations, your audience’s aptitude, your company’s culture, your course syllabus, or your company’s required house style.
  5. Ownership: you’ll own your work product completely rather than your company.

Kaggle

Last year I set my sights on learning Python. I started by watching Udemy videos and a free online course. Before long, I recognized the futility of those approaches. Instead, I opted to dive in by signing up for a Kaggle.com competition.

Kaggle is an excellent resource for learning Python and machine learning that Google acquired years ago. Kaggle’s primary service is to host online machine learning competitions. The use case and data are provided along with a leaderboard based on each submission’s optimization metric. Not only is Kaggle free, but some competitions offer hefty financial awards!

My Python learning journey started with Kaggle’s most basic and prolific competition — predicting Titanic’s survivors. The key benefit to this choice was the plethora of online resources. Countless bloggers have written about their approach and shared their code. Even Kaggle has educational resources for the competition.

Along the way, I not only learned some Python scripting, but more importantly, those blogs taught me different preprocessing techniques, creative feature engineering approaches, and multiple machine learning models. To this day, I still reference my Kaggle Titanic code.

Crypto

After my second Kaggle competition, I knew I needed a different approach to remain engaged. Though tremendously beneficial, I knew that predicting Titanic survivorship or predicting housing prices in Iowa wouldn’t hold my interest for long. I needed higher stakes!

Last year crypto was on a tear, and I was tired of missing out. I pumped money into a dozen coins and monitored performance by staring at my phone endlessly. I realized that the coins were highly correlated, price movements were chaotic, and my approach was terrible if for no other reason than the need for sleep. A lightbulb went off, and I knew Python was the answer!

A rules-based trading bot fueled by analytics was needed. This approach solved not only my crypto fatigue but also my Kaggle fatigue. What better motivation to learn Python than imagining a money printing press on my desk?

In a bygone era, neighborhood guys would work on their cars in the driveway on the weekends. Now we code Python. I teamed up with my friend and Python expert, Ian Jiang, to take a stab at building a high-frequency crypto trading bot. For the following four months, we collaborated in our spare time to create the data, infrastructure, analytics, trading bot, and backtesting to automate crypto buying and selling.

By setting my learning path, I could pursue learning the best way for me. I was able to explore what interested me when it interested me. Most importantly, I could stay engaged long enough to level up my Python skills (see my crypto code here).

Resources

Setting your path to learning data science is as exhilarating as it is challenging. If this rings true for you, a plethora of resources await you.

As highlighted above, diving into a subject matter of personal interest is an excellent way to proceed. Fortunately, we’re surrounded by free data.

  • Data.world’s mission is to create the most meaningful, collaborative, and abundant data resource globally. Data.world is home to the world’s largest collaborative data community, which is free and open to the public.
  • The FRED is hosted by the St. Louis federal reserve bank. The FRED provides over 800,000 data sets from 108 data sources, nationally and internationally. This macroeconomic data provides the opportunity for rich analysis, especially to dive into the numbers behind news stories.
  • Snowflake marketplace hosts 325 free datasets spanning economic, weather, business, sports, and much more.
  • If external data doesn’t strike your fancy, then perhaps personal data does. For better or worse, we are constantly creating personal data. Why not analyze it?
  • Mint users can download their financial transaction history.
  • Fitbit allows you to download GPS data, community data, or an archive of your account data.

Sometimes personal interest isn’t the best motivator, and we need something more. Providing analytic services for social good organizations can be a win-win.

  • Viz for Social Good connects volunteers and social good enterprises. The enterprise provides the data and mandate. The volunteers offer data visualization skills. The enterprise can use the visualization for marketing, lobbying, and fundraising. The volunteers can hone visualization skills in a safe and collaborative environment. (check out my submissions!)

The accessibility of data science grows as free analytic tools are made available.

  • Mito is a robust and educational free Python package I previously wrote about (see article). Users can preprocess, explore and analyze data through a familiar spreadsheet-style GUI. The best part is that marked-down Python code is generated with each click! Mito pairs perfectly with Kaggle competitions, so now you can learn Python while accomplishing your work!
  • PyCaret is a user-friendly low-code automatic machine learning (AutoML) Python package. In my experience, the model outputs are hard to beat, especially for the minimal effort. Walking through the settings, parameters, and operations of PyCaret allows the user to begin understanding machine learning principles while watching the package do the work!
  • Knime is an open-source, free competitor to Alteryx. I have been and still am an Alyterx user (see my article). It’s an indispensable analytical tool, especially for extracting, transforming, and loading data (ETL), but it is expensive! Knime Analytics Platform, by and large, replicates Alteryx’s functionality — for free!
  • Tableau Public provides users with Tableau for free, so long as the work is saved publicly on Tableau’s cloud. This product is a great way to build a public portfolio, especially when combined with Viz for Social Good or the public data sets listed above.

Data scientists often must present their work to business leaders and end-users. Translating highly technical work to coworkers without the same background is a tremendous challenge and opportunity. Toastmasters is an excellent nonprofit that supports public speaking skills internationally. This organization helped me during graduate school to break out of my shell and improve my communication skills.

Conclusion

Data science is a highly accessible field with free tools and communities that have removed traditional barriers. Though data science is an in-demand field, you won’t know if it’s for you until you try it out.

Everyone has a different learning style. What’s best for one may be the worst for another. Hopefully, my experience of self-teaching offers an option to try. If it is the field for you, then you’re already well on your way to building your future on your terms!

--

--

Jared Carollo

Conducting & Implementing Analytics | Learning & Teaching | Giving back to the Medium community in return for all it’s given me