I literally just dove in and started doing it—with lots of help from YouTube and Google initially. The best YouTube videos that helped me get started with everything were by “sentdex” (his YouTube channel is here: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ). This guy is literally so helpful. He explains everything so crisply and concisely, and even shows you how it’s sometimes okay to “hack” your way through certain solutions.
More concretely, the best way to start is to find a side project that you love. That will motivate you through the tough days: the days where you think something will take 10 minutes, and it ends up taking 6 hours (that describes too many of my days over the years). But I was motivated to keep going because I thought the project was so fascinating.
So what was this side project, you ask? I always wanted to do some sort of analysis of the text of movie reviews. I was interested in seeing if the reviewers who said bombastic things like, “This is the greatest horror movie ever made. Hell, this is one of the greatest movies ever made—move over Casablanca,” or “At one point during the movie I thought to myself, ‘I’d rather gouge my eyes out with a spoon than watch five more minutes of this’” somehow got more notoriety or viewers or something like that. This side project is also related to what I ended up choosing to study in my dissertation, just in a different context (https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1https://www.quora.com/In-your-final-year-as-a-PhD-student-how-different-was-your-approach-to-research-questions-to-your-previous-years/answer/Jaime-Potter-1).
With this goal in mind, I thought I could do something with RottenTomatoes, which compiles movie reviews from thousands of different reviewers around the world. Better yet, they actually give you one signature quote from each reviewer, so I could start there.
With this in mind, I just went through the steps…
- Hmmm, what do I need to do first? I guess I need to get the data onto my computer in tabular format or something. What’s that called? Web scraping. Ok, let me search for web scraping videos on YouTube and follow along.
- Cool, step 1 took two months and there are tons of errors, but with every error that I troubleshoot I’m learning a lot. Nice! Now I have a bunch of text data. This looks nothing like the tables of numbers I had in Accounting class. What do I do with this? Oh, what’s that you say? There’s something called Natural Language Processing?! That sounds cool. Let me check out some YouTube videos but also supplement that with some more actual articles from Google, since I actually need to understand what’s happening.
- I’ve stemmed and tokenized and done all that fun NLP stuff. This data is too big to move to Excel, so I guess I have to keep analyzing it in Python (or move it to R, which I’m familiar with, but let’s keep it in Python, I decided). Oh, wow, there’s this great module called pandas. I’ll start practicing with that because initially some of the indexing of cells feels weird (iloc vs loc?! That reminds me, you’ll be using StackOverflow a LOT! https://stackoverflow.com/questions/31593201/pandas-iloc-vs-ix-vs-loc-explanation).
- I have some initial regressions and stuff. So cool! But people like visuals I’ve heard. Python has a bunch of visualization modules? Like a lot of them? That’s so cool. Let me check some out. I’ll start with matplotlib and then move to seaborn and bokeh. Those latter ones feel a bit easier syntactically and you can make really pretty visuals. Wow!
That’s generally the best way to do it. Much better than the way I started, which was as follows:
“Hey PhD-classmate-that-knows-how-to-code, how do I code?”
“Oh cool, you sent me a pdf guide on how to use Python. I’ll just read through this and then I’ll, well, know how to use Python—easy!”
“Jeez, everyone is telling me how useful these dictionaries and lists are, but I just really can’t understand why. Also, regex is the best thing ever? Isn’t it just like ctrl+F? How could that ever be so useful?
In short, reading a manual on its own doesn’t work. You need to contextualize your learning to make it stick. It’s really hard for me, even now, to explain why the dictionary data type is so very, very helpful. I could explain in the context of my projects, but it won’t make complete sense until you are stuck on a project somewhere and realize that a dictionary is absolutely vital here! When you are in the throes of a project, you will just see. That contextual knowledge is irreplaceable. Once you have that contextual knowledge from actually going and doing, then you’ll go back to that dense pdf guide months later and actually find it exceedingly helpful.
You’ll also go back to this initial code months—or years—later and think, “Wow, I was really bad at this. I know so many more “Pythonic” ways of writing this code now.” That’s totally normal and to be expected. What matters at the early stages is that you’re getting things to work so that you stay excited to keep learning more. It’s fine if it’s not pretty and the code is not clean. You’ll get better at that. These are the normal stages that everyone goes through as they learn something complex like programming or data science.
53.1K views
View 551 upvotes
View shares
551
Hello BHW members!
I'm looking for a way to index a pdf.
I saw a lot of "Game Generators" pdf. classified on the first google page indexed in minutes. Does anyone have any idea how to do this?