Welcome

Hello everyone and thank you very much for your interest in “Who said text isn’t data? Text analysis for data journalists with R”. I’m really happy that you decided to learn about text analysis.

While I’m still preparing everything so that I don’t get boring on Thursday, you might be wondering if you need to have anything installed on your computer to follow along with the session.

Covid-19 changed a lot of things in our lives, including how this kind of session is supposed to happen. Normally, I would do a hands-on session where you could apply what we are learning. But this being all virtual, I am thinking more of a session where I will be speaking a little bit about the theory of text analysis and demonstrate how to do stuff on my screen.

You will have my slides, all my code, and datasets in a repository, so that, after the session, you can try it by yourself.

But that doesn’t mean you can’t follow along. You can, and that’s why I’m writing to you.

For this session we will be using:

  • R, as a programming language;
  • RStudio as IDE - basically, where you will be writing your code.
  • Tidyverse as a set of packages for R that is useful to do data science with R.
  • Tidytext as the main package for doing text analysis.
So, if you want to follow along (or if you want to have everything configured correctly so that you can run my examples after the session), here’s what you have to do:

1 - Download and Install R

To download R, you need to go to this website choose your operating system and then just run what you have just downloaded.

It will look like any other program you need to install on your computer.

If you get in trouble, you can read this.

2 - Download and Install RStudio

On your computer, you must have a program called “R” now. It basically means you can run R code on your computer. But where do you write that code?

That’s what IDE’s (Integrated development environment) are made for - basically fancy words for a program where you write code.

Think about it as if it is “Microsoft Word” for code. You could be writing on “Notepad”, but a lot of people prefer “Microsoft Word” because it has some features they like.

It’s the same thing for code. RStudio was especially thought to write R code.

You can download RStudio here.

3 - Get to know RStudio

Packages are a very important thing in R. They are basically code someone already wrote and that you can install and call to help you.

If you open your RStudio and click on File > New File > R Script, you will have something like this:

I know all those windows might look confusing, but I think this tweet really helps explaining what everything does: