This class will involve a lot of coding, for which you will need some basic tools. Please make sure to set up the following tools before the first day of class.
We will discuss these tools in much more detail in class, so don’t worry if this is all new and perhaps a bit frightening right now.
Python and Jupyter notebook
Python is a free programming language. We will use the distribution called Anaconda (Conda) as it comes with the most essential for working with data, statistical computing and visualizations. We will use Anaconda for Python 3.6 throughout the course, so please make sure it is installed on your computer before first day of class. It works on all platforms.
Anaconda can be downloaded here, for Windows, Mac or Linux. If you want to watch a step-by-step tutorial on how to install Anaconda and Jupyter Notebook for you machine see the guides here:
- Install Anaconda for Windows by following these steps or watch this video
- Install Anaconda for Mac by following these steps or watch this video
Since the vast majority of our coding will be in the Python language, we will use an integrated development environment (IDE). IDEs integrate text editing, syntax highlighting, and version control, simplifying the coding process. These packages are automatically included in Anaconda Jupyter Notebook. It’s free and modern, and if you’re new to Python this will make it much easier to get started. All Python coding in this course will be done in Jupyter Notebook.
Verifying the installation
After installation of Python please to execute a number of commands in the shell. You open the shell by entering cmd prompt (Windows) or terminal (Linux/Mac). Note in Windows you may have to open the Anaconda Prompt instead of normal
cmd. Once the shell is open type
python - this will start Python. Once you have Python started please try the following two commands to verify that it’s working.
1+2 >>> 3
print('Welcome to Social Data Science') >>> Welcome to Social Data Science
Welcome to open source
To know core Python is powerful in itself, but the great potentials lie in the huge community of developers and researchers contributing to a shared pool of software packages. A programming language is as powerful as the community that surrounds it. Especially in the field of machine learning, the Python community is leading the way, allowing you to share code with top researchers from the field and industry, among others Google’s top engineering teams. Tapping into these vast resources is made easy by the Conda distribution and the pip package manager. Just open your shell/command-line/terminal and type the following:
conda install [name of package]
or if conda does not support it directly use the more generic package manager:
pip install [name of package]
Using Jypyter notebook
The Jupyter Notebook App can be launched by clicking on the Jupyter Notebook icon installed by Anaconda in the start menu (only Windows) or by typing the following either in a terminal (on Mac/Linux) or cmd (on Windows).
This will launch a new browser window (or a new tab) showing the Notebook Dashboard, a sort of control panel that allows (among other things) to select which notebook to open.
Getting friendly with Jupyter
Try to spend a little time familiarizing yourself with the Jupyter framework. For instance, try learning a few of your editor’s keyboard shortcuts; see our post here. The point is to be as productive as possible when working with the computer. Karl Broman, a professor of biostatistics and medical informatics at the University of Wisconsin-Madison, gives some great advice for working with code:
The key thing I emphasize to students is they should be using the mouse as little as possible. Every time you move your hands away from the keys, you’re slowing yourself down.
A Git client
Git is a version control system that allows you to track modifications to files and code over time. It also facilitates collaborations so that multiple people can share and edit the same code base.
If you are on Windows you can install Github Desktop which provides both the command line tool for git and a graphical user interface. Alternatively, you can install git as an optional package under Cygwin. I recommend the Github application, as it will be easier to interface with Github using it. Likewise, modern versions of Mac OS X have a command line git client installed by default, but the Github Desktop tool is a recommended addition.
A Github account
Github is a platform that facilitates collaboration on projects that use git. You can use it to host projects, publish them to the web, and share them with other people. Create a free account if you don’t already have one.
Once you have an account, clone the course repository using your local git client. This is most easily done on the command line as follows:
# git clone https://github.com/abjer/sds Cloning into 'sds'... remote: Counting objects: 145, done. remote: Compressing objects: 100% (98/98), done. remote: Total 145 (delta 40), reused 137 (delta 37) Receiving objects: 100% (145/145), 454.90 KiB | 594.00 KiB/s, done. Resolving deltas: 100% (40/40), done. Checking connectivity... done.
When this is complete, verify that you have a local directory called
sds containing a
Afterwards you can subscribe to updates, small and big, by using
Star within the GitHub page.
A text editor (optional)
An important alterntive to Jupyter is a decent text editor. A text editor is a program that lets you work with plain-text files. You should pick an editor capable of syntax highlighting, syntax checking (ensuring that brackets and parentheses are properly paired), and handling multiple files. We highly recommend:
Another good option is Sublime Text.