Social Data Science 2018
A summer school at University of Copenhagen. The course is offered by the Center for Social Data Science.
- Instructors: Andreas Bjerre-Nielsen & Snorre Ralund
- Co-instructors: David Dreyer-Lassen, Ulf Aslak, Kristian Urup Larsen & Kristoffer Glavind
- Date: August 13 - September 1, 2018
- Teaching hours: Morning session: 9-12, afternoon session: 13-16
- Place: CSS 35.01.44 see map
NB! Students are required to have completed two tutorials at DataCamp before the first day of class - more information below under “Preparation”.
The objective of this course is to learn how to analyze, gather and work with modern quantitative social science data. Increasingly, social data - data that capture how people behave and interact with each other - is available online in new, challenging forms and formats. This opens up the possibility of gathering large amounts of interesting data, to investigate existing theories and new phenomena, provided that the analyst has sufficient computer literacy, while at the same time being aware of the promises and pitfalls of working with various types of data.
This aim of this course is fourfold:
- We will introduce students to the state of the art social science literature using computational methods and social data.
- We will present students with an overview of key benefits and challenges of working with different kinds of social data. We will show how various kinds of data (survey, web-based, experimental, administrative, etc.) can be used to answer different questions within the social sciences. Furthermore, we will discuss ethical challenges related to the use of different types of data.
- We will introduce students to statistical techniques for predicting and classification, known as machine learning, and we will discuss how these methods relate to existing empirical tools within economics such as causal inference and regression.
- We will present modern data science methods needed for working with computational social science and social data in practice. Being an effective economist and data scientist means spending large fractions of our time writing and debugging code. In this section you will learn how to write code that will clean, transform, scrape, merge, visualize and analyze social data.
The course will consist of two weeks of teaching and one week of making the final exam project. Each day is divided into two teaching sessions. A morning session 9-12 and an afternoon session 13-16. Most teaching sessions contain an equal mix of lectures and exercises.
The lectures will focus on the broad topics covered in the course (part 1-3 listed above). In the exercise classes we will get our hands dirty and present data science methods needed for collecting and analyzing real-world data. In addition to core computational concepts, these classes will focus on the following topics
- Generating data: We will teach how to “scrape”, i.e. find and collect data, from websites as well as working with APIs.
- Data manipulation tools: Participants will learn how to import, transform, munge and merge data from various sources.
- Visualization tools: We will learn best practices for visualizing data in different steps of a data analysis. Participants will learn how to visualize raw data as well as effective tools for communicating results from statistical models for broader audiences.
- Reproducability tools: Participants will learn how to use version control and social coding using Github and how to effectively communicate the insights of an analysis using markdown.
- Prediction tools: We will cover key implementations of machine learning algorithms and participants will learn how to apply and interpret these models in practice.
Note that an average of three hours of exercises per day is not a large amount of time for learning how to code. We will use some of this time like development meetings: going over assignments, having detailed code reviews of various forms, and discussing blocking issues and potential solutions.
Academic interest in data handling skills is growing. This implies increased demand for skills needed to effectively gather, handle, and analyze data as well as present results to a range of audiences. Therefore, this course will provide you with important tools for future academic study. Furthermore, the skills taught in this course are also widely used in business. Python programming skills in particular are highly valued in fields such as data science, finance and information technology. As this course is focused on general skills for working with social science data such as gathering and visualization, it is equally relevant for students seeking careers outside academia where skills such as the ability to effectively communicate the results of an analysis are in high demand.
This course assumes no knowledge of any particular software or computer program, but while we will try to demystify the technological side of things so students feel comfortable getting started and thinking like a data scientist, this will be a technical course, and students should expect to spend a significant amount of time learning these tools. Because the course builds on a wide range of techniques, we do not have any hard requirements to sign up for the course, but students are expected to have an interest in some subset of: statistics, econometrics, linear algebra, and a scripting language (we will use Python in this course).
Before class begins on August 13 all students are required to prepare. We expect everyone of you to have completed the following tutorials before the first day of class:
You should use the invitation that will be sent out through the course page or write Andreas an email to sign up for the class site on DataCamp. This will make it easier for us to spot where you have difficulties. Note that each tutorial is expected to take four hours.
We also expect you to bring a laptop with software installed, see this post.
Peergrade and A2
Some of you have had problems submitting to peergrade. Hopefully these problems are fixed now - if not, a workaround has been proposes here:
The assignment is still due at 23:59 today.
If you still encounter problems handing in, please submit a github issue on the repository.