Forschungspraktikum 1+2: Computational Social Science

Session 01: Welcome & Introduction

Dr. Christian Czymara

Agenda

  • Introduction & course structure
  • Term paper
  • Software, introduction to R (and Python)

Introduction

Lecturer

  • Lecturer at the Department of Sociology with a Focus on Methods of Quantitative Empirical Social Research
  • Research interests: Migration & integration, inter-group conflict, attitudes, mass media, political communication
  • Methods: Classical quantitative methods for social research combined with computational methods and natural language processing
  • More info on my homepage

Office Hours

  • After appointment
  • Office: 3.G152 (PEG)
  • Contact me at cc@soz.uni-frankfurt.de
  • Do not hesitate to write me if you have questions, comments, doubts, criticism etc.

General information

  • Thursdays, 14:15 in room PEG 2.G 111 / 116
  • Material available on GitHub
  • Join this course’s mailing list for communication
  • Course on QIS
  • Slides in English, Kurs auf Deutsch 🙂

Overview

What is CSS?

  • “The intersection of social science and data science is sometimes called computational social science.” (Salganik 2018: xviii)
  • Answer social science questions with computer science methods
  • New, “big” data sources
  • New methods, much more computational power
  • In particular, text mining and natural language processing (NLP) have received increasing attention in the social sciences

Definining big

  • 3 Vs: Volume, Variety, Velocity
  • A lot of data…
  • In a variety of formats…
  • Being created constantly

The boom of CSS and NLP

  • Books Ngram Viewer
  • “Google Books Ngram Viewer, it displays a graph showing how those phrases have occurred in a corpus of books” google.com

Goal of seminar

  • The goal is that students learn how to use computational tools to analyze big and small data
  • The training will focus on R, but we might work with Python from time to time
  • This implies continuous work throughout the semester

You should have…

  • Interest in quantitative social research and programming
  • Good working knowledge of descriptive and inductive statistics (regression etc.)
  • Some knowledge of R or another statistics software / language
  • I will introduce both in this seminar, but the focus will be on computational methods

Your interest

Average motivation is: 9.3157895/10

Your knowledge: Familiarity with CSS

Prior knowledge CSS

Other prior knowledge

  • Percentage who ran a logistic regression before: 0.8421053
  • Percentage who used machine learning before: 0.2631579
  • Average experience with code-based data analysis: 3/5
  • Percentage who usually work with R: 0.5263158
  • Percentage who usually work with Python: 0.2631579
  • Percentage who never use any statistical software: NA

Your expectations (selection)

[1] "Ich wünsche mir, dass die Anforderungen an die Student*innen von Anfang klar kommuniziert werden bezüglich des Levels, das wir uns bei R (möglicherweise noch vor dem Kurs) selbst aneignen müssen und wie die Teilnahmenachweise und Leistungsnachweise erbracht werden."

Your expectations (selection)

[1] "Textanalysen von Internetquellen bzw. Textdokumenten (politische Positionspapiere), sofern  dies in den betreffenden Methoden enthalten ist. Jedenfalls verstehe ich dies so."
[1] "My personal main area of interest lies with text mining / natural language processing methods. I am especially interested of running analysis with something like sentiment analyis as I learned how to do it in my BA, but never got around to do it in earnest."
[1] "Ich habe schon einiges an linearen und log regressionen ausgeführt und würde meinen Werkzeugkasten gerne erweitern. Am liebsten würde ich weiterhin in R arbeiten."
[1] "Ich habe Kenntnisse in Statistik und bereits Erfahrungen mit Python gemacht, allerdings weiß ich noch nicht genau mit welchen Methoden beides kombieniert werden kann. Daher würde mich insbesondere der methodisch-soziologische Bezugspunkt interessieren."

Your expectations (selection)

[1] "I am wondering how to get access to \"big\" data. Social science data is often very limited so I am still wondering how to apply machine learning for social phenomena"

What this course will offer

  • A guide to the scraping, preparation, and analysis of different types of digital trace data (focus on text-as-data)
  • The means to conduct your own computational research project
  • Hands-on application of methods in tutorials

What this course will not offer

  • In-depth understanding of mathematical foundation of methods
  • Advanced programming
  • Course is not suited as a general introduction to quantitative social science methods

Literature

Term paper

Term paper

  • Computational analysis of digital data (in the broadest sense) on a research question of your interest
  • You can choose from any topic (… for which you find suitable data)
  • The focus of the paper should be the analysis

Term paper

  • ~15 to 20 pages, incl. tables, graphs, references etc.
  • Also add your code so that I can understand and reproduce what you’ve done
  • DEADLINE IS 01 APRIL 2025!
  • Hand in as PDF and code via e-mail to cc@soz.uni-frankfurt.de

Syllabus

Syllabus overview

Session Date Topic
1 17. Okt 24 Introduction
2 24. Okt 24 Text-As-Data: Preparation
3 31. Okt 24 Text-As-Data: Frequency Analysis

Syllabus overview

Session Date Topic
4 07. Nov 24 Sentiment Analysis
5 14. Nov 24 Machine Learning I
6 21. Nov 24 Machine Learning II

Syllabus overview

Session Date Topic
7 28. Nov 24 Topic Modeling
8 05. Dez 24 Advanced Topic Modeling

Syllabus overview

Session Date Topic
9 12. Dez 24 Web scraping
10 19. Dez 24 Individual data collection
Christmas break

Syllabus overview

Session Date Topic
11 16. Jan 25 Open Science
12 23. Jan 25 Large language models
13 30. Jan 25 Using Google and Amazon Web Services

Syllabus overview

Session Date Topic
14 06. Feb 25 Presentation of term papers
15 13. Feb 25 Course Wrap-Up

More information about each session on GitHub

Questions or comments?

Data collection

  • Download data sets from websites, online archives, repositories, …
  • Application Programming Interfaces (APIs)
  • Web scraping
  • Collect your own data

Data analysis

  • Sentiment analysis
  • Text summary
  • Machine translation
  • Classification
  • Regression etc.

Text analysis

R Boumans & Trilling(2016)

A glimpse into computational social science research

Example 1

Political migration discourses on social media: a comparative perspective on visibility and sentiment across political Facebook accounts in Europe

Example 2

Real-World Developments Predict Immigration News in Right-Wing Media: Evidence from Germany

Example 3

The rise of Jihadist propaganda on social networks

Example 4

Online hatred of women in the Incels.me forum Linguistic analysis and automatic detection

Software

R

  • You will need R for most tutorials and the term paper (Python use is encouraged as well)
  • To work with R, install on your computers
  • R at cloud.r-project.org
  • RStudio at rstudio.com

Python

  • To work with Python, install on your computers
  • Python at python.org
  • Anaconda at anaconda.com (email required)
  • Important: Python does not like white spaces in your file path (i.e. “C:\Users\Christian Czymara\”)

GitHub

  • Material will be uploaded on GitHub
  • github.com/czymara/CSS_WS24
  • You can download files without having an account
  • For advanced users: Feel free to make an account and download GitHub Desktop to synchronize files every week

ChatGPT

  • Not strictly necessary, but can greatly help you code and solve coding related problems
  • Requires a (free) OpenAI account
  • In particular, check out Code Nerd (requires ChatGPT Plus subscription)

R

R

  • Why “R”?
  • “R is an implementation of the S programming language” (Wikipedia)
  • R is a programming language for data analysis
  • Rstudio is a so called Integrated Development Environment (IDE), making writing and running R code a lot easier
  • Overview of stored objects
  • Projects containing multiple files
  • Git connection
  • Etc.

R benefits

  • Free and open source
  • Large and very helpful community
  • Plethora of user-written packages on basically everything
  • Very powerful tools for data manipulation and data visualization
  • In addition to analyzing data, you can write programs, websites, books, and much more with R (and R Markdown)
  • … and integrate with other languages

Google Colab

Python

Google Colab

Tasks for next week

  • Install R and Rstudio
  • Familiarize with R
  • Bring one of your term papers or your BA thesis in DOCX format on a USB stick
  • Optional: Install Pyhton and Anaconda

Literature