Date(s) - Apr 09, 2021
12:00 pm - 2:00 pm
NIA Aging Centers’ Collaborative Virtual Methods Workshop Series
The workshop series is co-sponsored by:
– Center for Aging and Policy Studies, Syracuse/Cornell/U-Albany
– Center for Advancing Sociodemographic and Economic Study of Alzheimer’s Disease, USC/Stanford/UT Austin
– Center on Aging & Population Sciences, UT Austin
– Center for the Demography of Health and Aging, UW-Madison.
This particular workshop is organized by:
Center for Aging and Policy Studies, Syracuse/Cornell/U-Albany
Workshop Title: Scraping web data for Social Science
Speaker: Chris Hess
Workshop description: “Many data like social media posts, classified ads and search results reside on the internet, but only in a semi-structured form with no clear mechanism for collection like a file form or application programming interface (API). While these data could be useful for quantitative and/or qualitative analyses and may even contain geolocation information that could facilitate merges with other data, conventional approaches to automating web page navigation require considerable programming knowledge in languages like Python that might deter users from pursuing this research. This workshop will illustrate how to use Helena, a novel programming-by-demonstration web scraping tool, for collecting web data in both a one-off and ongoing capacity. After reviewing how to generate structured data using this tool, Chris Hess will discuss how he and his colleagues have used web data for basic and applied research and describe some of the common challenges to using scraped data for social science research.
Bio: Chris Hess is a postdoctoral associate at Cornell University in the Department of Policy Analysis and Management and recent PhD from the University of Washington Department of Sociology. His research investigates the housing search process and changing spatial structure of neighborhood inequalities in the United States through a combination of conventional (longitudinal surveys, census estimates) and novel data sources (scraped ads, administrative records). Over the past three years, he and his colleagues have scaled their web scraping project based on Helena from a single source (Craigslist) in one location (Seattle) to include many major platforms for all core-based statistical areas in the United States.
Registration for this event is now closed.