MADAS Seminar Series 2017/18
Web Scraping Explained: Turn Websites Into Structured Data
Rossano Schifanella – Department of Computer Science (University of Torino)
Thursday 19 April @14:30
The seminar will start with an overview of the main techniques and approaches to derive actionable knowledge from the World Wide Web with particular focus on three main areas, i.e., content, usage and structure web mining. In the second part, the focus will shift to the concepts and tools for extracting, collecting, structuring, and analyzing data contained in Web pages. In particular, the Web architecture and the main Web technologies (e.g. HTML) will be presented, as well as the standard languages for browsing the structure of pages, such as DOM and XPath. At the end of the seminar, a hands-on session will present some of the widely adopted Python frameworks for scraping data from websites, and real case scenarios will be discussed.
Rossano Schifanella is an Assistant Professor in Computer Science at the University of Turin, Italy, where he is a member of the Applied Research on Computational Complex Systems group. He is a visiting scientist at Nokia Bell Labs and a former visiting scientist at Yahoo Labs and at the Center for Complex Networks and Systems Research at the Indiana University where he was applying computational methods to model the behavior of (groups of) individuals and their interactions on social media platforms. His research embraces the creative energy of a range of disciplines across data mining, network analysis, urban informatics, computational social science, and data visualization.