If you would like to learn R from the very beginning, you are invited to join us on January 17, 2019 (Friday) 9:30 A.M. - 12:30 P.M. to explore R basics, including R list, R Matrix, R Data Frame, basic plotting in R, and a project analyzing weather and climate data. If you are interested in getting started with R, don't miss this one. Please RSVP by January 15, 2019 (Wednesday) for this free SSDA training workshop. Please bring your own laptop.
Make pretty, interesting, maps in R! This intermediate level workshop focuses on the visualization of thematic data stored in polygons representing spatial regions. A number of very common data processing and mapping applications will be explored, as well as some less common ones: 1. Importing and basic visualization of spatial data 2. Projections and coordinate systems in R 3. Color/shade ramp design 4. Choropleth mapping 5. Proportional symbols, dot density, and other maps This hands-on workshop will consist of some theory and a lot of practice. Much of what you learn is directly transferable to non-geographic data visualization. As a bonus, you will receive a library of R functions to make good cartography easier for your own visualization projects. No prior experience working with spatial data in R is required, but participants should have some background using R for basic data analysis and be comfortable running scripts and portions of scripts in an R environment such as R-Studio.
We covered R basics, including vectors, matrices, and Data Frames, as well as basic plotting in R in the previous workshop. We will continue exploring R in this workshop, including loops, functions, utilities, and basic statistical methods for climate data analysis. If you are comfortable with the very basic R, you are more than welcome to join us in this free hands on training workshop. Please RSVP by February 19, 2020 (Wednesday), and please bring your own laptop for this workshop.
Data science gets complicated as projects evolve, datasets grow, and research teams change. Starting a new project is often daunting, and existing projects will get out of hand if not managed properly. Many of these issues can be mitigated by taking an intentional approach to project design. In this workshop, Ezra will share tips and tricks for taking control of your data. Participants will explore a number of real world scenarios, and discuss strategies and techniques that reduce complexity and remove barriers to research.
These are anxious times, and we need to connect with our various communities even as we also take precautions against COVID-19. The data analytics community is forging right along, and CBSA is excited to be a part of the Network Analysis Workshop on Monday, March 23, from 8am until noon. This workshop was originally going to be in-person, but now will be conducted online using Zoom (connection details below). Professor Ken Frank has brought together an engaged group of scholars to discuss intermediate to advanced network analysis topics as they apply to the spread of COVID-19 and the spread of information and human response to COVID-19. Several of us will be giving talks about different dimensions of this. There will be some quick and dirty network analysis hacking, framed around a set of questions. The goal is to develop those research questions and flesh out responses to them. There is interest in the possibility of developing the outcome of the workshop into one or more proposals, perhaps to NSF. There are a couple of presentations planned, if you are interested more in hearing those than in the research question hackathon; below are the tentative times for different aspects of this event: 08:00-08:15 Overview [Ken Frank] 08:15-08:40 Background on coronavirus, social aspects [Nigel Paneth] https://news.northeastern.edu/2020/03/03/one-way-to-predict-the-spread-of-covid-19-follow-the-memes/ 08:40-08:50 Set up SNA [Ken Frank] Quick ideas (being developed for NSF RAPID proposals) 08:50-09:10 How will teams work virtually? [Sinem Mollaoglu] 09:15-09:35 How will people seek resources on Pinterest? [Kaitlin Torphy] 09:35-10:00 Break, side conversations 10:00-10:20 Spatial aspect [Ashton Shortridge] 10:20-10:40 Structure for hackathon [Ken Frank] Motivating Questions: Q1) How will individuals react? What systemic structures (patterns in networks) will emerge? Polarization? Politics? Q2) How will spread and reaction to virus affect other behaviors and systems? Q3) Other? Q4) Outside the corona virus? 10:30-end Hackathon Break-out “rooms” by research questions Zoom Information: To participate on zoom you will click on https://msu.zoom.us/j/783760435 Phone: One tap mobile +16468769923,,783760435# US (New York) +16699006833,,783760435# US (San Jose) Dial by your location +1 646 876 9923 US (New York) +1 669 900 6833 US (San Jose) Meeting ID: 783 760 435
Tableau is a business intelligence and data visualization tool used for analyzing the data in a graphical format. In this workshop, Sushmasree Gadde, Tableau Developer for MSU IT Services, will introduce you to the benefits of using Tableau for visual analysis. This primer will provide an overview of Tableau’s suite of products, how to navigate Tableau’s user interface, and the advantages of designing visualizations in Tableau. Attendees will download their own copy of Tableau Desktop and build their first visualization.
The workshop will cover latent variable approaches to network analysis. We will be touching on distance and block models but the main focus will be on latent factor models. We will be diving into how these models can be used to help conduct inference on networks and will also explore how they can be used to map out an underlying social space. Workshop will involve a discussion of the technical details behind these models along with an applied tutorial in R. 1. Importing and basic visualization of spatial data 2. Projections and coordinate systems in R 3. Color/shade ramp design 4. Choropleth mapping 5. Proportional symbols, dot density, and other maps This hands-on workshop will consist of some theory and a lot of practice. Much of what you learn is directly transferable to non-geographic data visualization. As a bonus, you will receive a library of R functions to make good cartography easier for your own visualization projects. No prior experience working with spatial data in R is required, but participants should have some background using R for basic data analysis and be comfortable running scripts and portions of scripts in an R environment such as R-Studio.
RMarkdown is a very slick way to present data and its analysis. By writing a markdown file, which is a pure text file easily handled with RStudio, one can create webpages and PDF documents. Things can get much more interactive. One can mix literate text with R code in a very legible way, which makes showcasing data and results much more attractive than simply running a script and reading inline comments. We don't need to copy-paste plots and numbers from RStudio to our final presentation: these will be made on the fly on our report/presentation. Despite its name, it also supports python, and it even allows R and python to exchange information in the same presentation!
Machine learning is a powerful branch of data science that uses data-driven approaches and statistical methods to derive patterns and make predictions. In this workshop you will learn the fundamentals of two key methods: classification and regression trees (CART) and random forests. Further, you will see how to use these methods in the R environment, and you will have time to experiment with them on multivariate data. This two-session hands-on workshop will consist of some theory and some practice. You will leave the workshop with a basic understanding of the machine learning model-building practice of training and testing, as well as tools for assessing model performance and interpreting results.
If you would like to learn R from the very beginning, you are invited to join us on June 3 and 4 (Wednesday and Thursday) 10:00 A.M. - 11:30 A.M.. This two-day session workshop will explore R basics, including R list, R Matrix, R Data Frame, basic plotting in R, and loops and functions in R which will get you ready for some interesting intermediate level R workshops that SSDA will offer this summer.
We are pleased to have Spartans back with us this summer to share their data science stories via Zoom. The first speaker is Sean Law, who received his Ph.D. at MSU, and serves as an advisor on an enterprise A.I. Council at TD Ameritrade now. If you are interested in knowing how he became a data scientist and how he transitioned from academia to industry, please join us on Zoom this summer!
Topic modeling is one of those Natural Language Processing techniques that are less mature and constantly evolving. As an unsupervised machine learning model, it will help you get a sense of what your texts talk about without reading through the whole thing. This workshop will introduce you to the basics of topic modeling with tweets as examples, and you will walk away with chunks of code ready for your own exploratory analysis.
Data Carpentry aims to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time and with less pain. This is an introduction to R designed for participants with no programming experience. The lessons assume no prior knowledge of the skills or tools. The workshop will start with some basic information about R syntax, the RStudio interface, and move through how to import CSV files, the structure of data frames, how to deal with factors, how to add/remove rows and columns, how to calculate summary statistics from a data frame, and a brief introduction to plotting.
If you are interested in learning some insights of Kaggle competition from a Kaggle competition master, or if you are interested in working in industry after graduation, please join our next SSDA invited guest speaker, Nan Du, who will share his data science story with us, on July 8 (Wednesday), 2020, on Zoom.
Social network data can be difficult to collect, but they can sometimes be inferred from easy-to-collect bipartite data. For example, we might infer that pairs of legislators are collaborators if they have co-sponsored many bills together (e.g. a bill co-sponsorship network), or that pairs of people are friends if they have co-attended many events together (e.g. an event co-attendance network). These are examples of bipartite projection networks, which can offer a practical way to measure social networks, but which also require special techniques for analysis. In particular, we must decide how many bills two legislators must sponsor together, or how many events two people must attend together before we can count them as collaborators or friends. This workshop will provide an introduction to bipartite projection networks and their analysis using the backbone package for R.
Social scientists increasingly analyze text, such as political speeches and social media posts, to understand important phenomena. Unlike many other data sources, text is usually only available in its raw form though and requires careful cleaning and processing to properly analyze it. Recent research demonstrates these steps can measurably affect results, but they are rarely discussed in research papers. This workshop will equip participants to prepare messy text for analysis.
How can social scientists leverage the recent explosion of textual data to learn about this social world? This workshop will provide an introduction to a variety of quantitative methods, including dictionaries, topic modeling, and machine learning, that can be used for sentiment analysis and document exploration. Applications to multiple topics will be discussed. This workshop builds on Cleaning Messy Text Using R: How to Get Your Data Ready for Analysis to demonstrate a working research pipeline for textual data.
Bayesian statistics is a powerful and flexible framework for parameter estimation and provides an alternative approach to maximum likelihood. In this workshop we will cover the basic theory of Bayesian statistics and approaches to applying this framework using R and other software. This is a two-session workshop. The first part will cover an intro into Bayesian theory and coding a basic model in JAGS (Just Another Gibbs Sampler). The second part of the series will focus on interpreting parameter estimates and model output. If time remains, there will be a brief introduction to other Bayesian software (Rstan, Nimble). You should leave the workshop with a basic understanding of Bayesian statistics and the coding required to run a model within this framework.
Pandas is one of the most popular python libraries to read, wrangle, and write data, which is also seamlessly integrated with seaborn when it comes to data visualization. Seaborn is an easy way to make pretty box plots, bar codes, and linear regressions, This will be a hands-on workshop consisting of both theory and practice parts. We will first walk through quickly through the general philosophy behind pandas, and later see how that philosophy is translated in multiple wrangling commands. Understanding the way pandas is designed will make parsing through its documentation easier. Whenever walking through pandas most common commands, a R translation will be provided for those already familiar with R but unfamiliar with pandas.
If you would like to learn R from the very beginning, and get ready for the new semester, you are invited to join us on August 31 - September 2 (Monday-Wednesday) 4:00 P.M - 5:30 P.M.. This three-day session workshop will explore R basics, including R lists, R Matrixes, R Data Frames, basic plotting in R, and loops and functions in R which will get you ready for some interesting intermediate level R workshops that SSDA offers regularly.
We had Spartans back with us this summer to share their data science stories via Zoom. Our first invited guest speaker in the fall is Sushant More, who serves as a research scientist at Amazon now. If you are interested in knowing how he became a data scientist and how he transitioned from academia to industry, please join us on Zoom on September 18 (Friday), 2020, at 2PM EST!
Is bootstrapping magic? Can you really generate a sampling distribution for nearly any statistical property by thrashing your data with a golf club? How might that be useful? Bootstrapping is a conceptually simple idea for using sampling with replacement to characterize the mean, variance, skew, you name it, for statistics involving your data. In this short workshop I will introduce the basic theory behind bootstrapping and show how to use it in R. Golf club not included, but everything else is.
This is a two-session workshop to introduce R basics to R beginners. Topics include R lists, R matrices, R data frames, functions, and packages in R.
Data science gets complicated as projects evolve, datasets grow, and research teams change. Starting a new project is often daunting, and existing projects will get out of hand if not managed properly. Many of these issues can be mitigated by taking an intentional approach to project design. In this workshop, Ezra will share tips and tricks for taking control of your data. Participants will explore a number of real world scenarios, and discuss strategies and techniques that reduce complexity and remove barriers to research. Generic examples in both R and Python will be shown, but no prior programming experience is necessary for this workshop.
If you would like to learn Python from the very beginning, you are invited to join us on October 19-21, 2020 (Monday - Wednesday) 4:00 P.M. - 5:30 P.M. to explore Python lists, functions, packages, DataFrames, and even plotting in Python using Matplotlib. If you are interested in getting started with Python, don't miss this one.
If you joined our R introductory level training workshops in the past, or if you used R before and would like to refresh your memory on what you have learned, this workshop will be a good fit for you. The workshop will cover basics of R, R lists, R matrices, R data frames, functions, packages, and plotting in R, all packed within 90 minutes. This will be the last R introductory training workshop that SSDA offers this semester. If you would like to get ready to start with R, do not miss it.