Data

  • The data for this project are synthetic (some columns were artificially generated). The dataset was generated using a subset of Disney Wait Time data, obtained from touringplans.com. Because it was artifically generated, conclusions should not be drawn outside of this challenge. This dataset will be loaded in RStudio Cloud when you join below.

  • Here is a data dictionary.

  • We are interested in the relationship between between the standby posted wait time (SPOSTMIN) for the amusement park ride and how many units of merchandise were sold that day (MERCHANDISE).

  • You may use any technique to obtain a significant p-value that you believe could be justified in the scientific literature.

Things to consider with this data

  • There are missing values. Some of them are coded as <NA>, others are coded differently, for example for the column indicating standby posted wait time (SPOSTMIN) if the ride was closed, it is coded as -999. Be sure to examine the data dictionary for the variables you are using.

  • There are two levels of data, time-level, and day-level. The time-level data, indicated by columns with unique values based on DATETIME, has unique values by second within days. For example, SPOSTMIN has many unique values within each day, for each wait time reported. The day-level variables are constant by DATE. For example, the column MERCHANDISE is constant by date.