Details

We are conducting research on the ways that people use data analysis and data science tools. We will not collect any personally identifiable information about you for the purposes of this research. The potential risks to you are small. The potential benefits to the community of data scientists, developers, and professors are very high – we will be able to learn about the process of how people analyze data which can improve how we use data in business, scientific studies, and other areas.

If you join this study, you will participate in the data analysis experiment. You will be asked to analyze a dataset and we will record all of the R code you are using while you perform this analysis in RStudio Cloud. You will then submit this code to us for review. We also will ask you to submit a one paragraph description of the final model you fit. You will then be asked to fill out some answers on SurveyMonkey. The data from all participants will be compiled into a single data set by the investigators in the study and then analyzed and released to the community. We plan to release all data collected from this study openly online via sites like Github and Figshare.

Read the full consent form here.

Instructions

  1. The Start Challenge button will send you to a RStudio Cloud instance we have set up for the challenge
  2. After signing into RStudio Cloud, click “Projects” at the top of the page; you should see a project called base-project. Click this.
  3. Save your own copy of this project by clicking “Save a Permanent Copy” in the top right corner

  4. Click on the instructions.Rmd file and follow the instructions there.

  5. You will be asked to consent before beginning. Please read the consent form here.

  6. To begin your analysis and load the data, run the first code chunk in instructions.Rmd.

    pchallenge::start_challenge()
    
  7. Conduct your analysis, examining the association between the standby posted wait time (SPOSTMIN) for the amusement park ride and how many units of merchandise were sold that day (MERCHANDISE). You may use any technique to obtain a significant p-value that you believe could be justified in the scientific literature. You can add your analysis code to instructions.Rmd file, open a new script for your analysis, or run it in the RStudio Cloud console. Please do not post your code publicly until after the challenge has been completed.

  8. When you have completed your analysis, run the final code chunk of instructions.Rmd.

    pchallenge::turn_in_challenge()
    

    This will first prompt you to copy your R Report. We will use this to verify the p-value you submit. The file will automatically open,

    If you are working from a PC If you are working from a mac
    Press ctrl + A to select the text Press ⌘ + A to select the text
    Press ctrl + C to copy the text Press ⌘ + C to copy the text
    Press save Press save

    Then it will bring you to a SurveyMonkey survey where you can input your analysis results.

    This will open a browser with a final survey to submit your results. If you do not have pop-ups enabled, you will get an error message - click “Try Again” and the survey should load.

    The survey will first ask you to paste (ctrl + V or ⌘ + V on a mac) your “R Report” into a text box. You will then be asked a few demographic questions. Finally, you will be asked to submit a short description of your analysis along with your final p-value for the association between between the standby posted wait time for the amusement park ride and how many units of merchandise were sold that day.

  9. If you choose to enter a username in the SurveyMonkey survey, you can join the p-hack-athon Leaderboard.