| patient_ID | disease_status | treatment | age_in_years |
|---|---|---|---|
| 4oncnyz0 | tested positive | control | 44 |
| e01i28hz | tested positive | low_dose | 38 |
| gis9nqlo | tested positive | medium_dose | 53 |
| 5gijw0g0 | tested negative | high_dose | 47 |
| 5y72ehlm | tested positive | control | 55 |
| r60yuoz4 | tested negative | low_dose | 58 |
| p3fhvv4q | tested negative | medium_dose | 59 |
| 7dmeyvy5 | tested negative | high_dose | 60 |
1 Biostatistics basics
Note: you do not have to use R to complete this problem set. If you would like to use R for any of the questions, please make use of the RStudio Server app running on the Koa Server. For a reminder on how to use the RStudio Server app check out the Setup and Lab 1 sections from our lab instructions website.
Populations, estimates, sources of variation
In one or two sentences, what is the difference between a sample and a population? [1 point]
If you’re trying to study differences in species richness across locations, taxonomic misidentification can occur, what category of error or source of variation is this? [1 point]
- Bad luck
- Random variation
- Scientific
- Measurement error
What/which is an estimate (choose all that apply)? [1 point]
- A property of a population
- A statistic calculated from a sample
- A best guess of the value of a parameter
- The average body mass of 20 Kōlea birds
Random sampling for a clinical trial
Suppose you are designing a clinical trial for a new drug and you have enough budget to afford a trial with 100 participants randomly selected from the total (finite) population of adults living in the fictional country of El Dorado. Assume no one is added or subtracted (dies) from the population during sampling. El Dorado is divided into 10 regions that vary widely in their population size. To randomly select participants, you are given a list of all living adults in El Dorado arranged by Region and alphabetically by surname (last name) within Region like this:
| Region | Name | ID Number |
|---|---|---|
| A | Aaronson, A | 1 |
| \(\vdots\) | \(\vdots\) | \(\vdots\) |
| A | Zykowski, Z | 65184 |
| B | Aaronson, A | 65185 |
| \(\vdots\) | \(\vdots\) | \(\vdots\) |
| B | Zykowski, Z | 187706 |
| C | Aaronson, A | 187707 |
| \(\vdots\) | \(\vdots\) | \(\vdots\) |
| C | Zykowski, Z | 245906 |
Each adult also has unique ID Number determined by their Region and Name.
A. Choose the first 100 individuals in the list
B. Choose the first 10 individuals from each Region
C. Use a random number generator to select 10 ID Numbers from each region
D. Use a random number generator to select 100 ID Numbers from the country population
Which of the sampling procedures (A, B, C, or D) is most likely to produce a biased estimate of the true population response to the drug? [1 point]
After starting the trial on participants, you learn that a technician accidentally used a nonrandom sampling procedure that is likely to bias your estimate. The technician suggests that you ask for more funding to increase the sample size to 200 while continuing to use the same biased sampling procedure. If you adopt their suggestion, what is most likely true about the resulting estimate from the sample of 200? [1 point]
- More precise and less biased
- More precise and equally biased
- Less precise and less biased
- Less precise and equally biased
Randomly assign individuals to “treatments”
- [2 points] As discussed in the Syllabus, the Group Project is a major part of this class. All students in your section will be divided into groups of 4 to 5 students each. For this activity, you will randomly assign students to groups. Each student must be assigned to one and only one group. Assume there are 23 students in your section, and you can simply “name” each student “Student 01”, “Student 02”, …, “Student 23”. You can use any method you want to assign groups, as long as it is truly random. You need to upload your work to show how your randomization procedure works. This could be any of the following:
- a picture of a piece of paper showing randomization
- a Google Sheet/Doc with randomization procedure
- a short video describing and showing your randomization procedure
- an R script
- something else, as long as it’s clear what you did.
Types of variables
Imagine you have a dataset from a different clinical trial that looks like this:
(Assume the above shows only a small subset of the full data)
7a. Which of the following is a categorical variable?
7b. Which of the following is a numerical variable?
7c. Which of the following is a ordinal variable?
patient_IDdisease_statustreatmentage_in_years
Questions 7a-c are worth [3] points in total.