BIO 206: Biostatistics HW 1

Description

Unformatted Attachment Preview

Name: ____________________________
Biostatistics – BIO 206
Homework 1
Complete the following assignment. Please be concise. Should be typed only.
1) Define statistics. Please use your own words. (16%)
– The Practice or science of collecting analyzing interpreting and presenting the data
-List the following information for each of the following statistical methods. For type of data, you should
indicate Quantitative or Categorical:
1) One-sample t test: (done for you)
-Purpose: Test whether a sample mean differs significantly from a given parametric mean.
-Type of data: Quantitative
-Example of application (can be made up): Test whether food companies actually are giving you the
amount of food they list for their products, e.g., 269.3 g of Doritos.
2) Two-sample t test: (12%)
-Purpose:
-Type of data:
-Example of application (can be made up):
3) Mann-Whitney U test: (12%)
-Purpose:
-Type of data:
-Example of application (can be made up):
4) Analysis of variance (ANOVA): (12%)
-Purpose:
-Type of data:
-Example of application (can be made up):
5) Kruskal-Wallis test: (12%)
-Purpose:
1
Name: ____________________________
Biostatistics – BIO 206
-Type of data:
-Example of application (can be made up):
6) Correlation: (12%)
-Purpose:
-Type of data:
-Example of application (can be made up):
7) Regression: (12%)
-Purpose:
-Type of data:
-Example of application (can be made up):
8) Chi Square (?2) test: (12%)
-Purpose:
-Type of data:
-Example of application (can be made up):
2
BIO 206: Biostatistics
Lecture 1
Introduction
Instructors
Instructor: Windsor Aguirre
Department of Biological Sciences
waguirre@depaul.edu
Office: 221A McGowan South
Office hours: By Appointment
TA: Fernanda Vivas
FVIVAS@depaul.edu
Office: 221 McGowan South
Office Hours: TBA
Statistics in Biology
Objectives
• Understand common variable types and distributions
• Use descriptive statistics to summarize, graph, and
compare data sets
• Understand experimental design and apply
inferential statistical methods appropriately
• Be able to interpret and critique a statistical
presentation in a published argument
• Have some exposure to software for statistical
analyses
Methods:
• Descriptive statistics
– E.g., Mean, mode, median, standard deviation, variance, coefficient of
variation, z-scores, confidence intervals, standard error of the mean,
graphing, etc.
• Inferential statistics
– t test (one-sample, independent-samples, paired-samples)
– Mann-Whitney U-test
– ANOVA
– Kruskal-Wallis
– Correlation
– Regression
– ?2 test
Desire to Learn (D2L)
• D2L will be used heavily in this course
– DePaul Website: Shortcuts
• Check for periodic announcements
• Syllabus, lectures, online quizzes, assignments,
etc., will be posted there
Syllabus
• Please read the syllabus carefully
– Describes what is expected from you for this
course
• Class schedule
– May change depending on our progress
– Changes will be announced in class and via D2L
Lectures
• I value participation and call on students
frequently so be prepared and review the
material carefully
• Lectures will be posted on D2L
• Context matters: much of what we will talk
about will not be on slides
Readings
• Your required textbook is Exploring
Statistics: Tales of Distributions
Twelth edition by Chris Spatz
– See syllabus for reading schedule
– Good read for a statistics book
– Keep up with reading, important for you
to read the chapter BEFORE it is covered
in class
– Online Quizzes will be based on readings
– Copy placed on reserve in library
Attendance Policy
• You are not graded on attendance but you are
expected to attend every class
• Worksheets will occasionally be handed out in class
– Some will be unannounced
– Cannot make them up if absent or late
• Talk to me if you know that you will miss class
frequently
Grading Scale
• A ————— 93 – 100
• A- ————— 90 – 92
• B+ ————— 87 – 89
• B ————— 83 – 86
• B- ————— 80 – 82
• C+ ————— 77 – 79
• C ————— 73 – 76
• C- ————— 70 – 72
• D ————— 60 – 69
• F ————— < 59
Course Grade
• Online quizzes ……………………….. 10%
• Assignments ………………………….. 20%
• Stats Project ………………………….. 10%
• Exam I ……………………………………. 10%
• Exam II …………………………………… 20%
• Final Exam..…………………………….. 30%
Online Quizzes
• Based on readings from book
• Eight multiple choice quizzes to be completed on D2L
• See syllabus schedule for due date
– Due by start of class on the due date; strongly recommend
completing quizzes early (computer glitches are not a valid
excuse)
– D2L automatically cuts off access after due time
– Make sure you actually submitted your quiz
• You will be allowed to drop one Online Quiz grade
Assignments: Homework
• You will have six major homework assignments
– Intended to help you practice and master the methods that we
learn
– See syllabus schedule for due dates
• Do not turn in homework late! Homework is due at the
beginning of class on the date listed on the syllabus
• -25% for late assignments
– Assignments will not be accepted beyond the start of the
following period
Assignments: Worksheets
• Worksheets to be completed in class will be assigned
periodically and may be unannounced
– Cannot be made up if you are absent or late without valid written
documentation
• Some will be based on individual activities, others will be
based on group activities
• Homework and worksheets will be averaged
– You can drop your lowest Assignment grade (either one homework or
one worksheet)
• You will be allowed to drop your lowest assignment grade
Statistics Project
• You will design and conduct your own scientific
experiment involving original biological data and write a
report in the format of a scientific paper
– Use your imagination and experience from other classes
– Come up with a problem that you will enjoy researching
• Must employ one of the methods covered in class
• Must be an original study conducted by you
• You may work in groups (up to three students) or
individually
Stats Project
• Introduction: Background, scientific question, justification,
objectives, hypotheses (null & alternate)
• Methods: Data collection, date and locality info, quality
assurance, methods of analysis, equations, software used
• Results: What did you find? Describe the results of your
experiment. Provide summary tables, figures, charts, etc.
• Discussion: Interpret your results. What did you learn?
Why does it matter? Problems? Suggestions for future
research? Things you would have done differently?
Stats Project
• Acknowledgements: Thank anyone that helped,
participated, or made your project possible
• Literature cited: Cite sources used to provide context or
justify your experiment, sources for the software (if used),
and any papers that you cite to interpret your results
Stats Project
• Group Assurance: This section should only be included in
group projects.
– The name of each student in the group and a summary of what
each participant did should be included in this section. The
summary should include specific tasks like data collection,
research on topic, a specific analysis or the creation of specific
charts, etc. At the end of the section, include the following
statement and include your signature after the statement:
– “I certify that to the best of my knowledge the above summary of
the work performed by each group member is correct.”
• Student A: Signature
• Student B: Signature
• Student C: Signature
Stats Project
• Guidelines:
– Should be biological (broadly defined)
– The experiment cannot involve human or vertebrate (fishes,
amphibians, birds or mammals) subjects if it is manipulative
– You should not involve infectious agents or conduct any
experiments that put your or anyone else’s safety at risk
– You should try to avoid needlessly hurting animals with developed
nervous systems
– Use your common sense and ask me if you have any doubts
PRIOR to collecting data!
• Due dates:
– Oct 4: Group composition due
– Oct 25: Outline of proposed experiment & Peer review
– Nov 15: Stats Project Paper due
Midterms and Final Exams
• The two midterm exams combined are worth 30%
of your final grade, are closed book/notes, and are
NOT cumulative
• The final is worth 30% of your final grade and is
cumulative. It will consist of two sections:
– Section A will be closed book and primarily consist of
definitions, multiple choice, and short answer problems
– Section B will be open book and will consist of long
problems that you have to solve. You will not be told
what method to employ to solve these problems
Participation
• I strongly encourage participation
– It will get pretty boring if I am the only one speaking!
• I frequently ask questions in class
– Be prepared
• Stop me to ask questions
– My job is to help you learn, do not be afraid to ask
questions if something is not clear
– If you don’t ask questions I may assume that I am going
too slow
Be Prepared to Invest Time
• This course is not intended to be difficult but statistics is
quantitative by nature and some students have difficulties
with quantitative problems
– ten weeks
• The assignments, readings, and review will take time so
plan ahead
Imagine an “A” in Biostatistics!
• YOU can earn an A in this class
• Study as you go, don’t leave things to the last minute
– Bad habit in school and life
• Do the work, spend time on the class, read, participate
– Time is usually the most limiting factor for academic achievement
Syllabus/Schedule
Aguirre Lab
Tools Lab Employs
Molecular Markers
Morphometrics
Western
Ecuador
What Do We Mean by Statistics?
Why Are Statistics Necessary?
Statistics in Biology
Statistics in Biology
• Descriptive Statistics:
– Produce numbers and graphs that help summarize or
describe data
Central Tendency:
-Mean: 70.8
-(Median, Mode)
Variation:
-Standard Deviation
-(Range, variance,
Coefficient of
variation)
Why Are Statistics Necessary?
• We often must go beyond descriptions and draw
conclusions about unmeasurable populations from
samples
– Does an experimental drug significantly alter the patients
physiology?
– Does the mean crop yield differ significantly between
two plant populations given different fertilizers?
– How do we decide if a difference seen is biologically
meaningful or just due to random chance?
Why Are Statistics Necessary?
Statistics – Definitions
• Population: every individual in the group that you define
• Sample: a sub set of the population ( a group of individual from that
population that has been sampled)
• Parameter:
• Statistic (Estimate):
• Sampling Error:
Samples & Sampling
• Bias can be a major problem when drawing
conclusions from experiments
– Bias:
• What makes a good sample? It is a random sample
from a population
– Equal chance:
– Independence of sampling units:
Variables
• Variable:
• Score:
Types of Variables
• Quantitative Variables (or Numerical Variables)
– Quantitative measurements of individuals that have magnitude
on a numerical scale
– Numerical variables may be either continuous or discrete.
• Categorical Variables (Qualitative or Attribute)
– Characteristics of individuals that do NOT have magnitude or
numerical scale
– Categorical variables may be nominal or ordinal.
Quantitative variables
• Height
• Weight
• Length
• Dose (e.g., in micrograms/gram)
• Longevity (i.e., number of years)
Discrete
vs. Continuous
Can be counted
• Number of limbs
• Number of offspring
• Number of petals
Can be measured
• Arm length
• Height
• Weight
– Discrete data come in indivisible units
– Continuous data can take on any real-number value
– Discrete data are often analyzed like continuous data
assuming that there is a large number of possible values
Categorical variables
• Nominal: Qualitative descriptors
– Sex
– Genotype
– Drug treatment (e.g. aspirin vs. ibuprofen)
– State
– Survival (i.e., live or die)
• Ordinal: Qualitative descriptors that can be ranked
– Severity of pain (low, medium, high)
– Life stage (egg, larva, juvenile, adult)
Ordinal vs. Discrete?
• How do you distinguish categorical ordinal data from
numerical discrete data?
Assignments for Next Week
• HW1 due next Thursday
– Define statistics in YOUR OWN WORDS
– Research the main statistical methods that we will
learn in the course
• OQ1 also due next Thursday
– Based on chapters 1-3
– Quizzes are under the “More” Tab on D2L
BIO 206: Biostatistics
Lecture 2
Types of Variables
Frequency Distributions – Types of Graphs
Descriptive Statistics
Announcements
• OQ1 is due Thursday Sep 15 (Ch 1-3)
– D2L: Unlimited time but must be completed in one session
– Closes automatically at start of class Thursday (1pm)
– You get to drop one quiz
• HW1 research on statistical methods also due
Thursday (Submission folder)
– Do not turn in late
• OQ2 due Tuesday Sep 20 (Ch 4 & 5)
Readings
• Your required textbook is Exploring
Statistics: Tales of Distributions
Twelth edition by Chris Spatz
– See syllabus for reading schedule
– Good read for a statistics book
– Keep up with reading, important for you
to read the chapter BEFORE it is covered
in class
– Online Quizzes will be based on readings
– Copy placed on reserve in library
Lecture Plan
• Short review last lecture
• Understand variable types, how they can be
identified, and why they matter
• Review simple & grouped frequency distributions
• Review common types of graphs used to illustrate
patterns of variation in data
• Review common descriptive statistics of central
tendency and start on measures of variation
Why Are Statistics Necessary?
• We need methods to efficiently summarize
complex systems in nature
Statistics in Biology
•
68
52
78
86
96
58
90
80
92
51
90
50
86
68
86
68
96
80
60
68
96
87
74
66
66
32
90
94
78
76
90
94
78
64
60
84
88
72
80
64
82
71
80
54
84
76
58
46
92
52
80
62
92
52
80
74
92
80
60
66
88
44
84
66
98
44
76
92
70
84
56
54
68
74
56
54
78
78
72
66
54
68
66
54
68
68
66
54
Statistics in Biology
• Descriptive Statistics:
– Produce numbers and graphs that help summarize or
describe data
Central Tendency:
-Mean: 70.8
-(Median, Mode)
Variation:
-Standard Deviation
-(Range, variance,
Coefficient of
variation)
Why Are Statistics Necessary?
Statistics – Definitions
Samples & Sampling
• Bias can be a major problem when drawing
conclusions from experiments
– Bias:
• What makes a good sample? It is a random sample
from a population
– Equal chance:
– Independence of sampling units:
Variables
• Variable:
• Score:
Types of Variables
• Quantitative Variables (or Numerical Variables)
– Quantitative measurements of individuals that have magnitude
on a numerical scale
– Numerical variables may be either continuous or discrete.
• Categorical Variables (Qualitative or Attribute)
– Characteristics of individuals that do NOT have magnitude or
numerical scale
– Categorical variables may be nominal or ordinal.
Quantitative variables
• Height
• Weight
• Length
• Dose (e.g., in micrograms/gram)
• Longevity (i.e., number of years)
Discrete
vs. Continuous
Can be counted
• Number of limbs
• Number of offspring
• Number of petals
Can be measured
• Arm length
• Height
• Weight
– Discrete data come in indivisible units
– Continuous data can take on any real-number value
– Discrete data are often analyzed like continuous data
assuming that there is a large number of possible values
Categorical variables
• Nominal: Qualitative descriptors
– Sex
– Genotype
– Drug treatment (e.g. aspirin vs. ibuprofen)
– State
– Survival (i.e., live or die)
• Ordinal: Qualitative descriptors that can be ranked
– Severity of pain (low, medium, high)
– Life stage (egg, larva, juvenile, adult)
Ordinal vs. Discrete?
• How do you distinguish categorical ordinal data from
numerical discrete data?
Types of Variable
• Different types of variables are analyzed using
DIFFERENT statistical methods
• Categorical Variables (compute frequencies)
– ?2 Test
• Quantitative Variables
– t-test, ANOVA, correlation, regression
Working with Numbers
• Once we have collected data, we need to explore
and try to understand them. How?
12. 2
14.0
13.5
17.2
11.3
12.5
15.0
15.7
16.7
18.9
13.0
12.3
12.2
14.0
13.5
17.2
14.3
13.5
16.6
11.7
16.1
17.9
13.9
14.3
12. 0
16.0
11.5
17.2
17.4
16.5
14.0
13.7
Simple Frequency Distribution
-Works OK if you have a
limited number of possible
scores
Grouped Frequency Distribution
• Group your data into interval classes
• Reduces the number of classes relative to a
simple frequency distribution, so it is easier to
interpret
Grouped Frequency Distribution
Class Mark
Simplifies evaluation of
data when there are
many scores
-Class Interval:
-Class Mark:
Graphing: Harnessing Your Brain
Power – Histogram (Numerical)
-What does this
histogram tell
you?
Histogram Shapes
Skewed Distributions
Left Skew
(negative)
Right Skew
(positive)
Histogram Display
Interval width can affect histogram shape!
Histogram Display
Interval width can affect histogram shape!
Categorical Variables
• How do we graphically display categorical
variables?
Categorical Variables
Frequency table showing the ten most common causes of death in Americans
between 15 and 19 years of age in 1999. The total number of deaths is n = 13,778.
Cause of death
Frequency
Accidents
6,688
Homicide
2,093
Suicide
1,615
Malignant tumor
745
Heart disease
463
Congenital abnormalities
222
Chronic respiratory disease
107
Influenza and pneumonia
73
Cerebrovascular diseases
67
Other tumor
52
All other causes
1,653
Bar Graph: Categorical Variables
Why the spaces
between the bars?
Y axis should start at 0
Pie Chart: Categorical Variables
-Can be difficult to
interpret when you
have many
categories
(Bar graphs tend to
be better)
-Be careful with color
scheme
Plotting Relationships Between
Two Numerical Variables
• Plots used to graph relationships between two
variables (X & Y data collected from every
individual)
• Scatter plot: Many possible Y’s for every X
• Line graph: One Y for every X
Scatter Plot
Very effective for displaying bivariate data
Male guppy ornamentation
data
Line graph
-Best when you have one value of Y for each X
-Very popular for displaying time series
Descriptive Statistics
• Measures of central tendency
– Arithmetic mean, median, mode
• Measures of variation:
– Range, interquartile range, standard deviation,
variance, coefficient of variation
Measures of Central Tendency
• Arithmetic Mean: (? Y) / N
– Y are the observations
– N is the total number of observations
– Used to obtain a single value that is representative of a
sample
• Median: Point that divides an ordered distribution
into two equal parts; the (N+1)/2 value
– If there are an even number of values, the median will
fall between the two central values
– Sum those two central values and divide by two
Comparing the Mean & Median
Data
2
3
4
5
7
8
10
50
52
Outliers: Scores that are very different
from most scores in the sample
Mean = 15.7
*The median is less
sensitive to outliers
Median = 7
(9 scores so median is 5th
score in ordered series)
Mean & Median
The Mode
• The mode is the score with the highest
frequency in a frequency distribution
– Or the interval with the highest peak in a grouped frequency
distribution displayed as a histogram
Measures of Central Tendency
Are Not Enough!
• Why does variation matter?
• How do you describe variability?
Range
• YH – YL
• YH = High value, YL = Low value
• Range = Maximum Value – Minimum Value
– Should be a single number that gives a simple measure
of the spread of the data
– For the numbers [9, 4, 9, 5, 7, 2, 4, 8], what is the
range?
• Advantages?
Problems with the Range
Samples with very
different patterns of
variation can have
similar ranges

Purchase answer to see full
attachment
Explanation & Answer:

Worksheet

User generated content is uploaded by users for the purposes of learning and should be used following Studypool’s honor code & terms of service.

Needs help with similar assignment?

We are available 24x7 to deliver the best services and assignment ready within 6-12hours? Order a custom-written, plagiarism-free paper

Get Answer Over WhatsApp Order Paper Now

Do you have an upcoming essay or assignment due?

All of our assignments are originally produced, unique, and free of plagiarism.

If yes Order Paper Now