Title: | Examples from Multilevel Modelling Software Review |
---|---|
Description: | Data and examples from a multilevel modelling software review as well as other well-known data sets from the multilevel modelling literature. |
Authors: | Douglas Bates <[email protected]>, Martin Maechler <[email protected]> and Ben Bolker <[email protected]> |
Maintainer: | Steve Walker <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0-8 |
Built: | 2024-11-20 03:49:54 UTC |
Source: | https://github.com/cran/mlmRev |
The bdf
data frame has 2287 rows and 25 columns of language
scores from grade 8 pupils in elementary schools in The Netherlands.
data(bdf)
data(bdf)
a factor denoting the school.
a factor denoting the pupil.
a numeric vector of verbal IQ scores
a numeric vector of IQ scores.
Sex of the student.
a factor indicating if the student is a member of a minority group.
an ordered factor indicating if one or more grades have been repeated.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector of socioeconomic status indicators.
a factor indicating of the school is a public school, a Protestant private school, a Catholic private school, or a non-denominational private school.
a numeric vector
a numeric vector
a factor with levels 0
and 1
a numeric vector
a numeric vector
a factor indicating if the class is a mixed-grade class.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Snijders, Tom and Bosker, Roel (1999) Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, Sage.
summary(bdf)
summary(bdf)
Scores on the 1997 A-level Chemistry examination in Britain. Students are grouped into schools within local education authories. In addition some demographic and pre-test information is provided.
data(Chem97)
data(Chem97)
A data frame with 31022 observations on the following 8 variables.
Local Education Authority - a factor
School identifier - a factor
Student identifier - a factor
Point score on A-level Chemistry in 1997
Student's gender
Age in month, centred at 222 months or 18.5 years
Average GCSE score of individual.
Average GCSE score of individual, centered at mean.
This data set is relatively large with 31,022 individuals in 2,280 schools. Note that while this is used, illustratively, to fit Normal response models, the distribution of the response is not well described by a Normal distribution.
http://www.bristol.ac.uk/cmm/learning/mmsoftware/data-rev.html
Yang, M., Fielding, A. and Goldstein, H. (2002). Multilevel ordinal models for examination grades (submitted to Statistical Modelling).
str(Chem97) summary(Chem97) (fm1 <- lmer(score ~ (1|school) + (1|lea), Chem97)) (fm2 <- lmer(score ~ gcsecnt + (1|school) + (1|lea), Chem97))
str(Chem97) summary(Chem97) (fm1 <- lmer(score ~ (1|school) + (1|lea), Chem97)) (fm2 <- lmer(score ~ gcsecnt + (1|school) + (1|lea), Chem97))
These data on the use of contraception by women in urban and rural areas come from the 1988 Bangladesh Fertility Survey.
data(Contraception)
data(Contraception)
A data frame with 1934 observations on the following 6 variables.
Identifying code for each woman - a factor
Identifying code for each district - a factor
Contraceptive use at time of survey
Number of living children at time of survey - an
ordered factor. Levels are 0
, 1
, 2
, 3+
Age of woman at time of survey (in years), centred around mean.
Type of region of residence - a factor. Levels are
urban
and rural
http://www.bristol.ac.uk/cmm/learning/mmsoftware/data-rev.html
Steele, F., Diamond, I. And Amin, S. (1996). Immunization uptake in rural Bangladesh: a multilevel analysis. Journal of the Royal Statistical Society, Series A (159): 289-299.
str(Contraception) summary(Contraception) (fm1 <- glmer(use ~ urban+age+livch+(1|district), Contraception, binomial)) (fm2 <- glmer(use ~ urban+age+livch+(urban|district), Contraception, binomial))
str(Contraception) summary(Contraception) (fm1 <- glmer(use ~ urban+age+livch+(1|district), Contraception, binomial)) (fm2 <- glmer(use ~ urban+age+livch+(urban|district), Contraception, binomial))
Cognitive scores of infants in a study of early childhood intervention. The 103 infants from low income African American families were divided into a treatment group (58 infants) and a control group (45 infants). Starting at 0.5 years of age the infants in the treatment group were exposed to an enriched environment. Each infant's cognitive score on an age-specific, normalized scale was recorded at ages 1, 1.5, and 2 years.
data(Early)
data(Early)
This groupedData
object contains the following columns
An ordered factor of the id number for each infant.
A numeric cognitive score.
The age of the infant at the measurement.
A factor with two levels, "N"
and "Y"
,
indicating if the infant is in the early childhood intervention
program.
Singer, Judith D. and Willett, John B. (2003), Applied Longitudinal Data Analysis, Oxford University Press. (Ch. 3)
str(Early)
str(Early)
A subset of the mathematics scores from the U.S. Sustaining Effects Study. The subset consists of information on 1721 students from 60 schools
data(egsingle)
data(egsingle)
A data frame with 7230 observations on the following 12 variables.
a factor of school identifiers
a factor of student identifiers
a numeric vector indicating the year of the test
a numeric vector indicating the student's grade
a numeric vector of test scores on the IRT scale score metric
a factor with levels 0
1
indicating if
the student has been retained in a grade.
a factor with levels Female
Male
indicating the student's sex
a factor with levels 0
1
indicating if
the student is Black
a factor with levels 0
1
indicating if
the student is Hispanic
a numeric vector indicating the number of students enrolled in the school
a numeric vector giving the percentage of low-income students in the school
a numeric vector
These data are distributed with the HLM software package (Bryk, Raudenbush and Congdon, 1996). Conversion to the R format is described in Doran and Lockwood (2004).
Doran, Harold C. and Lockwood, J.R. (2004), Fitting value-added models in R, (submitted).
str(egsingle) (fm1 <- lmer(math~year*size+female+(1|childid)+(1|schoolid), egsingle))
str(egsingle) (fm1 <- lmer(math~year*size+female+(1|childid)+(1|schoolid), egsingle))
Exam scores of 4,059 students from 65 schools in Inner London.
data(Exam)
data(Exam)
A data frame with 4059 observations on the following 9 variables.
School ID - a factor.
Normalized exam score.
School gender - a factor. Levels are mixed
,
boys
, and girls
.
School average of intake score.
Student level Verbal Reasoning (VR) score band at intake -
a factor. Levels are bottom 25%
, mid 50%
, and
top 25%
.
Band of student's intake score - a factor.
Levels are bottom 25%
, mid 50%
and top
25%
./
Standardised LR test score.
Sex of the student - levels are F
and M
.
School type - levels are Mxd
and Sngl
.
Student id (within school) - a factor
http://www.bristol.ac.uk/cmm/learning/mmsoftware/data-rev.html
Goldstein, H., Rasbash, J., et al (1993). A multilevel analysis of school examination results. Oxford Review of Education 19: 425-433
str(Exam) summary(Exam) (fm1 <- lmer(normexam ~ standLRT + sex + schgend + (1|school), Exam)) (fm2 <- lmer(normexam ~ standLRT*sex + schgend + (1|school), Exam)) (fm3 <- lmer(normexam ~ standLRT*sex + schgend + (1|school), Exam))
str(Exam) summary(Exam) (fm1 <- lmer(normexam ~ standLRT + sex + schgend + (1|school), Exam)) (fm2 <- lmer(normexam ~ standLRT*sex + schgend + (1|school), Exam)) (fm3 <- lmer(normexam ~ standLRT*sex + schgend + (1|school), Exam))
The GCSE exam scores on a science subject. Two components of the exam were chosen as outcome variables: written paper and course work. There are 1,905 students from 73 schools in England.
data(Gcsemv)
data(Gcsemv)
A data frame with 1905 observations on the following 5 variables.
School ID - a factor
Student ID - a factor
Gender of student
Total score on written paper
Total score on coursework paper
http://www.bristol.ac.uk/cmm/learning/mmsoftware/data-rev.html
Multivariate response models. (2000). In Rasbash, J., et al, A user's guide to MLwiN, Institute of Education, University of London.
str(Gcsemv)
str(Gcsemv)
Immunizations received by children in Guatemala.
data(guImmun)
data(guImmun)
A data frame with 2159 observations on the following 13 variables.
a factor identifying the child
a factor identifying the family.
a factor identifying the community.
a factor indicating if the child received a complete set of immunizations. All children in this data frame received at least one immunization.
a factor indicating if the child was 2 years or older at the time of the survey.
a factor indicating if the mother was 25 years or older.
an factor indicating the child's birth's order within the
family. Levels are 01
- first child, 23
- second or
third child, 46
- fourth to sixth child, 7p
-
seventh or later child.
a factor indicating the mother's ethnicity. Levels are
L
- Ladino, N
- indigenous not speaking Spanish, and
S
- indigenous speaking Spanish.
a factor describing the mother's level of eduation.
Levels are N
- not finished primary, P
- finished
primary, S
- finished secondary
a factor describing the husband's level of education.
Levels are the same as for momEd
plus U
- unknown.
a factor indicating if the mother had ever worked outside the home.
a factor indicating if the family's location is considered rural or urban.
the percentage of indigenous population in the community at the 1981 census.
These data are available at http://data.princeton.edu/multilevel/guImmun.dat. Multiple indicator columns in the original data table have been collapsed to factors for this data frame.
Rodriguez, Germán and Goldman, Noreen (1995), "Improved estimation procedures for multilevel models with binary response: a case-study", Journal of the Royal Statistical Society, Series A, 164, 339-355.
data(guImmun) summary(guImmun)
data(guImmun) summary(guImmun)
Data on the prenatal care received by mothers in Guatemala.
data(guPrenat)
data(guPrenat)
A data frame with 2449 observations on the following 15 variables.
a factor identifying the birth
a factor identifying the mother or family
a factor identifying the community
a factor indicating if traditional or modern prenatal care was provided for the birth.
an ordered factor of the child's age at the time of the survey.
a factor indicating if the mother was older or younger. The cut-off age is 25 years.
an ordered factor for the birth's order within the family.
a factor indicating if the mother is Ladino, or indigenous not speaking Spanish, or indigenous speaking Spanish.
a factor describing the mother's level of eduation.
a factor describing the husband's level of education.
a factor describing the husband's employment status.
a factor indicating if there is a modern toilet in the house.
a factor indicating if there is a TV in the house and, if so, the frequency with which it is used.
the percentage of indigenous population in the community at the 1981 census.
distance from the community to the nearest clinic.
These data are available at http://data.princeton.edu/multilevel/guPrenat.dat. Multiple indicator columns in the original data table have been collapsed to factors for this data frame.
Rodriguez, Germán and Goldman, Noreen (1995), "Improved estimation procedures for multilevel models with binary response: a case-study", Journal of the Royal Statistical Society, Series A, 164, 339-355.
data(guPrenat) summary(guPrenat)
data(guPrenat) summary(guPrenat)
Data from the 1982 study “High School and Beyond”.
data(Hsb82)
data(Hsb82)
A data frame with 7185 observations on students including the following 8 variables.
an ordered factor designating the school that the student attends.
a factor with levels
a factor with levels Male
and Female
a numeric vector of socio-economic scores
a numeric vector of Mathematics achievement scores
a numeric vector of mean ses
for the school
a factor with levels Public
and Catholic
a numeric vector of centered ses
values where the
centering is with respect to the meanses
for the school.
Each row in this data frame contains the data for one student.
Raudenbush, Stephen and Bryk, Anthony (2002), Hierarchical Linear Models: Applications and Data Analysis Methods, Sage (chapter 4).
data(Hsb82) summary(Hsb82)
data(Hsb82) summary(Hsb82)
Malignant Melanoma Mortality in the European Community associated with the impact of UV radiation exposure.
data(Mmmec)
data(Mmmec)
A data frame with 354 observations on the following 6 variables.
a factor with levels Belgium
, W.Germany
,
Denmark
, France
, UK
, Italy
, Ireland
,
Luxembourg
, and Netherlands
Region ID - a factor.
County ID - a factor.
Number of male deaths due to MM during 1971–1980
Number of expected deaths.
Centered measure of the UVB dose reaching the earth's surface in each county.
http://www.bristol.ac.uk/cmm/learning/mmsoftware/data-rev.html
Langford, I.H., Bentham, G. and McDonald, A. 1998: Multilevel modelling of geographically aggregated health data: a case study on malignant melanoma mortality and UV exposure in the European community. Statistics in Medicine 17: 41-58.
str(Mmmec) summary(Mmmec) (fm1 <- glmer(deaths ~ uvb + (1|region), Mmmec, poisson, offset = log(expected)))
str(Mmmec) summary(Mmmec) (fm1 <- glmer(deaths ~ uvb + (1|region), Mmmec, poisson, offset = log(expected)))
The Oxboys
data frame has 234 rows and 4 columns.
This data frame contains the following columns:
an ordered factor giving a unique identifier for each boy in the experiment
a numeric vector giving the standardized age (dimensionless)
a numeric vector giving the height of the boy (cm)
an ordered factor - the result of converting age
from a
continuous variable to a count so these slightly unbalanced
data can be analyzed as balanced.
These data are described in Goldstein (1987) as data on the height of a selection of boys from Oxford, England versus a standardized age.
Pinheiro, J. C. and Bates, D. M. (2000) Mixed-Effects Models in S and S-PLUS, Springer, New York. (Appendix A.19)
data(Oxboys)
data(Oxboys)
The s3bbx
data frame has 2449 rows and 6 columns of the
covariates in the simulation by Rodriguez and Goldman of multilevel
dichotomous data.
data(s3bbx)
data(s3bbx)
This data frame contains the following columns:
a numeric vector identifying the child
a numeric vector identifying the family
a numeric vector identifying the community
a numeric vector of the child-level covariate
a numeric vector of the family-level covariate
a numeric vector of the community-level covariate
http://data.princeton.edu/multilevel/simul.htm
Rodriguez, Germán and Goldman, Noreen (1995) An assessment of estimation procedures for multilevel models with binary responses, Journal of the Royal Statistical Society, Series A 158, 73–89.
str(s3bbx)
str(s3bbx)
A matrix of the results of 100 simulations of dichotomous multilevel
data. The rows correspond to the 2449 births for which the covariates
are given in s3bbx
. The elements of the matrix are all
0, indicating no modern prenatal care, or 1, indicating model prenatal
care. These were simulated with "large" variances for both the family
and the community random effects.
data(s3bby)
data(s3bby)
An integer matrix with 2449 rows and 100 columns.
http://data.princeton.edu/multilevel/simul.htm
Rodriguez, Germán and Goldman, Noreen (1995) An assessment of estimation procedures for multilevel models with binary responses, Journal of the Royal Statistical Society, Series A 158, 73–89.
str(s3bby)
str(s3bby)
Scores attained by 3435 Scottish secondary school students on a standardized test taken at age 16. Both the primary school and the secondary school that the student attended have been recorded.
data(ScotsSec)
data(ScotsSec)
A data frame with 3435 observations on the following 6 variables.
The verbal reasoning score on a test taken by the students on entry to secondary school.
The score attained on the standardized test taken at age 16.
A factor indicating the primary school that the student attended.
A factor with levels M
and F
The student's social class on a numeric scale from low to high social class.
A factor indicating the secondary school that the student attended.
These data are an example of cross-classified grouping factors.
http://www.bristol.ac.uk/cmm/learning/mmsoftware/data-rev.html
Paterson, L. (1991). Socio economic status and educational attainment: a multidimensional and multilevel study. Evaluation and Research in Education 5: 97-121.
str(ScotsSec)
str(ScotsSec)
These data come from the British Social Attitudes (BSA) Survey started in 1983. The eligible persons were all adults aged 18 or over living in private households in Britain. The data consist of completed results of 264 respondents out of 410.
data(Socatt)
data(Socatt)
A data frame with 1056 observations on the following 9 variables.
District ID - a factor
Respondent code (within district) - a factor
A factor with levels 1983
, 1984
,
1985
, and 1986
An ordered factor giving the number of positive answers to seven questions.
Political party chosen - a factor. Levels are
conservative
, labour
, Lib/SDP/Alliance
,
others
, and none
.
Self assessed social class - a factor. Levels are
middle
, upper working
, and lower working
.
Respondent's sex. (1=male, 2=female)
Age in years
Religion - a factor. Levels are Roman
Catholic
, Protestant/Church of England
, others
,
and none
.
These data are provided as an example of multilevel data with a multinomial response.
http://www.bristol.ac.uk/cmm/learning/mmsoftware/data-rev.html
McGrath, K. and Waterton, J. (1986). British Social Attitudes 1983-1986 panel survey. London, Social and Community Planning Research.
str(Socatt) summary(Socatt)
str(Socatt) summary(Socatt)
Data from Tennessee's Student Teacher Achievement Ratio (STAR) project which was a large-scale, four-year study of the effect of reduced class size.
data(star)
data(star)
A data frame with 26796 observations on the following 18 variables.
id
a factor - student id number
sch
a factor - school id number
gr
grade - an ordered factor with levels K
<
1
< 2
< 3
cltype
class type - a factor with levels small
,
reg
and reg+A
. The last level indicates a regular
class size with a teachers aide.
hdeg
highest degree obtained by the teacher - an
ordered factor with levels ASSOC
< BS/BA
<
MS/MA/MEd
< MA+
< Ed.S
< Ed.D/Ph.D
clad
career ladder position of the teacher - a factor
with levels NOT
APPR
PROB
PEND
1
2
3
exp
a numeric vector - the total number of years of experience of the teacher
trace
teacher's race - a factor with levels W
,
B
, A
, H
, I
and O
representing
white, black, Asian, Hispanic, Indian (Native American) and other
read
the student's total reading scaled score
math
the student's total math scaled score
ses
socioeconomic status - a factor with levels
F
and N
representing eligible for free lunches or
not eligible
schtype
school type - a factor with levels
inner
, suburb
, rural
and urban
sx
student's sex - a factor with levels M
F
eth
student's ethnicity - a factor with the same
levels as trace
birthq
student's birth quarter - an ordered factor with
levels 1977:1
< ... < 1982:2
birthy
student's birth year - an ordered factor with levels 1977:1982
yrs
number of years of schooling for the student - a
numeric version of the grade gr
with Kindergarten
represented as 0. This variable was generated from gr
and
does not allow for a student being retained.
tch
a factor - teacher id number
Details of the original data source and the process of conversion to this representation are given in the vignette.
http://www.heros-inc.org/data.htm
str(star)
str(star)