Midterm Study Notes
An operational definition of a quantity is a specific
process whereby it is measured. For example, the weight of an object
can be operationally defined by using a balance and standard weights.
What is evaluation?
The systematic determination of the merit, worth, value,
significance of something (standards and performance data) Program, Policy,
Performance, Product, Process, and Personnel
n G&U: social work research is “a systematic
and objective inquiry that utilizes the research method to
solve human problems and creates new knowledge that is generally applicable, (p.
19).”
RESEARCHERS must:
- be aware: of your biases and values (how will
you handle it if the results or conclusions of your study don’t match your
beliefs, values, or what you expected? (e.g., you’re research showed a
higher proportion of homosexuals with mental health diagnoses.))
- be skeptics: Question everything. Especially
findings. Know how to read and critique research articles and reports.
- share findings with others: for replication and
for them to critique
- be honest: don’t fiddle with data. Often vague,
consult others supervisors, experts, the literature, or governing bodies.
Dishonesty is a deliberate intention to deceive…researchers allow values or
preconceived notions to influence methods of data collection, analysis, and
interpretation. Recognize and report your values!
BE ABLE TO SUMMARIZE SOME
OF THE KEY DIFFERENCES BETWEEN APPLIED AND BASIC RESEARCH—POSSIBLE ESSAY
QUESTION.
Basic:
goal—to develop theory and expand knowledge base.
Applied:
goal—to develop solutions for problems and applications in practice.
Applied and basic are more similar
than different. They are similar in theory, methodology, and ethics. The
difference between the two is in degree
|
|
Applied Research |
Basic Research
|
|
PURPOSE |
|
Knowledge Type |
Knowledge Use: improve understanding of a problem to contribute to the
solution of a problem--Is it making a difference? |
Knowledge Production: to expand knowledge (id universal principles that
contribute to understanding of how the world operates. Eg. NASA research. |
|
Question Type |
Broad in Scope: complex 'fuzzy' issues, multiple broad research questions,
research in messy uncontrolled environment. |
Narrow in Scope: specific topic, fundamental issues, tightly focused
question. Eg. What is the effect of cocaine use on fine motor coordination?
In controlled environment (laboratory), reduce measurement error or
eliminating noise. |
|
Significance |
Practical & Statistical Significance: Q. if effects are of significant size
to be meaningful. Level of outcome to audience and interest groups. Eg. New
drug showed statistical significance but not practical significance--not
enough change for the individual. |
Statistical Significance: Q. if causal relationships exist and statistical
significance. |
|
Theory |
Opportunism: use theory instrumentally by identifying variables and concepts
that will likely produce practical results. Which theory will be useful?
Does theory help solve the problem? Will combine theories. |
Purity: Underlying theory is critical. Controlled environment. Researcher
controls variables to represent the theoretical constructs. Eg. study deals
with only anger and not frustration, boredom, fatigue, etc. |
|
CONTEXT |
|
Environment |
Open: diverse environments, permission to obtain access to the data. Limited
by resources and political/bureaucratic barriers and time |
Often conducted in universities and academic environments. Laboratories |
|
Initiator of
Research |
Client Initiated: research question often from client and they are often
poorly framed on completely understood. Client in control; much negotiation
scope, cost, time frame, etc. Trade-offs. |
Researcher Initiated: The idea for the study, approach to executing it. More
flexibility in question and design |
|
No. of Researchers |
Multidisciplinary research teams; often include community collaboration |
Individual Researcher: autonomous, sets scope and approach, usually smaller
teams (if more than 1 researcher), less collaboration |
|
Stakeholders |
Multiple stakeholders |
Few
stakeholders |
|
Results |
Results & Publishing negotiated |
More free to use the results immediately |
|
|
|
|
|
METHODS |
|
Validity |
External Validity Emphasized: the extent to which the study results are
generalizable. |
Internal Validity Emphasized: the extent to which a causal relationship can
be soundly established. Both validities are important in both types of
research. |
|
Construct |
Construct of Effect: Is it/are you credible? Valid outcome measures,
accurately measure variable of interest. Multiple outcomes and multiple
measures to assess construct. |
Construct of Cause--Cause and Effect: the independent variable must be
clearly explicated and not confounded with any other variables. |
|
Levels of Analysis |
Multiple Levels of Analysis: specific problem at more than one level of
analysis (individual, group, organization, society); use multiple research
methods and triangulation. Quasi-experimental design. |
Single Level of Analysis: Multiple levels of analysis not as needed b/c of
the control on the other variables. Experimental design. |
|
Commitment to
Research Design |
Iterative design |
Research conducted exactly as designed |
KNOW THE DIFFERENCE
BETWEEN THE TWO APPROACHES. POSSIBLE ESSAY QUESTION.
Approaches:
Quantitative:
relies on quantification (or putting into numbers) in collecting and analyzing
data and uses statistical and inferential statistics. (more in chapter 5)
Qualitative: descriptive methods of data collection.
Data in the form of words, diagrams, drawings, observations, prime example is
ethnography. (more in chapter 6)
Three Contextual Factors that Shape Social Work Research
Studies
- Social Service Program (private sector, public social
service setting) (a) accountability (b) all research has evaluative potential
(c) accountability creates market for research (d) programs exist in hostile
environments (e) programs have scarce financial resources (f) programs have
client files
- Social work profession (a) professional values and
ethics (b) profession’s beliefs and practices, and (c) the rewards for doing
research
- Social workers themselves (a) social workers are people
oriented (b) social workers have a vested interest in practice (c) social
workers need research
KNOW AND BE ABLE TO
DISCUSS:
Researchers enemies: bias,
intervening (nuisance) variables, and chance (random error).
BE ABLE TO DISCUSS SOME OF THESE ETHICAL CONCERNS
The 7 main issues that social workers must be concerned
about during the actual research project or activity:
- Ethical Aspects of Research Designs: (use random
assignment, do not withdrawal or reintroduction of an
intervention—consultation!!!)
- Use of Deception: (clients get vague info on
intervention)
- Privacy, Confidentiality, Anonymity: Privacy:
persons’ interest in controlling the access of others to themselves;
Confidentiality: extension of privacy, agreement between researcher and
participant on how data are to be handled in keeping with the subjects;
interest in controlling the access of others to info about themselves;
Anonymity: the names and other unique identifiers of subjects are never
attached to the data or know to the researcher. (protection of data,
individual’s privacy, plan if participants become upset during data collection
or other research activities).
As social workers, when are you
obligated by NASW and Michigan Law to breech confidentiality anytime, inside and
outside of work? Suspected Homicide, Suicide, Child Abuse
- Conflicts of Interest: (participants are or were
clients, results affect you directly or indirectly)
- Reporting of Results: (protect confidentiality
and report accurately, protect clients from harm; obligated to be honest and
accurate)
- Disclosure of Results to Research Participants:
(generally share research data and results with clients, colleagues, or
public; may hold results if it is to protect participants. Determine “right to
know” audience.
- Acknowledgment of Credit: (collaborators,
contributors—conversations, presentations, conferences, classes, manuals, web
resources, program materials, published and unpublished references, other
forms of media (radio, television) (When find a resource write down citation
right away: name of author, publisher name, publication date, publisher
location, name of book/article, write down date you got it and website when
applicable; keep copy for your records and resources)).
http://owl.english.purdue.edu/handouts/research/r_apa.html and
http://webster.commnet.edu/apa/index.htm are good Internet sources for APA
citations.
Structure in Research Interviewing
A)
Structured Interviews: exact directions and sequencing of the
interview. Open and close ended questions.
B)
Semistructured (focused) Interviews: (selected topics and
hypotheses but specific items are not entirely predetermined. Requires skilled,
trained interviewer. (What is it that is to be learned and how much is already
known about it? To what extent are the interviewers trained, prepared, and able
to elicit data on their own from their research participants (interviewees)? To
what extent is the simplicity of coding responses (implications for validity) to
be a determining factor?
C)
Unstructured Interviews: Only general problem area determined in advance.
Freedom to discuss wider range and depth. Usually neutral questions. Difficult
to code and analyze.
Again, the purpose of a lit review is to:
¨
Provide you with existing theory
¨
Develop a justification for your study (how your work will
address a need or contribute to an unanswered question)
¨
Inform your decisions about methods, alternative
approaches, or potential problems with your plan
¨
Be a source of data to test or modify your theories
¨
Help you in generating your own theory
Random Sampling: means the sampling method is free
of human judgment, population has equal opportunity of being selected for
the sample.
Longitudinal Designs
- Data at two or more periods
- Trend Studies: use data from surveys conducted at
different periods in time on samples drawn from a particular population.
Unemployment stats from US Dept of Labor.
- Cohort Studies: focus on specific groups. A
cohort is a set of people who go thru an experience at the same time (high
school grads who enter college in 2005). Take several random samples from the
population of HS grads enter College in 2005 and monitor how their
characteristics change over time. (Baby Boom generation studies (1946-62),
children who entered the KPS special education system the same year (there may
be 2500, we will draw a sample of 200 (starts with 1/2500).
- Panel Studies: follows the same set of
individuals over time and collects data regularly. Tend studies and cohort
studies have a series of random samples from the population (once some one is
sampled his/her data are sent back into the pool and may or may not be
selected again BUT in panel study the same sample is kept and data are
collected from this one sample.
- Panel Study by Mary Jo Bane (1986) studied episodes of
poverty among children in the sample. She found that most poor children are
not always poor but instead live in families that transition in and out of
poverty. She also showed that Caucasian children tended to transition into
poverty as a result of becoming part of a female-headed household; and African
American children tended to be born into a poor family.
- Trend, cohort, and panel studies allow us to monitor
service histories of clients and assessment measures over time. Often much
more useful than the average condition at a particular point in time, or
cross-sectional studies. But these designs are more costly, time-consuming,
and complex than cross-sectional studies.
BE ABLE TO DESCRIBE SOME
OF THE PROS AND CONS OF GROUP-ADMINISTERED SRUVEYS, MAIL SURVEYS, AND TELEPHONE
SURVEYS.
Group-Administered Surveys
- Group-Administered survey (tautology but… it’s when you
administer the survey to a group. The group does the survey individually but
often a researcher is present. E.g. of cover letter (Fig. 16.3, p. 262).--
(Remember certain agencies and all universities have explicit policies on
research, particularly when using human subjects, IRBs or HSIRBs).
- Group-Admin. Surveys are cheaper than interviewing and
much better response rate than mail surveys. But you are unable to make many
generalizations to a greater population. The survey sample only represents the
people that showed up. Except if the boss requests a random sample of all his
employees and he has a high response rate (people showed up), other questions
about bias introduced by the boss giving a survey to his employee (even if
it’s anonymous), or providing incentives, or mandating compliance with the
survey. So it’d be very difficult to conduct a group-admin. survey of all Kzoo
Co. residents b/c you’d likely have difficulty getting the people who were
selected for the sample to be able to come in and at a set time to take a
survey.
Mail Surveys
- E.g., national census conducted every 10 years w/
roughly 290 million in the entire population. (some face to face interviews
but most is mail). This is a very unique choice of data collection b/c the
Census is trying to gather info on the entire population rather than a
representative sample.
- What are the pros and cons of using a mail survey
compared to interviewing and a telephone survey?
- One of the most frequently used data collection methods
(along w/ telephone).
- Low cost and Large N: no interviewer because it’s
self-administered, and no long distance calls. With a mail survey there can be
more options listed than an interviewee would be able to come up with
him/herself or remember without the options listed; also can add graphic
illustrations, or pictorial scales.
-
- But the participants can’t answer questions about the
questionnaire, reading and language barriers, physical limitations.
Low Response Rates
- Easy to forget or refuse
- If you only have 10% of the surveys returned you have
issues of external validity (genralizability) b/c as it’s likely that those
that did not return the survey are more representative that the one’s that
did. Additionally, you’d have to worry about the survey measuring what it says
it’s measuring, or measurement validity.
- Before you start a mail survey decide if mail surveys
are appropriate for gathering data on the population (homeless, immigrants).
- Some times mail letter before mailing the explaining
survey purpose, anonymity, etc. and that they should be expecting a survey in
the mail that they should complete and return. Sometimes follow-up letters can
be sent or post cards indicating that the survey is being returned if it’s
anonymous.
- There is no excepted standard for mail survey response
rates. The greater the better, keeping in mind, trying not to potentially bias
the participants. Anything below 30 generally can be considered unacceptable.
BE ABLE TO NAME OR LIST
AND DISCUSS SOME OF THESE TIPS
Tips
¨
Good cover letter. In simple language, write to the audience, Lots
of white space and as short as possible
¨
Date of mailing of questionnaire
¨
Identify you and institution,
¨
Give brief synopsis of the purpose of the study
¨
Give potential benefits to policy, practice, and the participant
¨
Explain the importance of the participants participation
¨
Slightly over-estimate the time to take the question (thru pilot
testing)
¨
Tell how info will be confidential and how it will be used
¨
Explain how to return the form
¨
Give contact person for questions or comments about the survey
Costs
¨
Minimize cost maximize clarity, appearance and readability.
¨
Stamped-addressed envelopes, business reply mail, postcards
Confidentiality thru participant coding system. Id # on the
survey. Anonymous by no identifying information the survey (don’t know who
returned survey and who didn’t)
If the survey is confidential, send a follow-up letter and
another copy of the survey to the participants that haven’t returned the survey.
Also, telephone reminders, but not necessarily offer the survey over the phone,
that would change the nature of your data collection methods and would introduce
a potential bias in the responses.
Telephone Surveys
¨
What are the pros and cons of a telephone survey compared to
interviewing and mail surveys?
¨
Have many of the benefits of face-to-face interviewing but quicker
to implement and less expensive than face-to-face, no transportation involved,
local calls are inexpensive, no printing expenses. All things considered most
telephone surveys end up being more expensive than mail surveys though.
¨
But allows for interviewer bias, no visual contact,
rapport-building is unlikely, the surveys must usually be shorter or respondents
may fatigue.
¨
Sometimes cover letters and the survey are mailed first to be used
as a reference for the interviewee to use during the phone survey.
¨
Proportion of people without telephones (poor, unlisted numbers,
cell phone exclusively, in rural areas). Even addresses (homeless, precariously
housed, doubled occupancy). Be aware of the likelihood for measurement error
(construct validity: how instrument measure the theoretical construct, i.e., how
well does your survey on depression measure depression and not anxiety, fatigue,
boredom, frustration, etc.)
Sampling in Phone Surveys
¨
Random Digit Dialers (RDD). Not totally random because it
sometimes goes by phone prefixes (269). I live in Kzoo, but I still have a Conn.
number and don’t plan to change it. Toll free lines. Caller ID. Not appropriate
for surveying students from KPS graduating class of 1995 by using a list of
names of the graduates and the (269) area code, (b/c many moved and have totally
new addresses and area codes; your sample would only be representative of those
that remained in Kzoo).
¨
Some surveys will have a machine or recorded voice tell you to
call a number to complete a survey. This is not survey research as the results
are not generalizable.
Computerized Data Collection
¨
Coding is the translating of data to a form that is readable to a
computer. For recoding and analyzing data (Microsoft Excel; also Statistical
Package for the Social Sciences (SPSS), SAS, Microsoft Access, among others).
Can also include audio and video data.
¨
The more steps to coding the more likely data errors will occur
(misspelling changes the mean, adapting response but gets the wrong intended
meaning, put data into wrong cell, misses or adds twice, types in numbers
incorrectly, limited level of computer experience for the person who performs
the coding, etc.
¨
Problems with computer or email surveys: access and ownership of
computers (socioeconomics, ethnicity—digital divide leads to problems of the
representativeness of your sample. Who is it representative of? (people who have
enough access and background in computers to be able to answer the survey, and
the question of who is actually responding? Also, likely to go right to the junk
mail file as an unrecognized email address. Hard to convince people of the
legitimacy of the project through email.
A
positivist’s assumptions
- To prove existence it must be measurable.
- Objectivity: the research must be as objective as
possible. Not influenced by the observer.
- Reduce uncertainty:
- Strive toward duplication:
- Strive toward using standardized procedures (for
credibility and is why you’re taking this class)
- Quantitative approach generally refers to the portion of
research represented in the form of numbers. Analyzed by descriptive and
inferential statistics. (Interested in cause and effect). Quantitative
research can be used for idiographic studies (single-subject).
- With quant research, any of the big decisions are made
before the study occurs and the researcher often knows the studies limitations
ahead of time.
- First step using a quant approach is to identify a
general problem to research and then develop a research question that can be
answered or a hypothesis that can be tested.
A quasi-experiment: no random selection or assignment of
participants to groups. treatment is applied to naturally occurring groups that
may differ for various reasons other than the particular treatment. And uses
classification variables; researcher must establish that the groups don’t differ
on the accidental characteristics that may affect the final score.
Quasi-experimental design (not randomly assigned but use a similar comparison
group with which to compare results). Carefully matching the tx and control
groups, aids in the elimination of rival explanations.
- What are variables? Non-uniform
characteristics of observational units. A variable assumes a range of values
and has a name. A categorical variable is a variable that doesn’t
change for a person but varies across people (e.g., gender, ethnicity, place
of birth). Variable labels describes a variable. What are the variable labels
for depression? (changes in sleeping patterns, feeling down, suicidal
thoughts, withdrawal, tone of voice, body language, changes in eating
patterns, tearfulness, etc.). All variables must be measurable: record
the variable’s frequency, duration, and/or magnitude (intensity).
Is age measureable? Years.
Months. Is depression measureable?
- Data collection must be objective: reflect the
“real” world and not biased by the researcher(s). In a quantitative study,
only the research participant will produce the data! You will collect only the
data that the research participant gives you on the variables you requested.
- Data collection must be able to be duplicated.
Your data collection procedures must be clear enough for other researchers to
replicate. Must ensure all participants are being measured in the exact same
way.
BE ABLE TO IDENTIFY AN
INDEPENDENT AND DEPENDENT VARIABLE IN A RESEARCH QUESTION AND HYPOTHESIS
Independent variable is age—it affects the dependent
variable. Dependent variable is depression—it is the one affected.
- Univariate: looking at one dep. variable at a
time; Multivariate: multiple dep. variables at one time. Our study in
univariate. Only looking at one variable depression but we could look at
depression and axiety to make the study multivariate.
Nuisance variables: factors that influence the value of the
dependent variable other than the treatment of interest. (Sleep, IQ,
socioeconomic status, time of day of the test, room temperature extremes, etc.
depression affect ethnicity? Having depression could never
influence if you where Caucasian or Asian. Restated: What affect does ethnicity
(independent variable) have on the likelihood of depression (dependent
variable)? Not how does depression affect ethnicity.
Good hypotheses have…
- Relevance: relevant to the knowledge base and
research questions.
- Completeness: fully expresses what you intend it
to mean
- Specificity: reader can understand each variable
and their hypothesized relationships. It should be clear what relationships
you’re suggesting
- Potential for testing: How easy will it be for
the truth of the hypothesis to be verified.
BE ABLE TO DISCUSS SOME OF
THE METHODS FOR INFERRING CAUSATION. POSSIBLE ESSAY QUESTION.
Eight Strategies for Inferring Causation
- Ask the observers. What experiences in undergrad
were the most important? What elements from your undergrad in social work are
relevant to you today? What has improved your sadness and depression? (Asks
the people directly affected and ask the people who observed the effects on
the participants).
- Check whether content matches outcome: If
alcohol reduction program is truly reducing relapses in drinking, then we
expect that alcoholics that avoid relapsing used these strategies from
treatment not from prior knowledge or from other sources or programs. Look for
strategies not learned (counterexamples). (Edison, 2000 ways to learn how not
to make a light bulb).
- Other patterns: modus operandi. Detective
searching for clues. OR start with the suspect and trace down the causal chain
to see what impacts. Every time evidence is consistent with the expected
“trace” left by the suspect, confidence is gained that the suspect is
increased. On the other hand, if evidence is contradictory, it reduces that
causal chain (or suspect). Missing evidence makes the explanation more
doubtful. In empirical studies what almost 100% confidence that you’re sample
is representative of the population and there is no error. In evaluation,
beyond a reasonable doubt.
- Check whether the timing of outcomes makes sense.
At the same time or after whatever caused it? Distal outcomes (far downstream
in causal chain)—(a) Is it too early to expect change? Is it unrealistically
quick? The expect time for change may be in the relevant literature. (b) is
the timing of the outcome better or logically attributable to other causes?
(c) Outcome do not occur out of sequence (in health promotion program: blood
pressure drops prior to changes in eating and exercise).
- Check dose is related to response. (the higher
the dose, the greater the response…to a point (point of diminishing return
or ceiling effect)) Is the magnitude of change logical for the duration
and intensiveness of the treatment? Better to give multiple doses and make
multiple response observations; also different contexts.
- Make Comparisons with a control or comparison group.
Experimental (control grp.) or quasi-experimental design (comparison grp).
Sample sizes must be large enough.
- Control Statistically for extraneous variables.
If you were examining a new teaching technique for math instruction, even if
you had a control group, you might want to be sure that prior aptitude was not
causing your results to look better or worse.
- Identify and check the underlying causal mechanism.
How do we know cigarette smoking may increase the likelihood for lung cancer?
How do we know they’re not just correlated (or just share some type of
relationship but one’s not causing the other)? Researchers identified several
substances known to be carcinogenic in cigarette smoke (thru experimental
design); so it’s harder to argue that there’s not some type of causal
relationship.
Qualitative Research
- Based on an interpretive way of thinking and
reality is define by the research participants. Subjective reality.
Answers in words not numbers. The thing being observed changes when
being observed. The realities are constantly changing. Focuses on in-depth
understanding of a few cases rather than general understanding of many; social
phenomenon in natural setting. Knowledge is created by the researcher
in the field.
- Qual Research Designs best suited for learning about…
- personal perspective
- context
- finding and understanding unexpected things, or
impacts
- understand the how questions/questions about
processes
- develop causal explanations
- Similarities between quant and qual research: (a)
careful, diligent process with systematic procedures and plans (b) both can
study a particular social problem (c) both have standards for rigor
Phases
1.
Problem Identification: move from a part to a whole; id key concepts and
loosely defined variables
2.
Question Formulation: general research question or working hypothesis
3.
Research Design: ethnography, goal-free evaluation
4.
Collecting Data: All data processed is done so thru the researcher(s).
You are a tool in data collection. Principles in data collection: (1) be aware
of your own biases—your notes are sources of data (2) participants tell you
their stories and you tell them your understanding or interpretation of their
stories (3) data collection has multiple sources and multiple methods
(triangulation) (limitations of triangulation—resources, and measurement
instruments that aren’t comparable—commensurate measure: measured on the same
scale).
5.
Analyzing Data: iterative
6.
Interpreting Data: iterative
7.
Presentation of Findings: generally lengthier than quant reports
8.
Dissemination of Findings
Principles of Observation
- Instrument of observation is the person and any
instruments he/she uses. Record systematically, with rules and standards
- Context and Circumstances: Observations are made long
enough to obtain data to defend your conclusion or find out that the
observations aren’t producing enough or the right type of data (see trends or
changes). Determine which of the alternative available observational
situations may present the optimal conditions for the research.
- Observe everything relevant: the people you expect to
see the change (dependent variable) and others directly or indirectly
influencing person or process.
- Self-monitoring and self-observation. A) variables are
easily defined (number of cigarettes a day, logs of daily food intake,
number of putdowns in a parents says when working on homework) B) subjective
experiences (critical thoughts, suicidal thoughts, times you thought about
contributing to a conversation but didn’t for a day).
- Observations by the researcher: what questions need the
most technical attention? Easier to observe—physical objects, non-verbal
behaviors, facial expressions, gestures, social interactions. Harder to
observe—ideas, meanings, subjective experience and other intangible that are
inferred from observations.
BE ABLE TO DEFINE
REACTIVITY AND GIVE AN EXAMPLE OR RECONGNIZE AND EXAMPLE OF REACTIVITY. POSSIBLE
ESSAY QUESTION.
- Reactivity: things being observed or measured are
affected by being observed or measured. (Limitation of the positivist
approach) How much and what types of interactions will you likely have or that
you did have? Acceptable for the observer to be affected by observed but not
the other way around.
- Observational Measurement Instrument: (a)
existing measurement instruments developed by other researchers—already tested
for reliability and validity, comparability with other studies; (b) design
your own instrument—designed for the requirement of your study; (c) adapt an
existing measurement instrument.
- Observational Measurement Instrument: (a)
existing measurement instruments developed by other researchers—already tested
for reliability and validity, comparability with other studies; (b) design
your own instrument—designed for the requirement of your study; (c) adapt an
existing measurement instrument.
- (1) Specify the purpose of the observation
and the research question the observation is supposed to inform. Define
the context or circumstances under which the observations are being made.
- (2) Unstructured observation: observe and record
everything. Don’t interpret, just look at what’s possible and important to
observe. Remember your biases will lead you to expect to see certain things,
want to see certain things: use multiple observers (two eyes are better than
one, two observers are likely to observe different things—also like measuring
a persons height: measure the kids three times using a different measurement
instrument each time--12 in ruler on a wall, a tape measure, and a yardstick …
all can measure pretty accurately but by more instruments you’ll get slightly
different measurements (measurement error) but the more measures you take and
the more reliable instruments you use the more likely you’re observations will
be reliable (differences in data are due to true differences/scores by the
same student on the same test will be the same).
- Time: Time being observed is representative of
time not observed. Or randomize observation times.
- Recording data: Checklist, drawings/diagrams,
countings, listings, writings. Investigating if the variable of interest did
or did not occur and how they occurred or didn’t’ occur
- Be aware of calibration slippage: observers
unintentionally alter their standards (become jaded or desensitized).
Participant Observation
- For small groups, organizations, community life. The
observer becomes a participant in the thing she/he’s observing. The emic
perspective—“insider perspective” Often includes observation and intensive
interview. Documents, artifacts and archives.
- 3 Types of Participant Observation
- active participant: has job, or social role in
the setting and researcher
- privileged observer: known and trusted, with
access to insider info
- limited observer, no role other than researcher
who builds trust over time…Ethnographer.
Why measure things:
- Correspondence: connect the real world with the
world of concepts—connect theory with reality. What is freedom? What is
self-esteem?
- Objectivity of Standardization: measurement
reduces the guesswork. What if I told everyone in the room to eyeball how tall
everyone was? How accurate do you think you’d be? Do you think STUDENT A could
disagree with STUDENT B?
- Quantification:
- Increases objectivity and ability to describe them
precisely. How would you tell some how far it was to Chicago with out using
a standard system of measurement?
- Also can determine < and > relationships among
variable. Instead of do you eat pizza (yes/no) you can ask about frequency
(a few x month; 2 x week; 1 x week, 2 or 3 x week, 4 or more x week)—2 x
week is < 1 x week—4 or more x week is > 2 or 3 x week).
- Rank-order (specifies an exact distance) (1 is 1 away
from 2, 2 is 1 away from 3 etc.)
- Measurement allows for statistical analysis
- The more objective, precise and defined the measurement
procedures the easier to replicate your study and communicate your findings.
So people can confirm or refute your results.
UNDERSTAND AND BE ABLE TO
DEFINE VALIDITY AND RELIABILITY
Measurement Validity and Reliability
- Validity is the degree to which an instrument
measures what it is supposed to measure.
- Reliability is the degree of accuracy or precision of
a measurement instrument. Consistently yield similar results.
Target E.g.
(1)
(upper left) no reliability, no validity; (2) (upper right) reliability, no
validity;
(3)
(bottom) reliability and validity
Validity
Measures what it is intended to measure. The extent
to which answers correspond to some hypothetical “true value” of what we are
trying to describe or measure. The score that you get reflect the true
differences not error in measurement. (1) Does the instrument measure the
variable? If so, how accurately?
Types of Validity (Content, Criterion, Construct)
1. Content Validity: the extend to which the content
of a measuring instrument reflects the concept that is being measured and in
fact measures that concept and not another. All variables have operational
definitions. Data gathered must be directly relevant to these variables. Logical
sample of questions from the universe of questions.
- Face Validity: instrument has self-evident
meaning and measures what it appears to measure. How does it appear to the
respondents?
2. Criterion Validity: the scores obtained on a
measuring instrument are comparable with scores from an external criterion
believed to measure the same concept. [The Beck Depression Inventory (BDI) and
The Hamilton Rating Scale for Depression (HAM-D)]. Mile and Kilometer.
- Current validity: ability of instrument to
predict accurately an individual’s current status. Individual who give full
effort in school scores high on MEAP.
- Predictive validity: ability to predict future
performance or status. Will the participants vary in some way in the future
on the variables measured. Does success on the MEAP have a relationship with
SAT, ACT, HS or college graduation, future income, etc.?
3. Construct Validity: instrument successfully
measures a theoretical construct; the degree to which explanatory concepts
account for variance in the scores of an instrument. Examines concepts (unmeasurable
ideas). What is the instrument measuring, how and why it operates the way it
does.
- What concepts might account for performance on an
instrument? (E.g., MEAP--literacy, confidence)
- Derive hypotheses from the theory surrounding the
concepts
- Test these hypotheses empirically.
Convergent Validity: the
degree to which different measures of a construct yield similar results, or
converge. Evidence from different sources and collected in different ways leads
to the same or similar measure of the concept; and if given to people in two
different states, it should yield similar results in both groups. (Two different
way for measuring opinions on George Dubya—opinion mail survey and email
correspondence from within the White House.
Discriminant Validity: a
concept can be empirically differentiated from other concepts. Prior research or
statistical procedures have demonstrated that a difference that can be measured
exists separating anger from frustration, bereavement from clinical depression.
Examine and weigh the various approaches to instrument
validation and ask yourself:
A)
How well does this instrument measure what it should measure?
B)
How well does this instrument compare with one or more external criteria
that claims to measure the same thing
C)
What does this instrument mean? What is it in fact measuring? How and why
does it operate the way it does?
BE ABLE TO DEFINE AND
DISCUSS SOURCES OF MEASUREMENT ERROR.
Sources of Measurement Error
Measurement Error: variation in responses on a
measurement instrument that cannot be attributed to the variable being measured.
A goal is to minimize error.
- Systematic Error, (G&U call it constant error):
constantly affect the variable being measured. Common sources of systematic
error: demographic characteristics and personal style.
- Demographic variables: What are some of the
demographic variables that could systematically influence a person’s
response? intelligence, education, socioeconomic, race, culture, religion
- Personal Styles or Response Sets: Personal styles
of the respondents as indicators of personality traits. Develop subtle or
socially neutral questions, incorporate various response-sets or faking
indicators, and concealment of the instruments true purpose (don’t have
leading questions). Train observers and use multiple observers and observation
measures.
- Error from Personal Styles
- Social desirability: tend to give favorable
impression of one’s self
- Acquiescence: tend agree (many studies have
shown that people tend to agree)
- Deviation: tend to give unusual or uncommon
responses
- Error from Reactions to Observers
- Contrast Error: tend to rate others as opposite
to oneself in regard to a particular trait or characteristic
- Halo Effect: tend to be influenced by single
favorable trait or one’s general impression affect the rating of a single
trait or characteristic
- Error of Leniency: tend to rate too high or
always give favorable reports
- Error of Severity: tend to rate too low…
- Error of Central Tendency: tend to rate in the
middle, avoid any extreme positions (many studies show, when neutral
category are provided, people tend toward middle)
BE ABLE TO DEFINE AND
DISCUSS SOURCES OF RANDOM ERROR.
Random Error
o
Unknown or uncontrolled factors affecting the variable being
measured and the process of measurement in an inconsistent fashion. Over and
under-estimate the true differences.
o
The larger the sample the more random error cancels each other out
and scores tend toward the population mean.
o
Types of random error
o
Transient qualities of the respondent (vary day to day, moment to
moment)
o
Situational factors in measurement (seating arrangement, work
space, noise, lighting, presence of recording device, social setting)
o
Factors related to the administration of the instrument
(uniformity of instrument applications—add or omit material, change wording of
questions or instructions, different criteria or info to classify behaviors).
Standardization helps minimize subjectivity
o
Administrators demeanor, appearance, demographics can affect how
an individual responds. Build rapport (create interest, cooperation, time
getting acquainted, increase motivation, reduce anxiety, make sure are capable
of completing tasks. Environment conducive to type of response, clear
standardized instructions, and trial runs of instrument.
o
Other types of error specifically in surveys:
o
Sample selection bias: sample from an incomplete list (out of
date); sample from wrong place (E.g., why people may or may not use city park
and sample from people at the park—should sample from those not using park. Use
random sampling to draw from the population you want to study!
o
Respondents are usually biased in some way, to the best solution
is as high a response rate as possible. With a high response rate, the
non-responders would have to be very different from the responders to affect
your overall estimates.
o
Item nonresponse error: failure of respondents to answer
individual questions—blank questions, accidentally skipped items, do not follow
instructions, write marginal comments that can’t be equated with printed
categories.
o
Response error: respondents misunderstand the wording. Make all
respondents understand the items in the same way and can provide answers for
every item (mutually exclusive response options). Make items Clear and do not go
beyond what is reasonable to expect people to know or remember
What is validity? Measurement instrument actually measures
what it intends and does so accurately. What were the three main types? Content,
Criterion, and Construct. Must ask yourself what is it valid for and for whom?
Content Validity in Standardized Measurement Instrument
- Each item must represent an aspect of the variable being
measured
- Questions empirically related to the construct being
measured?
- The instrument should discriminate between individuals
at low and high extremes, and middle
- Double-barrel questions or vague interpretations should
be avoided
- Some questions should be worded positive and others
negatively (yes half, no other half) Alternating positive and negative wording
for questions breaks up social desirability response set
- Questions should be short
- Avoid negative questions
- Avoid biased questions (derogatory statements, slang
terms, and prejudicial or leading questions)
Criterion Validity
What is Criterion Validity? Can anyone come up with
examples? Process of comparing scores on a measurement instrument with an
external criterion. E.g., A) schools grades, credits. B) Contrasting groups or
groups that are assumed to be different. C) psychiatric diagnoses D) other
instruments E) different observers
Construct Validity
What is construct validity? Degree to which an instrument
measures a theoretical construct or an unobservable characteristic or trait. A)
predict developmental changes B) use other measurement instruments with proven
construct validity to validate new instruments—should correlate with old
instruments C) convergent-discriminant validation: Convergent Validity:
the degree to which different measures of a construct yield similar results, or
converge. Evidence from different sources and collected in different ways leads
to the same or similar measure of the concept; and if given to people in two
different states, it should yield similar results in both groups. Discriminant
Validity: a concept can be empirically differentiated from other concepts. D)
pretest posttest
Construction of Standardized Instruments
Question Selection
To enhance content validity…
A) Rational-Intuitive method: choose questions in a logical
manner.
B) Empirical method: statistics used to select questions.
Response Category Selection
Possible response options devised (Likert scale)
Number of Categories
How many response categories? What is the problem with to
much variance? And too little?
A)
Large enough to allow for some variance but small enough so that
discrimination between levels can be made.
B)
Odd or Even number of response categories
The response-value continuum: decisions about how
respondents should be rated to frequencies or to agree-disagree dichotomies.
Yes/no then scale question.
Determination of Instrument Length
Typically, the longer the instrument, the greater the
reliability. However, the more difficult to administer and more likely that the
respondents will satisfice
“Satisfice” is the
phenomenon that occurs when people settle for satisfactory solutions to problems
rather than seeking optimal ones in a variety of domains. Given a constant
stimulus, respondent burden increases with the motivational and cognitive
demands of the survey. With an increase in cognitive demands, a respondent is
likely to make less effort, evident by less variation in her or his ratings.
When respondents grow impatient, fatigued, or disinterested, the cognitive
demands may make them particularly susceptible to “satisficing.” Additionally, a
respondent is most likely to satisfice when the costs of optimizing are high,
particularly with difficult or demanding questions than with easier questions.
KNOW THE TYPES OF
MEASUREMENT INSTRUMENTS
Basic Types of Measurement Instruments:
Rating scales, summated scales, modified scales
- Rating scales: use judgments by self or others to assign
a person a single score in relation to the variable being measured.
- Questionnaire-type scales combine the responses of all
the questions to form a single overall score for the variable being measured.
- Modified scales do not fit in either classification.
Rating Scales
Rating of individuals, objects or events on various traits
or characteristics at a point on a continuum or a position in an ordered set of
response categories. Numerical values are assigned to each category.
- Someone is evaluated by his/herself or by someone else.
Self-ratings are advantageous because a person evaluates her own thoughts,
feelings, and behaviors accurately provided they are self-aware and willing to
be truthful.
- Four type of rating scales in G&U (remember all
questions must be mutually exclusive)
- Graphic: variable described on a continuum from one
extreme to the other. The points are ordered in equal intervals and assigned
number. Examples on p. 119.
- Itemized: series of statements ranking different
positions on the variable being measured.
- Comparative: compare an individual, or object, being
rated with others. (reduces satisficing)
- Self-anchored: Respondent rates herself on a continuum
(usually a 7-9 point scale) The specific referents for each point are
defined by the respondent. [measures of your perceptions of your emotions]
- Summated Scales: multiple questions that the respondent
is asked to answer and a total of all the questions indicates the person’s
position on the variable of interest. (Often used for assessing individual and
family problems; needs assessment, evaluation) Respondents indicate the degree
of agreement or disagreement with a statement. Figure 9.1 p. 121.
- Modified Scales:
- Semantic Differential Scale: rates the respondent’s
perception of 3 dimensions of the concept under examination: 1) evaluation
(good vs bad), potency (weak vs strong), and activity (slow vs fast).
Several questions per dimension and scored on 7-11 pt continuum on which
only extremes are represented, i.e., Horrible1,2,3,4,5,6,7Wonderful.
Semantic Differential Scale has comparability issues and if the 3 dimensions
are the most appropriate variables.
- Goal Attainment Scaling (GAS): Figure 0.3 on p. 124.
Often used in evaluating preordinate (pre-determined) program goals, whose
goals are they?
Selection of a Standardized Instrument
§
3 Considerations in Selecting a Measurement Instrument:
measurement need, finding instruments capable of measuring the variables, and
assessing alternatives instruments
·
Determining Measurement Need
A)
Why will the measurement occur?
§
Research
§
Assessment/diagnosis
§
Evaluation
B) What will be measured?
§
Specify________
C) Who is appropriate for
making the most direct observations?
§
Research
participant/client
§
Practitioner or
researchers
§
Relevant other
D) Which type of format is
acceptable?
§
Inventories and surveys
§
Indexes
§
Scales
§
Checklists and rating
systems
E) Where will the measurement
occur?
§
General setting
§
Situation-specific
environment
F) When will the measurement
occur?
§
Random
§
Posttest only
§
Repeated over time