Published online 2014 Dec 22. doi: 10.1177/2050312114562579
The developmental milestones from the Denver II Developmental Screening Test include cognitive and language/communication skills, fine and gross motor skills and social/emotional skills. Every year of a child’s life from birth to the age of five has a set of expectations for each category (Shahshahani, Vameghi, Azari, Sajedi, & Kazemnejad, 2010). This can be a challenge, since very few developmental screening tools are developed or tested with linguistically or culturally diverse samples of children. Further, practitioners may lack the technical training to review and compare complex psychometric information on the quality of developmental screening tools.
PMID: 26770755
Abstract
Objective:
To demonstrate a method of evaluating accuracy of developmental screening modeled on the evidence-based medical literature.
Method:
A retrospective review was performed on 418 children screened with the Denver II by a trained technician. Two models for analyzing screening data were examined, using predictive values and likelihood ratios (LR+ and LR−).
Results:
The technician, working at 20% time, screened 44% of eligible children. There were 129/418 (31%) children with Suspect Denver II results, 115/418 who were referred, 81/115 (70%) who were evaluated by Early Intervention, and 64/81 (79%) who qualified for services. The uncorrected positive predictive value for the Denver II alone (44%) was insufficient to meet the preset standard of 60%, but the LR+ of 4.16 indicated a significant contribution of test information to improving predictive value. Combining test results with information from the parent–technician conference to achieve a referral decision resulted in an uncorrected predictive value of 56%, which rose with correction for children referred but not evaluated to 72% (LR+ 10.33). Negative predictive values and likelihood ratios of a negative test and a non-referral decision achieved recommended levels. Parents who expressed concern were significantly more likely to complete recommended evaluation than those who did not (82% vs 58%, p < .01). Results were in the same range as in published studies with other screening tests but showed three areas for improvement: screening more children, more carefully supervising some referral decisions, and getting more children to evaluation.
Conclusion:
![Screening Screening](/uploads/1/2/4/7/124781582/793186510.png)
Levels of predictive accuracy above 60% can be obtained by combining different types of information about development to make decisions about referral for more complete evaluation. Systematic study of such combinations could lead to improved predictive accuracy of screening programs and support attempts to close the gap between referral and evaluation.
Keywords: Developmental screening, child development, Denver II, pediatrics, evaluation of screening, predictive value, likelihood ratio
For at least 20 years, the medical literature has emphasized the use of predictive values (PVs) and likelihood ratios (LRs) in evaluating accuracy of diagnostic and screening tests in clinical practice.1 This approach has been largely ignored in the developmental screening literature, where accuracy of developmental screening has traditionally been addressed as a question of test “validity” assessed by calculating sensitivity and specificity. While that information may be helpful in selecting a screening procedure, it is not very helpful in answering questions about how well a clinical program is working. In clinical practice, the question is “How well is this screening program working in this clinical setting, with this population, with this test, these referral and follow-up procedures, and these definitions of eligibility?” Specific characteristics of the setting will vary from one program to another: For example, a successful program where the prevalence is high and the standards for eligibility are liberal may not work in a community where prevalence is low and eligibility is more restricted. What is needed is a methodology that can be applied in and adapted to various different settings.
Altman has summarized the position adopted in the evidence-based medical literature to answer these questions. He states,
We need to know the probability that the test will give us the correct diagnosis. Sensitivity and specificity do not give us this information. Instead, we must approach the data from the direction of the test results, using predictive values.2
Those tell us the probability that either a positive or negative screening test will lead to a diagnosis. The clinician must decide what are acceptable standards for those values and then determine whether his actual results meet those standards.
Camp3 has described a model for evaluating developmental screening based on this alternative approach. Three terms are used extensively in this model: prevalence, PV, and LR. Understanding the meaning of “prevalence” is essential. It is often assumed to mean the percentage of delayed children in the general population rather than recognizing that it is also applied to the sample under study. The latter is important in evaluating a specific program because it represents the role that chance or the pretest probability plays in arriving at the PV in a specific sample. Positive predictive value (PPV) represents the percentage of children in the sample with positive screening results who have the disorder in question and is the same as post-test probability. The traditional meaning of negative predictive value (NPV) is the percentage with negative screening results who do not have the disorder; more useful in our context is the percentage of children who do have the disorder (1 − NPV), which we will refer to as PV of a negative. LR represents the ability of a screening test to increase PV.
The first step in Camp’s model is to set standards for what will be acceptable levels for PVs—the thresholds above and below which no further information will be sought before making a decision. In developmental screening, she suggests setting the threshold for the PPV at 60% or better, even though Aylward,4 for example, would accept 50%. LRs (LR+ and LR−) use the information represented by sensitivity and specificity to describe the ability of a screening procedure to improve the post-test probability over the pretest probability. Camp suggests a LR for a positive test (LR+) of 2.0 or higher. For the threshold below which no further information needs to be obtained before making a decision, that is, the PV of a negative (1 − NPV), she suggests accepting the practice common in medicine of using 10% or less. She also asks that the LR of a negative (LR−) be below .50 to indicate that negative test information improves the PV of a negative over chance. She asks this because the prevalence is below 10% in many studies of developmental screening. If the observed values do not meet these standards, further evaluation steps include correction for verification bias and adding additional information. Those steps help the clinician decide what needs to be done to improve the PV to the desired level.
Because Camp’s model includes examining the data for evidence of under-referral as well as over-referral, and not all children get evaluated, some method for estimating errors of under-referral needed to be developed. This latter is particularly important because it is seldom addressed in reports of developmental screening. Although the question of missing delays is often belabored, it is usually assumed that the question can be answered simply by accepting large numbers of over-referrals.
In interpreting a screening test, it is important to consider other information along with the test result. Previous writers have recommended this. Frankenburg et al.5 wrote that “the Denver II is a screening test, the results of which should be integrated with everything else that one knows about the child.” Lipkin and Gwynn6 “recommend the incorporation of parent-completed questionnaires or directly administered screening tests into the process of surveillance and screening. However, their results should be combined with attention to parental concerns and the pediatrician’s opinion, rather than replacing them.”
In previous studies, clinicians chose not to refer some children with abnormal test results7,8 and did refer some children with normal test results.8,9 These clinical judgments presumably considered more than the test result. One study compared the results of a screening test and clinicians’ ratings as separate procedures,9 but there has been little study of ways in which a clinician combines test data with clinical data to make a decision.
An important problem in other studies has been children who are referred but not evaluated (RNE). In a community-based program, after an average of seven contacts with the family, only 43% of children (mostly Latino and African-American with low family incomes) received services after referral for developmental evaluation or other services.10 In two clinic-based programs, the proportion of referrals evaluated was 51%8 and 33%.11
The goal of this study was to demonstrate a method of evaluating accuracy of developmental screening modeled on the evidence-based medical literature. Our questions were as follows:
- What numbers and what proportions of children were screened, showed evidence of delay, were referred, were evaluated, and qualified for services?
- Did screening and referral results meet previously recommended criteria for PVs and LRs?
- How did the program combine screening results with clinical information in deciding whether to refer? Did combining the information strengthen prediction?
- How did prediction change when we corrected for children RNE (verification bias)?
- How many children with evidence of delay were not referred or not evaluated?
- What proportion of children referred got to evaluation?
- How did the program compare to other ones in the literature? How could it be improved?
Methods
Study population
Children screened in this study were attending two community health centers in Colorado. Screening aimed to identify young children who might qualify for Early Intervention (EI) through the local school districts. (EI refers to Part C of the Individuals with Disabilities Education Act.) Clinicians referred children for screening routinely at age 2 and at older ages if they had not been screened. They referred children under 2 years if they were concerned (18 cases). There were 422 children who kept appointments for screening between 22 August 2005 and 11 January 2008. Four children were dropped (2 for missing charts, 2 for Denver IIs not done), leaving 418 for review.
![Printable denver developmental screening test Printable denver developmental screening test](http://helpserver.biz/onlinehelp/kid/datakind/3.0/denver.jpg)
Previous surveys (2003 and 2005) of 179 parents from different families in the main clinic showed that the median family income was 81% of the federal poverty level and parents’ median educational attainment was ninth grade. Among the last 95 children in this study, half of whom went to a second clinic, parents’ educational level was also ninth grade. The primary language at home was Spanish (75%), English (13%), both languages (8%), and other (4%).
Parents’ speaking Spanish, English, or both languages was not associated with significant differences in the proportions of children referred, evaluated, or qualifying for services.
Because the health center did the study for quality assessment, following the Health Insurance Portability and Accountability Act (HIPAA) confidentiality standards, parents were not asked to sign informed consent.
Measures
Denver II
The Denver II is the 1992 revision of the Denver Developmental Screening Test. There are English and Spanish versions of the Denver II, each containing 125 items in four developmental domains. An examiner administers an age-appropriate sample of items, although some can also be passed by parental report. The number and pattern of items failed yield a result of Normal (Within Normal Limits (WNL)), Suspect, or Untestable. Each item is scored as pass, fail, or refused. Items that can be completed by 75%–90% of children at the child’s age but are failed are called cautions; those that can be completed by 90% of children but are failed are called delays. A normal score means no delays and no more than one caution; a suspect score means one or more delays or two or more cautions; a score of untestable means enough refused items that the score would be suspect if they had been delays.12
The Denver II was administered during a separate appointment by a trained technician who was bilingual and who had completed the recommended training for paraprofessionals12 under the supervision of the senior author. The technician had 40-min appointments and could spend time listening to parents and explaining her findings. In a parent–technician conference, she considered all information available, including the Denver II results, the child’s history, her own observations, and the parents’ opinions. If she decided to refer and the parents agreed, she faxed the score sheet of the Denver II and all identifying information to the local EI agency.
The EI program in the school districts started with a home visit by a bilingual Mexican teacher, offered transportation if needed, offered evaluations in Spanish or English in a single office visit, and followed up on cases. EI evaluated children within 6 weeks, determined eligibility, and implemented services. Policies and procedures of EI and other agencies fostered communication and follow-up. Communication between the clinics and EI was easy because it involved just a few people and they all knew one another.
Chart review
We reviewed the 418 medical charts between May 2008 and December 2010. Using a special form, two public-health nurses and the senior author reviewed all charts independently, and the senior author reviewed and entered all data.
The main information recorded during chart review was whether a child was referred to EI, was actually evaluated by EI, and, if evaluated, qualified for services (including monitoring in one case). That decision was made by the EI teams of the two local school districts, each consisting of five professionals. The senior author contacted the school district to obtain a report if there was none in the chart.
Parents of 13 children with Suspect Denver II results refused a referral to EI. Those were counted as RNE. Seven children were already receiving services at the time of screening. Three of them with Suspect Denver II scores were counted as referred, evaluated, and qualified for services; three with Untestable scores were counted as not referred but evaluated and qualified; one child, not delayed but receiving services solely because of prematurity, was counted as normal, not referred, and not qualified. Eight children with Suspect or Untestable scores were rescreened with normal results.
When children were not referred to EI despite Suspect or Untestable Denver II results, the technician wrote comments about why the children were not referred. The senior author reviewed the test items and those comments and then noted his agreement or disagreement with the decision not to refer. This information was used to estimate the number of possibly delayed children in these two groups who might have been missed.
Data analysis
The number of children qualifying for EI services was the major outcome variable of interest. PVs for both positive and negative screening results are reported as the percentage of children with a given screening result who qualified for services. The goal of screening is to maximize the PV of a positive screen and minimize the PV of a negative screen.
LRs (LR+ or LR−) were used to evaluate the ability of the screening information to raise or lower the PV over chance. These were calculated by changing probability percentages to odds using the formula p/(1-p) and then calculating the ratio of post-test odds (PV) to pretest odds (prevalence). Note that LR is always positive. The + and − signs refer to the screening result.
For evaluating the significance of LRs, we used the standards described by Furukawa et al.13 If the LR+ is between 2.0 and 5.0, positive screening results “generate small (but sometimes important) changes in probability”; if between 5.0 and 10.0, they “generate moderate shifts.” If LR− is between .20 and .50, a negative screening result contributes a “small (but sometimes important) shift” in reducing the PV; if it is between .10 and .20, it contributes moderately.
Camp’s recommended threshold for an acceptable positive screening result, stated above, was a PPV of at least 60% and an LR+ of at least 2.0. For a negative screening result, her recommended threshold was that a negative screen lead to a positive result (1 − NPV) in 10% or fewer of cases and that the LR− be below .50. When the PV is unacceptable, but the LR meets the minimum standard, adding information often improves accuracy to acceptable levels.
We calculated PVs and LRs for two methods of defining positive and negative screening results. One method used the Denver II alone using a Suspect score as a positive screening result and a Non-Suspect (Untestable and WNL) score as a negative screening result. The second method combined Denver II results with information from the parent–technician conference. When this resulted in a referral to EI for further evaluation, the screening result was considered positive. Classification as a non-referral was considered a negative screening result.
Two types of corrections were also examined arising from the fact that only ~20% of the total population of 418 was evaluated by EI. The first correction was for verification bias. This correction is recommended when children are referred but some fail to be evaluated.14,15 Verification bias can be corrected statistically using a weighting procedure to estimate how many children would be expected to qualify for services if all of those referred had been evaluated. Failure to make this correction assumes that all non-evaluated referrals were normal. Some authors assume that the proportion of children qualifying among those not evaluated is the same as among those evaluated; we used Camp’s3 more conservative assumption that the proportion of children qualifying is the same among those not evaluated as among those referred. The formula used to correct the number of children qualifying was (number of known qualifiers) + ((number of known qualifiers/number referred) × (number of RNE)).
The second correction was to estimate the number of non-referred children who might have qualified for EI services. We examined the reasons Suspect and Untestable children were not referred to identify those whose screening was incomplete or who the senior author thought should have been referred but were not. Treating these children as though they were RNE, we estimated the number who might have qualified if they had been referred. These were added to the number of non-referred children known to qualify for EI services to achieve a corrected estimate of the number missed by non-referral.
Results
Overall results
Table 1 shows the numbers of children with each type of Denver II result. Overall, 51% (212/418) of children had a normal Denver II, 31% (129/418) Suspect, and 18% (77/418) Untestable. There were 115/418 (28%) referred to EI. Of those, 70% (81/115) were evaluated by EI, and 79% (64/81) qualified for EI services. The total number who qualified for EI services (referred plus non-referred) was 67, which yields a prevalence of 16% (67/418).
Table 1.
WNL (212) | Referred | 9 | Evaluated | 8 | Qualified | 4 |
Not qualified | 4 | |||||
Not evaluated | 1 | Qualified | 0 | |||
Not qualified | 1a | |||||
Not referred | 203 | Evaluated | 0 | Qualified | 0 | |
Not qualified | 0 | |||||
Not evaluated | 203 | Qualified | 0 | |||
Not qualified | 203a | |||||
Suspect (129) | Referred | 96 | Evaluated | 68 | Qualified | 57 |
Not qualified | 11 | |||||
Not evaluated | 28 | Qualified | 0 | |||
Not qualified | 28a | |||||
Not referred | 33 | Evaluated | 0 | Qualified | 0 | |
Not qualified | 0 | |||||
Not evaluated | 33 | Qualified | 0 | |||
Not qualified | 33a | |||||
Untestable (77) | Referred | 10 | Evaluated | 5 | Qualified | 3 |
Not qualified | 2 | |||||
Not evaluated | 5 | Qualified | 0 | |||
Not qualified | 5a | |||||
Not referred | 67 | Evaluated | 3 | Qualified | 3 | |
Not qualified | 0 | |||||
Not evaluated | 64 | Qualified | 0 | |||
Not qualified | 64a |
aDevelopmental status of these children is unknown because they were not evaluated by Early Intervention.
Speech therapy accounted for 69% of services for which children qualified. The speech sector of the Denver II by itself would have identified 77% (49/64) of those who qualified. Children under 18 months qualified in more cases than older children for occupational or physical therapy.
Accuracy of the Denver II alone
PVs and LRs are shown in Table 2 for the Denver II alone. Normal and Untestable children are reported together as Non-Suspect. The PV of a Suspect Denver II was 44% (57/129) with LR+ = 4.16; the PV of a Non-Suspect Denver II was 3% (10/289) with LR− = .16. Correction for verification bias was achieved by recalculating prevalence, PV, and LR after adjusting the number qualifying to include a percentage of children who were RNE. The adjusted number qualifying was calculated as ((number qualifying/number referred) × number RNE) + number qualifying. Using values in Table 1, this was ((57/96) × 28) + 57 = 73.6 for a Suspect Denver II. Corrected results are also presented in Table 2.
Table 2.
Uncorrected | Corrected for RNE (referred but not evaluated) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Test result | n | Prevalence | Qualified | PV | Odds | LR | Prevalence | Qualified (A) | PV | Odds | LR |
Suspect | 129 | .16 | 57 | .44 (57/129) | .79 (.44/.56) | 4.16 (.79/.19) | .20 | 73.5 | .57 (73.5/129) | 1.33 (.57/.43) | 5.32 (1.33/.25) |
CI (.13–20) | CI (.35–.53) | CI (3.30–5.21) | CI (.17–25) | CI (.49–.66) | CI (3.99–6.65) | ||||||
Non-Suspect | 289 | .16 | 10 | .03 (10/289) | .03 (.03/.97) | .16 (.03/.19) | .20 | 12 | .04 (12/289) | .04 (.04/.96) | .16 (.04/.25) |
CI (.13–20) | CI (.01–.05) | CI (.11–.33) | CI (.17–25) | CI (.02–.06) | CI (.10–.29) | ||||||
Untestable | 77 | .16 | 6 | .08 (6/77) | .09 (.08/.92) | .47 (.09/.19) | .20 | 7.5 | .10 (7.5/77) | .11 (.10/.90) | .44 (.11/.25) |
CI (.13–20) | CI (.02–.14) | CI (.17–.25) | CI (.04–.16) | ||||||||
WNL | 212 | .16 | 4 | .02 (4/212) | .02 (.02/.98) | .13 (.02/.16) | .20 | 4.44 | .02 (4.44/212) | .02 (.02/.98) | .08 (.02/.25) |
CI (.13–20) | CI (0–.04) | CI (.17–.25) | CI (0–.04) |
PV: predictive value; LR: likelihood ratio; CI: confidence interval; WNL: Within Normal Limits; Q(A): qualified—adjusted for referred but not evaluated.
With correction for verification bias, the PV of a Suspect Denver II rises to 57% (confidence interval (CI): .49–.66) with LR+ = 5.32 and a Non-Suspect Denver II rises to .04 with LR− = .16. The PV for a Suspect Denver II still falls short of the 60% goal but the 95% CI contains the goal. Both the corrected and uncorrected PVs of a Non-Suspect Denver II (4%) and LR− (.16) meet the preset standards.
Accuracy of the decision to refer
Table 3 shows the PVs and LRs for referrals, which reflect the combination of the screening test with other information from the parent–technician conference. The raw PV of a referral was 56% (64/115) with LR+ = 6.68. After correction for verification bias, the PV of a referral became 72% (82.5/115) with LR− = 10.3. The 95% CIs for raw and corrected PVs overlap, indicating that the differences are not statistically significant. Table 3 also shows the PV and LR for the combination of each Denver II score and parent–technician conference. Referral of a child with a Suspect Denver II yields an uncorrected PV of 59% (57/96) with LR+ = 7.57 which corrects to 77% (73.5/96) with LR+ = 13.4, both above the goal of 60%. Referral of a child with a Non-Suspect Denver II (PPV = 37%) falls short of the acceptable threshold but is not low enough to dismiss the need for referral.
Table 3.
Decision | Group | n | Raw | Corrected for Children Referred but Not Evaluated | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Q | Prevalence | PV | LR | Prevalence | Qa | PV | LR | |||
Referral | Suspect Denver II | 96 | 57 | .16 | .59 (57/96) | 7.57 (1.44/.19) | .20 (85.5/418) | 73.5 | .77 (73.6/96) | 13.4 (3.35/.25) |
CI (.13–.20) | CI (.49–.69) | CI (.17–.25) | CI (.69–.81) | |||||||
Non-Suspect Denver II | 19 | 7 | .16 | .37 (7/19) | 3.11 (.59/.19) | .20 (85.5/418) | 9 | .47 (9/19) | 3.56 (.89/.25) | |
CI (.13–.20) | CI (.26–.48) | CI (.17–.25) | CI (.36–.58) | |||||||
Total referrals | 115 | 64 | .16 | .56 (64/115) | 6.68 (1.27/.19) | .20 (85.5/418) | 82.5 | .72 (82.5/115) | 10.3 (2.57/.25) | |
CI (.13–.20) | CI (.47–.65) | CI (5.07–8.52) | CI (.17–.25) | CI (.64–.80) | CI (7.1–13.7) | |||||
Non-referral | Total non-referrals | 303 | 3 | .16 | .01 (3/303) | .05 (.02–.16) | .20 (85.5/418) | 3 | .01 (3/303) | .04 (.01/.25) |
CI (.13–.20) | CI (.004–.02) | CI (.17–.25) | CI (.004–.02) | CI (.01–.12) |
Q: qualified; Qa: qualified—adjusted for referred but not evaluated; PV: predictive value; LR: likelihood ratio; CI: confidence interval.
Accuracy of the decision not to refer
In Table 3, the PV for a non-referral is 1% (3/303) with LR− = .05, both meeting Camp’s standards for dismissing the need to refer. These results assume that all of the non-referred children would have failed to qualify for services. In some ways, this is a reasonable assumption because there was no evidence that any other children in the population were identified as delayed during the later chart review. Nevertheless, we sought to determine how estimates of children missed by non-referral would affect the PVs and LRs.
The 303 children who were not referred included 33 children with Suspect Denver II scores and 67 with Untestable. Three of the Untestable children were already receiving services, and screening was insufficient in three Suspect and eight Untestable cases. The senior author questioned the technician’s decision not to refer 13 cases (10 Suspect, 3 Untestable). In the remaining 73, the senior author agreed with the technician because evidence of delay was minor or transient (20 Suspect, 53 Untestable—see Table 4). The total number of possible additional referrals among children with suspect scores was 13; among untestables, the number was 11.
Table 4.
Reason | Number | Example: Suspect | Example: Untestable |
---|---|---|---|
Child passed older items in same sector | 19 | (age 66 months) Failed to count one block but passed 7 older speech items. Refused to draw person with 3 parts and failed to copy + but did two older fine motor items. | (age 39 months) Shy. Refused to wiggle thumb, jump over paper, or jump up. Refused to show she knew two adjectives but passed 4 older speech items. |
Child failed only one item | 14 | (age 24 months) Failed to stack 4 blocks but stacked 2. | (age 25 months) Refused to kick ball. |
Parent said child could do items at home | 15 | (age 31 months) Failed to put on clothes. Refused to stack 8 cubes (did 4), jump up, or throw or kick ball. Mother said he could do those things at home. | (age 25 months) Refused to kick ball but reportedly could do so at home. |
Child’s speech seemed good | 5 | (age 50 months) Didn’t count one block or name 4 colors but did 7 other speech items. Speech was clear. | (age 30 months) Refused to dump raisin from bottle and stack cubes. Parents said child could do the items at home. Child had clear speech with 3-word sentences. |
Combination of reasons | 6 | (41 months) Failed to know 2 actions and name 1 color but passed 4 other language items. Stacked 6 cubes but not 8. Speech was 80% clear. | (age 63 months) Failed to count 5 blocks but passed one older speech item. Said clear sentences in English and Spanish. Refused to draw a person or tandem walk. |
Items themselves seem minor | 14 | (29 months) Failed to wash and dry hands and to build tower of 6 cubes. | (age 27 months) Refused to dump raisin from bottle. |
Total | 73 | 20 | 53 |
We then estimated the number of children who might have qualified for services among the subgroups. We treated the 13 children with suspect scores as though they were RNE. Multiplying 13 by 59% (the unadjusted PPV of a referred Suspect Denver II) indicates that 8 might have qualified. A similar procedure indicated that 4 children might have qualified from the 11 Untestable children who had not been evaluated (11 × 37%, the PV for a referred non-suspect child). Using the rule of three,16 we estimated that a maximum of three additional children might have qualified from among the WNL group, bringing the total number of potentially missed children to 15/303. Adding the new 15 to the 3 non-referred Untestables already receiving services brings the potential total to 18.
Table 5 shows the PVs (1 − NPV) and LRs for non-referrals before and after adjusting for hypothetically missed children. The overall PV of a non-referral rises to .05 with LR− = .16, which remains acceptably low. However, the PV (.18) and LR− (.79) for a non-referred child with a Suspect Denver II are unacceptably high. In Table 5, the PV of a non-referred Untestable rises to 10% with LR− = .45, barely meeting the preset standards for acceptability. The majority of these cases were children whose screening was insufficient because parents failed to return for rescreening.
Table 5.
Non-referred group | n | Raw result | Corrected for possibly missed cases | ||||||
---|---|---|---|---|---|---|---|---|---|
Q | Prevalence | PV | LR | Qa | Prevalence | PV | LR | ||
Suspect Denver II | 33 | 0 | .16 | NA | NA | 8a | .18b (82/418) | .24 (8/33) | 1.45 (.32/.22) |
CI (.13–.20) | CI (.15–.22) | CI (.11–.42) | |||||||
Odds .19 | Odds = (.18/.82) = .22 | Odds .24/.76 = .32 | |||||||
Untestable | 67 | 3 | .16 | .04 (3/67) | .21 (.04/.19) | 7a | .18 | .10 (7/67) | .50 (.11/.22) |
CI (.13–.20) | CI (0–.08) | CI (.02–.16) | |||||||
Odds = .1/.9 = .11 | |||||||||
WNL | 203 | 0 | .16 | NA | NA | 3c | .18 | .015 (3/203) | .07 (.015/.22) |
CI (.13–.20) | CI = .006–.024 | ||||||||
Odds .015 | |||||||||
Total | 303 | 3 | .16 | .01 (3/303) | .05 (.01/.19) | 18 | .18 | .09 (18/303) | .45 (.10/.22) |
CI (.13–.20) | CI (0–.02) | Odds (.09/.91 = .10) |
PV: predictive value; Q: qualified; Qa: qualified, corrected for referred but not evaluated; LR: likelihood ratio—predictive value odds/prevalence odds; CI: confidence interval; WNL: Within Normal Limits.
Prevalence: raw 67/418 = .46, corrected 82/418 (includes 15 new).
aCorrected for children who should have been referred but were not.
bPrevalence is based on hypothetical total (99.5) qualifying for services in study group (418).
Analysis of reasons for non-referral also provided information about whose decisions led to children not being evaluated. As noted above, the technician chose questionably not to refer 13 children and follow-up was incomplete in 3, resulting in a total of 16 children who probably should have been evaluated. Parents failed to return for rescreening (8), refused referral (10), and accepted referral but failed to keep the appointment (24) for a total of 42 who probably should have been evaluated. Review of children’s charts showed that some parents expressed concern about development during the well-child visit. Of those who did, 82% (37/45) completed evaluation of their children; of those who did not, 58% (41/71) completed evaluation (p = .01).
Cost
The technician earned US$5096 in 2005. Including costs of materials, we estimate the costs of the project at US$5169 a year, US$13,579 for the 29 months of the study, US$32 per Denver II, and US$212 per child qualified. This estimate omits the indirect costs of supervision, medical records, the appointment system, and other items. The medical director of the clinic said she supported the program because of its results and its low cost. The Colorado Medicaid program does pay for developmental screens; we estimated that its reimbursements could have paid the technician’s salary.
Discussion
PVs and LRs for Denver II alone
Traditionally, studies of developmental screening in clinical settings report only the percentage of children who were evaluated who met a gold-standard criterion for delay. In this study, this was 79%, which compares favorably with results of studies with the Ages and Stages Questionnaires (ASQs).17 In those studies, that percentage varied from 34%18 to 55%19 to 63%8 to 68%11 to 77%.9,11
This value is an overestimate of the PV of a positive test when it fails to take into account children who were not evaluated. A more appropriate estimate of the PPV is the percentage of children with positive screening results who qualify for services. In this study, this value was 44% (61/129) for the Denver II alone, which is in the same range as reported for other screening tests administered to children aged 15–36 months. For example, PPVs of 38%17 and 33%8 have been reported for the ASQs, 46% for the Parents’ Evaluation of Developmental Status (PEDS).20 The uncorrected LR+ of a Suspect Denver II was 4.16; the corresponding figures calculated from other studies were 4.93 and 5.0 for the ASQs and 3.76 for the PEDS. All of those values that fail to correct for children RNE will now be underestimates of PV and LR, as in this study, because of verification bias.14,15
PVs and LRs for combined results—referral
Even with correction, all the tests have the same limitations, namely, a minimal to moderate LR+ and a PPV below .50 in most populations. Evaluating the strength of test evidence of delay and basing referrals on both information from the screening test and information from the parent–technician conference provides nearly acceptable PVs without correction (PPV 56%, LR+ 6.68) and values above the acceptable threshold with it (PPV 72%, LR+ 10.3). In two other clinic-based studies using the ASQs, the PPV of a referral was 33% (86/261)8 and 23% (26/115).11 Correcting for children RNE, the figures rise to 50% (130/261) and 39% (45/115), respectively. Although some have argued that all children with any hint of delay should be referred,6 clinicians seldom do so,7 and they sometimes successfully identify delayed children despite normal test results.8,9 The current findings demonstrate the value of combining information as urged by previous writers.5,6,21 Systematic study of how to combine information from different screening tests and/or other information should improve our ability to develop clinical decision rules that improve PV. Those in turn would support vigorous efforts of both clinicians and parents to follow up on recommended referrals.
Identifying subgroups failing to meet overall standards
Even when the overall performance of a program meets standards for referral and non-referral, some subgroups may fail to do so. Identifying and analyzing the reasons for failure in those subgroups can lead to further improvement. In this study, two subgroups stood out as needing additional attention. In the subgroup of 19 children with Non-Suspect Denver II, referred because of information from the parent–technician conference, 7/19 (37%) qualified for services. This PV failed to reach the threshold for successful referral, but it was too high to dismiss the possibility of delay. A clinician seeing these data might still decide to refer those children, creating some over-referrals but not very many. An alternative would be to gather more information about each child, for example by rescreening later or administering a different screening test.
In the other subgroup, non-referred children with Suspect Denver II, corrected PV (24%, 8/33), and LR− (1.45) were too high to accept non-referral. The majority (10/13) of children in this group would have been referred if the senior author had reviewed decisions not to refer at the time they were made.
Follow-up after screening
The proportion of children referred who got to evaluation in this study was 70% (81/115). This percentage is at the high end of the range reported in other studies, where the percentage has varied from 33%11 to 51%8 to 76%.18 Three factors probably increased this number: the ability of the technician to spend time with parents, careful follow-up, and communication between her and EI. In addition, significantly more parents who expressed concern during the well-child visit followed through than did parents who did not express concern. A recent report showed reasons parents decide not to follow through: misunderstanding the pediatrician, preferring to wait and see how the child develops, considering themselves expert in their child’s development, and facing practical challenges like reaching people on the telephone.22 The matter deserves further study, especially among parents who appear unconcerned.
Proportion of children screened
Despite success in screening outcomes, the program reached only 47% (242/512) of the targeted 2-year-olds who had a well-child visit during the project period in the main clinic and 19% (373/1935) of all children in the age range 6–76 months. Previous studies using the ASQs have screened varying proportions of eligible children: 25%,18 54%,9 62%,23 and 93%.11 The number screened in this study was low because the technician worked only 20% time. When a child had a well appointment and she was not working, a Denver II appointment was made later on and parents were less likely to come. If her time had been expanded to full time, her salary would have cost more, but she would have had perhaps half time available for other work.
Implications for other screening programs
Although the Denver II was used in this study, the purpose of this article was not to justify its use. However, aspects of the Denver II often criticized, for example, date of norms (1992) and a Colorado standardization sample, did not adversely affect its usefulness in these Colorado clinics. Indeed, viewed from the standpoint of PVs and LRs, the current results with the Denver II equal or exceed published reports from comparable studies using parent questionnaires. The convenience of parent questionnaires makes them ideal for maximizing the number of children who get screened in routine pediatric settings. Although less efficient, screening tests involving direct observation such as the Denver II are still particularly useful when one is not confident that answers to questionnaires are dependable. Choice of test usually rests on such factors.
A more general problem is that despite a significant overlap in what is measured by the different screening tests, each procedure contributes something unique. This is a common problem in differential diagnosis where different types of information must be combined to reach the correct diagnosis.24 Study of how different types of information can be combined to predict a given outcome can lead to development of clinical decision rules such as the Ottawa ankle rules for deciding when to order an X-ray for an injured ankle.25
This study demonstrates that combining the results of the Denver II with information obtained during the parent–technician conference can improve PPVs to acceptable levels without significantly increasing under-referrals. Presumably, similar results could be obtained with other screening tests with systematic study. Such study of how to combine information from different screening tests and/or other information should improve our ability to develop clinical decision rules that improve efficiency. Those in turn would support vigorous efforts of both clinicians and parents to follow up on recommended referrals.
Limitations of study
A major limitation in evaluating success of a clinical program is that the quality of outcome data is not the same for all children. Using a lack of evidence of delay in later chart review as evidence of normal functioning is questionable. There is, however, precedent for this in the medical literature.13 We also recognize that the efforts to adjust for children RNE are, at best, informed guesses. Although not ideal, both procedures provide useful information.
Summary
The approach to evaluation described here clarified successful aspects of the screening program and identified areas for improvement. The program achieved higher levels of predictive accuracy when screening test results were combined with other information from the parent–technician conference. A clinician in these clinics could expect that 59% (minimum) to 77% (corrected) of children with a suspect score on the Denver II and a confirmatory family conference would qualify for EI services. Likewise, he or she could expect that, at worst, fewer than 10% of children not referred would qualify for services if they were evaluated.
Evaluation showed three areas for improvement: (1) Failure to return for rescreening and to obtain evaluations of children referred to EI, primarily due to parents not following through, emerged as a major problem for this and most screening programs. (2) Closer supervision of the technician’s decisions not to refer children with Suspect results could have increased referrals and increased confidence in the decision not to refer. (3) Having the technician available full time would have enabled her to see children at all well-child appointments. Further systematic study of combining information is recommended to improve efficiency of developmental screening programs.
Acknowledgments
The authors thank Lydia Sofia Delgado for performing the Denver II; Mona Reeves, RN, Ginny Strange, RN, and Sonia D. Martinez for gathering data; and Sara Harper, RN, for analyzing the data. The latter three volunteered their time.
Footnotes
Declaration of conflicting interests: The authors declared no potential conflicts of interest with respect to the research, authorship, or publication of this article.
Funding: A small private grant of US$500 supported the work of nurses in collecting data. The authors received no other financial support for the research, authorship, or publication of this article.
References
1. Guyatt T, Rennie D, Meade MO, et al. Users’ guides to the medical literature: a manual for evidence-based clinical practice. 2nd ed.New York: McGraw-Hill, 2008. [Google Scholar]
2. Altman DG, Bland JM.Statistics notes: diagnostic test 2: predictive values. BMJ1994; 309: 102. [PMC free article] [PubMed] [Google Scholar]
3. Camp BW.Applying Bayesian analysis to evaluation of developmental screening. J Dev Behav Pediatr2009; 30: 583–592. [PubMed] [Google Scholar]
4. Aylward GP.Conceptual issues in developmental screening and assessment. J Dev Behav Pediatr1997; 18: 340–349. [PubMed] [Google Scholar]
5. Frankenburg WK, Dodds J, Archer P, et al. A major revision and restandardization of the Denver Developmental Screening Test. Pediatrics1992; 89: 91–97. [PubMed] [Google Scholar]
6. Lipkin PH, Gwynn H.Improving developmental screening: combining parent and pediatrician opinions with standardized questionnaires (Commentary). Pediatrics2007; 119: 655–656. [PubMed] [Google Scholar]
7. King TM, Tandon SD, Macias MM, et al. Implementing developmental screening and referrals: lessons learned from a national project. Pediatrics2010; 125: 350–360. [PubMed] [Google Scholar]
8. Guevara JPGM, Localio R, Huang YV, et al. Effectiveness of developmental screening in an urban setting. Pediatrics2013; 131: 30–37. [PubMed] [Google Scholar]
9. Hix-Small H, Marks K, Squires J, et al. Impact of implementing developmental screening at 12 and 24 months in a pediatric practice. Pediatrics2007; 120: 381–389. [PubMed] [Google Scholar]
10. McKay K, Shannon A, Vater S, et al. ChildServ: lessons learned from the design and implementation of a community-based developmental surveillance program. Infant Young Child2006; 19: 371–377. [Google Scholar]
11. Talmi A, Bunik M, Asherin R, et al. Improving developmental screening documentation and referral completion. Pediatrics2014; 134: e1181–e1188. [PubMed] [Google Scholar]
12. Frankenburg WK, Dodds J, Archer P, et al. Denver II training manual. 2nd ed.Denver, CO: Denver Developmental Materials, Inc, 1992. [Google Scholar]
13. Furukawa TA, Strauss S, Bucher HC, et al. Diagnostic tests. In: Guyatt G, Rennie D, Meade MO, et al., editors. (eds) Users’ guides to the medical literature: a manual for evidence-based clinical practice. 2nd ed.New York: McGraw-Hill, 2008, pp. 419–438. [Google Scholar]
14. Begg CB.Biases in the assessment of diagnostic tests. Stat Med1987; 6: 411–423. [PubMed] [Google Scholar]
15. Bates AS, Margolis PA, Evans AT.Verification bias in pediatric studies evaluating diagnostic tests. J Pediatr1993; 122: 585–590. [PubMed] [Google Scholar]
16. Hanley JA, Lippman-Hand A.If nothing goes wrong, is everything all right?JAMA1983; 249: 1743–1745. [PubMed] [Google Scholar]
17. Squires J, Potter L, Bricker D.The ASQ User’s Guide for the Ages and Stages Questionnaires. 2nd edBaltimore, MD: Paul H. Brookes, 1999. [Google Scholar]
(Redirected from DENVER II)
Denver Developmental Screening Tests | |
---|---|
Medical diagnostics | |
Purpose | identify young children with developmental issues |
The Denver Developmental Screening Test was introduced in 1967 to identify young children, up to age six, with developmental problems. A revised version, Denver II, was released in 1992 to provide needed improvements. The purpose of the tests is to identify young children with developmental problems so that they can be referred for help.
The tests address four domains of child development: personal-social (for example, waves bye-bye), fine motor and adaptive (puts block in cup), language (combines words), and gross motor (hops). They are meant to be used by medical assistants or other trained workers in programs serving children. Both tests differ from other common developmental screening tests in that the examiner directly tests the child. This is a strength if parents communicate poorly or are poor observers or reporters. Other tools, for example the Age and Stages Questionnaires, depend on parent report.
- 2Denver II
Denver Developmental Screening Test[edit]
The test was developed in Denver, Colorado, by Frankenburg and Dodds.[1] As the first tool used for developmental screening in normal situations like pediatric well-child care, the test became widely known and was used in 54 countries and standardized in 15.[2] The Denver Developmental Screening Test was published in 1967. During its first 25 years of use, one study found it to be insensitive to language delays.[3] Other concerns arose: that norms might vary by ethnic group or mother's education, that norms might have changed, and that users needed training.
Denver II[edit]
Research Basis[edit]
The Denver Developmental Screening Test was revised in order to increase its detection of language delays, replace items found difficult to use, and address the other concerns listed.[4] There are 125 items over the age range from birth to six years. An examiner administers the age-appropriate items to the child, although some can be passed by parental report. Each item is scored as pass, fail, or refused. Items that can be completed by 75%-90% of children but are failed are called cautions; those that can be completed by 90% of children but are failed are called delays. A normal score means no delay in any domain and no more than one caution; a suspect score means one or more delays or two or more cautions; a score of untestable means enough refused items that the score would be suspect if they had been delays. The Denver II is available in English and Spanish. Videotapes and two manuals describe 14 hours of structured instruction and recommend testing a dozen children for practice. Beyond this a professional degree is not required. As with all developmental testing, one must follow the instructions in detail.
The standardization sample of 2,096 children was selected to represent the children of the state of Colorado. The test has been criticized because that population is slightly different from that of the U.S. as a whole. However, the authors found no clinically significant differences when results were weighted to reflect the distribution of demographic factors in the whole U.S. population. Significant differences were defined as differences of more than 10% in the age at which 90% of children could perform any given item [5]. Separate norms were provided for the 16 items whose scores varied by race, maternal education, or rural-urban residence.
Interpretation[edit]
The author of the test, William K. Frankenburg, likened it to a growth chart of height and weight and encouraged users to consider factors other than test results in working with an individual child. Such factors could include the parents’ education and opinions, the child’s health, family history, and available services. Frankenburg did not recommend criteria for referral; rather, he recommended that screening programs and communities review their results and decide whether they are satisfied. [6]
In 2006 the American Academy of Pediatrics Council on Children with Disabilities; Section on Developmental Behavioral Pediatrics published a list of screening tests for clinicians to consider when selecting a test to use in their practice. This list includes the DENVER II among its choices.[7] The chairman of the committee wrote: “In the practice of developmental screening and surveillance, we recommend the incorporation of parent-completed questionnaires or directly administered screening tests into the process of surveillance and screening. However, their results should be combined with attention to parental concerns and the pediatrician’s opinion, rather than replacing them, to augment the screening process and increase identification of children with developmental disorders”.[8]
Studies in Practice[edit]
One study evaluated the Denver II in terms of how its results matched those of a psychologist in five child-care centers: two serving the children of college-educated white parents and three serving low-income African-American children. The psychologist evaluated 104 children, of whom 18 were judged to be delayed [9]). All but two of the 18 came from the low-income centers but no mention is made regarding use of separate norms for African-American children. Results of the Denver II, using an older scoring method, included 33% questionable tests, in between normal and abnormal. If their scores were considered normal, too many children with delays would be missed (low sensitivity); if their scores were considered abnormal, too many children would be referred (low specificity). On the basis of this study, the Denver II fell into disfavor, and it is now seldom mentioned in reviews. Materials may no longer be purchased in hard copy, but they are available at no charge.
Another study evaluated the Denver II in the screening program of a community health center.[10] Here the criterion for abnormality was the eligibility of children for Early Intervention, according to the judgment of speech-language pathologists and other professionals in two suburban school districts. This study included 418 children in all and 64 who needed EI. The success of the screening program was judged in terms of predictive value: the probability that a child, if referred, would be eligible for services. The predictive value was 56%; allowing for children who were referred but not evaluated, it was 72%; this compared favorably with two studies using the Ages and Stages Questionnaire in clinics, which found comparable predictive values of 50% and 38%.[11] The study showed the value of taking into account other information besides the test result because the screener increased the predictive value from 44% to 56% by using her judgment not to refer some children with minor delays.
In a study of two-stage screening, children were prescreened with Frankenburg’s Revised Prescreening Developmental Questionnaire [12] and 421 with suspect scores were given the Denver II and evaluated by independent examiners.[13] In children under 18 months the prevalence of abnormality was 0.19 on diagnostic tests, and the Denver II had a positive predictive value of 0.36, a negative predictive value of 0.90, a sensitivity of 0.67, and a specificity of 0.72. The authors concluded that a suspect Denver II “should lead to careful monitoring and rescreening unless provider or parental concern suggests the need for immediate referral.” Among children 18–72 months old, the prevalence of abnormality was 0.43 and the positive predictive value of the Denver II was 0.77, negative predictive value of 0.89, sensitivity 0.86, and specificity of 0.81. The authors concluded that in their program a suspect Denver II should usually result in referral. (Positive predictive value meant the probability that a child with a suspect Denver II would be diagnosed as abnormal when evaluated; negative predictive value meant the probability that a child with a normal Denver II would be diagnosed as normal when evaluated.)
A study of 3389 children under five in Brazil has produced a continuous measure of child development for population studies.[14] The measure was based on the Denver Developmental Screening Test but can be used with the Denver II.
See also[edit]
- Child Development Stages,
- Developmental Disability,
- Early Childhood Intervention,
References[edit]
- ^Frankenburg, W.K. (1967). 'The Denver Developmental Screening Test'. The Journal of Pediatrics. 71 (2): 181–191. doi:10.1016/S0022-3476(67)80070-2.
- ^Frankenburg, W.K.; Dodds, J.; Archer, P. (1990). Denver II Technical Manual. Denver Developmental Materials, Inc. p. 1.
- ^Borowitz, K.C.; Glascoe, F.P. (1986). 'Sensitivity of the Denver Developmental Screening Test in Speech and Language Screening'. Pediatrics. 78: 1075–1078.
- ^Frankenburg, W.K.; Dodds, J.; Archer, P. (1990). Denver II Technical Manual. Denver Developmental Materials, Inc. p. 1.
- ^Frankenburg, W.K.; Dodds, J.; Archer, P. (1990). Denver II Technical Manual. Denver Developmental Materials, Inc. p. 6,18–19.
- ^Frankenburg, W.K.; Dodds, J.; Archer, P. (1990). Denver II Technical Manual. Denver Developmental Materials, Inc. p. 20–22.
- ^American Academy of Pediatrics, Council on Children with Disabilities; Section on Developmental Behavioral Pediatrics; Bright Futures Steering Committee; Medical Home Initiatives for Children with Special Needs Project Advisory Committee. Identifying infants and young children with developmental disorders in the medical home: an algorithm for developmental surveillance and screening. Pediatrics, 2006;118:405–420
- ^Lipkin, P.H.; Gwynn, H. (2007). 'Improving developmental screening: Combining parent and pediatrician opinions with standardized questionnaires'. Pediatrics. 119: 655–56.
- ^Glascoe, F.P.; Byrne, K.E.; Ashford, L.G. (1992). 'Accuracy of the Denver II in developmental screening'. Pediatrics. 89: 1221–1225.
- ^Dawson, P.; Camp, B.W. (2014). 'Evaluating developmental screening in clinical practice'. SAGE Open Medicine. 2: 205031211456257. doi:10.1177/2050312114562579. PMC4712749. PMID26770755.
- ^Guevara, J.P.; Gerdes, M.; Localio, R. (2013). 'Effectiveness of developmental screening in an urban setting'. Pediatrics. 131 (1): 30–37. doi:10.1542/peds.2012-0765. PMID23248223.
- ^Frankenburg, W.K. (1987). 'Revision of the Denver Prescreening Questionnaire'. J. Pediatr. 110: 653–57. doi:10.1016/S0022-3476(87)80573-5.
- ^Burgess, D.; Camp, B.W.; Spicer, C. (1996). 'Accuracy of the Denver II in a clinical developmental screening protocol'. Abstract Presented at the Society for Developmental-Behavioral Pediatrics.
- ^De Lourdes Drachler, M.; Marshall, T.; de Carvalho Leite, J.C. (2007). 'A continuous-scale measure of child development for population-based epidemiological surveys: A preliminary study using item-response theory for the Denver test'. Paediatric and Perinatal Epidemiology. 21 (2): 138–153. doi:10.1111/j.1365-3016.2007.00787.x. PMID17302643.
External links[edit]
- Developmental and Behavioral Pediatrics at American Academy of Pediatrics
- HealthyChildren.org American Academy of Pediatrics
Retrieved from 'https://en.wikipedia.org/w/index.php?title=Denver_Developmental_Screening_Tests&oldid=899868698'