Structural validation of the “Teacher’s report form” attention problems scale in a Brazilian sample

Background: Attention deficit hyperactivity disorder is one of the most common childhood neurodevelopmental syndromes. Although clinical evaluation is considered the gold standard in diagnosing psychiatric disorders, in epidemiological studies, this evaluation is rarely used for practical and financial reasons. Instead, psychometric instruments are used to screen for the disorders. In this case, it is essential to investigate whether these instruments are suitable for measuring the proposed problem. This study aims to verify the structural validation of the Attention Problems Scale of the Teacher’s Report Form (TRF). Methods: A random sample of 445 TRF filled out by the teacher of children from São Gonçalo/RJ was selected. The confirmatory factor analysis was applied to validate some factor structures that have been raised in the literature. A second aspect analyzed was the use of structural equation models to verify the validated factorial structure’s relationship with some comorbidities. Results: The bifactor model was the most suitable to explain the TRF child’s Attention Problems Scale’s factor structure. It presented the best-fit quality scores for confirmatory factor analysis than other tested structures. Although it presented good indicators for structural validity, some symptoms could be reassessed to have a more consistent instrument. The bifactor model as an explanatory structure in SEM was able to predict important mental health outcomes. These results are an additional validation to the bifactor model. Conclusions: The results suggest the validity of the TRF’s Attention Problems Scale. The instrument’s factor structure was also appropriate because it corroborated most of the association’s assumptions between subtypes of attention problems and other aspects of mental health. The existence of screening scales adapted to Brazilian Portuguese can substantially impact many children who have difficulty learning. Also, the screening scales can be a useful tool for the health sector to facilitate referral to the professional to make the diagnosis.


Background
Attention deficit hyperactivity disorder (ADHD) is the most common neurodevelopmental disorder in the early stages of life (Faraone, Sergeant, Gillberg, & Biederman, 2003). Although there are concerns about the true prevalence of ADHD (Singh, 2008), a meta-analysis study confirms this diagnosis' stability in the past three decades (Polanczyk, Willcutt, Salum, Kieling, & Rohde, 2014). Some of the symptoms associated with ADHD are lack of attention, constant changes in activities, unfinished tasks (even if they are critical), difficulty organizing tasks, and impulsiveness (American Psychiatric Association, 2013).
Studies based on clinical diagnosis shows that ADHD prevalence varies between 3 and 7%. On the other hand, for studies using screening tools, the estimates range between 2.3 and 19.8% (Ayano, Yohannes, & Abraha, 2020;Nigg, 2006;Thomas, Sanders, Doust, Beller, & Glasziou, 2015). Clinical evaluation is considered the gold standard for diagnosing psychiatric disorders; however, in epidemiological studies, this evaluation is rarely used for practical and financial reasons (Buitelaar, 2002). Using psychometric instruments aims to help screen for large population groups, addressing those who need specialized services (Singh, Yeh, & Blanchard, 2017). As a positive aspect, the standardization provided by the use of epidemiological instruments will compare with other studies that have the same objective, even when done in other populations with different cultural contexts. Therefore, it is vital to validate the instrument properly.
An instrument's validity can be obtained by assessing whether the scale is appropriate for the intended objective (Streiner, Norman, & Cairney, 2015). The conceptualization and definition of validity in this study are in the COnsensus-based Standards for the Selection of Health Measurement Instruments (COSMIN). This initiative developed taxonomy and reached a consensus on definitions of the measurement properties. COSMIN set validity as a domain, the construct validity as a measurement property, and the structural validity as an aspect of a measurement property. The construct validity is the degree to which the instrument's score is consistent by measuring what is proposed to measure (Mokkink, Prinsen, Bouter, de Vet, & Terwee, 2016). The structural validity is an essential step to validate an instrument; it is related to how scores of an instrument adequately reflect the dimensionality of the construct investigated (Mokkink et al., 2010).
We checked the structural validity of the Attention Problems Scale of the Teacher's Report Form (TRF) (Achenbach & Rescorla, 2001;Edelbrock & Achenbach, 1984). The TRF is a screening tool aimed at teachers. The original version of the scale has good psychometric properties (Achenbach & Rescorla, 2001;Bordin et al., 2013). Some studies assess the psychometric quality of the Brazilian version of Youth Self-Report (YSR), attesting good results; however, there are still no published studies involving structural validation of the TRF (Bordin et al., 2013).
The original structure of YSR was defined using exploratory factor analysis (EFA). The Attention Problems Scale has two specific factors (inattention and hyperactivity/impulsivity) (Achenbach & Rescorla, 2001). The factorial structure analysis should be one of the stages of psychometric equivalence between the Brazilian version and the original scale.
Some studies using confirmatory factor analysis (CFA) and EFA showed the structural validity of the TRF. Groot, Koot, and Verhulst (1996) used EFA to evaluate the TRF in a Dutch sample and compared their results using CFA with the Achenbach structure (Achenbach & Rescorla, 2001). Social Problems and Attention Problems were the scales with the worst results, with higher amounts of items with smaller loadings than 0.30; however, the Attention Problems scale was not assessed in a model separated from other syndrome scales. Dumenci, McConaughy, and Achenbach (2004) evaluated the TRF Attention Problems scale's factor structure, comparing US samples from the general population and mental health services, being the first to introduce the bifactor structure in the structural validity of the Attention Problems scale. They tested three models that describe (1) a general factor related to the attention problems, (2) a structure with two correlated factors (inattention and hyperactivity/impulsivity), and (3) a third model, which builds a hierarchical structure represented by two specific factors (inattention and hyperactivity/impulsivity) and a general factor attention problems. The authors show that Attention Problems are better conceptualized by the last model (with a hierarchical structure). Ivanova et al. (2007) used data from 20 different countries to evaluate the factorial structure of TRF using CFA. This study evaluated the hierarchical structure model with a general factor and two specific factors for the Attention Problems scale, as validated by Dumenci's study (Dumenci et al., 2004). The results showed that the model fits well for most of the 20 countries analyzed. The root mean square error of approximation (RMSEA) was < 0.08 for each country, the comparative fit index (CFI) varied between 0.942 and 0.979, and the Tucker-Lewis index (TLI) between 0.981 and 0.993, which also indicates a good fit. Greece, Lebanon, and Turkey found that all 26 items of the Attention Problems scale have significant factor loadings. Denmark had 13 with no significant loading factors, 12 related to hyperactivity/impulsivity. In Portugal, the only nonsignificant item was 'whining'. Only 3 countries had all items load significantly, many countries had a few items that did not load significantly, and only a few countries had many items that did not load significantly, which suggests the need for further studies on the validity of the scale. Brazil was not one of the 20 countries studied by Ivanova et al. (2007). The TRF Attention Problems scale's hierarchical structures have also been shown in other studies (Toplak et al., 2012;Ullebø, Breivik, Gillberg, Lundervold, & Posserud, 2012;Wagner et al., 2016). Campos, Santacana, Olmos, and Cebollero (2006), applying EFA in the Spanish version of the TRF (20 items), found a third factor named other inattention, besides the two factors traditionally found (inattention and hyperactivity/impulsivity).
Some studies show that comorbidities are associated with ADHD subtypes and general factors and that there is an association between attention problems without hyperactivity and internalizing problems. For example, Power et al. (2004) show that patients with ADHD combined and those with inattention have anxiety and depression as comorbidities. Kuntsi et al. (2004) relate low IQ and ADHD. Olson, Schilling, and Bates (1999) indicate that impulsive behavior can predict externalizing problems. Arias, Ponce, and Núñez (2018) mention the possibility of using a structural equation model with the dimensions of ADHD as predictors. That is a critical step to analyze the subtype's importance.
Besides the importance of validating the structure of instruments that are used for screening of behavioral problems, from a clinical perspective, it is essential to check the difference between structural patterns of manifestation of ADHD and how this may suggest different forms of treatment (Dumenci et al., 2004). Chen, West, and Sousa (2006) punctuated that a bifactor structure is relevant because it allows the analysis of constructs in each specific domain and general terms.
Thus, this work aimed to validate an ADHD structure through the TRF's Attention Problems Scale and analyze the relevance of each component through its relationship with other mental health outcomes.

Sample
The data came from a longitudinal study with preschool children in São Gonçalo, Rio de Janeiro, Brazil, which started in 2005 (Pires, da Silva, & de Assis, 2013). The sample was obtained from a total of 6,589 2nd year students, ages 6 to 11, in the public education system in 2005. We used cluster sampling in three stages for the selection of the sample. In the first stage, 25 schools were selected using systematic sampling with probability proportional to school size. For the second stage, two classes were randomly selected in each school, and finally, ten students were randomly selected in each class, totaling 500 students. Approximately, 40% of the initially sorted students were replaced, mainly due to inconsistencies in the class diary, which contained students who were not enrolled.
A total of 36 children did not have the teacher's questionnaire and were excluded from the study. Another 18 were excluded because they obtained low scores (less than 69) on the Wechsler Intelligence Scale, which was used to assess children's intellectual level. One student was excluded for not completing this test. After these exclusions, the final sample was 445 children.

Measures
The TRF is a psychometric instrument filled out by teachers that assesses behavioral problems in the last two months in children and adolescents aged 6-18 years. It is a questionnaire with 113 items, from which 26 belong to the Attention Problems Scale. This scale has two dimensions: 14 inattention items and 12 hyperactivity/impulsivity items. The items describe situations that may arise in schools which would be linked to symptoms of attention problems, including can't sit still, restless, or hyperactive; impulsive or act without thinking; fails to finish things he/she starts; can't concentrate, can't pay attention for long. For each of these issues, the teacher rates if the behavior (symptoms) is absent, sometimes present, or frequently present in the child's repertoire (Achenbach & Rescorla, 2001). Table 1 shows the symptoms that assessed ADHD in the TRF.
Externalizing and internalizing problems were also measured using the TRF. The subscales considered Somatic Complaints, Anxiety/Depression, and Withdrawal/ Depression for the internalizing problems construct. On the other hand, the externalizing problems were measured by Rule-Breaking Behavior and Aggressive Behavior. Another instrument used in the analyses was the Wechsler Intelligence Scale for Children. This construct in the structural equation model (SEM) structure was assessed by verbal and non-verbal IQ.

Data analysis
The CFA is a factor analysis with restrictions on the model's parameters, e.g., whether a symptom has loaded on a specific factor; only this loading is estimated, and not on another factor. These restrictions reflect the structural hypothesis about the instrument (Kaplan, 2009). As the data are categorical, we used the weighted least squares mean variance-adjusted (WLSMV) to estimate the CFA parameters. Figure 1 depicts the structures proposed for the analysis. In Fig. 1a, the attention problem is defined as one factor. The structure of Fig. 1b specifies the model with two specific factors (inattention and hyperactivity/impulsivity), and with a correlation between them; in Fig. 1c, the structure is specified as a bifactor model with two specific factors and a general factor. In this structure, all factors are orthogonal to each other (without correlations).
The model was evaluated using the following indices: root mean square error of approximation (RMSEA), Tucker-Lewis index (TLI), and comparative fit index (CFI). A good fit is RMSEA close to or below 0.06, with an upper 90% confidence limit close to or below 0.10, and CFI and TLI close to or greater than 0.95 (Brown, 2006). After adjusting the CFA, it is essential to inspect the model, for instance, whether a particular indicator, under the theoretical restriction, should be loaded at only one factor (the other factor loadings are set at zero). Otherwise, the indicator should be freely estimated (no restrictions), and then the symptom is loaded into more than one factor. This review about the theoretical imposition on parameters is also extended to errors of the factor structure. Errors also have restrictions; there is no correlation between them (i.e., they are set to zero), and they need to be analyzed. The modification indices are used to assess such restrictions. It compares nested models; in this particular case, we compare a model with fixed parameters and another with freely estimated parameters, and it is observed whether this change would be significant. Modification indices with values above 10 suggest that removing the restriction imposed on a given  parameter is significant (factor loadings or covariance between errors). An additional step to analyze the construct validity was to investigate how each ADHD subtype relates to other problems or comorbidities using SEM. The bifactor model was used as predictors in three models with outcomes: (1) internalizing problems measured by somatic complaints, withdrawn and anxious/depressed; (2) externalizing problems measured by delinquent and aggressive behavior; (3) intelligence quotient measured by verbal and nonverbal intelligence.
The analyses were performed using the statistical package R version 2.14.2. For the CFA, the library lavaan (Rosseel, 2012) was used.
The factorial model (Fig. 1c) provided the best fit among the three models. The value of the upper limit for a 90% confidence interval for RMSEA is below 0.10. The point value is close, despite being greater than 0.06, indicating a good fit. The CFI and TLI also indicate a good fit with values above 0.95. An important fact that corroborates the general factor's introduction was the high correlation of 0.731 between the factors inattention and hyperactivity/impulsivity in the two-factor model (Fig. 1b). The high correlation between specific factors (Fig. 1b) suggests that some factors were not introduced.
The bifactor model was superior, and the introduction of a general factor improved the model structure.
The standard loadings (Table 2) show some items with negative loadings, such as items HI73 (Behaves irresponsibly), HI109 (Whining), and I22 (Difficult following directions). Besides, the I22 also had a low value. The majority of items showed higher loadings on the general factor; the item with the lowest factor loadings was the I80 (stares blankly). However, some of the items that showed low loadings on their specific factor have higher loadings on the general factor (e.g., HI73 and I22). It indicates that some symptoms may be more associated with general ADHD (general factor) than with specific problems (hyperactivity/impulsivity and inattention). The items HI7 (Daydreams or gets lost in his/her thoughts), HI109 (Whining), I1 (Acts too young for his/ her age), and I13 (Confused or seems to be in a fog) showed the highest variability (ɛ > 50).
We can observe the modification indices with values higher than 10 in Table 3. The modification index for the item I22 (difficulty following directions) has a value of 87.412 on the hyperactivity/impulsivity construct. The item HI73 (behaves irresponsibly) had an index of 85.128 on the inattention construct, indicating that these loadings should be estimated on the opposite factors. Table 3 shows that the items HI73 (Behaves irresponsibly) and I22 (Difficult following directions) have a high modification index (36.669), showing a considerable covariation of errors for these indicators. This covariance can be related to high modification indices for cross-loadings. Thus, we can choose to estimate the cross-loadings or correlation between residuals of these indicators. The two symptoms I17 and I80, respectively (daydreams or gets lost in his/her thoughts; stares blankly) were the ones with the largest Table 2 Standardized factor loadings , which can be interpreted as overlapping between these two symptoms. The I80 (stares blankly) is correlated with other items (I49, has difficulty learning, and I61, not working up to potential). It has to be noted that the substantial correlation between the item residuals I49, I61, and I92 (has difficulty learning; poor schoolwork; underachieving, not working up to potential). These items are related to symptoms of poor school performance. Based on the modification indices results, there is a suggestion that the inattention could be split into symptoms related to school performance (i.e., I49, difficulty learning, I61, poor schoolwork, and I92, underachieving) and other types of attention. Adjusting a CFA, we observed the RMSEA of 0.077 (0.072 to 0.082) and CFI and TLI very close to the previous model. Thus, the proposal for adding a new factor was not confirmed.
The structural dimensionality of the TRF was analyzed, verifying how ADHD subtypes are associated with other aspects of mental health. This analysis can also be seen as a hypothesis testing step in the construct validity. Three SEMs were conducted: bifactor model explaining internalizing problems, externalizing problems, and IQ. The three models showed a good fit. The RMSEA was 0.067, 0.058, and 0.056, with none of the upper limits of the confidence of 90% greater than 0.10. CFI and TLI were higher than 0.95 in all models. The scores for general factor were positively correlated with externalizing (ξ g → ξ extern = 0.874, p value < 0.001) and internalizing problems (ξ g → ξ intern = 0.547, p value < 0.001) and inversely a little correlated with IQ (ξ g → ξ iq = − 0.136, p value = 0.070). The specific inattention factor was positively correlated with internalizing problems (ξ i → ξ intern = 0.590, p value < 0.001), inversely with IQ (ξ i → ξ iq = − 0.330, p value < 0.001) but not significative with externalizing (ξ i → ξ extern = − 0.062, p value = 0.280); and hyperactivity was positively correlated with externalizing problems (ξ hi → ξ extern = 0.623, p value < 0.001), a little correlated with internalizing (ξ hi → ξ intern = − 0.213, p value = 0.060), but not with IQ (ξ hi → ξ iq = − 0.050, p value = 0.593).

Discussion
This paper aimed to evaluate the dimensional structure of the Attention Problems Scale of the Brazilian version of the TRF. Like the one proposed in the original validation study of the YSR (Achenbach & Rescorla, 2001), the bifactor model was the model with the best fit. This model is structured with a higher-order factor, named general, and two specific factors related to inattention and hyperactivity/impulsivity. The quality of the fit indices for this model was well above those of other models tested and can be considered good values using the criteria set by (Brown, 2006).
The hierarchical structure of ADHD has already been considered in other studies (Dumenci et al., 2004;Ivanova et al., 2007;Martel, Von Eye, & Nigg, 2010). This structure is more consistent with the DSM-IV (American Psychiatric Association, 2013) criteria for identifying ADHD, whose subtypes are defined as predominantly inattentive, predominantly hyperactive/impulsivity, and the combination of inattention and hyperactivity/impulsivity (Dumenci et al., 2004). Thus, the results obtained in the present study confirm that the Attention Problems Scale of the Brazilian version of the TRF presents a similar structure to that found in other societies, which endorses the validity of this form in Brazil.
Some items that had problems in Ivanova et al. (2007) and were also found in our study (negative values): HI109-the median factor loadings for the 20 countries was very low, and for nine countries, the factor loading was not significant, including Portugal; HI73 presented low median load, and for eight countries, the factor loading was not significant. These results suggest how these symptoms may not be relevant to the scale. The item 'difficulty following directions' initially allocated as an inattention symptom and behaves irresponsibly (a symptom of hyperactivity) showed a high modification index to estimate cross-loading. It indicates that, in the Brazilian version of the form, these symptoms cannot be understood in the same way that the author of the original version of TRF has defined (Achenbach & Rescorla, 2001). It means that the symptom describing hyperactivity can also capture inattention and vice versa. The items difficulty learning (I49), poor schoolwork (I61), underachieving, and not working up to potential represent symptoms of inattention directly related to academic performance (I92). The modification index also indicates the presence of a high covariation between these items. It may reflect the existence of a third factor. A three-factor model was reported in the Spanish version of the TRF (Campos et al., 2006). However, despite showing a good fit, this factor's introduction does not add additional information to explain the ADHD construct. When evaluating the modification indices, all suggestions found should not be adjusted in the model. Only should we consider those which have a theoretical meaning to be included in a new model and tested.
Therefore, based on this analysis, it can be understood that some symptoms are more consistent in discriminating the specific problems of inattention and hyperactivity/impulsiveness. In contrast, others have more impact when describing the ADHD syndrome in their combined form. It is essential to highlight that other studies analyzing the TRF's factorial structure did not assess the modification indices (Dumenci et al., 2004;Ivanova et al., 2007). This failure can be a problem in evaluating any instrument because even in structures with good quality indices (RMSEA < 0.06 and CFI/TLI > 0.95), the structural model may need further restructuring. It can only be identified if these modification indexes are used.
The scores for general factor were positively correlated with externalizing and internalizing problems and inversely correlated with IQ (Kuntsi et al., 2004). The specific inattention factor was positively correlated with internalizing problems (Power et al., 2004), and inversely with IQ, and hyperactivity was only positively correlated with externalizing problems (Olson et al., 1999). These associations show the importance that each component of the bifactor model has for the definition of ADHD.
Meanwhile, people with inattentive subtype have more academic problems. The combined subtype shows up the most damaging subtype, including comorbid externalizing and internalizing problems, lower IQ scores, and increased demand for care and treatment, evidencing the importance of defining the problem correctly. It may lead to a better performance in identifying the syndrome and drawing the corresponding treatment.
A well-defined instrument that specifies the syndrome's valid symptoms is vital to conduct a better assessment. Knowing the correct structure for these disorders is essential for targeting the individual's right treatment (Martel et al., 2010). The present study contributes to a better understanding of the structure of the scale and the structural patterns of manifestation of ADHD, considering the combined manifestation of the different symptoms. The Attention Problems Scale of the Brazilian version of the TRF presented good indicators for structural validity. However, some symptoms could be reassessed in order to have a more consistent instrument. Although two symptoms do not correlate clearly to the original domain and some symptoms overlap, further studies are required to evaluate the relocation or deleting of items. Also, cultural differences combined with problems with the translation of specific items of the scale in different nations need to be considered, requiring attention when making comparisons. Cultural, political, educational or health systems, allied to the number of children rated by each teacher, have been appointed by Ivanova et al. (2007) as a source to impact its results.
A limitation of our study is that the sample of students is only from public schools, and it will be relevant to evaluate the instrument for private school students. Although the sample included is big enough for the analysis, the sample is from only one city, not representing the Brazilian population's diversity and regional differences. Reichenheim, Hökerberg, and Moraes (2014) proposed a seven-step roadmap to examine the structural validity. We verified four steps in our study, not being able in this paper to evaluate item discrimination and intensity regarding the latent trait spectrum, examining raw scores as latent factor score proxies, and assessing the dimensional structure and measurement invariance across groups. Those steps not checked are essential aspects to be analyzed in the future.

Conclusion
The results suggest the validity of the Attention Problems Scale of the Brazilian-Portuguese version of the TRF. The instrument's factor structure was also appropriate because it corroborated most of the association's assumptions between subtypes of attention problems and other aspects of mental health. Besides assisting researchers that can use this form for epidemiological and other types of research, the existence of screening scales adapted to Brazilian Portuguese can bring substantial impact on the mental health sector since they can facilitate referral to professionals that are able to make the proper diagnosis and choose the best line of treatment to help the child development.