Mathematics Assessment At Key Stages Two and Three:

Social class, gender, equity and National Curriculum tests in maths Barry Cooper & Máiréad Dunne #(1) Institute of Education, University of Sussex

Abstract

The paper draws on data from a recent ESRC project which has explored children's interpretation of and performance on the English National Curriculum tests in maths at Key Stages 2 and 3. Using data concerning more than 100 test items taken by 10-11 year-olds the paper shows that 'realistically' contextualised test items produce greater social class and gender differentiation than 'esoteric' items (i.e. those not contextualising mathematical operations in ‘everyday’ settings). The results are employed in a simulated selection process of children for secondary school places (either for a selective school or for a top set within a school). This simulation shows that, on the basis of our data, a test consisting of 'esoteric' items might be expected, all other things being equal, to select many more working class children than a test consisting of 'realistic' items.

Background

The 1988 Education Act introduced a national curriculum (NC) and assessment of children at the end of four Key Stages (KS) for England and Wales. There has been considerable debate and conflict about the nature of the curriculum and its assessment. As a result, the initial proposals for assessment mainly by teachers in their classrooms have been replaced by a stress on testing via group tests (DES/WO, 1988; Brown, 1992). While there has been a restatement recently of the importance of teacher-made assessment (Dearing, 1993), England now has an institutionalised pattern of annual national testing of children in maths at ages 7, 11, 14, and 16. In 1997 the first national league tables were published for 11 year olds. #(2)

One other key point must be made. Maths, though it centrally concerns number, space, measure, etc., is not fixed and unchanging. During periodic re-negotiations of what counts as school maths the cognitive demands made on children change (Cooper, 1983, 1985a, 1985b, 1994a). These demands have typically been differentiated by measured ‘ability’ and/or social class in England, as the case of SMP well illustrates (Cooper, 1985b; Dowling, 1998). In England in recent years, such re-negotiation has led to an apparent weakening of the boundary between ‘everyday’ knowledge and ‘esoteric’ mathematical knowledge both in the curriculum and in its assessment, perhaps especially so for children deemed ‘less able’ (e.g. Dowling, 1998). While in the 1960s and early 1970s the preferred version of school maths tended to favour ‘abstract’ algebraic approaches #(3) , the dominant orthodoxy since the time of the Cockcroft Report of 1982 has favoured the teaching and learning of maths within ‘realistic’ settings (Cockcroft, 1982; Dowling, 1991; Boaler, 1993a, 1993b). This preference within the world of maths educators has been reflected in the national tests (Cooper, 1992, 1994b). It has been argued, drawing on the work of Bernstein (1990, 1996) and Bourdieu (1986), that test items contextualising mathematical operations within ‘realistic’ settings might be expected to cause problems of interpretation for certain students. Working class children may experience more difficulty than others in choosing ‘appropriately’ between using ‘everyday’ knowledge and ‘esoteric’ mathematical knowledge when responding to items (Cooper, 1992, 1994b). This may lead to underestimation of their mathematical capacities in cases where a rational ‘everyday’ response is ruled out as ‘inappropriate’ by the marking scheme but is ‘chosen’ by the child in place of an alternative ‘esoteric’ response (Cooper, 1996, 1998a&b). Similar arguments have been advanced in respect of gender, with girls seen as likely to be disadvantaged by ‘realistic’ assessment items (Boaler, 1994). In summary, performance on ‘realistic’ items may not reflect underlying competence #(4) . It is upon this possible threat to valid and fair assessment that our research has focused.

While the assessment literature has many useful discussions of item bias and differential validity (e.g. Wood, 1991, p.177; Gipps & Murphy, 1994) these tend not to draw on relevant sociological insights concerning the relation between culture and cognition (e.g. Bernstein, 1996; Bourdieu, 1986, 1990a,b&c). Discussions of bias are frequently technical if not empiricist in tone (e.g. Camilli & Shepard, 1994). While purely quantitative methods can identify items, or classes of items, which some groups of test-takers find more or less difficult than other groups, they are less good at increasing our understanding of why such items ‘behave’ in the way they do. To advance our understanding in this area, a more qualitative concern with children’s cognitive strategies and processes is needed, coupled with the use of relevant theoretical insights from outside the area of assessment itself #(5) . It is this more explanatory problem to which our research has been addressed - in the belief that a better understanding of the ways culture, cognition and test performance interact should inform test design (e.g. Cooper, 1998b; Cooper & Dunne, 1998). It would then be possible to avoid more easily those items which cause unnecessary and construct-irrelevant difficulty to some test takers (Messick. 1989, 1994). However, here we intend to show how differently contextualised NC item types are associated with different relative performances by certain social groups. Our focus will be therefore on some of our quantitative data.

Research Focus and Methods

Our research has explored children’s interpretation of and performance on the NC tests, relating this to social class and gender. A subsidiary goal of the research has been to explore what might follow from taking the work of Bernstein seriously in analysing children’s responses to maths test items. This has become a central concern of the research as it became clear that the ‘appropriate’ negotiation of the boundary between the ‘everyday’ and the ‘esoteric’ is difficult for many children. The work of Bernstein and his collaborators is very helpful here, especially the theorising of ‘recognition and realisation rules’ in relation to children’s cognitive strategies (Bernstein, 1990, 1996; Holland, 1981). For an account of how his (and Bourdieu’s) theoretical concepts might be put to use in this area see Cooper (1996, 1998b).

We have employed both quantitative and qualitative methods. The basic strategy has been to use initially statistical analysis of children’s performance on items in test situations to generate insights concerning broad classes of test items (e.g. items which embed mathematical operations in ‘everyday’ and ‘esoteric’ contexts respectively). This has involved coding test items on a number of dimensions #(6). Analyses of the relationships between social class, gender, measured ability, item type and performance have been carried out. Some of these use the child as the case for analysis, others use the item itself (see below and Cooper, Dunne & Rogers, 1997). Alongside this we have used more qualitative analyses of children’s responses to particular items in both the tests and subsequent clinical interviews to generate understanding of why, for example, ‘realistic’ and ‘esoteric’ items seem to be differentially difficult for children from different socio-cultural backgrounds (e.g. Cooper & Dunne, 1998). This has involved the coding of children’s responses on various dimensions, especially the child’s use, whether ‘appropriate’ or not, of ‘everyday’ knowledge in responding to items. In parallel, informing and being informed by this work, a model of the way culture, cognition and performance on ‘realistic’ test items interact has been developed (Cooper, 1996, 1998b).

In each of three primary and secondary schools, Year 6 and Year 9 children took three group tests in maths. Two of these were the actual May 1996 Key Stage national tests. The third, taken some four months earlier, comprised a test put together by us, drawing on previous NC items. Our tests were designed to cover a variety of item types and four Attainment Targets (ATs) #(7) . Our secondary test, like the May 1996 test, was tiered by NC level. Our tests were marked according to the NC marking schemes. Between the administration of the first test and May 1996 we interviewed all of the Year 6 children and a 25% sample of the Year 9 children while they worked individually through a selection of items from the first test. This allowed access to children’s interpretations of the items and their methods. Furthermore, and this has been a crucial part of our approach, it was possible to allow children to reconsider their approach and answer in cases where they had initially chosen an ‘inappropriate’ ‘everyday’ reading of the meaning and requirement of the item. This has allowed us to explore the ways in which the use of a certain class of ‘realistic’ item can lead to the underestimation of children’s actually existing knowledge and understanding (Cooper, Dunne & Rogers, 1997; Cooper & Dunne, 1998). In order to allow an examination of social class effects we have also collected information on parental occupations. The issue of parental occupations was a sensitive one, especially in the secondary schools. Two of the three schools required parental permission before children were allowed to supply this information. The third required that the question go directly to the home with the result that we gained this information for only 43% of the sample in this school #(8) . We also have children’s scores on the three Nelson Cognitive Ability tests. We have also interviewed teachers, concentrating on the school’s approach to maths, and on teachers’ perspectives on NC assessment and the pupils in their schools. The nature of the samples and the project’s activities are set out in Table 1.

Table 1: The primary and secondary school samples

	Children Tested (n)	Children Interviewed (n)	Teachers Interviewed (n)	Lessons Observed (n)
Key Stage 2
School A	63	63	4	4
School B	44	44	3	4
School C	29	29	6	5
Total KS2

Key Stage 3
School D	254	50	6	10
School E	102	37	5	5
School F	117	36	4	5
Total KS3

Results

In this paper we will report on Key Stage Two #(9) . We begin with an intrinsic discussion of one item in order to illustrate the type of issues which arise when mathematical operations are contextualised within ‘realistic’ settings. This happens to be a KS3 item, though similar items appear at KS2.

Figure 1: ‘Realistic’ items and ambiguity: an illustration (from SEAC, 1992).

Statement of Attainment: "Solve number problems with the aid of a calculator, interpretting the display (2.4d)

The marking scheme (Band 1 - 4, Pper 1) gives as "approximate evidence" of achievement: "Gives the answer to the division of 269 by 14 as 20, indicating that they have interpreted the calculator display to select the most appropriate whole number in this context. Do not accept 19 or 19.2

The item in Figure 1 is one of a type much discussed in mathematical education circles (e.g. Verschaffel, De Corte, & Lasure, 1994).The key point is that the child’s answer must not be fractional. The lift can not go up (and down) 19.2 times. The child is required therefore to introduce a ‘realistic’ consideration into his or her response. In fact the child must manage much more than this. S/he must introduce only a small dose of realism - ‘just about enough’. S/he must not reflect that the lift might not always be full; or that some people might get impatient and use the stairs; or that some people require more than the average space - e.g. for a wheelchair. Such considerations - ‘too much realism’ - will lead to a problem without a single answer, and no mark will be gained #(10) . There is a certain irony here. Many reformers have argued for the use of ‘ill-structured’ items in maths teaching, learning and assessment contexts (e.g. Pandey, 1990). This item, however, is unintentionally ill-structured. Children’s and schools’ interests now hinge on managing the resulting ambiguities in a legitimate manner.

The child is asked to exercise some ‘realistic’ judgement and, in doing so, might be presumed to be undertaking a ‘realistic’ application of some mathematical (or at least arithmetical) knowledge. But on whose account of ‘applying’? The lift item essentially concerns queuing behaviour. A mathematics of queuing exists. We might turn for some insight to an elite disciplinary source. Let’s try Newer Uses of Mathematics #(11) , edited in 1978 by Sir James Lighthill, FRS, then Lucasian Professor of Applied Maths at Cambridge #(12). This edited collection includes a paper on methods of operational analysis by Hollingdale (former Head of Maths Dept at the Royal Aircraft Establishment) which discusses queuing. An edited extract follows:

Everyone, nowadays, is only too familiar with queues - at the supermarket, the post office, the doctor's waiting room, the airport, or on the factory floor. Queues occur when the service required by customers is not immediately available. Customers do not arrive regularly and some take longer to serve than others, so queues are likely to fluctuate in length - even to disappear for a time if there is a lull in demand…. The shopper leaving the supermarket, for example, desires service; the store manager wants to see his cashiers busy most of the time. If customers have to wait too long, some will decide to shop elsewhere; … The essential feature of a queuing situation, then, is that the number of customers (or units) that can be served at a time is limited so there may be congestion. …. Queuing problems lend themselves to mathematical treatment and the theory has been extensively developed during the last seventy years. …The raw materials of queuing theory are mathematical models of queue-generating systems of various kinds. The objective is to predict how the system would respond to changes in the demands made on it; in the resources provided to meet those demands; and in the rules of the game, or queue discipline as it is usually called. Examples of such rules are: 'first come, first served'; 'last come, first served', as with papers in an office 'in-tray'; service in an arbitrary order; or priority for VIPs or disabled persons. To analyse queuing problems, we need information about the input (the rate and pattern of arrival of customers), the service (the rate at which customers are dealt with either singly or in multiple channels), and the queue discipline…. (Hollingdale, in Lighthill, pp. 244-245)

The question which arises then is would any of these models deliver the correct answer according to the producers of marking schemes for National Curriculum tests #(13) . If not, why not, and what approach does? Can the ‘required’ approach be specified via teachable ‘rules of engagement’ for such items? If not, why not? Should they be?

Various writers have employed the notion of educational ground rules to capture what is demanded of children in cases like the lift item (Mercer & Edwards, 1987). There is clearly some affinity between this concept and those of recognition and realisation rules as employed by Bernstein (1996). However, it can be seen that it would be quite difficult - if not impossible - to write a set of rules which would enable the child to respond as required to the lift question. Certainly, the rule - in the sense of a mandated instruction - to employ ‘realistic’ considerations would not do, since ‘how much’ realism is required remains a discretionary issue. It is this problem that has led to a range of attacks on the use of rules to model human activities (e.g.Taylor, 1993) and, in particular, has led Bourdieu to reject a rule-based account of cultural competence (see Bourdieu, 1990a). His concept of habitus aims to capture the idea of a durable socialised predisposition without reducing behaviour to strict rule-following (Bourdieu, 1990c). Bourdieu sometimes describes what habitus captures as ‘a feel for the game’ and we can see that this describes fairly well what is required by the lift problem and others like it #(14) . Both Bernstein and Bourdieu have shown that members of the working class are more likely to respond to test-like situations by drawing on ‘local’ and/or ‘functional’ rather than ‘esoteric’ and/or ‘formal’ perspectives #(15) . We have shown elsewhere that this can lead to the relative underestimation of these children’s mathematical capacities when test items are superficially ‘realistic’ but actually demand an ‘esoteric’ response (Cooper, 1996, 1998b; Cooper & Dunne, 1998). Because of lack of space, we will not present any findings concerning the lift item here, nor will we be able to present the explanatory perspective. We move instead to present a statistical overview of children’s relative performance on ‘realistic’ and ‘esoteric’ items at KS2. We have already described our simple coding of ‘realistic’ and ‘esoteric’ items. The lift item can serve as an exemplar of the former #(16) . The following is an example of the latter:

Figure 2: coded as ‘esoteric’ (Key Stage 2: SCAA, 1996)

Quantitative Analysis: an Overview

Each separately marked item or sub-item #(17) of the three tests taken by 10/11 year olds was coded on a variety of dimensions including a two-fold division into what we have termed ‘realistic’ or ‘esoteric’ #(18) items using a rule which is simple to state though not always easy to operationalise. An item has been categorised as ‘realistic’ if it contains either persons or non-mathematical objects from ‘everyday’ settings #(19) . Otherwise it is coded as ‘esoteric’. For each child the percentages of total marks scored on the two categories of items #(20) were calculated, giving a ‘realistic’ and an ‘esoteric’ percentage for each child. Then, for each child a ratio was created by dividing the ‘realistic’ by the ‘esoteric’ percentage’ achieved. Table 2, Table 3 and Table 4 show the distribution by social class and sex of the two percentages and the resulting ratio for the primary school children for whom we have full relevant information #(21) . Our social class categories are set out in Appendix 1.

Table 2: Percentage score achieved on KS2 ‘realistic’ items on the three tests by class and sex

	Female	Female	Male	Male	Total	Total
Class	Mean	Count	Mean	Count	Mean	Count
Service class	57.74	26	60.33	34	59.21	60
Intermediate class	55.68	13	55.04	17	55.32	30
Working class	47.34	13	51.07	20	49.60	33
Total	54.62	52	56.46	71	55.68	123

Table 3: Percentage score achieved on KS2 ‘esoteric’ items on the three tests by class and sex

	Female	Female	Male	Male	Total	Total
Class	Mean	Count	Mean	Count	Mean	Count
Service class	71.07	26	70.10	34	70.52	60
Intermediate class	70.35	13	69.98	17	70.14	30
Working class	65.71	13	64.69	20	65.09	33
Total	69.55	52	68.54	71	68.97	123

Table 4: Ratio of KS2 ‘realistic’ percentage to ‘esoteric’ percentage by class and sex

	Female	Female	Male	Male	Total	Total
Class	Mean	Count	Mean	Count	Mean	Count
Service class	.81	26	.88	34	.85	60
Intermediate class	.79	13	.79	17	.79	30
Working class	.71	13	.79	20	.76	33
Total	.78	52	.83	71	.81	123

Ratios such as these have properties that can make them difficult to interpret. In particular, a ratio of percentages will have an upper bound set by the size of its denominator. If, for example, a child scores 50% as their ‘esoteric’ subtotal then their highest possible r/e ratio will be 100/50 or 2. If another child, on the other hand, scores 40% as their ‘esoteric’ subtotal their highest possible ratio will be 100/40 or 2.5. Since service class children, on average, do better than others on the ‘esoteric’ subsection of the tests their potential maximum r/e ratio is lower than that for the working class children who score lower on the ‘esoteric’ subsection. Notwithstanding this, Table 4 shows that the service class children have the highest ratios of any group.

There is a clear relation of this ratio to social class background, with its value ranging from 0.85 for the service class, through 0.79 for the intermediate grouping, to 0.76 for the working class for boys and girls taken together #(22) . Service class children as a whole have a better performance on ‘realistic’ items in relation to ‘esoteric’ items than do working class children. The relation of the ratio to class is particularly clear in the case of girls. Looking at sex, the r/e ratio is higher for boys in both the service and working class groups, though it is identical for girls and boys in the intermediate grouping #(23) . The class effect is illustrated in Figure 3, where two linear regression lines have been fitted to capture the ‘realistic’-‘esoteric’ relation for these two class groupings. What this finding suggests is that, all other things being equal, the higher the proportion of ‘realistic’ items in a test, the greater will be the difference in outcome between service and working class children.

It is important to stress that these class differences are not ones of kind. There is much overlap in the three distributions of these ratios by social class. The differences in Table 4 are differences ‘on average’ not of kind. The charts in Cooper et al (1997) demonstrate this clearly. However, it is also worth noting that, given the many other dimensions on which these test items differ within the categories ‘realistic’ and ‘esoteric’, it is also possible that these results underestimate the importance of the effect of ‘realistic’ versus ‘esoteric’ contextualisation. It is perhaps surprising that the effect appears at all amidst all this ‘noise’ #(24) .

Figure 3: The distribution of KS2 ‘realistic’ percentages by ‘esoteric’ percentages by child (service class and working class only #(25) )

Social class may, of course, only appear to be a causal factor here. It might be the case, for example, that ‘ability’, some concomitant of school attended such as curriculum coverage, and/or systematic differences in the easiness of the ‘realistic’ versus ‘esoteric’ items are the real underlying causes of the results in Table 4. We have tried to approach these problems from two directions. First, we have used logistic regression to examine the associations between school, ‘ability’, sex, class and the ratio. Secondly, concerning curriculum topic/area we have looked at how the ratio varies within Attainment Targets. The regression analysis (Cooper, Dunne & Rodgers, 1997) with our ‘realistic’/ ‘esoteric’ ratio as dependent variable and social class, sex, school and non-verbal ‘ability’ as independent variables, suggests that class and sex are statistically significant here and that school and non-verbal ‘ability’ are not #(26) . Details of the analysis by Attainment Target are set out in the following section of the paper.

The differences within Attainment Targets

How do these class and gender differences in the r/e ratio behave within attainment targets, i.e. in relation to broad topic areas within maths #(27) . In fact, Tables 5-7 show that the class and gender differences continue to appear within ‘number’, ‘algebra’ and ‘shape and space’.

Table 5: Ratio of ‘realistic’ percentage to ‘esoteric’ percentage by class and sex (number)

	Female	Female	Male	Male	Total	Total
	Mean	Count	Mean	Count	Mean	Count
Service class	.79	26	.83	34	.81	60
Intermediate class	.82	13	.81	17	.81	30
Working class	.78	13	.79	20	.78	33
Total	.79	52	.81	71	.80	123

Table 6: Ratio of ‘realistic’ percentage to ‘esoteric’ percentage by class and sex (algebra)

	Female	Female	Male	Male	Total	Total
	Mean	Count	Mean	Count	Mean	Count
Service class	.69	26	.88	34	.80	60
Intermediate class	.66	13	.71	17	.69	30
Working class	.57	13	.56	20	.56	33
Total	.66	52	.75	71	.71	123

Table 7: Ratio of ‘realistic’ percentage to ‘esoteric’ percentage by class and sex (shape & space)

	Female	Female	Male	Male	Total	Total
	Mean	Count	Mean	Count	Mean	Count
Service class	1.17	26	1.17	34	1.17	60
Intermediate class	1.04	13	1.13	17	1.09	30
Working class	1.04	13	1.19	20	1.13	33
Total	1.10	52	1.17	71	1.14	123

The patterns are less clear than they were in Table 4 but are nevertheless there. In each case an overall service/working class comparison of the r/e ratio favours the service class against the working class. In parallel with this, an overall male/female comparison of the r/e ratio consistently favours the boys. These differences are particularly marked in the case of algebra. It is also interesting to note that, in the case of ‘shape and space’, the children found the ‘realistic’ items generally easier than the ‘esoteric’ ones. Nevertheless, the r/e ratio remains highest in the case of the service class taken as a whole, and boys have a higher ratio than girls. We are not able to present a table for the case of data handling since all of the items under this heading have been coded as ‘realistic’. However, some idea can be gained of the ‘behaviour’ of the latter items in relation to class by examining their position in Table 8. Here we show how children from each class group performed on each of the seven attainment target - context coding combinations. Table 9 shows comparable calculations for boys and girls. Comparing the service class with the working class, and boys with girls, there appear to be similar class and gender effects across attainment targets, suggesting that the differences in the r/e ratio in Table 4 are not ‘spurious’ topic effects.

Table 8: Mean percentage scores by class for each existing attainment target/context combination

	Service class	Inter-mediate class	Work-ing class	Total	service mean/ working mean	number of separately coded items & sub-items
Number - ‘esoteric’	78.81	77.94	75.61	77.74	1.04	21
Number - ‘realistic’	64.05	63.57	59.96	62.83	1.07	22
Algebra - ‘esoteric’	68.46	66.92	61.54	66.23	1.11	10
Algebra - ‘realistic’	50.60	44.05	30.74	43.67	1.65	11
Shape & space - ‘esoteric’	66.79	66.19	57.58	64.17	1.16	11
Shape & space - ‘realistic’	72.50	66.33	60.00	67.64	1.21	9
Handling data - ‘esoteric’	n/a	n/a	n/a	n/a	n/a	0
Handling data - ‘realistic’	62.86	55.42	49.34	57.42	1.27	26
n (children)	60	30	33	123		110

Table 9: Mean percentage scores by sex for each existing attainment target/context combination

	Girls	Boys	Total	Boys’ Mean / Girls’ Mean	number of separately coded items & sub-items
Number - ‘esoteric’	76.63	78.27	77.56	1.02	21
Number - ‘realistic’	60.71	63.88	62.51	1.05	22
Algebra - ‘esoteric’	68.23	64.36	66.03	0.94	10
Algebra - ‘realistic’	42.06	44.37	43.37	1.05	11
Shape & space - ‘esoteric’	63.49	64.08	63.89	1.01	11
Shape & space - ‘realistic’	65.19	68.87	67.28	1.06	9
Handling data - ‘esoteric’	n/a	n/a	n/a	n/a	0
Handling data - ‘realistic’	55.67	58.19	57.10	1.05	26
n (children)	54	71	125		110

Another possibility which needs to be addressed is that it is because the ‘esoteric’ items are, in general, found easier in this data set, coupled with class related differences in typical educational achievement, that the r/e ratio patterns by class are as they are. Perhaps working class children just perform less well on harder items? In fact, however, statistical analyses employing items rather than the child as the case have shown that broad social class differences in a relative of this ratio remain (though are reduced in importance #(28) ) when examined within four categories of items ordered by average difficulty levels #(29) . The means in Table 10 derive from a variable constructed by dividing, for each item, the service class mean score by the working class mean score #(30) . It can be seen that, within each category of items, from the most easy to the most difficult, the service class children perform relatively better than working class children on ‘realistic’ items as compared to ‘esoteric’ items #(31) .

Table 10: Ratios of service class mean score to working class mean score for an item by observed item difficulty and nature of item (count is of items)

	Realistic Items		Esoteric Items		Total Items
Item difficulty levels	Mean	Count	Mean	Count	Mean	Count
1. Most difficult quartile	1.62	21	1.37	6	1.56	27
2. Second quartile	1.42	18	1.20	9	1.35	27
3. Third quartile	1.21	14	1.10	12	1.16	26
4. Most easy quartile	1.06	15	1.03	15	1.04	30
Totals	1.35	68	1.14	42	1.27	110

These effects may appear small. However, in the world of educational practice, where decisions are often taken on the basis of thresholds being achieved or not by children, differences of this size can have large effects. To illustrate this, we have developed a simulation of what would happen to children from different social class backgrounds if a selection process were to occur on the basis of three differently composed tests: one comprising items which behave like our ‘esoteric’ items, one of items which behave like our ‘realistic’ items, and one comprising an equal mixture of the two #(32) . This process might be realised as a selection exam for secondary school or for set placement within the first year of secondary school. A summary of the results is shown in Table 11 and Figure 4. It can be seen that, using our results as the basis for predicting outcomes, the proportion of working class children in this sample who would be selected by an ‘esoteric’ test is double that which would be selected by a ‘realistic’ test. The two tests lead to quite different outcomes, mainly for intermediate and working class children #(33) .

Table 11: Percentage outflow selected from classes under three simulated testing regimes (KS2)

	Esoteric Test (26% selected in total)	Mixed Test (½ & ½) (26.8% selected in total)	Realistic Test (27.6% selected in total)
Percentage selected
Service Class	30.0	33.3	33.3
Intermediate Class	20.0	23.3	33.3
Working Class	24.2	18.2	12.1

Figure 4: Percentage of children selected from each social class under three simulated testing regimes (KS2)

A similar simulation for sex does not show such large effects, reflecting the smaller differences in the realistic/esoteric ratio in Table 4. While in the case of class, a move from ‘realistic’ through mixed to ‘esoteric’ composition linearly increases the proportion of working class children selected, any pattern for sex is less clear (see Table 12 and Figure 5).

Table 12: Percentage outflow selected from sexes under three simulated testing regimes

	Esoteric Test (26% selected in total)	Mixed Test (½ & ½) (26.8% selected in total)	Realistic Test (27.6% selected in total)
Percentage selected
Girls	22.2	18.5	22.2
Boys	28.2	32.4	31.0

Figure 5: Percentage of boys and girls selected under three simulated testing regimes

Given the small cell sizes which would result, we will not present a simulation for the six sex/class groups.

Discussion

Considering the marked class effect, a key issue begs to be explored. Is there any evidence that ‘realistic’ items, for various reasons, are underestimating working class capacities relatively more than those of children from other class backgrounds? Might they be differentially valid in general? Or is it the case that ‘realistic’ items happen to demand ‘legitimately’ some mathematical capacities which are more social class-related than those required by ‘esoteric’ items? We have presented evidence elsewhere suggesting that part of the social class effect found is due to the social class distribution of children’s ‘choice’ of an ‘illegitimate’ and ‘inappropriate’ ‘everyday’ response mode rather than to their lack of mathematical capacity per se (Cooper, 1996, 1998b; Cooper, Dunne & Rogers, 1997; Cooper & Dunne, 1998). We had hoped to address these explanatory issues here, but space makes this impossible. We have discussed the use of Bernstein and Bourdieu’s ideas to make sense of these findings in these papers and we must refer the reader to these. We are, in a current ESRC project, exploring similar arguments concerning gender. However, whatever the best explanation of these findings is, one thing is clear. Serious equity issues seem to be raised, probably unintentionally, by the continuing emphasis on the ‘realistic’ contextualisation of maths, especially when this is carried over into national test contexts. Darling-Hammond (1994), amongst others, has raised similar concerns about performance assessment in the USA. Whether these are problems that can be addressed successfully by teachers remains to be established.

Acknowledgements

This work was mainly funded by the ESRC (Project: R000235863, 1995-1997). Nicola Rodgers worked as a Research Assistant on the project for seven months in 1996. We would like to thank her for her contribution. We would like also to thank all of the teachers and children in the six schools for putting up with our constant demands over most of a year; and also Beryl Clough, Hayley Kirby and Julia Martin-Woodbridge for their work in so patiently transcribing interviews.

Appendix 1: Occupational Groupings

(combined from Goldthorpe & Heath, 1992 & Erikson & Goldthorpe, 1993)

1 Service class, higher grade: higher grade professionals, administrators and officials; managers in large industrial establishments; large proprietors.
2 Service class, lower grade: lower grade professionals, administrators and officials; higher grade technicians; managers in small industrial establishments; supervisors of non-manual employees.
3 Routine non-manual employees
4 Personal service workers
5 Small proprietors with employees
6 Small proprietors without employees
7 Farmers and smallholders
8 Foremen and technicians

9 Skilled manual workers
10 Semi- and unskilled manual workers
11 Agricultural workers

We have collapsed 1&2 into a service class, 3-8 into an intermediate class, and 9-11 into a working class.

Notes

(1) b.cooper@sussex.ac.uk and mairead.dunne@sussex.ac.uk
(2) For fuller accounts of the policy background, see Ball (1990, 1994); Brown (1992, 1993); Cooper (1994a, 1994b).
(3) Though, at the same time, dominant versions of school maths also incorporated newer applications of maths (Cooper, 1985a).
(4) For discussions of similar issues in the case of science see Morais et al (1992) for class; and Murphy (1996) for gender. For a useful discussion of the competence/performance distinction see Wood & Power (1987). The distinction clearly begs many theoretical questions. It does, however, allow a particular critical perspective to be taken on the validity of test items.
(5) For earlier examples of this strategy see Mehan (1973) and Bourdieu (1984).
(6) These have included type of contextualisation, ‘wordiness’, difficulty levels, attainment target, type of response required, and use of pictorial representation.
(7) It is important to note that, in order to maintain comparability across the years 1992-1996, we have worked within the 1991 NC framework for maths which comprised 5 Attainment Targets one of which, Using And Applying Mathematics was not assessed via the tests. The other 4 were Number, Algebra, Shape And Space, and Handling Data. For the same reason, and because we wished to throw light on the use of Statements of Attainment (SoA), we have also used the SoAs allocated to items where appropriate and, in some analyses, have allocated SoAs to items in years where the official rubric had not (mainly coded by examination of comparable previous items).
(8) For discussion of issues of definition and coding re class, see Cooper & Dunne (1998).
(9) Work on Key Stage Three is reported in the Project Report to the ESRC (October 1997, currently being peer-reviewed) and forms part of a book currently in preparation.
(10) See Cooper (1992, 1994b) for a fuller discussion.
(11) In the preface, Lighthill says, “…we want to outline some of the many ways of using mathematics for significant practical purposes …”
(12) Other holders of this post have included Sir Isaac Newton and Stephen Hawking.
(13) These producers of the items and marking schemes exist in a field distinct from that of Hollingdale, of course.
(14) For a qualitative comparison across a range of test items of two children who differ markedly in their ‘feel for the game’ see Cooper (1996, 1998b).
(15) For examples of such research, see Holland (1981), and Bourdieu (1986).
(16) Clearly, ‘realistic’ items differ amongst themselves in numerous ways. In particular, some require ‘realistic’ considerations to be taken into account; others do not. The latter typically embed a ‘hidden’ mathematical structure in the ‘noise’ of the ‘realistic’ (see Cooper, 1992 & Cooper & Dunne, 1998, for discussion of some examples).
(17) In a few cases there is some dependency of one sub-item on another.
(18) Given that in some cases a person appeared just to introduce the item we experimented with a threefold category system, putting such items into a category we termed ‘ritualistically’ ‘realistic’. However, in the end, we decided not to pursue this as we felt unable to judge, when coding items prior to analysis of data, whether what to us might seem ‘ritualistic’ might seem the same to a child.
(19) Clearly, it is possible to raise questions here about whose ‘everyday’ and whose ‘esoteric’. We wish ‘everyday’ here to refer to such activities as shopping, sport, etc. of which we can assume most children have some knowledge and personal experience. Solving x2 - 3 = 6 might well be describable as ‘everyday’ by reference to some group’s behaviour in some setting, but we assume here that such items are recognisably different from those which embed maths in shopping etc. The purpose of our distinction is not to legislate on what ultimately counts, in some universalistic way, as ‘everyday’ or ‘esoteric’, but to enable empirical analysis of important issues to get off the ground.
(20) The final handful of items from our ‘mock’ test were omitted from this analysis in order to only include items which all or very nearly all children had definitely attempted. 110 separate items or part-items entered the analysis. Two-thirds of the items come from the 1996 tests and a third from earlier incarnations of the NC tests.
(21) We are providing tests of significance in footnotes, though we have some doubts about their value. Our samples are not the sort of simple random samples which the maths of significance testing generally assumes (e.g. Hoel, 1971). Neither are the members our samples selected independently of one another, given the decision (the only practical one) to select schools as our basic unit. We tend to see the relationships discussed as features of these particular groups of Year 6 and Year 9 children. Whether the relationships are likely to generalise to larger populations is, for us, as much a matter of theoretical plausibility as of the application of significance testing to the data.
(22) An analysis of variance (simple factorial) of the r/e ratio by social class finds the differences between classes to be statistically significant (p=0.005).
(23) An analysis of variance (simple factorial) of the r/e ratio by sex finds the differences between sexes to be statistically significant (p=0.027). A further analysis of variance including both class and sex finds both independent variables significantly related to the r/e ratio (class: p=0.003; sex: p=0.048) and finds the class/sex interaction to be non-significant. R-squared is 13.7% (adjusted R-squared 10%).
(24) Furthermore it is a common error of empiricism to move from the absence of an effect, or the small size of it, to the absence of a mechanism, forgetting that the effects of a real mechanism may be hidden by other factors at work. See, e.g., Bhaskar (1979).
(25) The line for the intermediate class falls between these two with a similar slope.
(26) Logistic regression, employing backward elimination. It should be noted, however, that statistical significance is difficult to interpret when procedures such as logistic regression are applied to samples such as ours which are not simply random. See, e.g., Gilbert, (1993) pp.77-78.
(27) Early versions of the English national curriculum assumed that each test item could be associated with one statement of attainment - a form of behavioural objective within each of the Attainment Targets of ‘number’, etc. We are not believers in the idea that an item can assess just one statement of attainment from within an attainment target. However, we are following the early ‘official’ practice of the national curriculum assessors in coding each item (or part-item) as belonging to one AT. Clearly, any item is likely actually to demand a cluster of skills and understandings for its solution. More recently, the National Curriculum test papers have dropped the labelling of each item by one statement of attainment. We have taken the ‘official’ coding where it exists and have tried to simulate it in the case of more recent items where it does not. Some of these codings are difficult for the very reason mentioned above.
(28) An analysis of variance of this service/workng class ration by difficulty level and nature of the item (‘realistic’ v. ‘esoteric’) finds both independent variables significant (difficulty: p=0.001; nature of item: p=0.051), with the interaction term non-significant. R-Squared is 25.7% (adjusted R-Squared is 20.5%).
(29) Similarly, the findings hold when the ‘wordiness’ of items is controlled for.
(30) Differences in measured ‘ability’ are automatically controlled for in this approach, as in the use of the realistic/esoteric ratio earlier.
(31) The nature of this ratio is such that it is constrained to be smaller as difficulty level falls.
(32) Because of ties in the data, it has been necessary to select very slightly different overall proportions of children in the three cases: 26% for the ‘esoteric’ simulation, 27.6% for the ‘realistic’ simulation, and 26.8% for the mixed test. These are small differences in relation to the size of the resulting effects.
(33) The findings would also have implications for any comparison of schools via league tables based on the three simulated tests discussed here. We have not attempted to apply significance testing to these models. It should be recalled that they employ social class and sex differences previously shown to be statistically significant in the treatment of the ‘realistic’/’esoteric’ ratio.

References

Ball, S.J. (1990) Politics and Policy Making in Education, London, Routledge.

Ball, S.J. (1994) Education Reform, Open University, Buckingham.

Bernstein, B. (1990) The Structuring of Pedagogic Discourse, London, Routledge.

Bernstein, B. (1996) Pedagogy, Symbolic Control and Identity: Theory, Research, Critique, Taylor & Francis, London.

Bhaskar, R. (1979) The Possibility of Naturalism, Sussex, Harvester.

Boaler, J. (1993a) "The role of contexts in the mathematics classroom: do they make mathematics more "real"’? For the Learning of Mathematics, 13, 2, 12-17

Boaler, J. (1993b) "Encouraging the transfer of 'school' mathematics to the 'real world' through the integration of process and content, context and culture", Educational Studies in Mathematics, 25, 341-373.

Boaler, J.: (1994) ‘When do girls prefer football to fashion? An analysis of female underachievement in relation to "realistic" mathematics contexts’, British Educational Research Journal, 20, 5, 551-564.

Bourdieu, P. (1984) Homo Academicus, Paris, Éditions de Minuit.

Bourdieu, P. (1986) Distinction: A social critique of the judgement of taste, RKP, London

Bourdieu, P. (1990a) "From rules to strategies", in his In Other Words, Cambridge, Polity.

Bourdieu, P. (1990b) In Other Words, Cambridge, Polity Press.

Bourdieu, P. (1990c) The Logic of Practice, Oxford, Blackwell.

Bourdieu, P. (1994) Raisons Pratiques: sur la théorie de l’action, Paris, Seuil.

Brown, M. (1992) "Elaborate nonsense? The muddled tale of Standard Assessment Tasks at Key Stage 3", in Gipps, C. (Ed) Developing Assessment for the National Curriculum, Kogan Page & London University Institute of Education, pp. 6-19.

Brown, M. (1993) Clashing Epistemologies: the Battle for Control of the National Curriculum and its Assessment, Professorial Inaugural Lecture, King's College, London.

Camilli, G. & Shepard, L. (1994) Methods for Identifying Biased Test Items, London, Sage.

Cockcroft, W.H.: (1982) Mathematics Counts, London, HMSO.

Cooper, B. (1983) "On explaining change in school subjects", British Journal of Sociology of Education, 4(3), pp. 207-22.

Cooper, B. (1985a) Renegotiating Secondary School Mathematics: a Study of Curriculum Change and Stability, Basingstoke, Falmer Press.

Cooper, B. (1985b) "Secondary school mathematics since 1950: reconstructing differentiation", in Goodson, I.F. (Ed) Social Histories of the Secondary Curriculum, Barcombe, Falmer, pp.89-119.

Cooper, B. (1992) "Testing National Curriculum Mathematics: Some critical comments on the treatment of 'real' contexts for mathematics", in The Curriculum Journal, pp. 231-243.

Cooper, B. (1994a) "Secondary mathematics education in England: recent changes and their historical context", in Selinger, M. (Ed) Teaching Mathematics, London, Routledge, pp5-26.

Cooper, B. (1994b) "Authentic testing in mathematics? The boundary between everyday and mathematical knowledge in National Curriculum testing in English schools", in Assessment in Education: Principles, Policy and Practice, 1, 2, pp. 143-166.

Cooper, B. (1996) "Using Data From Clinical Interviews To Explore Students’ Understanding of Mathematics Test Items: Relating Bernstein and Bourdieu on Culture to Questions of Fairness in Testing", Paper presented to the Symposium: Investigating Relationships Between Student Learning and Assessment in Primary Schools, American Educational Research Association Conference, New York, April 1996.

Cooper, B. (1998a) "Assessing National Curriculum Mathematics in England: Exploring children’s interpretation of Key Stage 2 tests in clinical interviews", Educational Studies in Mathematics, 35, 1, 19-49.

Cooper, B. (1998b, forthcoming) "Using Bernstein and Bourdieu to understand children’s difficulties with ‘realistic’ mathematics testing: an exploratory study", in International Journal of Qualitative Studies in Education, 11, 4.

Cooper, B. & Dunne, M. (1998) "Anyone for tennis? Social class differences in children’s responses to national curriculum mathematics testing", The Sociological Review, 46, 1.

Cooper, B., Dunne, M., & Rodgers, N. (1997) "Social class, gender, item type and performance in national tests of primary school mathematics: some research evidence from England", paper presented at the Annual Meeting of the American Educational Research Association, Chicago, March 1997.

Darling-Hammond, L. (1994) "Performance-based assessment and educational equity", Harvard Educational Review, 64, 1, pp.5-30.Dearing, R. (1993) The National Curriculum and its Assessment: Final Report, London, SCAA.

Dearing, R. (1993) The National Curriculum and its Assessment: Final report, SCAA.

Department of Education and Science/Welsh Office (1988) National Curriculum: Task Group on Assessment and Testing: A Report, DES/WO.

Dowling, P. (1991) "A touch of class: ability, social class and intertext in SMP 11-16", in Pimm, D. & Love, E. (Eds) Teaching and Learning School Mathematics, London, Hodder & Stoughton.

Dowling, P. (1998) The Sociology of Mathematics Education: Mathematical Myths/Pedagogic Texts, Falmer Press.

Erikson, R. & Goldthorpe, J.H. (1993) The Constant Flux: A Study of Class Mobility in Industrial Societies, Oxford, Clarendon.

Gilbert, N. (1993) Analysing Tabular Data: Loglinear And Logistic Models For Social Researchers, University College London Press.

Gipps, C. & Murphy, P. (1994) A Fair Test? Assessment, Achievement and Equity, Open University Press.

Goldthorpe, J. & Heath, A. (1992) "Revised class schema 1992", Working Paper 13, Nuffield College Oxford.

Hoel, P.G. (1971) Introduction to Mathematical Statistics, 4^th Edition, New York, Wiley.

Holland, J. (1981) "Social class and changes in orientation to meaning", in Sociology, 15, 1, 1-18.

Lighthill, J. (1978) (Ed) Newer Uses of Mathematics, Penguin Books.

Mehan, H. (1973) "Assessing children’s school performance", in Dreitzel, H.P. (Ed) Childhood and Socialisation, Canada, Collier-Macmillan.

Mercer, N. & Edwards, D. (1987) Common Knowledge: The Development of Understanding in the Classroom, London, Methuen.

Messick, S. (1989) "Validity", in Linn, R. (Ed) Educational Measurement, 3rd Edition, London, Collier Macmillan.

Messick, S. (1994) "The interplay of evidence and consequences in the validation of performance assessments", Educational Researcher, 23, 2, 13-23.

Morais, A., Fontinhas, F. & Neves, I. (1992) "Recognition and realisation rules in acquiring school science: the contribution of pedagogy and social background of students", British Journal of Sociology of Education, 13, 2, 247-270.

Murphy (1996) "Assessment practices and gender in science", in Parker, L.H. et al (Eds) Gender, Science and Mathematics, Kluwer.

Pandey, T. (1990) "Power items and the alignment of curriculum and assessment", in Kulm, G. (Ed) Assessing Higher Order Thinking in Mathematics, Washington, AAAS.

SCAA - Schools Curriculum and Assessment Authority (1995a) Mathematics Tests Key Stage 2 1995, London, Dept. for Education.

SCAA - Schools Curriculum and Assessment Authority (1995b) Mathematics Tests Key Stage 3 1995, London, Dept. for Education.

SCAA - Schools Curriculum and Assessment Authority (1996) Key Stage 2 Tests 1996, London, Dept. for Education and Employment.

SEAC - Schools Examinations and Assessment Council (1992) Mathematics Tests, 1992, Key Stage 3, SEAC/University of London.

SEAC - Schools Examinations and Assessment Council (1993a) 1993 Key Stage 3 Mathematics Tests, DES/WO.

SEAC - Schools Examinations and Assessment Council (1993b) Pilot Standard Tests: Key Stage 2: Mathematics, SEAC/University Of Leeds.

Taylor, C. (1993) "To follow a rule …" in Calhoun, C., Lipuma, E. & Postone, M. (Eds) Bourdieu: Critical Perspectives, Cambridge, Polity

Verschaffel, L., De Corte, E. & Lasure, S. (1994) "Realistic considerations in mathematical modelling of school arithmetic word problems", Learning and Instruction, 4, 273-294.

Wood, R. & Power, C. (1987) "Aspects of the competence-performance distinction: educational, psychological and measurement issues", in Journal of Curriculum Studies, 19, 5, 409-424.

Wood, R. (1991) Assessment and Testing, Cambridge, Cambridge University Press.