I was initially given a secondary set of data, with results of a number of questions, which were answered by 1600 students. 721 were in KS2, 642 in KS3 and 237 in KS4.
The first problem I was faced with was to decide which pieces of data I would use, as there were 25 questions asked. I need to use data which is continuous, to get the best use of my results. For example, if I were to use car colour and shoe size, I would not be able to do very much with the information I had. Car colour can only be red, blue, white, etc, and show size can only be size 1-12, etc.
However, with time and height, the data is continuous, as a person can be 125.5 cm tall, not just 125 or 126, for example. After considering this, I have decided to investigate foot size and height.
To investigate any relationship between height and foot size in adolescents, aged 7 to 16 years.
The taller a person is, the longer their feet are.
Before starting my investigation, there are a few points I need to simplify.
For example, is my measurements were different, would this be suitable to investigate? I need to ensure that both variables are measured in centimetres, so I can fairly compare them. If I am going to talk about taller or shorter, I need to state what is “tall” and what is “short”. The heights I am using range from 124cm to 174cm. This means that there is a range of 50cm. So I will say that over the median is “tall” and below the median is “short”. This means that 149 and above is tall, and 148 or below is short.
As my population is so large (1600), I need to take a sample of it. To be fair, unbiased and effective, this sample needs to be representative of the whole population. It also needs to be large enough for me to draw any fair conclusions from it. It also must be large enough so I can discount any member of the population who seems anomalous, like someone over 3m tall, as they have probably entered false details.
After taking this into consideration, I have decided to take a random stratified sample.
I decided to take the first member of the population, then every 50, up to 1551, giving me 32 pieces of data in my sample.
However, two members of this sample did not answer the “height” question, which is needed for my investigation. So, I have decided not to include these results, taking my sample down to 30. If I were to include these pieces of data, many of my calculations could be wrong, and biased. For instance, if I were to calculate the mean, these two results would bring it down a lot lower, as they would count as zero.
Now that I have obtained my sample, I can go on to start my actual investigation, to determine whether there is a real relationship between height and foot size. I suspect there is.
Comparing height with foot size
The first thing I did to start my investigation, was to draw a scatter graph, using the sample I have already taken.
This graph is as follows:
I have also added a line of best fit onto this graph. From this line, you can see that the line does have positive correlation, although it is not very strong.
I decided to calculate the actual value of the correlation (rï¿½). This came out to be 0.4201, so this proves that the correlation is not very good.
From this graph, it would appear that height and foot size are related, but we cannot rely on this graph to be rock solid evidence, for a number of reasons.
1) This is only a small sample of the total population; 30:1600. For a different section on the population, the correlation could be completely different.
2) The sample population includes data from both boys and girls, of a wide range of ages. An 8 year old is, on average, shorter than a 12 year old, so this has to be kept in mind.
Looking at the number of boys and girls
My sample contains data from both boys and girls, so I need to look at the sample, to see if the number of boys compared to girls is roughly equal.
This table shows that there are four more girls than boys. This is probably not enough to make a difference to the initial correlation, however it may do. The sample is slightly biased. I need to take into account that boys may grow faster, or slower, than girls, so this would affect the overall result.
So, I need to change my sample, so it is unbiased. After considering all possibilities, I have decided to only use boys in my second sample.