Students were initially introduced to the idea with the help of few examples and explaining the 1854 Broad Street cholera outbreak problem. They were then’ introduced to the term Data science, and were asked what they thought Data was and what Science was. We followed up with more personal examples to which they could relate to. Probably Cholera is not so relatable but predicting one’s monthly expenditure and then asking for allowance accordingly was a good example. My personal aim was for them to understand the data driven approach of decision making which they could inculcate in their daily lives and not really compulsorily make careers out of it. I’ve personally seen the change in my decision making after I started working with data at Aspiring Minds and otherwise. A data-driven approach nurtures an organized and structured way of thinking which I think is a very valuable skill. So we talked about few examples from their life. We emphasized on how they could start keeping records of their personal lives and be more organized.
You can read about the basic structure of the experiment here.
There was mentor for every 2 students, and I was mentoring Tanish (right) and Vaibhav (left) who were in 5th and 8th grade respectively.
Above: Vaibhav (left) and Tanish(right) donning the personalized data flavoured ID cards with poems and TILs. Click here to read the story behind making them.
The initial training set was made by making the students rate flashcards based on whether or not they would befriend a particular person. The features visible to them were the name of the person, an activity which the person liked and the face of the person.
Tanish, the fifth grader, was not really looking at the names at all.
‘I don’t look at names before making friends’ said Tanish so innocently.’
He was probably only concentrating on the activities.
I asked him why he didn’t rate highly on the befriend scale a person who boxed and he said
‘What if he punches me?’
But then he did rate highly a person who knew Judo and contradicted himself. In his defence he said, ‘Judo is okay. I am good at Judo.’
He had a problem understanding a few activities on the flashcard like calligraphy and snooker which was understandable. Maybe easier words could be used if we plan to focus on his age group in the future.
Vaibhav was done with his grading pretty quickly. He wanted to know what I did at work and what sort of research I worked on. I was a little stumped by the question. How do I explain it to an 8th grader? But before I could form an answer, Tanish was done with grading his cards too and I decided to tell Vaibhav about my work later and if time didn’t permit he could always ping me on email.
After the grading was done we removed the last few sheets and kept it in an envelope to make the test split. I told them we won’t be looking at them initially to which Tanish replied, ‘Why? Are they personal?’
One thing to notice is here was that Vaibhav took hardly a second to grade a flashcard, but Vaibhav took more than a few seconds. Obviously the age and maturity had an effect on it, but could it also mean Tanish was actually spending time thinking about people portrayed in the flash cards and capturing the underlying features and Vaibhav was just quickly reading through the activities and giving a grading based on just that?
The next part was the tough part. Varun was trying to teach them the intuition behind using the gradings to figure out inferences. Well, Vaibhav could come up with a rough idea on how it could be done and how we could analyze people who’ve been graded higher to figure out preferences. Other students did catch up and understood the concept. To be honest, it is a very basic and simple concept but thinking of it yourself for the first time is a little tough. The briefing was over after exchanging of datasets and students were again dispersed to their mentors.
Once back from the briefing, we started with analyzing the overall grading of the student whose data was give to us. Let’s call the person foo for anonymity reasons. I explained it to them that to know how friendly foo was we would need to know the grades foo gave in each grade. Tanish instinctively grabbed his paper and pen to start counting. This was a good impetus to show him the usefulness of Excel. He had never interacted with Excel before. Surprisingly even Tanish knew what bar graphs were. I told them to draw the graphs they expected on paper and pen. Tanish asked me what he should make the graph about to which I replied he should make it about foo’s friendliness given the counts in each category was in front of us. He got confused. He knew how to draw bar graphs but couldn’t see how this information could be portrayed in a bar graph. I told him to draw any Bar graph he wanted to draw and later showed him how it could be transformed into a plot about our information. Vaibhav drew the exact plot which was required and even made sure his units were correct! After the paper graphs, I showed them how Excel drew them in 2 clicks and they were certainly amused.
We tried to analyze the plot and draw some inferences about foo from this. Grades less than 3 were 11 in number and more than 3 were 9. Both the kids were in consensus about how foo was not really a friendly person even though the difference in the two categories was only 2. They didn’t really take into consideration the significance of the difference between the counts of the two categories.
They couldn’t comment upon the fact that the middle bar was the highest and what could it possibly mean. Did it mean foo was not a quick decision maker and generally confused?
We later converted the grade distribution variable into a binary variable for further plots. One has to be really careful with the terms one uses in front of kids. The moment I used the word binary, hell broke loose. But explaining it in simpler terms got order back. Even while explaining concepts it is quite easy to end up using words which are not really in their dictionary but are commonly used by one in his day to day life. We plotted the number of mail friends and female friends based on the new binary feature.
Tanish wanted to name this plot as a Donut chart. We did similar plots for the other three features. Again it was a little difficult to explain the significance of differences in percentages and counts. A difference of one or two was considered not significant keeping in mind the total number of friends was small.
We couldn’t move ahead to to naiver naive bayes model which we had prepared for them because of the lack of time. We had a good finishing talk by Varun which brought things back into perspective.
I wish we had more time and we could have finished the entire exercise. I wish I had interacted with more children. I tried my best to tell them how they could use simple recording of stats in their day to day life. We talked about grades, pocket money etc. It’s just an approach to organizing life and aids in decision making, and that is personally the only thing I wanted them to take home from this camp.
I asked them how they would be using what they learned today.
Vaibhav - ‘I would like to predict my spending. I would like to predict what subjects I study I most, how frequently I study specific subject’
Tanish - ‘Exams. I would take my exam results, from the report card of every year. And then I will make it on excel and then I will remember the grades and the one I get more grades I will take a gift.’
Using data to validate requests of gifts from parents. We've got a genius!
It was really an enriching experience, I hope we can do a follow up session soon!
For science :)