Session Two: Statistics/Chart (Behrouz)

In LON-CAPA, we are involved with two kinds of large data sets:

1)    Educational resources such as web pages, demonstrations, simulations, and individualized problems designed for use on homework assignments, quizzes, and examinations;

2)    Information about users who create, modify, assess, or use these resources.

Usually, instructors/ course coordinators wish to assess the studentsÕ educational situation or evaluate the problems have bee presented in the course, after the students used the educational materials. There are two main modules in LON_CAPA that provide the statistical information for instructors/ course-coordinators: lonchart.pm and lonstatistics.pm. When an instructorÕs access is authorized, he/she can find useful reports about the course, regarding maps, problems included in each map, and the students who tried to solve the problems. When an instructor selects a course, he/she has two buttons in remote control to obtain the statistical information: ÒchartÓ or ÒstatÓ button.

lonchart.pm

ÒchartÓ button in remote control calls lonchart.pm, which provides a quick review of students tries on different problems of a course for an instructor.  The instructor may monitor the number of tries of every student in each map and its problems. The number of solved problems in a map is shown in the end of each map with a different color (green). The overall solved problems and total number of problems in the map can be seen at the end of line according to every individual student in a different color (blue). A sample of chart is shown in Fig. 3.2.1

xxxxxxx1:msu ! 001 ! 1*11*121 8 1231.31423212 12  2111211284.131 13 É  231113112221 12  162 / 188
xxxxxxx2:msu ! 003 ! 12113162 8 1+11  1  21x11110 11211322246132 14 É ############    0  149 / 188

1..9: correct by student in 1..9 tries

*: correct by student in more than 9 tries

+: correct by override

-: incorrect by override

.: incorrect attempted

#: ungraded attempted

Ô Ô: not attempted

x: excused

Fig. 3.2.1 Chart of map, problems, and studentsÕ tries, and a quick statistics of solved problems

1. When an instructor loads the chart in his/her machine once, its data is cached in his/her local machine. If he/she runs the chart again, the course chart is loaded very quickly from the cache.

2. An instructor can sort the chart according to user name, last name, as well as, the section which student belongs to.

Top of Form

Sort by:              

3. The instructor is able to select the ÒexpiredÓ students (who dropped the course earlier) or ÒactiveÓ students or ÒanyÓ (all the) students.


Student Status:      

Bottom of Form

lonstatistics.pm

In ÒstatÓ button of Òremote controlÓ, a menu with three options is provided for instructor:

1)    Problem stats

2)    Problem Analysis

3)    Student Assessment

 

 

Problem Stats

ÒProblem StatsÓ button provides a table [1] which includes statistical information about every problem, as you see in Fig. 3.2.2. The function ExtractStudentData in lonstatitics Perl Module fetches all the data from a particular student .db file into a big hash in local machine. It uses dump function, which communicates via lonc/lond to get the student data from student repository (data server) and then all versions of student submissions computed according to every problem. The results are stored in an array in the memory.  Before computing the studentsÕ tries in a particular problem, the different parts of problem are distinguished by considering the meta-data, which is provided for every problem.

Homework Set 1

P#

1

Calculator Skills

256

267

3

1.04

256

0

0.0

0.04

0.2

5.7

0.03

0.00

2

Numbers

256

414

17

1.62

255

0

0.4

0.38

1.6

5.7

0.11

0.02

3

Speed

256

698

13

2.73

255

0

0.4

0.63

2.2

1.9

0.06

0.02

4

Perimeter

256

388

7

1.52

255

0

0.4

0.34

0.9

2.4

-0.00

0.02

5

Reduce a Fraction

256

315

4

1.23

256

0

0.0

0.19

0.5

2.3

0.01

0.00

6

Calculating with Fractions

256

393

7

1.54

255

0

0.4

0.35

0.9

2.0

0.15

0.02

7

Area of a Balloon

254

601

12

2.37

247

0

2.8

0.59

1.8

1.8

-0.05

-0.02

8

Volume of a Balloon

252

565

11

2.24

243

0

3.6

0.57

1.9

2.0

-0.06

-0.03

9

Numerical Value of Fraction

256

268

4

1.05

256

0

0.0

0.04

0.2

3.4

0.01

0.00

10

Units

256

1116

20

4.36

246

0

3.9

0.78

4.2

1.9

0.18

0.03

11

Vector versus Scalar

254

749

11

2.95

251

0

1.2

0.66

2.2

1.1

-0.05

-0.05

12

Adding Vectors

253

1026

20

4.06

250

0

1.2

0.76

3.6

1.8

0.14

0.00

13

Proximity

249

663

19

2.66

239

1

3.6

0.64

2.3

2.8

0.11

-0.10

Fig. 3.2.2: Statistics table includes general statistics of every problem of the course

Every part of multi-part problems is distinguished as a separate problem. The multi-instance problem is also considered separately, because a particular problem or one part of it might be used in different maps. Finally, the array, which includes all computed information from all students, sorted according to the problem order, underlying in homework sets order. Therefore, in this step we can compute the following statistical information:

1.     #Stdnts:      Total number of students who take a look at the problem.(Let #Stdnts is equal to n)

2.     Tries:         Total number of tries to solve the problem ( where denote a student try).

3.     Mod:         Mode, Maximum Number of Tries for solving the problem.

4.     Mean:        Average Number of the Tries.

=

5.     #YES:        Number of students solved the problem correctly.

6.     #yes:          Number of students solved the problem by override.

Sometimes, a student gets a correct answer after talking with the instructor. This type of correct answer is called Òcorrected by override.

7.     %Wrng:    Percentage of students tried to solve the problem but still incorrect.

8.     S.D.:          Standard Deviation of the studentsÕ tries.

9.     Skew.:       Skewness of the studentsÕ tries.

10.  DoDiff:      Degree of Difficulty of the problem.

As you see Degree of Difficulty is always between 0 and 1. This is a good factor for an instructor to determine whether a problem is difficult, and what is the degree of this difficulty. Thus, DoDiff of each problem is saved in its meta data.

11.  Dis.F.:       Discrimination Factor [2] is an standard for evaluating how much a problem discriminates between the upper and the lower students.  First, all of the students are sorted according to a criterion. Then, %27 of upper students and %27 lower students are selected from the sorted students applying the mentioned criterion. Finally we obtain the Discrimination Factor from the following difference:

Applied a criterion in %27 upper students - Applied the same Criterion in %27 lower students.

Discrimination Factor is a number in interval [-1,1]. If this number is close to 1, it shows that only upper students have solved this problem. If it is close to 0 it shows that the upper students and the lowers are approximately the same in solving the problem. If this number is negative, it shows that the lower students have more successes in solving the problem, and thus this problem is very poor in discriminating the upper and lower students.

In lonstatistics.pm we compute the Discrimination Factor from two criteria:

1st Criterion for Sorting the Students:

2nd Criterion for Sorting the Students:

á       Change the stats table sorting

As you see in Fig. 3.2. 2, all headers in the stats table are buttons that change the order of the table. Users can change increasingly or decreasingly every column of the table. First the user select the ÒascendingÓ or ÒdescendingÓ option, then he/she can change the order of the table with clicking the header of his/her interested header. If the user changes the order the table, all information is shown in one table, each row corresponds to a particular problem. If the user selects the first column, Òhomework set orderÓ, the information is shown in different tables, each table corresponds to a particular homework set.  

á       Graphical chart

Two important features in this page might be seen through the graphical charts. That is, a user could see the content of Ò%wrongÓ column and Òdegree of difficulty of problemsÓ in the graphical chart as is shown in Fig. 3.2. 3 and 4 for homework set 1 in course PHY183 SS02. These graphical charts are produced dynamically by calling a CGI scripts, (graph.gif) which is located in /home/httpd/cgi-bin/

Fig. 3.2. 3: Degree of difficulty graph                                    Fig. 3.2. 4: %Wrong graph

 

Problem Analysis

Conceptual option response problems, in which the students are given several concepts that are randomly assigned to each student, are more difficult than numerical simple problems.  Instructors usually want to see the studentsÕ tries according to every particular concept separately. ÒProblem AnalysisÓ button provides all response option problems in one table as follows in the Fig. 3.2.5.

Total number of students: 263

Select number of intervals

Option Response Problems in course PHY183 SS02:

#

Problem Title

Resource

Address

1

Numbers

/res/msu/physicslib/msuphysicslib/01_Math_1/msu-prob10.problem

2

Speed

/res/msu/physicslib/msuphysicslib/03_Units_Scaling/msu-prob22.problem

3

Units

/res/msu/physicslib/msuphysicslib/03_Units_Scaling/msu-prob17.problem

4

Vector versus Scalar

/res/msu/physicslib/msuphysicslib/06_Vectors_Scalars/msu-prob07.problem

5

Adding Vectors

/res/msu/physicslib/msuphysicslib/06_Vectors_Scalars/msu-prob10.problem

6

Traveling Car

/res/msu/physicslib/msuphysicslib/05_1D_Motion/msu-prob16.problem

7

Atwood Machine

/res/msu/kashy/Testing/randomlabel/atwood3T2M.problem

8

Sliding mass concepts

/res/msu/physicslib/msuphysicslib/10_Motion_W_Friction/msu-prob32.problem

9

Work, Power, Energy Concepts

/res/msu/physicslib/msuphysicslib/12_Work_Power_Energy/msu-prob27.problem

10

Bead on a Wire

/res/msu/physicslib/msuphysicslib/13_EnergyConservation/msu-prob32.problem

11

Atwood Machine

/res/msu/physicslib/msuphysicslib/20_Rot2_E_Trq_Accel/msu-prob23.problem

12

Flinstone Bowling

/res/msu/physicslib/msuphysicslib/21_Rot3_AngMom_Roll/msu-prob38.problem

13

Boat on Pond

/res/msu/physicslib/msuphysicslib/32_Fluids1_Pascal_Arch/msu-prob12.problem

 

É

É

É

Fig. 3.2.5: Option response problems in course PHY183 SS02

Fig. 3.2.5 includes a table, which shows the title of every option response problem in the first column. This title has a link to the original html page of the problem.  In the second column the source address of the problem is shown. Third column of this table includes a button to analyze the studentsÕ data on this particular option response problem. When this button is clicked, all data about this problem is restored, for every student.  Different versions of studentsÕ submissions are evaluated. The results are presented in a graphical chart as well as a numerical table. For example, if we select the analysis of  /res/msu/kashy/Testing/randomlabel/atwood3T2M.problem

A frictionless, massless pulley is attached to the ceiling, in a gravity field g. Mass Ma is greater than mass Mb. The tensions Tx,Ty, Tz, and the constant g are magnitudes. (For each, select: Greater than, Less than, Equal to, True, or False)

 

 
 


Fig. 3.2.7: Graphical chart of student tries for

                 Atwood Machine Problem according

                 to every concept in 1 interval time.

 

Fig. 3.2.6: Atwood Machine option response problem in HW3

In addition, the data of studentsÕ tries are shown in a table as you see in Fig. 3.2.8. In the last row of the table you can see the time interval of this data and the overall correct and wrong answers separately. If an instructor wants to see the studentsÕ tries in different time intervals, he/she could set the number of intervals from 1 to 7 time intervals, and then recompute the analysis.

#

Concept

Correct

Wrong

1

Two masses have same acceleration if the two the string does not stretch.

1342

433

2

Weight of the two masses is greater than the tension of the string attached to the ceiling.

585

1190

3

The top tension is equals the two bottom tensions. (massless pulley)

1263

512

4

Tension holding the two masses are equal if mass of pulley=0

1087

688

5

Sub-System accelerates upwards or downwards accordingly

757

1018

6

Center of mass accelerates downward

1354

421

 

From:[Thu Jan 24 00:46:22 2002] To: [Mon Feb 4 23:59:59 2002]

6388

4245

Fig. 3.2.8: Table of student tries for Atwood Machine Problem according to every concept in one time interval.

#

Concept

Correct

Wrong

1

Two masses have same acceleration if the two the string does not stretch.

124

98

2

Weight of the two masses is greater than the tension of the string attached to the ceiling.

44

178

3

The top tension is equals the two bottom tensions. (massless pulley)

142

80

4

Tension holding the two masses are equal if mass of pulley=0

125

97

5

Sub-System accelerates upwards or downwards accordingly

64

158

6

Center of mass accelerates downward

151

71

 

From:[Thu Jan 24 00:46:22 2002] To: [Wed Jan 30 00:23:10 2002]

650

682

#

Concept

Correct

Wrong

1

Two masses have same acceleration if the two the string does not stretch.

1218

335

2

Weight of the two masses is greater than the tension of the string attached to the ceiling.

541

1012

3

The top tension is equals the two bottom tensions. (massless pulley)

1121

432

4

Tension holding the two masses are equal if mass of pulley=0

962

591

5

Sub-System accelerates upwards or downwards accordingly

693

860

6

Center of mass accelerates downward

1203

350

 

From:[Wed Jan 30 00:23:11 2002] To: [Mon Feb 4 23:59:59 2002]

5738

3563

Fig. 3.2.9: Table of student tries for Atwood Machine Problem according to every concept in 2 times interval.

In Fig. 3.2.9, number of studentsÕ tries tables and the graphical chart are shown in 2 different time intervals. An instructor would be able to check whether the students have more wrong answers during the first days of opening the homework set, and how many students have tried during the first or the second interval. Since the problems are individualized he/she might be able to see how many students have tried to solve the problem after communicating with each other and understanding the concept.  In Fig. 3.2. 10 the charts and tables of studentsÕ tries are shown in 3 time intervals. So if the homework should be done in one week, an instructor would be able to observe the distribution of studentsÕ tries every day separately after choosing the 7 time intervals.

#

Concept

Correct

Wrong

1

Two masses have same acceleration if the two the string does not stretch.

31

30

2

Weight of the two masses is greater than the tension of the string attached to the ceiling.

8

53

3

The top tension is equals the two bottom tensions. (massless pulley)

44

17

4

Tension holding the two masses are equal if mass of pulley=0

32

29

5

Sub-System accelerates upwards or downwards accordingly

20

41

6

Center of mass accelerates downward

42

19

 

From:[Thu Jan 24 00:46:22 2002] To: [Mon Jan 28 00:30:53 2002]

177

189

#

Concept

Correct

Wrong

1

Two masses have same acceleration if the two the string does not stretch.

692

257

2

Weight of the two masses is greater than the tension of the string attached to the ceiling.

321

628

3

The top tension is equals the two bottom tensions. (massless pulley)

690

259

4

Tension holding the two masses are equal if mass of pulley=0

590

359

5

Sub-System accelerates upwards or downwards accordingly

399

550

6

Center of mass accelerates downward

703

246

 

From:[Mon Jan 28 00:30:54 2002] To: [Fri Feb 1 00:15:25 2002]

3395

2281

#

Concept

Correct

Wrong

1

Two masses have same acceleration if the two the string does not stretch.

619

147

2

Weight of the two masses is greater than the tension of the string attached to the ceiling.

256

510

3

The top tension is equals the two bottom tensions. (massless pulley)

529

237

4

Tension holding the two masses are equal if mass of pulley=0

465

301

5

Sub-System accelerates upwards or downwards accordingly

338

428

6

Center of mass accelerates downward

609

157

 

From:[Fri Feb 1 00:15:26 2002] To: [Mon Feb 4 23:59:59 2002]

2816

1775

Fig. 3.2. 10: Table of student tries for Atwood Machine Problem according to every concept in 3 times interval.

Student Assessment

This option provides some reports about the current educational situation of every student as you see in Fig. 3.2.11.  A ÔYÕ show that the student has solved the problem and ÔNÕ shows his/her failure.  A Ô-Ô denotes a unattempted problem. The numbers in the right column show the total number of tries of the student in solving the corresponding problems.

Total number of students : 263


Select   Map        
Select Section
Select Student

#

Set Title

Results

Tries

1

msu/mmp/phy183.sequence

   

2

msu/mmp/kap1/calckap1.sequence

YYYYYYYYYYYYY

1,1,1,2,1,1,1,1,1,7,9,3,2

3

msu/mmp/kap2/calckap2.sequence

YYNYYYYNNYYYYYYYY

10,1,0,1,1,2,1,4,5,1,1,1,3,2,1,1,1

4

msu/mmp/kap3/calckap3.sequence

YYYYYYYYNYYYYYYYYYY

4,3,5,1,1,8,2,3,20,1,1,1,1,1,2,3,2,2,3

5

msu/mmp/kap4/calckap4.sequence

NYYYYYYYYYYYYYYY

20,1,1,1,3,3,2,3,4,2,3,2,1,1,2,5

6

msu/mmp/kap5/calckap5.sequence

YYYYYYYYYYY-YY

5,2,1,9,12,1,3,12,1,2,1,,1,3

7

msu/mmp/kap6/calckap6.sequence

YYYYYYYYYYYYYY

3,2,4,2,1,1,2,1,1,9,2,3,2,2

8

msu/mmp/kap7/calckap7.sequence

YYYYYYYYYYYYYYYYYY

4,1,3,1,10,4,1,1,2,1,1,2,1,2,1,3,1,3

9

msu/mmp/kap8/calckap8.sequence

YYYYYYYYYYYYYYY

4,3,1,2,3,3,4,3,3,1,1,4,1,1,7

10

msu/mmp/kap9/calckap9.sequence

YYYYN-NYYNY

1,1,1,2,1,,2,2,1,6,4

11

msu/mmp/kap10/calckap10.sequence

YYYYYYYYYYYY

2,1,1,1,1,1,1,1,1,1,1,1

12

msu/mmp/kap11/calckap11.sequence

-----------------

0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

Fig. 3.2. 5: A sample of a student homework results and tries

We plan to present several other reports in this page as well. Some student classification reports also would be depicted here.

In Òproblem statsÓ and Òstudent assessmentÓ pages, an instructor can limit the range of his/her information to a particular section or a particular homework set.  So, an instructor would be able to load the statistics table or student tries in Òstudent assessmentÓ table to a particular map or section. After changing the map or section, he/she would be able to see the results via the recalculating the computation on that page.

Future work, using activity.log to classify the students

The problem is whether we can find the good features for classifying students? If so, we would be able to identify a predictor for any individual student after doing a couple of homework sets. With this information, we would be able to help a student use the resources better.  As the first step of data mining stuff we want to make an initial effort to classify the students.

Preprocessing and finding the useful student data and segmenting may be a difficult task. Internally, one part of this data is stored in a student directory:

 /home/httpd/lonUsers/domain/1st.char/2nd.char/3rd.char/username/

For example: /home/httpd/lonUsers/msu/m/i/n/minaeibi/

Since spring semester 2002, LON-CAPA has logged every activity of every student who has used online educational resources and their recorded paths through the web. So another part of data is stored in Òactivity.logÓ which is located in course directory.

The student data restored from .db files in student directory and is fetched into a hash table. The special hash keys ÒkeysÓ, ÒversionÓ and ÒtimestampÓ were evaluated from the hash. The version will be equal to the total number of versions of the data that have been stored. The timestamp attribute is the UNIX time the data was stored. keys is available in every historical section to list which keys were added or changed at a specific historical revision of a hash. We extract the features from a structured homework data, which is stored as particular URLÕs. For example the result of solving homeworkÕs problem by students could be extracted from resource.partid.solved, the total number of the students for solving the problem could be extracted from resource.partid.tries, and so forth.

All data stored in activity.log includes user name, time and resource URL. We can divide these data into six types of URLs, listed below, according to their importance for data mining:

1. problems: are the most useful data. i.e. msu/mmp/kap14/kap14.sequence___33___msu/mmp/kap14/problems/cd418a.problem
2. html pages to those are some links in the problems. i.e. msu/mmp/kap14/kap14.sequence___5___msu/mmp/kap14/cd396.htm
3. the images, which are loaded in above html pages. i.e. /res/msu/mmp/kap14/picts/backsoun.gif .

4. loncapa routines: i.e. /adm/navmaps,  or  /adm/roles, or  /adm/logout.
5. Posted data
by students: i.e. resource.0.11.submission=27.11
6. remote control gif files: i.e.: /res/adm/pages/v.gif

So, activity.log usually grows fast, when students have more access to the educational resources. We have brought a sample of different types of data which are logged in activity.log after a preprocessing phase as follows: 

144) 1010955846: studentX --> /adm/navmaps
145) 1010955205: studentX --> /res/msu/mmp/kap14/picts/beta_eqn.gif
146) 1010955685: studentX --> /adm/navmaps
147) 1010955988: studentX --> /adm/navmaps
148) 1010955998: studentX --> msu/mmp/kap14/kap14.sequence___5___msu/mmp/kap14/cd396.htm
149) 1010955999: studentX --> /res/msu/mmp/kap14/picts/velocity_eqn3.gif
150) 1010956000: studentX --> /res/msu/mmp/kap14/picts/time_eqn.gif
151) 1010954609: studentX --> /res/adm/pages/grds.gif
152) 1010954611: studentX --> /res/msu/mmp/wordproc.gif
153) 1010954626: studentX --> /res/adm/pages/i.gif
154) 1010955717: studentX --> msu/mmp/kap14/kap14.sequence___1___msu/mmp/kap14/cd392.htm
155) 1010955717: studentX --> /res/msu/mmp/kap14/picts/backsoun.gif
156) 1010955920: studentX --> msu/mmp/kap14/kap14.sequence___3___msu/mmp/kap14/cd394.htm
157) 1010955921: studentX --> /res/msu/mmp/gifs/demo.gif
158) 1010956113: studentX --> /res/adm/pages/v.gif
159) 1010954629: studentX --> /res/adm/pages/eval.gif
160) 1010954631: studentX --> /res/adm/pages/back.gif
161) 1010954632: studentX --> /res/adm/pages/b.gif
162) 1010954633: studentX --> /res/adm/pages/r.gif
163) 1010955754: studentX --> msu/mmp/kap14/kap14.sequence___2___msu/mmp/kap14/cd393.htm
164) 1010955756: studentX --> /res/msu/mmp/kap14/picts/asound.jpg

165) 1010955762: studentX --> /res/msu/mmp/kap14/picts/sensor.jpg
166) 1010955999: studentX --> /res/msu/mmp/gifs2/example.gif
173) 1010955687: studentX --> /res/adm/pages/u.gif
174) 1010955688: studentX --> /res/adm/pages/s.gif
175) 1010955688: studentX --> /res/adm/pages/e.gif
176) 1010956528: studentX --> msu/mmp/kap14/kap14.sequence___33___msu/mmp/kap14/problems/cd418a.problem
177) Sent data
178) 1010956536: studentX --> msu/mmp/kap14/kap14.sequence___33___msu/mmp/kap14/problems/cd418a.problem
179) Sent data HWVAL11=27.11
180) 1010956536: studentX --> msu/mmp/kap14/kap14.sequence___33___msu/mmp/kap14/problems/cd418a.problem
181) Sent data resource.0.11.submission=27.11
182) 1010956702: studentX --> msu/mmp/kap14/kap14.sequence___33___msu/mmp/kap14/problems/cd418a.problem
183) Sent data
184) 1010955921: studentX --> /res/msu/mmp/kap14/picts/areal.gif

185) 1010955731: studentX --> /res/adm/pages/n.gif

á       Feature Extraction

From these two types of student data which are stored by the LON-CAPA system, the features may be considered for classifying the students, are as follows:

1.     Total number of correct answers.

2.     Total number of tries for doing homework.

3.     Time at which the student got the problem correct. Usually better students get the homework completed earlier.

4.     Reading the material before attempting homework vs. attempting first and then read up on it.

5.     Submitting a lot of attempts in a short amount of time without looking up material in between, vs. those giving it one try, reading up, submitting another one, etc.

6.     Getting the problem right on the first try, vs. those with high number of tries.

7.     Giving up on a problem versus students continuing trying up to the deadline.

8.     Participating in the communication mechanisms, vs. those working alone.


It might be interesting to group students with time of the first log on (beginning of assignment, middle of the week, last minute) and correlate this with the number of tries or number of solved problems. A student who gets all correct answers will not necessarily be in the successful group if they took an average of 5 tries per problem, but it should be verified from this research.

We hope to find similar patterns of use in the data gathered from LON-CAPA, and eventually be able to make predictions as to the most-beneficial course of studies for each learner based on a limited number of variables for each individual student. Based on the current state of the learner in a learning sequence, the system could then make suggestions to the learner as to how to proceed.




[1] If instructor is going to port the statistics table data to Excel, he/she can select the checkbox ÒOutput CSV formatÓ at top of the statistics table.

[2]   This name has been got from administration office of Michigan State University for evaluating the examsÕ problem. Here we expanded this expression to homework problems as well.