Tuesday, March 1, 2016

Does HPV have more VAERS events than any other vaccine?

tl;dr - No it doesn't, but here's why this fact is important.

It began with a post someone linked to on vaccineimpact.com it reads:
There are over 80 vaccines approved for use in the United States. HPV vaccines account for nearly 25% of the entire Vaccine Adverse Event Reporting System (VAERS) database. This is particularly disturbing because the VAERS system was established in 1990 and HPV vaccines were not introduced until mid-2006.

Now this isn't a very widely spread meme - Google only displays about eight sites which carry close to verbatim copies of it (some of those might just be mirrors of other anti-vaccination sites).  One of them is rather entertainingly a record from the UK House of Lords.  Where Margaret of Mar uses this completely incorrect factoid.
The Countess of Mar (CB): My Lords, I am grateful to the noble Lord, Lord Patel of Bradford, for bringing this question to our attention this evening. I am afraid that I do not share his enthusiasm or that of my noble friend Lady Gould for HPV vaccines.... In the USA, HPV vaccines account for nearly 25% of the entire Vaccine Adverse Event Reporting System, or VAERS, a system that was established in 1990—and HPV vaccines were not introduced before mid-2006. 
 ...and about another six which reference a similar meme apparently started by sanevax which says that HPV vaccines are responsible for "60% of the entire VAERS database of adverse events".

So what does the data really say?  

See for yourself...

Do we see HPV leading the pack?  Nope! FLU3 - The trivalent influenza vaccine is far and away the leader in VAERS reported adverse events but even holding that title it only covers 11.8% of the events.  HPV4 (Quadrivalent vaccines like Gardasil) are responsible for a mere 4.2% and HPV9 (Nonavalent vaccines like Gardasil9) sits at at a puny 1%.  There is also an HPVX vaccine in VAERS when the type of HPV vaccine was unknown. This is just under 1%.  Even adding all those together puts us at a mere 6%.

But wait! We don't exactly know when that report was written.  Isn't it possible that for some interval between 1990 and 2016 it WAS true.  Well here's a plot of HPV (HPV4, HPV9 and HPVX) and FLU3 and their percentage of the total number of VAERS events up to that point over time.  

As you can see the answer is a big, fat NOPE there too!  So not only is this not true now but it has NEVER been true at any point EVER!

So why is this important?

Personally I'd break down talking with people who are critical of vaccination into a number of different categories but two big ones are:

Knowledge gap - Sometimes you simply can't get them to understand something.  Like how you can have a large number of people have a serious problem shortly after vaccination and have it not be related.  People make these kind of statistical errors all the time.  People seem almost designed to take coincidence as a sign.  So to me, it's not unreasonable that some people would make that same kind of mistake with events that happen close to vaccination.

Trust gap - Some people simply can't bring themselves to trust some information.  Sometimes it's information from pharmaceutical companies, or the government.  Sometimes the position is very contrived and they simply don't trust any information that doesn't support their position but at least the pretense there is about trust.

Here we have an example of something that not only anyone can understand - literally everyone who has a computer has access to the VAERS data and could do a similar (but much less fancy) analysis to what I did above.  Not only that but it requires absolutely zero trust (assuming you already trust VAERS data which I assume these people would).   You can download the data, put it into a spreadsheet and figure out these things on your own.  IMHO anyone could have fact checked this in a mater of minutes.

Yet it's still there and I would defy anyone to try and get them taken down.  I've already made a request to at least one site which has gone unanswered and unfulfilled.

I'm not sure if this is just a "what we post online is forever" effect or if there's just a culture of expediency around those who are critical of vaccine.

Sources, Reference and Data:

You can get all the VAERS data in easy to process CSV files right here.  Annoyingly separated by year so if you have to work on the entire dataset you're stuck downloading 26 files.

My R code for generating the above charts can be found here.  If you want to use this code yourself please link back to my blog.  If you have any trouble with it or want to adapt it.  Feel free to leave a comment.

# Make sure your working directory is set to wherever you put the VAERSVAX csv files.
# You can set this with setwd("c:/where/you/put/them")
allData <- data.frame()
for (year in 1990:2016) {
temp<-read.csv( paste(year,"VAERSVAX.csv",sep=""))
temp <-cbind(temp[,1:2],year)
allData <- rbind(temp,allData)
newData <- data.frame(table(allData$VAX_TYPE))
newData <- cbind(newData,newData$Freq/sum(newData$Freq))
names(newData) <- c("Vaccine","Count","Frequency")
newData$Vaccine <-factor(newData$Vaccine, levels=newData[order(newData$Frequency,decreasing = TRUE),"Vaccine"])
allvac <- ggplot(newData,aes(x=Vaccine,y=Frequency,fill=Vaccine))+geom_bar(sta="identity")+theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5,size=11))+scale_fill_hue(guide=FALSE)+labs(x="Vaccine",y="Percent of Adverse Events")
hpv <- data.frame()
for (testyear in 1990:2016) {
temp <- allData[allData$year %in% 1990:testyear,]
temprow <- data.frame(vaccine='HPV',year=testyear,frequency=length(temp[temp$VAX_TYPE %in% c('HPV9','HPV4','HPVX','HPV'),1])/length(unique(allData[,1])))
hpv <- rbind(temprow,hpv)
temprow <- data.frame(vaccine='FLU',year=testyear,frequency=length(temp[temp$VAX_TYPE %in% c('FLU3','FLU4'),1])/length(unique(allData[,1])))
hpv <- rbind(hpv,temprow)
hpvflu <- ggplot(data=hpv, aes(x=year, y=frequency, group = vaccine, colour = vaccine)) + geom_line(size=4) + geom_point( size=4, shape=21, fill="white")

Tuesday, September 18, 2012

The Math of Khan Part 2 - Going through Khan Academy's Math program in an hour a day

Hours 3,4,5 and 6: Multiplication and Division

This unit was pretty long, somewhat boring but no real problems with the material or any accolades for that matter.  It was pretty much having students grind their way through multiplication and division problems with increasing numbers of digits.  The skill list looks like this:

  1. Basic Multiplication
  2. Multiplying 1-Digit Numbers
  3. Basic Division
  4. Multplying 3-Digits by 1 Digit
  5. Multiplication with Carrying
  6. 1 Digit Division
  7. Multiplying 3-Digits by 2 Digits
  8. Multi-Digit Multiplication
  9. Multi-Digit Division Without Remainders
  10. Multiplication and Division Word Problems
  11. Division without Remainders
  12. Division with Remainders
  13. Multi-Digit Division
  14. Multiplication and Division Word Problems 2
  15. Multiplying and Dividing Negative Numbers
  16. Decimals on the Number line 1


Generally I found doing 3 x 2 multiplication in my head a little difficult although division wasn't that bad.  For anything more complicated I resorted to using scratch paper.  Again I wonder how necessary it is to be tested on large numbers as the work was more tedious than difficult and I didn't feel like I was applying any kind of extra knowledge so as I stated before I don't see how students would view this topic as anything other than a grind.  That said I don't know how you can avoid this kind of work (other than cutting back a bit on the number of digits) still I'd rank this topic as:


To give you an idea of the amount of work need to complete the unit.   I answered about 350 questions in 4 hours to achieve mastery here.  So I was averaging about 41 seconds per question implying that multiplication is four or five times harder for me than division.  Who'd a thunk it.

Next week I'll start looking at fractions and decimals.

Thursday, September 13, 2012

The Math of Khan Part 1 - Going through Khan Academy's Math program in an hour a day

MOOC - Massively Open Online Courses is the acronym of Summer 2012.  As someone who works in an educational institution it's rare for a day to go by without someone talking about distance learning of some kind and specifically the "free" kind.  Whether it's "passive" systems like MIT's Open Courseware and iTunesU or "active" systems like Udacity, edX, Coursera and Khan Academy.  What this means is almost every day someone wants to tell you their implicit presuppositions about education through a comment on how MOOC is great, horrible, the future of education and/or it's demise.

I admit some of the courses certainly look appealing but as soon as I start think about taking one of them, the voices start.  Not the ones that tell me where to hide the bodies of vendors who make mathematical errors so horrible they border on insulting (perhaps that's the subject for another post).  No, the first voice to speak up is my technical criticism "These things are mostly video aren't they?  Don't you already loathe how video is massively overused as a delivery medium?" This is, in fact true (perhaps also a topic for another post there) I can think of nothing more dull than having to sit through a series of video lectures.

Right after that my "Shouldn't you be doing something more productive" voice speaks up and reminds me of how I, at this moment have about 79,642 other endeavors currently on the go.  Any of which is probably a better use of my time than taking an interest course.  This usually talks me out of it and from there I get back to surfing the web which for reasons unexplained both voices consider a neutral activity.

Delta, a part-time math teacher over at Angry Math recently reviewed Udacity's statistics 101 course and found it to be somewhere between bad and awful.  Some of these criticisms are squarely leveled at whomever developed the course materials. Others might be more generally applied to the medium as a whole.  Delta's critique started me wondering if there was a way to sample material from a MOOC without taking a whole course and wading through hours of *shudder* video lectures.

Then it struck me, what if I just took the exams and tests?  While this wouldn't help evaluate the teaching material but it *might* be a reasonable proxy for evaluating the skills we can expect students to gain from the course.  From there it was an easy leap to Khan Academy.  Khan Academy has a lot of material aimed at the Elementary to High-School curriculum (which means it should be easy) but it's also known for having a pretty sophisticated testing engine and methodology.

How testing works at Khan:

As far as I can see there is no document describing their testing methodology in detail but here's what I've gathered from my first few hours on their site.  Mathematics is broken up into a series of topics which are displayed in a "knowledge map" giving students an idea how the topics are interrelated.

For example addition and subtraction is shown as a prerequisite for multiplication.  Each topic is broken up into a series of skills which represent individual elements to be tested on. To give you an idea I've reproduced the list of skills for the Addition and Subtraction topic below:

  1. 1 Digit Addition
  2. Number Line 1
  3. Representing Numbers
  4. 2 Digit Addition
  5. 1 Digit Subtraction
  6. Number Line 2
  7. 2 and 3 Digit Subtraction
  8. Number Line 3
  9. Subtraction with Borrowing
  10. Addition with Carrying 
  11. Ordering Negative Numbers
  12. Addition and Subtraction Word Problems
  13. 4 digit Addition with Carrying
  14. 4 digit Subtraction with Borrowing
  15. Adding Negative Numbers
  16. Adding and Subtracting Negative Numbers
  17. Negative Number Word Problems

Each of these skills has at least one kind of test question associated with it.  When a student chooses to practice a topic they are given a "stack" - a series of questions which test their knowledge on some subset of the skills in the topic.  Primarily this will contain topics that the student has yet to reach "mastery" in (or has started to show some need to review) however it appears that there is also some minimum number of topics.  So as you master the majority of the topic skills you will still be tested on some topics which you have mastered.  Alternatively you can choose to focus on practicing one specific skill.

Mastery, appears to occur when the student has collected enough "leaves" in a particular topic.  Up to three of these "leaves" are awarded for answering a question on a particular topic correctly.  The scoring system appears to go like this:  Three leaves are awarded for answering the question correctly on the first attempt.  If they answer incorrectly they may continue to attempt the same question.  If they succeed two leaves are awarded.  I'm not sure what grants you only one leaf but there is a button on each question page where a student can request a "hint".  Presumably that drops your score.

Anyway, at the end of the "stack" the student's leaves are tallied and their progress in each topic is displayed.  It's worth noting that if students have a lot of trouble with a particular skill they can *lose* mastery of it.  In which case the skill shows up as "needing review".  According to the Khan documentation this can also happen over a long period of inactivity.

How I will be blogging about this:

Starting from the "Telling Time" topic I will, for a full hour each workday do nothing but consecutive testing until I achieve mastery in a topic.  Once I do I will move on to another topic.

I will, unless specifically instructed or provided an opportunity by the Khan software be doing these calculations in my head or on paper.

From there I'll post some highlights about what is needed to pass these tests and some of my limited commentary of their pedagogy and perhaps some contrasts with how I teach (or might teach) math to my kids at home.  I'll try to give a "thumbs up", "neutral" or "thumbs down" score in the following two areas:

Accuracy: I'll be asking myself on one hand if the material is ambiguous enough to cause confusion with a related concept.  For example Delta at angry math correctly noted that the Udacity course didn't clearly differentiate between the symbols for population mean and the sample mean.  Which is a huge thing.  The other side of that coin will be asking if the student is being taught an overly strict definition that may cause confusion later on.   Taking into account how far away the related concept is in the students education.  e.g. Teaching someone that they can't take the square root of -1 if they're five years away from learning about complex numbers will be looked at differently than if they're going to learn that in the next topic.

Practicality: This is where I ask "Is the concept being taught applicable to a real-world situation?".  By "real world" I may mean "in a particular career".  For example I went through Junior High in the aftermath of the New Math, which meant I was taught "Thou shalt know how to do math in an arbitrary radix (sometimes called "base") notation".  e.g. 0F + 01 = 10 in base 16 (or radix 16). Knowing how to count in and convert between base 16, base 10, base 8, and base 2 are still fairly important to a career in computer programming. So I'd consider those useful skills.  However I can remember a couple of classes where we spent time on arbitrary bases like base 12 or doing inter-base multiplication (e.g. What is 423 in base 6 multiplied by 12 in base 3) which are somewhere between uncommon and non-existent in computing today.

Hour One:  Telling time.

While this isn't strictly a math skill it is something commonly taught as math to elementary school students.  Khan's tests consist of exactly two kinds of question. i) Here is a clock, type the time in a box

 and ii) Here is a clock, set it to this time.
It's hard to come up with much in the way of criticism here but if I were to choose something that was perhaps a little interesting was how the writer of the courseware focused on the hour hand moving proportionally to the minute hand.  So in the type ii) questions they demand you set the hour hand in pretty close to exact proportion.  So the following answer would be considered wrong:
The proportional movement of the hour hand *is* an accurate description of how the vast majority of chronographs work but it's worth noting that there are some notable exceptions.  Some tower clocks and high-end multi-time zone watches use a "stepping hour-hand" mechanism where the hour hand jumps from one hour-mark to the next (Not to be confused with a "jumping hour" mechanism where the hour hand is replaced by a dial which displays the hour).  So thinking about my criteria, is this a practical skill?  Absolutely?  Is the topic handling too loose? Nope.  Too strict?  Only in the mildest possible way. It's clearly important for a student to be able to read clocks where the hour hand moves proportionally to the minute hand as this is the predominant form of timepiece. However this "feature" is simply an artifact of the mechanism the designers used to move the hands.  While it doesn't seem likely, if this *were* to somehow teach students that the configuration of hands on a multi-time zone watch are incorrect that would be a bad outcome.  As I said I doubt that happens much so I'd still give this unit:


Still Hour One: Addition and Subtraction.

It's worth noting that it became clear working through this unit that Khan's testing software neither employs purely static set of questions - that is you get different questions each time you ask for a "stack".  More interestingly it also doesn't appear to be simply drawing a random question from a large pool of static questions.  What it actually appears to be happening is the question is derived from a template.  Where the basic form of the question e.g. "Sally has ___ apples and eats ___. How many apples does sally have left?" is defined and the blanks are filled in by a randomly picked number.  The reason I think this is because some of the word problems involve situations one would not readily think about like sledding at -40 C (or F :-) ).  I suppose it's also possible that the questions are static but generated from a template.  The result in either case is pretty much the same.  It's exceedingly rare in some topics to get the same question twice.  Which is nice when compared to a lot of online courseware or even textbook test banks.  It will be interesting to see how this holds up with more complicated subjects. i.e. When you do integration by hand your professor has hand picked the functions you are integrating because not all are integratable by the same technique.

There's little to complain or laud here.  "Ordering negative numbers" seemed like something of an odd duck compared to the other "skills" but I suppose negative numbers might be counter-intuitive for someone who is just learning.  Doing four digit addition and subtraction was a little tedious and IMHO doesn't teach you anything you didn't learn with two and three digits.  That said...


Not Yet Hour Two: Absolute Value.

The main thing to praise here is that the topic is short.  I was taught absolute value in high-school and I even recall having a TA go over the notation in my first Calculus tutorial.  The idea that anyone would need much more of a lesson than "throw the minus sign away" is mystifying.  Anyway one unit, about 15 test questions.  Finished in under eight minutes.



So that wraps up our first day:  Insofar as Addition, Subtraction, Absolute Value and Telling Time the Khan Academy doesn't appear to be doing anything crazy.   In case you're interesting in exactly how much work it takes to achieve "mastery" in these subjects.  I took a look at my "Skill Progress" page and found out that I had answered just under 400 questions - that's about one every 9 seconds!  Our next unit is multiplication and division which I expect to take more than a single hour since it involves multi-digit multiplication which I will have to do by hand.  *sigh*

Monday, September 10, 2012

A funny thing happened to me on the way to my blog...

Actually a few things happened to me, maybe a few dozen...dozen...or so but as we've probably just met I won't bore you with all the details right away.  About two years ago I decided that the internet was completely without any form of blog highlighting skepticism...or at least one focusing on math...or perhaps a math focused skepticism blog written by me.

So I boldly set up this blog...and then left it for a few years.  During that time I'd occasionally observe something interesting and think..."I should write about that on my blog!" so I'd set to work writing an article and then within a day or so I'd say "Actually, that's really not that interesting.".  Fast forward to February 2012 and one of two things happened I either got a really interesting idea or my standards for calling something "interesting" had dropped so low as to let this one survive being written up.   While I was doing that I had been telling some people I knew about my subject: An algorithm for using fonts in a particular way and I'd usually wind up my anecdote by saying: "It wasn't really significant enough to make into a journal article or anything".   Then someone showed me a journal who's call for papers might just encompass what I was writing about.  Ok so I probably couldn't put that on my blog...at least until I get rejected formally.

Well as life goes on, my actually job got busy enough to sidetrack me from that work and just as I was about to get around to getting around to organizing myself to write up another post.  I got an unexpected piece of email from the UK telling me that My Raspberry Pi had just shipped!  That event set the course of my summer: I went headlong into finding ways to make a smaller, faster Linux distributions and wrestling with profilers and ANSI C to create tighter ARM assembly code.   I was even thinking of posting some of my work here just so I'd have something up.  Enter the last couple of firmware releases which broke the work I was doing effectively putting it on hold until I get around to rewriting a substantial part of it.

Today though I had an idea that was somewhat relevant, math-related and relatively self-contained in it's time commitment.   Should be up in the next two days, or years, you never know with me.