I’ve trained a Linear Regression model with R caret. I’m now trying to generate a confusion matrix and keep getting the following error:
Error in confusionMatrix.default(pred, testing$Final) : the data and reference factors must have the same number of levels
EnglishMarks <- read.csv("E:/Subject Wise Data/EnglishMarks.csv", header=TRUE) inTrain<-createDataPartition(y=EnglishMarks$Final,p=0.7,list=FALSE) training<-EnglishMarks[inTrain,] testing<-EnglishMarks[-inTrain,] predictionsTree <- predict(treeFit, testdata) confusionMatrix(predictionsTree, testdata$catgeory) modFit<-train(Final~UT1+UT2+HalfYearly+UT3+UT4,method="lm",data=training) pred<-format(round(predict(modFit,testing))) confusionMatrix(pred,testing$Final)
The error occurs when generating the confusion matrix. The levels are the same on both objects. I cant figure out what the problem is. Their structure and levels are given below. They should be the same. Any help would be greatly appreciated as its making me cracked!!
> str(pred) chr [1:148] "85" "84" "87" "65" "88" "84" "82" "84" "65" "78" "78" "88" "85" "86" "77" ... > str(testing$Final) int [1:148] 88 85 86 70 85 85 79 85 62 77 ... > levels(pred) NULL > levels(testing$Final) NULL
table(testing$Final). You will see that there is at least one number in the testing set that is never predicted (i.e. never present in
pred). This is what is meant why “different number of levels”. There is an example of a custom made function to get around this problem here.
However, I found that this trick works fine:
table(factor(pred, levels=min(test):max(test)), factor(test, levels=min(test):max(test)))
It should give you exactly the same confusion matrix as with the function.
Whenever you try to build a confusion matrix, make sure that both the true values and prediction values are of factor datatype.
Here both pred and
testing$Final must be of type
factor. Instead of check for levels, check the type of both the variables and convert them to factor if they are not.
testing$final is of type
int. conver it to factor and then build the confusion matrix.
I had the same issue. I guess it happened because data argument was not casted as factor as I expected. Try:
hope it helps
Something like the follows seem to work for me. The idea is similar to that of @nayriz:
confusionMatrix( factor(pred, levels = 1:148), factor(testing$Final, levels = 1:148) )
The key is to make sure the factor levels match.
I had this problem due to NAs for the target variable in the dataset. If you’re using the
tidyverse, you can use the
drop_na function to remove rows that contain NAs. Like this:
iris %>% drop_na(Species) # Removes rows where Species column has NA iris %>% drop_na() # Removes rows where any column has NA
For base R, it might look something like:
iris[! is.na(iris$Species), ] # Removes rows where Species column has NA na.omit(iris) # Removes rows where any column has NA
Your are using regression and trying to generate a confusion matrix. I believe confusion matrix is used for classification task. Generally people use R^2 and RMSE metrics.