Beware of floats
I am currently volunteering as an instructor to teach young women about Data & SQL. Right in the first module we covered core data types as shown here:
For this blog post I’d like to write about FLOAT and DECIMAL and why in most cases I think that DECIMAL is a much better choice.
According to the MySQL documentation float (and double) represent
approximate numeric data values
This is the reason where the general problem comes from. But what on earth does that mean?
If you use a float you can’t store the value exactly. Let’s take a monetary value as an example. We have a purchase price of 5.32 €. According to this website this is actually stored as 5.32000000000000028421709430404007434844970703125
(ok I am cheating a bit as this is not a float but a double but you get the impression).You can read about the mathematical foundation here.
So you probably ask yourself….well I just want to store 5.32 € and obviously 5.32000000000000028421709430404007434844970703125 is very close to that….does it actually matter? It does, as I will point out later on.
Well know that we know that float represents exact numbers…is there another datatype storing exact values? Yes of course….it is called DECIMAL or NUMERIC.
In MySQL and SQL Server they are actually identical. According to the SQL-Standard there can be tiny differences. But let’s just say that they are identical for the purpose of that blog post.
As a difference to approximate types like float with a DECIMAL you actually specify the precision and the scale.
DECIMAL(5,2) means that you can store 5 digits and 2 digits are reserved for the part after the comma (for example the cents of a price in euro). So the maximum number you could store within that definition would be
999.99 (5 digits in total and 2 of them after the comma).
Problems with approximate values
If you use approximate values (FLOAT or DOUBLE) you will run into problems with summing up values and also with comparing values.
Here is a small repro script for SQL server to populate a test table with float and numeric datatypes.
DROP DATABASE IF EXISTS floattest; CREATE DATABASE floattest; USE floattest; DROP TABLE IF EXISTS dbo.test; CREATE Table dbo.test(floatval FLOAT, numval NUMERIC(6,2)); INSERT INTO dbo.test SELECT a, a FROM ( /* generate a random number between 0 and 9999.99*/ SELECT ROUND(RAND()*10000,2) AS a ) src /* insert 10k rows*/ GO 10000 /* compare the sums */ SELECT SUM(floatval), SUM(numval) FROM dbo.test ;
The numbers involved are random…so you might end up with different results. In my testrun I got the following result:
The numbers look quite close but are not identical. So where does the imperfection of the float sum come from? Again this is because floats are only approximate values….so the difference between both sums is tiny but there is a difference.
Which leads us to the second issue with floats…comparing values.
This is particularly iffy. In this example we would like to store some parts of a car invoice like the varnish sum and the spare part sum as well as the overall sum (for the sake of brevity) in one table.
DROP TABLE IF EXISTS dbo.carInvoice; CREATE TABLE dbo.carInvoice ( id INTEGER, varnish FLOAT, sparePart FLOAT, total FLOAT ); INSERT INTO dbo.carInvoice ( id, varnish, sparePart, total ) VALUES ( 1, 2155.83, -- varnish - FLOAT 4192.98, -- sparePart - FLOAT 6438.81 -- total - FLOAT ), ( 2, 2500.20, 499.80, 2999.0 ) ; SELECT * FROM dbo.carInvoice
The SELECT statement will return the following:
Eagle eyed readers might have already spotted that the calculation for the total for id 2 is wrong. Let’s imagine we have just discovered that we have a bug in our code and would like to determine which totals are wrong. So we would fire off a SELECT like the following:
SELECT * FROM dbo.carInvoice WHERE total <> varnish + sparePart
How many rows will come back? You thought one? Think again 😀.
In fact both rows are returned. Why? Because the sum of varnish and sparePart for id 1 differ just a tiny bit.
SELECT id, varnish + sparePart - total AS diff FROM dbo.carInvoice
You can work around issues like this by rounding the float values. However this is not straight forward and will lead to your code being more cumbersome. In general, I have the strong opinion that it is a good thing just to stay away from floats and avoid issues like this. If you are already stuck with floats and it is too much of an effort to refactor them be aware of the caveats and how to deal with them.
You could perhaps safely use floats for things like geocodes (latitude and longtitude) which you are not very likely to do math on.
It seems that storagewise you could save 1 byte (numeric takes at least 5 byte of storage while float takes just 4 byte) if you use float and they would need less CPU time . However I have not come across a scenario where these gains would outweigh the disadvantages around summing and comparing.
The Mysql documentation even has an own article regarding Problems with floating point values.