## Floats and Doubles

This week we will look at two commonly used data types
for storing numeric data, **float** and **double**.
These are the two data types that are used for storing
non-whole numbers (i.e., numbers that can have
a value after the decimal point).
Numbers stored as a **float** have about 7 digits
of accuracy. This means that a number like 1234567 or
1.234567 could be stored accurately. Note the placement
of the decimal is not important. A number like 12345678
or 1.2345678, if stored as a **float** would have
a loss of precision. By contrast, numbers stored using
the **double** type has about 16 digits of accuracy.
Let's explore these two data types using the **gasctrysmall**
data file.

. use gasctrysmall

Using the describe command we can
see that the variables **gas** and
**infl** are stored as type **double**
and the variables **ctry** and **year**
are stored as type **float**.

. describe Contains data from gasctrysmall.dta obs: 8 vars: 4 26 Jan 2010 14:28 size: 256 (99.9% of memory free) ---------------------------------------------------------------------------------------------------- storage display value variable name type format label variable label ---------------------------------------------------------------------------------------------------- ctry float %9.0g Country ID year float %9.0g year gas double %9.0g Gas price infl double %9.0g Inflation factor ---------------------------------------------------------------------------------------------------- Sorted by: ctry

This is such a small dataset, we can list out the entire file, as shown below.

. list +--------------------------+ | ctry year gas infl | |--------------------------| 1. | 1 1974 .78 1.32 | 2. | 1 1975 .83 1.4 | 3. | 2 1971 .69 1.15 | 4. | 2 1972 .77 1.2 | 5. | 2 1973 .89 1.29 | |--------------------------| 6. | 3 1974 .42 1.14 | 7. | 4 1974 .82 1.12 | 8. | 4 1975 .94 1.18 | +--------------------------+

Suppose that we use the **generate** command
to make a copy of the variable **infl**,
calling it **infl2**. Note that the original
variable is of type **double**, but since
the default data type is **float**, the copy
is created as a **float**.

. generate infl2 = infl . describe infl infl2 storage display value variable name type format label variable label ---------------------------------------------------------------------------------------------------- infl double %9.0g Inflation factor infl2 float %9.0g

Let's list these variables out side by side.

. list ctry infl infl2 +---------------------+ | ctry infl infl2 | |---------------------| 1. | 1 1.32 1.32 | 2. | 1 1.4 1.4 | 3. | 2 1.15 1.15 | 4. | 2 1.2 1.2 | 5. | 2 1.29 1.29 | |---------------------| 6. | 3 1.14 1.14 | 7. | 4 1.12 1.12 | 8. | 4 1.18 1.18 | +---------------------+

These look identical. Let's list out the cases where these two variables are equal to each other.

. list if infl == infl2

Oh dear! This is kind of perplexing.
I would have expected all of the observations
to have been displayed, but actually none
are displayed. Let's try displaying
the variables **infl** and **infl2**
but displaying many more digits after
the decimal point.

. format infl infl2 %25.20f . list ctry infl infl2 +--------------------------------------------------------+ | ctry infl infl2 | |--------------------------------------------------------| 1. | 1 1.32000000000000010000 1.32000005245208740000 | 2. | 1 1.39999999999999990000 1.39999997615814210000 | 3. | 2 1.14999999999999990000 1.14999997615814210000 | 4. | 2 1.20000000000000000000 1.20000004768371580000 | 5. | 2 1.29000000000000000000 1.28999996185302730000 | |--------------------------------------------------------| 6. | 3 1.13999999999999990000 1.13999998569488530000 | 7. | 4 1.12000000000000010000 1.12000000476837160000 | 8. | 4 1.17999999999999990000 1.17999994754791260000 | +--------------------------------------------------------+

Now we see why no observations were displayed.
When we look out with 20 digits after the decimal
place, we can see that these numbers are not
exactly the same. This is the nature of
storing fractional numbers using computers.
Such fractional numbers are rarely stored
with perfect precision. There is usually
a little bit of slop, but it is so tiny
that it is not a problem.
Consider the **display** command
below that shows the value of **1/10**.
There is a tiny amount of imprecision.

. display %25.20f 1/10 0.10000000000000001000

The above number is shown using double
precision. But consider the amount of imprecision
if the number is displayed as a **float**
using the **float()** function.

. display %25.20f float(1/10) 0.10000000149011612000

This imprecision in the **double** or
**float** values is not a problem, except
if we try and compare to two values.
Then, the values are not the same.
When you compare 0.1 (with double precision)
to 0.1 (with float precision), the
two values are not the same.
What if we compare the value of a variable
to a specific number? For example, let's list out
the observations where **infl** is equal
to 1.12

. list ctry infl if infl == 1.12 +-------------------------------+ | ctry infl | |-------------------------------| 7. | 4 1.12000000000000010000 | +-------------------------------+

Now let's try the same comparison for
**infl2**.

. list ctry infl infl2 if infl2 == 1.12

When we type a number (like 1.12 above),
this is represented as a double
precision value. So, this compares the value 1.12 (stored as
double) with the value of 1.12 (stored
as float), and none of the observations
meet this condition. Instead, let's make
this comparison by asking for 1.12 to
be represented using float precision,
by specifying **float(1.12)**.

. list ctry infl infl2 if infl2 == float(1.12) +--------------------------------------------------------+ | ctry infl infl2 | |--------------------------------------------------------| 7. | 4 1.12000000000000010000 1.12000000476837160000 | +--------------------------------------------------------+

Now, the variable **infl2** is represented
with float precision, and 1.12 is represented
with float precision, and the comparison
successfully finds the equal value.
Likewise, we can compare **infl2** to
**float(infl)** and we see that
all of these are equal.

. list ctry infl infl2 if infl2 == float(infl) +--------------------------------------------------------+ | ctry infl infl2 | |--------------------------------------------------------| 1. | 1 1.32000000000000010000 1.32000005245208740000 | 2. | 1 1.39999999999999990000 1.39999997615814210000 | 3. | 2 1.14999999999999990000 1.14999997615814210000 | 4. | 2 1.20000000000000000000 1.20000004768371580000 | 5. | 2 1.29000000000000000000 1.28999996185302730000 | |--------------------------------------------------------| 6. | 3 1.13999999999999990000 1.13999998569488530000 | 7. | 4 1.12000000000000010000 1.12000000476837160000 | 8. | 4 1.17999999999999990000 1.17999994754791260000 | +--------------------------------------------------------+

Rather than demoting the precision of
the created variable (**infl2**), we
could have specified that we
wanted the copy of **infl** to be
stored as a double. Below we create **infl3** that
is stored with double precision.

. generate double infl3 = infl

Now, let's show the observations where **infl**
is equal to **infl3**.

. format %25.20f infl3 . list ctry infl infl3 if infl3 == infl +--------------------------------------------------------+ | ctry infl infl3 | |--------------------------------------------------------| 1. | 1 1.32000000000000010000 1.32000000000000010000 | 2. | 1 1.39999999999999990000 1.39999999999999990000 | 3. | 2 1.14999999999999990000 1.14999999999999990000 | 4. | 2 1.20000000000000000000 1.20000000000000000000 | 5. | 2 1.29000000000000000000 1.29000000000000000000 | |--------------------------------------------------------| 6. | 3 1.13999999999999990000 1.13999999999999990000 | 7. | 4 1.12000000000000010000 1.12000000000000010000 | 8. | 4 1.17999999999999990000 1.17999999999999990000 | +--------------------------------------------------------+

Or, we could specify that we want variables
to be created using double precision
using the **set type double** command.

. set type double

After issuing this command, subsequent variables
that are created (for the duration of this
Stata session) would be created using
type **double**.
If we wanted to adopt this as a permanent setting,
we could add the **permanently** option.

. set type double, permanently (set type preference recorded)

You can return to the default setting with the following command.

. set type float, permanently (set type preference recorded)

You might be rightly concerned that
storing all variables using double
precision might be overkill. Does
a dummy (0/1) variable need to
be stored with 16 digits of precision?
Does birth year (a whole number) need
to be stored with 16 digits of precision?
The answer, of course, is no. But,
the **compress** command is
very handy for taking
variables and storing them with the
smallest storage type that will not
lead to any loss of information.
Let's apply this to the current
dataset.

. compress ctry was float now byte year was float now int

The variables **ctry** was stored
as a float, but now is a **byte** and
the variable **year** was a float
and now is an **int**.
Using **set double on, permanently**, I think,
is a great way to store variables with the
highest level of precision possible.
Then, later, you can use the **compress**
command to identify and convert variables
to a more frugal method of storage.
For more details, you can see **help data types**
and **help set type**.
You can download the example data files from this tidbit
(as well as all of the other tidbits) as shown below.
These will download all of the example data files into
the current folder on your computer. (If you have done this
before, then you may need to specify **net get stowdata, replace**
to overwrite the existing files.

If you have thoughts on this Stata Tidbit of the Week, you can post a comment. You can also send me an email at MichaelNormanMitchell and then the at sign and gmail dot com. If you are receiving this tidbit via email, you can find the web version at http://www.michaelnormanmitchell.com/ .net from http://www.MichaelNormanMitchell.com/storage/stowdata net get stowdata

## Reader Comments (3)

Readers may also want to try the statement -clonevar infl3=infl- and see what -list if infl == infl3- returns.

Other useful sources on this issue are:

William Gould (2006) "Mata matters: precision"

The Stata Journal, 6(4):550-560.http://www.stata-journal.com/article.html?article=pr0025

Nicholas J. Cox (2006) "Stata tip 33: Sweet sixteen: Hexadecimal formats and precision problems"

The Stata Journal, 6(2): 282-283.http://www.stata-journal.com/article.html?article=dm0022

Jean Marie Linhart (2008) "Mata matters: Overflow, underflow and the IEEE floating-point format"

The Stata Journal, 8(2): 255-268.http://www.stata-journal.com/article.html?article=pr0038

http://www.stata.com/support/faqs/data/prec.html

http://www.stata.com/support/faqs/data/float.html

http://www.ats.ucla.edu/stat/stata/faq/longid.htm

Dear Martin - Indeed, that is a good suggestion, and foreshadows the tidbit for next week :) .

Dear Maarten - Thank you so much for the great list of additional resources on the issues of precision and doubles vs. floats. I would encourage readers to check out these great links.

Thanks!

Michael Mitchell