[C++] Lightweight float to string conversion

This article presents a simple algorithm for converting floats to string. It's designed to have a small code and limited RAM usage, making it a perfect fit for embedded systems. As a matter of fact, it's part of ArduinoJson 5.10.

It features:

  1. Small code size
  2. Small memory usage
  3. No dynamic allocation
  4. Up to nine digits after the decimal point
  5. Fairly good precision
  6. Exponentiation of large and small values

As far as I know, no such an algorithm has been publicly described.

A. Introduction

A.1. Existing algorithms

There is not much information the Internet covering float to string conversion. The best document I could find is this piece of gold detailing the current state of the art in a language that I could understand:

Printing Floating-Point Numbers by Ryan Juckett

In his article, Ryan Juckett presents the following algorithms:

  • Dragon4
  • Bellerophon
  • Grisu

However, they didn't meet my requirement because:

  1. they use arbitrary precision arithmetic
  2. the code is massive
  3. the memory usage is likely to be high

A.2. A lightweight alternative

The algorithm I'm going to describe is inferior to the because:

  • it supports a limited number of decimal places
  • it doesn't correct rounding error that could happen when reading back the value
  • it doesn't allow custom formatting

But, as you'll see it does the job and fits in less than 1KB of AVR code and uses just a few dozens of bytes of stack memory.

A.3. Overall description

The algorithm is very simple. It consists in splitting the floating-point value into multiple parts:

  1. The sign (+ or -)
  2. The integral part (32-bit unsigned integer)
  3. The decimal part (32-bit unsigned integer)
  4. The exponent part (16-bit signed integer)

Example

-12.34e56 becomes -, 12, 34 and 56.

B. The steps in detail

In this section, we'll see the various steps of the algorithm. The next section will give the complete implementation in C++.

B.1. Be positive

The first thing we need to do is to check the sign of the input float.

If it's positive, great!

If it's negative, we print a minus sign and negate the value so the rest of the algorithm doesn't have to worry about negative values.

B.2. Normalization

We use 32-bit integers, meaning that each part cannot exceed the capacity of a 32-bit integer.

Example

We'll never be able to output 123456789123 because the integral part exceeds the capacity of a 32-bit integer.

However, we'll be able to output 1.234567891e11 which represents the same value with an acceptable loss of accuracy.

This means that every value that exceeds the capacity of a 32-bit integer must be normalized so that the integral part goes from 1 to 9 and the exponent part compensates the difference.

In my implementation, this threshold is 1e7 but you can choose any value up to 1e9.

Similarly, when the input is very small, i.e. very close to zero, we normalize it to avoid printing a lot of leading zeros. This is very important as this algorithm only allows nine digits after the decimal point.

Example

If the input is 0.00000123, then the output is 1.23e-6

B.3. Binary exponentiation

As we saw, the normalization is the process of transforming 123456 to 1.23456e5.

This could be implemented with log() but this function is not always available on embedded platforms, and when it's present, it's fat and slow.

Now, we could implement the logarithm with a loop that divides by ten and counts the iterations. However, each floating point operation introduces an error, so each iteration magnifies the error. This could be OK for small values below 1000, as they require only 4 iterations; but doubles range up to 1.79e+308, requiring up to 308 iterations!

That's why my implementation uses a technique called "binary exponentiation" which minimizes the number of operations. It works by decomposing the input value into factors, just like you would do with "prime factor decomposition", except that each factor is a binary power of end.

Example

1e308 == 1e256 * 1e32 * 1e16 * 1e4

Only 4 operations are required to extract the 4 factors.

The biggest value that can be stored in a double is 1.79769e+308. As a consequence, the biggest exponent that we have to handle is 308. This means that every positive exponent can be composed with the following factors: 1e1, 1e2, 1e4, 1e8, 1e16, 1e32, 1e64, 1e128 and 1e256.

So, to compute the logarithm of the input value, we just need to extract the composing factors one after the other. The algorithm goes like this.

  1. compare the input with a given factor (e.g. 1e4)
  2. if it's greater, then divide the value and increment the exponent (e.g. exponent += 4)
  3. repeat for each factor

Example

  1. Suppose input is 1234
  2. 1234 is smaller than 1e4, so we pass
    • value is 1234
    • exponent is 0
  3. 1234 is bigger than 1e2, so we divide and increment the exponent:
    • value is now 12.34
    • exponent is now 2
  4. 12.34 is bigger than 1e1, so we divide and increment the exponent:
    • value is now 1.234
    • exponent is now 3

The end result is: 1.234e3

B.4. Integral and decimal part

Extracting the integral part is easy: we just need to cast the floating-point value to an integer. We know that the value will fit in a 32-bit integer because bigger values have been normalized.

To extract the decimal part, we compute the remainder of the previous operation, multiply by 1e9 and cast to an integer.

Example

  • Suppose input is 12.34
  • First, we cast to integer and get 12
  • The remainder is 0.34
  • We multiply by 1e9 and get 340000000

The end result is 12.340000000

B.5. Rounding

After extracting the decimal part, we can compute the remainder.

If it's smaller than 0.5, then great!

If it's greater or equal to 0.5, we need to round up.

To round up, we just need to increment the decimal part by one unit. If the decimal part is now equal to 1e9, then we need to increment the integral part and clear the decimal part.

Example

  • Suppose input is 0.9999999999
  • Then integral part is 0
  • And decimal part is 999999999
  • And remainder is 0.9 so rounding is needed
  • We increment the decimal part: 1000000000
  • As it's equal to 1e9, we increment the integral part
  • The results are 1 for the integral part and 0 for the decimal part

B.6. Writing the integral part

To convert the integral part to a string, we extract digits one by one in reverse order, i.e. from right to left. This is done by taking the modulus 10 and dividing by 10 to move to the next digit. We stop when the remainder is 0

Example

  • Suppose input is 12
  • 12 % 10 gives 2, our right most digit.
  • We divide by 10, this gives 1
  • 1 % 10 gives 1, our second digit.
  • We divide by 10, this gives 0, so we stop

Since digits are extracted in reverse order, we need to store them is a temporary buffer before outputting them. The trick is to fill the buffer starting from the end so that they are in the right order in memory. We start from the end of the buffer and decrement a pointer so as to point to the previous character. We can add a null terminator at the end so it's a valid C string.

Example

  • Suppose an uninitialized buffer: { ???, ???, ??? }
  • We add the null terminator { ???, ???, '\0' }
  • Then add the first digit { ???, '2', '\0' }
  • Then the seconds digit { '1', '2', '\0' }
  • We call puts() to print the digits in the right order

B.7. Writing the decimal part.

To print the decimal part, we use a process very similar to the previous one, except that:

  1. We remove the trailing zeros
  2. We don't stop when the remainder is 0, but when the 9 digits have been extracted (or removed)

To remove the trailing zeros, we compute the modulus 10, if it's 0 we divide by 10 and start again.

Example

  • Suppose input is 3400
  • 3400 % 10 gives 0, so we divide by ten
  • 340 % 10 gives 0, so we divide by ten
  • 34 % 10 gives 4, so we stop

To print a fixed number of digits, we use a counter initialized to 9 and decrement it each time a digit is extracted. We stop when the counter reaches 0.

B.8. The rest

We've done the biggest part of the job, here is what remains:

  1. Print the exponent part
  2. Handle NaN
  3. Handle +Inf and -Inf

These are trivial tasks that don't require explanation.

C. The implementation

Here is the main function:

void writeFloat(double value) {  
  if (isnan(value)) return puts("nan");

  if (value < 0.0) {
    putchar('-');
    value = -value;
  }

  if (isinf(value)) return puts("inf");

  uint32_t integralPart, decimalPart;
  int16_t exponent;
  splitFloat(value, integralPart, decimalPart, exponent);

  writeInteger(integralPart);
  if (decimalPart) writeDecimals(decimalPart);

  if (exponent < 0) {
    puts("e-");
    writeInteger(-exponent);
  }

  if (exponent > 0) {
    puts('e');
    writeInteger(exponent);
  }
}

Here is the function that extracts the integral part, the decimal part and the exponent part of a float:

void splitFloat(double value, uint32_t &integralPart,  
                uint32_t &decimalPart, int16_t &exponent) {
  exponent = normalizeFloat(value);

  integralPart = (uint32_t)value;
  double remainder = value - integralPart;

  remainder *= 1e9;
  decimalPart = (uint32_t)remainder;  

  // rounding
  remainder -= decimalPart;
  if (remainder >= 0.5) {
    decimalPart++;
    if (decimalPart >= 1000000000) {
      decimalPart = 0;
      integralPart++;
      if (exponent != 0 && integralPart >= 10) {
        exponent++;
        integralPart = 1;
      }
    }
  }
}

Here is the function that normalizes the values between 1e-5 and 1e7 and returns the exponent:

int normalizeFloat(double& value) {  
  const double positiveExpThreshold = 1e7;
  const double negativeExpThreshold = 1e-5;
  int exponent = 0;

  if (value >= positiveExpThreshold) {
    if (value >= 1e256) {
      value /= 1e256;
      exponent += 256;
    }
    if (value >= 1e128) {
      value /= 1e128;
      exponent += 128;
    }
    if (value >= 1e64) {
      value /= 1e64;
      exponent += 64;
    }
    if (value >= 1e32) {
      value /= 1e32;
      exponent += 32;
    }
    if (value >= 1e16) {
      value /= 1e16;
      exponent += 16;
    }
    if (value >= 1e8) {
      value /= 1e8;
      exponent += 8;
    }
    if (value >= 1e4) {
      value /= 1e4;
      exponent += 4;
    }
    if (value >= 1e2) {
      value /= 1e2;
      exponent += 2;
    }
    if (value >= 1e1) {
      value /= 1e1;
      exponent += 1;
    }
  }

  if (value > 0 && value <= negativeExpThreshold) {
    if (value < 1e-255) {
      value *= 1e256;
      exponent -= 256;
    }
    if (value < 1e-127) {
      value *= 1e128;
      exponent -= 128;
    }
    if (value < 1e-63) {
      value *= 1e64;
      exponent -= 64;
    }
    if (value < 1e-31) {
      value *= 1e32;
      exponent -= 32;
    }
    if (value < 1e-15) {
      value *= 1e16;
      exponent -= 16;
    }
    if (value < 1e-7) {
      value *= 1e8;
      exponent -= 8;
    }
    if (value < 1e-3) {
      value *= 1e4;
      exponent -= 4;
    }
    if (value < 1e-1) {
      value *= 1e2;
      exponent -= 2;
    }
    if (value < 1e0) {
      value *= 1e1;
      exponent -= 1;
    }
  }

  return exponent;
}

Here is the function that prints the integral part and the exponent:

void writeInteger(uint32_t value) {  
  char buffer[16];
  char *ptr = buffer + sizeof(buffer) - 1;

  *ptr = 0;
  do {
    *--ptr = (char)(value % 10) + '0';
    value = (uint32_t)(value / 10);
  } while (value);

  puts(ptr);
}

Here is the function that prints the decimal part:

void writeDecimals(uint32_t value) {  
  int width = 9;

  // remove trailing zeros
  while (value % 10 == 0 && width > 0) {
    value /= 10;
    width--;
  }

  char buffer[16];
  char *ptr = buffer + sizeof(buffer) - 1;

  // write the string in reverse order
  *ptr = 0;
  while (width--) {
    *--ptr = value % 10 + '0';
    value /= 10;
  }
  *--ptr = '.';

  // and dump it in the right order
  puts(ptr);
}

Here is the link to the code in ArduinoJson

D. Possible variations

You could use 64-bit integers and get much more digits. You could also repeat the decimal extraction a second time to get 9 more digits. However, I don't think it's worth it, as 9 digits are sufficient in most cases.

You could be less picky with the normalization and instead of reducing the integral part from 1 to 9, you could accept it to be from 1 to 9999 for example. That would significantly reduce the size of the normalize() function.