Floating point binary notation allows us to represent real (decimal) numbers in the most efficient way possible within a fixed number of bits.
Before jumping into how to convert, it is important to understand the format of a floating point binary number. Firstly, the number is always represented as a two’s complement number to allow for a range of both positive and negative numbers. Secondly, all normalised numbers start with two opposite bits: either 01 or 10
The calculation of a normalised floating point number uses the specific formula MxB^e.
In this case the mantissa represents the value of the number, the base identifies that binary is a base 2 number system, and the exponent shows how many decimal places the decimal point is moved. In the example below, we are converting the denary number 7.25.
The first step is to convert it into it’s fixed point notation. Note that we’ve added -8 as the most significant bit as the number is in Two’s Complement:
Next, we indicate where the decimal point should move to (we should be moving it to the right of the most significant bit):
The number of places we have moved the decimal point is the exponent. If moving left, the exponent is positive; if moving right, the exponent is negative. which means that we can now place the binary number into the formula above:
Next, we convert the exponent into a Two’s Complement binary number:
Finally, we can combine both the mantissa and exponent. This also allows us to remove the decimal point as it is now implied (sometimes this is left in for exam questions):
Let’s see this in action: