Quantity vs Numbers
An important distinction must be made between "Quantities" and "Numbers". A quantity is simply some amount of "stuff"; five apples, three pounds, and one automobile are all quantities of different things. A quantity can be represented by any number of different representations. For example, tick-marks on a piece of paper, beads on a string, or stones in a pocket can all represent some quantity of something. One of the most familiar representations are the base-10 (or "decimal") numbers, which consist of 10 digits, from 0 to 9. When more than 9 objects needs to be counted, we make a new column with a 1 in it (which represents a group of 10), and we continue counting from there.
An important distinction must be made between "Quantities" and "Numbers". A quantity is simply some amount of "stuff"; five apples, three pounds, and one automobile are all quantities of different things. A quantity can be represented by any number of different representations. For example, tick-marks on a piece of paper, beads on a string, or stones in a pocket can all represent some quantity of something. One of the most familiar representations are the base-10 (or "decimal") numbers, which consist of 10 digits, from 0 to 9. When more than 9 objects needs to be counted, we make a new column with a 1 in it (which represents a group of 10), and we continue counting from there.
Computers,
however, cannot count in decimal. Computer hardware uses a system where values
are represented internally as a series of voltage differences. For example, in
most computers, a +5V charge is represented as a "1" digit, and a 0V
value is represented as a "0" digit. There are no other digits
possible! Thus, computers must use a numbering system that has only two
digits(0 and 1): the "Binary", or "base-2", number system.
Binary Numbers
Understanding
the binary number system is difficult for many students at first. It may help
to start with a decimal number, since that is more familiar. It is possible to
write a number like 1234 in "expanded notation," so that the value of
each place is shown:
Notice that each digit is multiplied by successive powers of 10, since this a decimal, or base 10 system. The "ones" digit ("4" in the example) is multiplied by ,or "1". Each digit to the left of the "ones" digit is multiplied by the next higher power of 10 and that is added to the preceding value.
Notice that each digit is multiplied by successive powers of 10, since this a decimal, or base 10 system. The "ones" digit ("4" in the example) is multiplied by ,or "1". Each digit to the left of the "ones" digit is multiplied by the next higher power of 10 and that is added to the preceding value.
Now, do the
same with a binary number; but since this is a "base 2" number,
replace powers of 10 with powers of 2:
The subscripts indicate the base. Note that in the above equations:
The subscripts indicate the base. Note that in the above equations:
Binary numbers are the same as their equivalent decimal numbers, they are just a different way to represent a given quantity. To be very simplistic, it does not really matter if you have
or apples, you can still make a pie.
Bits
The term Bits is short for the phrase Binary Digits. Each bit is a single binary value: 1 or zero. Computers generally represent a 1 as a positive voltage (5 volts or 3.3 volts are common values), and a zero as 0 volts.
Most
Significant Bit and Least Significant Bit
In the decimal
number 48723, the "4" digit represents the largest power of 10
, and the 3 digit
represents the smallest power of 10 . Therefore, in this
number, 4 is the most significant digit and 3 is the least significant digit.
Consider a situation where a caterer needs to prepare 156 meals for a wedding.
If the caterer makes an error in the least significant digit and accidentally
makes 157 meals, it is not a big problem. However, if the caterer makes a
mistake on the most significant digit, 1, and prepares 256 meals, that will be
a big problem!
Now, consider a
binary number: 101011. The Most Significant Bit (MSB) is the left-most bit,
because it represents the greatest power of 2 (or). The Least Significant
Bit (LSB) is the right-most bit and represents the least power of 2 (or).
Notice that MSB
and LSB are not the same as the notion of "significant figures" that
is used in other sciences. The decimal number 123000 has only 3 significant
figures, but the most significant digit is 1 (the left-most digit), and the
least significant digit is 0 (the right-most digit).
Standard Sizes
Summary
|
|
Name
|
Length/bits
|
Bit
|
1
|
Nibble
|
4
|
Byte
|
8
|
Word
|
16
|
Double-word
|
32
|
Quad-word
|
64
|
Machine Word
|
Depends
|
Nibble
a Nibble is 4 bits long. Nibbles can hold values from 0 to
15 (in decimal).
Byte
a Byte is 8 bits long. Bytes can hold values from 0 to 255
(in decimal).
Word
a Word is 16 bits, or 2 bytes long.
Words can hold values from 0 to 65535 (in Decimal). There is occasionally some
confusion between this definition and that of a "machine word". See
Machine Word below.
Double-word
a Double-word is 2 words long, or 4
bytes long. These are also known simply as "DWords". DWords are also
32 bits long. 32-bit computers therefore, manipulate data that is the size of
DWords.
Quad-word
a Quad-word is 2 DWords long, 4
words long, and 8 bytes long. They are known simply as "QWords".
QWords are 64 bits long, and are therefore the default data size in 64-bit
computers.
Machine Word
A machine word is the length of the
standard data size of a given machine. For instance, a 32-bit computer has a
32-bit machine word. Likewise 64-bit computers have a 64-bit machine word.
Occasionally the term "machine word" is shortened to simply
"word", leaving some ambiguity as to whether we are talking about a
regular "word" or a machine word.
Negative Numbers
It
would seem logical that to create a negative number in binary, the reader would
only need to prefix the number with a "–" sign. For instance, the
binary number 1101 can become negative simply by writing it as
"–1101". This seems all fine and dandy until you realize that computers
and digital circuits do not understand minus sign. Digital circuits only have
bits, and so bits must be used to distinguish between positive and negative
numbers. With this in mind, there are a variety of schemes that are used to
make binary numbers negative or positive: Sign and Magnitude, One's Complement,
and Two's Complement.
Sign
and Magnitude
Under
a Sign and Magnitude scheme, the MSB of a given binary number
is used as a "flag" to determine if the number is positive or
negative. If the MSB = 0, the number is positive, and if the MSB = 1, the
number is negative. This scheme seems awfully simple, except for one simple
fact: arithmetic of numbers under this scheme is very hard. Let's say we have 2
nibbles: 1001 and 0111. Under sign and magnitude, we can translate them to
read: -001 and +111. In decimal then, these are the numbers –1 and +7.
When we add
them together, the sum of –1 + 7 = 6 should be the value that we get. However:
001
+111
----
000
And that isn't
right. What we need is a decision-making construct to determine if the MSB is
set or not, and if it is set, we subtract, and if it is not set, we add. This
is a big pain, and therefore sign and magnitude is not used.
One's
Complement
Let's
now examine a scheme where we define a negative number as being the logical
inverse of a positive number. We will use the same "!" operator to
express a logical inversion on multiple bits. For instance, !001100 =
110011. 110011 is binary for 51, and 001100 is binary for 12. but in this case,
we are saying that 001100 = –110011, or 110011(binary) = -12 decimal. let's
perform the addition again:
001100 (12)
+110011 (-12)
-------
111111
We
can see that if we invert 0000002 we get the value 1111112.
and therefore 1111112 is negative zero! What exactly is
negative zero? it turns out that in this scheme, positive zero and negative
zero are identical.
However, one's
complement notation suffers because it has two representations for zero: all 0
bits, or all 1 bits. As well as being clumsy, this will also cause problems
when we want to check quickly to see if a number is zero. This is an extremely
common operation, and we want it to be easy, so we create a new representation,
two's complement.
Two's
Complement
Two's
complement is a number representation that is very similar to one's complement.
We find the negative of a number X using the following formula:
-X = !X + 1
Let's
do an example. If we have the binary number 11001 (which is 25 in decimal), and
we want to find the representation for -25 in twos complement, we follow two
steps:
1.
Invert the numbers:
11001 → 00110
1.
Add 1:
00110 + 1 =
00111
Therefore
–11001 = 00111. Let's do a little addition:
11001
+00111
------
00000
Now, there is a
carry from adding the two MSBs together, but this is digital logic, so we
discard the carrys. It is important to remember that digital circuits have
capacity for a certain number of bits, and any extra bits are discarded.
Most modern
computers use two's complement.
Below is a diagram showing the representation held by these systems for all four-bit combinations:
Below is a diagram showing the representation held by these systems for all four-bit combinations:
Signed vs Unsigned
One important
fact to remember is that computers are dumb. A computer doesnt know whether or
not a given set of bits represents a signed number, or an unsigned number (or,
for that matter, and number of other data objects). It is therefore important
for the programmer (or the programmers trusty compiler) to keep track of this
data for us. Consider the bit pattern 100110:
·
Unsigned: 38 (decimal)
·
Sign+Magnitude: -6
·
One's Complement: -25
·
Two's Complement: -26
See how the
representation we use changes the value of the number! It is important to
understand that bits are bits, and the computer doesn't know what the bits
represent. It is up to the circuit designer and the programmer to keep track of
what the numbers mean.
Character Data
We've seen how
binary numbers can represent unsigned values, and how they can represent
negative numbers using various schemes. But now we have to ask ourselves, how
do binary numbers represent other forms of data, like text characters? The answer
is that there exist different schemes for converting binary data to characters.
Each scheme acts like a map to convert a certain bit pattern into a certain
character. There are 3 popular schemes: ASCII, UNICODE and EBCDIC.
ASCII
The ASCII code
(American Standard Code for Information Interchange) is the most common code
for mapping bits to characters. ASCII uses only 7 bits, although since
computers can only deal with 8-bit bytes at a time, ASCII characters have an
unused 8th bit as the MSB. ASCII codes 0-31 are "Control codes" which
are characters that are not printable to the screen, and are used by the
computer to handle certain operations. code 32 is a single space (hit the space
bar). The character code for the character '1' is 49, '2' is 50, etc... notice
in ASCII '2' = '1' + 1 (the character 1 plus the integer number 1)). This is
difficult for many people to grasp at first, so don't worry if you are
confused.
Capital letters
start with 'A' = 65 to 'Z' = 90. The lower-case letters start with 'a' = 97 to
'z' = 122.
Almost all the
rest of the ASCII codes are different punctuation marks.
Extended
ASCII
Since computers
use data that is the size of bytes, it made no sense to have ASCII only contain
7 bits of data (which is a maximum of 128 character codes). Many companies
therefore incorporated the extra bit into an "Extended ASCII" code
set. These extended sets have a maximum of 256 characters to use. The first 128
characters are the original ASCII characters, but the next 128 characters are
platform-defined. Each computer maker could define their own characters to fill
in the last 128 slots.
UNICODE
When computers
began to spread around the world, other languages began to be used by
computers. Before too long, each country had its own character code sets, to
represent their own letters. It is important to remember that some alphabets in
the world have more than 256 characters! Therefore, the UNICODE standard was
proposed. There are many different representations of UNICODE. Some of them use
2-byte characters, and others use different representations. The first 128
characters of the UNICODE set are the original ASCII characters.
EBCDIC
EBCDIC
(Extended Binary Coded Decimal Interchange format) is a character code that was
originally proposed by IBM, but was passed in favor of ASCII. IBM however still
uses EBCDIC in some of its super computers, mainframes, and server systems.
Octal
Octal
is just like decimal and binary in that once one column is "full",
you move onto the next. It uses the numbers 0−7 as digits, and because there a
binary multiple (8=23) of digits available, it has a useful property
that it is easy to convert between octal and binary numbers. Consider the
binary number: 101110000. To convert this number to octal, we must first break
it up into groups of 3 bits: 101, 110, 000. Then we simply add up the values of
each bit:
And then we string all the octal digits together:
1011100002 = 5608.
Hexadecimal
Hexadecimal is
a very common data representation. It is more common than octal, because it
represents four binary digits per digit, and many digital circuits use
multiples of four as their data widths.
Hexadecimal
uses a base of 16. However, there is a difficulty in that it requires 16
digits, and the common decimal number system only has ten digits to play with
(0 through 9). So, to have the necessary number of digits to play with, we use
the letters A through F, in addition to the digits 0-9. After the unit column
is full, we move onto the "16's" column, just as in binary and
decimal.
Hex
|
Decimal
|
Octal
|
Binary
|
0
|
0
|
0
|
0000
|
1
|
1
|
1
|
0001
|
2
|
2
|
2
|
0010
|
3
|
3
|
3
|
0011
|
4
|
4
|
4
|
0100
|
5
|
5
|
5
|
0101
|
6
|
6
|
6
|
0110
|
7
|
7
|
7
|
0111
|
8
|
8
|
10
|
1000
|
9
|
9
|
11
|
1001
|
A
|
10
|
12
|
1010
|
B
|
11
|
13
|
1011
|
C
|
12
|
14
|
1100
|
D
|
13
|
15
|
1101
|
E
|
14
|
16
|
1110
|
F
|
15
|
17
|
1111
|
Hexadecimal Notation
Depending
on the source code you are reading, hexadecimal may be notated in one of
several ways:
·
0xaa11:
ANSI C notation. The 0x prefix
indicates that the remaining digits are to be interpreted as hexadeximal. For
example, 0x1000,
which is equal to 4096 in decimal.
·
\xaa11:
"C string" notation.
·
0aa11h:
Typical assembly language notation, indicated by the h suffix.
The leading 0 (zero)
ensures the assembler does not mistakenly interpret the number as a label or
symbol.
·
$aa11:
Another common assembly language notation, widely used in 6502/65816 assembly
language programming.
·
#AA11:
BASIC notation.
·
$aa11$:
Business BASIC notation.
·
aa1116:
Mathematical notation, with the subscript indicating the number base.
Both uppercase
and lowercase letters may be used. Lowercase is generally preferred in a Linux,
UNIX or C environment, while uppercase is generally preferred in a mainframe or
COBOL environment.