Saturday, April 28, 2012

How data is stored in CD (Compact Disc) and read back from it?


©2012 Ganti Sree Rajiv

You can’t imagine the world if there are no storage medium like CD/DVD. You put a CD into the CD drive, open some software in your computer, select the files to be stored, and finally click on ‘Burn’. Within minutes, all the data gets copied in CD and you are happy.

But what’s happening behind the scene???

Before knowing about it, it would be better to know the difference between ‘Analog recording’ and ‘Digital recording’ and the way they work, as you will most often encounter these words.

An analog signal continuously varies with time. The most common and best example is an ECG machine. The ECG machine has a sharp pointed pen which is very sensitive and vibrates according to the sound waves coming from your heart. A graph paper is made to move using a small motor, and this sharp pointed pen draws some pattern on this graph as it vibrates which resembles the way your heart beats (with time). This is an analog storage and that wave is called analog wave.



On the other hand, a Digital signal represents ‘Bits’ (in case of binary, these bits are stored as sequence of ‘0’s and ‘1’s) and not a continuous wave. The CD/DVD stores such Digital data.

Digital recording first converts the Analog wave into stream of bits(Digital) and then records these bits. The analog wave can be converted to Digital by using ‘Analog to Digital converter (ADC)’. In order to play back the recorded music, these stored Digital bits are to be converted back to Analog wave. This is done by ‘Digital to Analog converter (DAC)’.

However there are lot of issues practically that, along with the original (intended) signal lot of noise due to surroundings also add up to the wave, which leads to errors in case of digital data. However, we are not discussing about this issue in detail.

How the analog wave is converted to Digital bits?
Let us look at this process in brief.

Let the ‘Red’ colored wave represent the part of the original Analog wave (to show you as example). Now, the first step in converting this analog wave into digital is that it is to be ‘SAMPLED’. In simple terms, the time period(time axis) of analog wave is equally divided (you can see 0-8 in above figure). After dividing the time period(time axis) equally, vertical lines (‘Green’ in color) are drawn from those points, and these vertical lines meets the original analog wave at some points (‘Yellow’ in color).
Now all these YELLOW colored points are connected (you can see a ‘Black’ line connecting these ‘Yellow’ points).

The wave representing these sampled points (we call it here as ‘Representing wave’) looks like this:
As the next step, you can see from the above figure that the whole amplitude of this 'representing wave' is divided equally into what is called as ‘Decision thresholds’. Each of these decision thresholds is given a unique code in binary (0000,0001,0010,0011,0100….). If the representing wave falls in that particular decision threshold region (follow ‘Yellow’ points), then that part of the representing wave is assigned with that code as its equivalent Digital representation. 


(Please note that the binary codes assigned to Decision threshold doesn't necessarily follow a particular fashion. It may be assigned in any way depending on coding mechanism.)

For the above 'representing wave', the equivalent digital representation is therefore: 
0001 0001 0101 0011 0010 0110 0110 0010 0111

So, this is the final digital equivalent binary data of that particular Analog wave which is finally stored in storage medium. 

The ‘Digital to Analog converter (DAC)’ uses the reverse mechanism to reproduce the analog wave back from these digital codes using the same code words of the threshold assigned to ADC.
Looking at this, here comes the definition of one common term, “Sampling rate” defined as Number of samples taken per second.

Note: It is very important that the Analog wave which is reproduced back from Digital codes, must match the original analog wave. If there is lot of difference, then there is no point using this wave anymore. So for the purpose of the analog wave which is reproduced back from the digital codes to accurately/closely represent the original analog wave, it is observed that the sampling rate must be 44,100 samples/second, and the number of decision thresholds is 65,536.

 I’m not going to confuse you anymore about this. This is just the basics showing how an analog wave is converted into Digital (Binary) representation.


What is the Capacity of a CD?
You might have observed mentioned on label of a CD as “74min/700MB”. What does it mean? 
It means that the CD can store a maximum 74 minutes of continuous sound. And in terms of digital data it can store 700MB of data.

But, did you ever think of how this number ‘700MB’ is determined by the CD manufacturer???

Before calculating the CD capacity, let us make a note of some points used in this calculation.
1.       1MB = 1024KB = 1024Bytes = 1024 Bits
2.       As I mentioned above that number of decision thresholds = 65,536 is preferred for the reproduced signal to accurately represent the original analog signal. These 65536 levels are actually represented by 2 Bytes.
3.       Also I mentioned that a sampling rate of 44,100 samples/sec is preferred for the reproduced signal to accurately represent the original analog signal. These 44,100 samples/sec is for 1 Channel. But a CD stores for 2 channels(one for each of the speaker in a speaker system).

Now, we arrive at the calculation of CD capacity:

44100 samples/sec/channel * 2 channels * 2 Bytes (per sample) * 74 minutes * 60 seconds (per minute) = 783,216, 000 bytes which ‘approximately’ equals a number ‘700 MB’. This is how the CD capacity is determined.



Now, coming to our main topic, How data is stored in a CD?

We calculated above that a CD can store approximately 700MB of data.
But looking at a CD, it is just a piece of plastic, and looking at cross-section of CD, it is just 1.2mm thick and 12cm in diameter. So how these 700MB of data stored in such a small device?

As I discussed above, the CD stores digital data (binary 1’s and 0’s). Well, theoretically it is Ok and everyone knows it. But how exactly are these 1’s and 0’s stored in a disc????

As the answer to this question, let us look at the different layers with which the Disc is made.


This image shows the layers of CD-R (CD-R means CD-Recordable, i.e., it is only one time recordable disc). The base is a plastic substrate (injection-molded piece of clear polycarbonate plastic). The layer above it is filled with an organic dye material, and over it a metal layer (usually Aluminum) which is used for reflection. 
When you ask your computer to save your data on disc, it will convert your data into digital bits, and signals the CD-ROM drive to write data accordingly. The CD drive has a laser unit (components of CD drive are discussed below). During recording, the laser present in the CD-ROM drive heats the organic dye layer, which makes a hole (called as ‘pit’). These pits are made according to the Digital binary data sequence which the computer sends. (For example, if binary ‘1’ is received, the laser makes a pit(hole) on the dye layer of the disc, and for binary ‘0’ it doesn’t).


Note: that we sometimes refer the words ‘pits’ and ‘bumps’. This is a very common fact that if you make a hole (pit) on one side of a surface, it means that on the other side of the same surface it resembles a lump (bump).


If you look deep inside the disc at the surface of dye after recording the data, you can find bumps.
The bump is usually 0.5 micron wide, and varies in length depending on the sequence of digital data. (As per the example discussed above, if series of 1’s are sent, then longer will be the length of the bump).

This is in case of a CD-R (only one time recording). But, how this happens in CD-RW (re-writable CD)???
In case of a CD-RW (re-writable) disc, instead of using organic dye layer as in CD-R, a special alloy (containing silver, indium, antimony, tellurium) is used. The CD-RW drive has laser that works with 2 different temperatures (instead of only one temperature as in the case of CD-R drive). These two temperatures are usually 200 degree and 600 degree.
When CD-RW drive has to record data on the CD-RW disc, it sets temperature to 200 degree, and the laser makes pits (bumps) on this alloy, same like before. In case, it has to erase the previous stored data and re-write the data on a disc, then the CD-RW drive first sets the temperature to 600 degrees at which temperature the entire alloy ‘liquefies’ and again upon solidification it loses its previous pattern, which resembles that data is erased. Now, once the data is erased, the CD-RW drive sets the temperature to 200 degree, and the laser again starts making pits on the disc to store new data.



Now comes the question,

How data is read back from CD?
The CD has a single track which is spiral in shape. The bumps (discussed above) are also made along this spiral track

The circling starts from centre (inside) of the disc to the outside.

Before understanding further, let us first look at the components of a CD drive.

The 'disc drive motor' rotates the disc usually at 200-500 revolutions/minute (the speed varies from 200-500 depending on other factors discussed later).
The 'laser unit' focuses on the bumps (working is explained later).
The 'tracking drive and motor' moves this laser unit so that laser unit follows the desired spiral path to read the bumps.

How the CD drive reads data back from Disc??
This mechanism is very interesting. In simple words, you can say the laser searches for presence of bumps along the spiral. This is correct, but let us see in detail how this works:
The laser unit sends a laser light onto the disc. This beam passes through the polycarbonate layer, and suppose if there is a bump(that means some data is stored there), the laser reflects back from that bump and falls exactly on the opto-electronic device, which then gives a logical high voltage (indicating presence of data).






If there is no bump present, then the laser light passes through and falls on the Aluminium layer where it gets reflected.

This time, due to difference in the reflection angle, the reflected laser doesn’t fall anymore on the opto-electronic device, and hence this device gives a logical low voltage (indicating no presence of data).
In this way, the data which is stored in CD is read back as binary (digital form). Now the DAC (Digital to analog converter) converts this back to analog (refer top).


The 'tracking motor' moves the lens unit from centre of the disc to the outer, and the 'disc drive motor' meanwhile rotates the disc, so that laser covers the entire spiral. Don’t get confused. See this illustration video…

As you can see, when the laser unit is at the centre of the CD, the disc motor is rotating the CD fast at high speed of 500 revolutions/minute. When the laser unit comes to outer part of CD, then the disc motor is rotating the CD slowly at around 350 revolutions/minute so as to read all the bumps without missing them.
(Please note that this mechanism is due to the fact that the radius of disc at centre and outer part is not the same).


Some more Information:
1.       Suppose you have stored several songs in a disc and you wish to listen to a particular song. Instead of listening to all songs on the disc until you get the desired one, you would prefer to first display the list of all the songs present, and then you would like to play the desired song. How the CD player does this for you??
For this purpose, in order the laser to be able to move between songs, data indicating the detail of song needs to be encoded into the music telling the drive “where it is” on the disc. This data is called as ‘sub-code data’.
2.       Sometimes there is possibility that a laser may misread the bump, which may change completely the sequence and the output analog wave. In order to make sure this doesn’t happen even if the laser misreads the presence of bump, 'error-correcting codes' are used.
3.       Suppose there is a very large file stored onto disc. The whole data is not stored sequentially. It is stored non-sequentially around one of the disc’s circuits. This is because even if a small error occurs while searching for presence of bump(data), it should not lead to big error as sequence gets disturbed.




©2012 Ganti Sree Rajiv

1 comment:

My World said...

Thanks for sharing!! Very informative!