Three different sets of data are available on internet penetration and use in India – from TRAI, from the census and from IAMAI-IMRB. But even read together the data fails to provide a comprehensive picture of digital inclusion in India.The absence of meaningful data cannot be overemphasized as we set out to achieve Digital India goals
The first four headlines that the internet search engine Google spits out when you type ‘internet use India’ are ‘Internet users in India to cross 300 mn by Dec: Report’, ‘India to have 213 million mobile Internet users by June: IAMAI-IMRB report’, ‘India will have 40m new mobile internet users in 6 months’ and ‘India to have more internet users than US by December end: IAMAI’. When you take a look at the datelines, they are November 19, 2014 and January 13 and 14, 2015. The source for all the stories is the same – a research jointly conducted by an industry body, Internet and Mobile Association of India (IAMAI), and a market research agency, IMRB International.
The first pit stop, in these days of internet-based research,for finding out just how many people in India, or in any city in the country, are on the internet is, then, this report. The additional benefit that this report offers is that, as an annual exercise, it offers comparative data. But, before sharing the data it delivers, let us understand how it arrives at this data.
The report relies on sampling the population. It samples the population in 35 cities and also has a separate section on rural India. Since our interest is in the cities, primarily Pune and for some comparative purposes the seven other large metros, we will try to understand the data from these cities. Once the cities are selected, the sampling method used is called quota sampling. A quota sample is, as any basic research textbook will tell you, a nonprobability sample or one that is not representative of the entire population. Such methods are not the best to generalise to the population. However, by choosing randomly within each of the sections (quotas), the validity can become more robust, which is what the report says the researchers have done. The quotas used for this research were households from SEC A, B, C, D, and E. Now SEC,or Socio-Economic Classification, is a form of market segmentation that uses education of the chief earner and ownership of 11 consumer durables to divide or segment the population. This classification is primarily used as a yardstick by the media industry when it tries to reach out to advertisers. The actual stretch is from SEC A1 all the way to SEC E3 (A1, A2, A3, B1, B2, C1, C2, D1, D2, E1, E2 and E3. It should be noted that while it is called Socio-Economic, it is primarily an economic classification since the ownership of consumer durables is the primary yardstick mapped to the education of the chief earner in the household. The assumption behind the naming, presumably, is that possession of consumer durables and the education of the chief earner are valid proxies for social indicators.). So when we look at the data below, we need to remember that the population is sampled in proportionto the spread of these segments in the general population. Unlike the 2012 and 2013 reports, the 2014 Internet in India report does not tell us the actual sample size. Understanding the method used is one part of understanding the research report.
The other important part that concerns us is what the people were asked. So there are two labels that we need to understand: Claimed Users and Active Users. Claimed Users, the report tells us, are those who replied with a‘yes’ to the question: Have they used theinternet ever (on a PC, mobile phone, tablet). The Active Internet User is one who answered ‘Yes’ to each of the following three questions: Have they used a PC?Have they used the internet ever (on a PC, mobile phone, tablet)?Have they accessed the internet in the last one month (on a PC, mobile phone, tablet)?One of the challenges of doing research can be seen in how the active user is defined. If you ask a young urban man or woman whether s/he is active on the internet and then ask what the term ‘active’ means in this context, s/he might reply that it means checking Facebook at least three times a day. For another person, getting on the internet once a week to check email may be active. But for this study, a person who has accessed the internet once in the last one month is active.
Pune, by this yardstick, has 40 lakh people who have used the internet at least once in the last month. This number goes up to 60 lakh for those who have used the internet ever. The table below gives the yearly increase over the last three years.
Table 1. Claimed and Active Internet User Estimates* 2012-2014 (IAMAI & IMRB)
* The word ‘estimates’ has been inserted by the author to show the data for what it is – estimates drawn from the sample.
The yearly increase of those who have used the internet ever in Pune is from 20 lakh in 2012 to 30 lakh in 2013 to 60 lakh in 2014. The figure for those who have used the internet at least once in the last month has increased from 12 lakh in 2012 to 30 lakh in 2013 to 40 lakh in 2014. In case you have not noticed, there are three cities for which, in 2013, there was no difference between those who said they have used the internet once in the last month and those who had used it ever. At best you can conclude that this is a problem because the figures have been rounded off. Or at worse that extrapolation from the sample to the entire population has resulted in little difference between claimed and active users. Be that as it may, we now have a research document that gives us the estimated number of people accessing the internet at least once a month and an estimate of how many people have accessed the internet ever.
Now, let us see what these numbers mean vis-à-vis some other data that is available. The Telecom Regulatory Authority of India (TRAI) provides a quarterly report on telecom, which includes a section on internet subscription (see table 2 below). Based on data provided by the telecom companies and the other internet service providers, the TRAI data is given by service area, which is the telecom parlance of how the country is divided for distribution of spectrum. The problem for anyone trying to compare data is three-fold. First, other than Mumbai, Delhi and Kolkata, all the places come under the state service areas. Even Chennai, which was earlier a separate service area, has been merged with Tamil Nadu. Second, even for these three cities, while TRAI provides data by service area, IAMAI-IMRB uses the urban agglomeration defined by the census as the city geography for their sampling(i) . For example, Delhi UA used by IAMAI-IMRB does not include Gurgaon, Faridabad, Ghaziabad and Noida, while the Delhi Service Area includes all these areas. Third, most crucially, while IAMAI-IMRB tries to estimate number of users based on those who have been at least once on the internet in the last month, the TRAI data is on the number of internet subscribers. Since there are individuals with multiple sim cards (hence subscribers for telecom companies), the number of subscribers qua individuals may be less than the number of subscribers reported. At the same time, it is quite possible that in a price-conscious country like India no one would be willing to subscribe to the internet and not be using it. So the number of subscribers at least signals an absolute commitment towards using the internet, which single use per month does not. So the number of subscribers reported by telecom companies cannot be compared to at least once in a month user estimates based on the survey of the population.With those three provisos, let us look at the service area-wise internet subscribers that include the eight metros. (See Table 2 below)
Table 2: Service Area-wise number of Internet Subscribers (Source: TRAI, Sept 2014)
Total Internet Subscribers (million)
|Tamil Nadu (incl. Chennai)|
Now, let us turn to a third data source and see if that offers some way of seeing the access that people have to the internet. The third data source is the 2011 Census. The Census is not a sample and hence the snapshot that we get for 2011 is that of the total population. Just to see how this data set can be seen in conjunction with the TRAI or IMRB data, we will only look at the Pune numbers for illustrative purpose. In Table 3 (see below) we have the reported numbers for the six constituent units of the Pune Urban Agglomeration. Since the schedule provides percentage of households and not absolute numbers, we need to calculate the exact numbers by using the household data provided separately. So within the Pune Municipal limits there were 1.63 lakh households (22% of the 7.43 lakh total households) who had a computer with an internet connection. In the PimpriChinchwad MC, the number was more than 47,000 households (11.2% of the 4.27 lakh total household) with computers connected to the internet.
Table 3. Percentage of Households having computers and telephones in Pune (Source: Census 2011)
|Dehu Road (CB)|
|Pune (M Corp.)|
So what we have here are three data sets that provide some information about internet in India. The newspapers that headlined increasing internet use were using numbers from a survey that estimated internet use by measuring those who said they used the internet at least once a month. The second numbers that are often used when discussing internet is the total number of subscribers as reported by telecom companies, which includes those with a wireline and a wireless internet connection including individuals who may be multiple subscribers. The third set of numbers comes from the census, which measures only those who have a computer with an internet connection and thus ignores the vast numbers, as suggested by the TRAI data, who connect to the internet via their mobile devices.
Given the problems with each of these ways and the difficulty of comparing these numbers, let us try to see what stacking some numbers of just Pune UA tells us about the reliability and validity of these numbers.
So let us go back in time and read the 2011 census data in conjunction with the 2012 IAMAI-IMRB since at least the time periods are the same.(ii) In 2012, according to the IAMAI-IMRB data, there were 20 lakh people in the Pune UA who said they had used the internet at some point of time. So it was estimated that about 40%(20 lakhs is 39.54% of the UA population) of the people had accessed the internet in the Pune UA. Those who had accessed the internet at least once in the last month were just a little below 24% (12 lakhs is 23.72% of the UA population). The census tells us that in 2011 there were 2.16 lakh households with a computer that had internet connection. With a simple assumption of persons per household based on total population, we get a figure of about 9 lakh people in the city who had access to a computer with internet connection in their homes.
Table 4. Number of HHs and population with access to computer with internet at home (Census 2011)
HH with Internet
Population with access to internet at home
|Dehu Road (CB)|
|Pune (M Corp.)|
For the same period the IAMAI-IMRB estimates were that 12 lakh people had accessed the internet at least once in the last month. It then appears that only 3 lakh more than the census data would suggest were reported as accessing the internet at least once in the month. This figure appears on the lower side. There are a number of questions that may be raised about such data comparison. One, we are assuming that everyone in the family is a computer user if there is a computer with internet connection at home. Two, the census only counts those with computers with internet at home while IAMAI-IMRB estimate all those who used the internet at least once in a month whether on a computer or on a mobile device. The point, however, is that we are not comparing the data sets, but reading them in conjunction. It appears that the IAMAI-IMRB estimates are on the lower side because if 9 lakh people in the city have access to wireline internet connection, then the total numbers of internet users at least once a month cannot be just 12 lakh. It should be more. After all, the IAMAI-IMRB data includes those connecting from offices and those using mobile devices. What is being suggested here is the difficulty of even using one set of numbers and reading them in conjunction, not comparison, with another set of numbers for a given area in the country.
So, at the risk of some repetition, let us be clear about what numbers are available to us. One, absolute numbers (census) that however do not speak of internet use but merely of possession by the household of a computer with internet connection. The census data ignores that part of the population that may be accessing the internet from their offices and those who may be accessing internet via mobile devices. We have an internet service provider data via TRAI that gives us total number of subscribers to the internet, but does not tell us anything about discrete connections and also does not reveal data by city or town, but by service areas. And then we have user data from a marketing agency and industry body combine that estimates users based on an annual survey that defines active users as those who have accessed the internet once in a month.
The conundrum is what purchase these numbers have when we want to get a sense of internet use vis-à-vis digital inclusion. The short answer is — not much. The data gives a fleeting glimpse of what is happening. This is not to doubt the work behind data collection and presentation. But that the purpose of such data gathering and collection is quite different from measures of digital inclusion. A once-in-a-month use cannot be an indicator of digital inclusion. The data on computers with internet at home is a more robust number, but only at the level of homes. It does not count those with access to internet connection at their offices or in cyber cafés or their friends’ homes. It does not give us those using mobile devices for internet use. Also, till the next census comes along, we do not have any number to rely on. And the industry data on the number of subscribers,and not users, is an indication, but by reporting by service areas, it becomes difficult to go beyond a general trend across regions.
With such data it is not really possible to answer the question of internet penetration, what that really means and how we use it. Is it access? Is it use? Is it access by device? Or is downloading a social networking app on the phone enough to qualify the person as an internet user? With enough stakeholders in the game from government to marketers to telcos, the questions that each wants to ask is different and the uses to which they put the data that they generate is also different. And none of them helps us answer the question of how prevalent is the use of internet in India.
What is even more intriguing is that a cogently articulated guideline for measuring ICT is not being used by these national-level data gathering efforts. The International Telecommunication Union has been creating manuals for indicators of ICT in households and among individuals. The latest manual was launched in December 2013 at the 11th World Telecommunication/CT Indicators Symposium held in Mexico City in December 2013.
Titled Manual for Measuring ICT Access and Use by Households and Individuals, 2014, it provides 16 core indicators, and seven of them are directly related to use of internet. Not only does this manual provide reasons why these indicators are important, but it also provides sample question/s for each indicator and can be used as a good reference manual for those researching ICT. The sevenindicators that should serve asmeasures for internet use are:
HH6: Proportion of households with internet
HH7: Proportion of individuals using the internet
HH8: Proportion of individuals using the internet, by location
HH9: Proportion of individuals using the internet, by type of activity
HH11: Proportion of households with internet, by type of service
HH12: Proportion of individuals using the internet, by frequency
HH14: Barriers to household internet access
Indicators HH6, HH11 and HH14 are related to access, and indicators HH7, HH8, HH9 and HH12 are related to use. The model questions will give a clear picture of what these indicators are telling us:
HH6: Does this household have internet? Yes/No
HH7: Have you used the internet from any location in the last three months? Yes/No
HH8: Where did you use the internet in the last three months? [Respondents should select all locations. List of locations include Home, Work, Place of education, Another person’s home, Community Internet access facility (typically free of charge), Commercial Internet access facility (typically not free of charge) and In mobility.]
HH9: For which of the following activities did you use the internet for private purposes (from any location) in the last three months? [There are 29 activities included in the list of activities.]
HH11: What type/s of internet services are used for internet access at home? [The types of services include six types divided by wireline and wireless and by speed.]
HH12: How often did you typically use the internet during the last three months (from any location)? Respondents can only respond to one category (see above). [The categories are 1)At least once a day: once a working day for respondents who only (or most frequently) use the internet from work or school etc., 2) At least once a week but not every day and 3) Less than once a week.]
HH14: Why does this household not have internet access? [The reasons included are: Do not need the internet (not useful, not interesting, lack of local content), Have access to the internet elsewhere, Lack of confidence, knowledge or skills to use the internet, Cost of the equipment is too high, Cost of the service is too high, Privacy or security concerns, Internet service is not available in the area, Internet service is available but it does not correspond to household needs (eg quality, speed) and Cultural reasons (eg exposure to harmful content)]
From indicators HH6, HH7 and HH12, we can see the suggested measures for household access, use and frequency. HH6, however, is different from the 2011 Census measure because while HH6 would have the internet use at home by any means, the census question is restricted to computer or laptop with internet. HH7 suggests a measure of use, which is someone who has used the internet in the last three months. The IAMAI-IMRB question on claimed internet use is for someone who has used the internet ever. At least once in three-months is a more robust measure of internet use. HH12 is a more granular way of understanding internet use with three time periods: once a day, once a week and less than once a week. The IAMAI-IMRB survey uses a once a month measure for the active user, which certainly is at variance with what ITU suggests may be considered an active user. While the manual does not use the word active and provides for a more objective measure of frequency, it seems that even if we were to combine the everyday user and the once a week user, we would have a more valid measure of an active user. Incidentally, “at least once in a month user” is an indicator used by social media and app makers to arrive at what they consider active users of the app or the social networking site. Facebook now provides both at least once a month user and at least once a day user. But while that may work for a website or a social networking app or a game where stickiness is important, it cannot be a yardstick used to measure internet use.
Yet the headlines in the Indian newspapers used “ever used” and “once a month use” to announce the number of internet users in India. In the absence of data that is more specific and of a better quality, it is difficult to make meaningful policy decisions either by the government or the industry. The absence of such data cannot be overemphasized as we set out to achieve Digital India goals.
AlokeThakore is an independent journalist, researcher, newsroom coach and teacher. He serves as the Hon. Director of the JM Foundation for Excellence in Journalism and has been associated, over the last three years, with a number of research projects on telecom and internet access.He is also the founder-director of Font & Pixel Media Pvt Ltd, a media and education enterprise.
(i) Personal communication with Mr Abheek Biswas of IMRB.
(ii) Personal communication with Mr Abheek Biswas of IMRB where he mentioned that the 2012 numbers were reconciled with the provisional 2011 census data since the provisional population tables started coming in 2012.