Tuesday, February 23, 2016

Data, the Giver

Data on data is a rare commodity here. It is difficult to say how much information we have been able to store, transmit or digitize in Nepal



Traditional data in the form of scrolls, records in one of the government offices in Nepal. Photo by Prabhat Kiran.

Dharma Adhikari
One example of how a word from your own vernacular becomes foreign beyond imagination is 'data'. This word is used so frequently today that it is among the most common terms in the corpus of contemporary English language.

Unsurprisingly, you will trace it to the Latin "datum", singular for data, a "thing given". But the "um" in datum sounded Sanskritique enough for me to consult a dictionary, and to my amusement, datum in Sanskrit means: to give. Notice also the kinship between donum (gift) in Latin and danum (gift) in Sanskrit. And "donor" is what we often mean by the much widely used Nepali words data (giver) or datrí (also giver) that can be traced likewise to Sanskrit and Latin.

The essence of data owes to our civilization that predates ancient Rome. The "something given" implies some form of a gift, given freely, with some measure of generosity. Today, we replace "to give' with "to transmit" or "to store", and that "something" with "information", which has apparently become everything for our data-driven age.

Function has taken over the meaning. With the unprecedented data deluge in human history, and for economic logic, information management with more urgency, there is new emphasis on storage, transmission and processing. The givers and the takers are in for the digital battle.

It starts with your personal device(s). I am drowning in information overload. I refuse to throw away anything hoping that someday the data could yield some patterns, perhaps something of research value! But to be honest, the data also includes sentimental clutter and I simply cannot let go of it. And now the smart phone adds to that junk, and to my woes.

I might have to face my moment of worst luck any instant: data deletion, theft, file corruption, virus infection, drive failure, hardware damage or its loss, due to human errors or natural causes. Institutionally, the failure and neglect in storage, transmission and processing causes far more damage. Reports of data breaches and hacking are becoming too frequent.

Globally, the volume of storage looks astounding. The Digital Universe Study (2014) by EMC Corporation, a US-based company, reported that digital information is doubling every two years and it will increase 10-fold between 2013 to 2020, from 4.4 trillion gigabytes to 44 trillion gigabytes. To put that into perspective, in four years, according to the study, the world's digital data would fill a stack of iPad Air tablets all the way to the moon 6.6 times.

This exponential increase is fueled by the growth in smart phones, internet of things and wearable devices. Despite all the hype about cyberia and social media, much of the 1.5 billion gigabytes of information on the internet remains hidden, in the form of Deep/Dark Web. We ask Google a question and as it is its characteristic to respond, to give, it suggests hundreds of options. But Google has so far indexed only 200,000 gigabytes of information, and that, according to Eric Schmidt, Google's former CEO*, constitutes only 0.004 percent of the entire internet. Compare that to 13 million gigabytes of data archived at the Internet Archive.

Data preservation is a key challenge. Fortunately, futuristic storage devices, such as crystal quartz and DNA programming promise much more durable, compressed and reliable techniques. With big data, information that exceeds a petabyte (1 million gigabytes), data architectures are revolutionizing business intelligence and analytics. Yet, much of the data remains raw, unstructured and unused, and increasingly insecure. It is many things: something dumped, discarded or thrown away, given freely, used meaningfully, or something kept safe, or hidden, and something stolen.

Looking homeward, it is frustrating to know that data on data is a rare commodity here. It is difficult to say how much information we have been able to store, transmit, or digitize so far in the country. Government agencies and data centers seem obsessed with issues of legislation, infrastructure and marketing. We get to know little about storage, processing or preservation of data.

Nonetheless, some conservative estimates are possible. The TU Central Library, with its 60,000 volumes, is 400 times smaller than the Library of Congress, whose printed materials amount to 15 terabytes or 15,000 gigabytes. We have a dozen or so national libraries of comparable size. Other traditional data including ancient scrolls and newspapers or magazines will add to that volume. The Government Integrated Data Centre (GIDC), a state-of-the-art facility, maintains 16 terabytes (16,000 gigabytes) of storage, and hosts servers of 13 ministries, and 40 other government agencies. Many of the thousands of agencies are yet to enter the digital universe.

About 255 gigabytes (16 GB is audio; 4 GB is video) of content for .np domains, with 11,812,123 URLs have been archived at the Internet Archives, according to Jefferson Baily, Director of Web Archiving Programs there. Regionally, we are behind Bhutan (15,686,595) and India (1,292,822,592).

For the 1,200 or so movies of 2.5-hour each, produced so far, allocate 5 terabytes of storage. Radio and TV broadcast content, music, CCTV surveillance videos, photographs, and social media content will also greatly enlarge the data volume. Industry data from banks, hotels, aviation as well as health, education, agriculture, and customs sectors will also yield enormous amount of data.

In terms of transmission or usage by country's major mobile networks and internet service providers, best guesstimates by Oval Analytics put the figure at 140 terabytes a day for September-October 2015.

Altogether, in our total volume of storage, we must have entered the petabyte range, without any fanfare. Mum's is the word on data volume, even for the Nepal Telecom Authority's regular management information system reports. It's data, the not-given. The culture of information control lingers. A large number of government agencies haven't appointed public information officers, as mandated by the RTI Act 2007. Moreover, turf wars between government agencies render them dysfunctional. For instance, the ICT Policy 2015 suffered major obstacles over battle for jurisdiction between the Ministry of Information and Communication and the Ministry of Science and Technology.

In spite of such hurdles, "Digital Nepal" is pacing ahead slowly, but without much coordination. There are fewer sightings of our sacks of ancient scrolls and records in offices. One remarkable development following the elections in 2013 and the earthquake last year was the launch of a number of data centers, including OpenNepal and the Kathmandu Lab. The public and the private sector boast of many disparate, disjointed small islands of digitization.

I was unable to ascertain the volume of data at the Central Bureau of Statistics. An official there informed that much of the data has already been digitalized and preserved in raw form, and 45 survey documents have been uploaded on its National Data Archive online. He cites the lack of a centralized data system, a clear government policy, and lack of manpower as major challenges. It is difficult to retain technical manpower, he says, because they get better pay in the private sector.

Only a uniform and standardized reporting format can help eliminate data inconsistencies and omissions. A more outstanding issue is the tendency among government officials and businesses to underrate and even trivialize data. As Hemanta Shrestha, CEO of Oval Analytics put it, "The mood is 'forget about data, I will do on my own'". There are givers and no takers. Shrestha observed that there is no interest in innovation and research because there is no competition; decisions remain ad-hoc, not data-driven; and businesses are reactive rather than pro-active.

As we achieve total mobile penetration, and as internet access broadens rapidly, digital coverage is no longer the major concern. We now have to worry less about infrastructure and more about the quantity and quality of data we consume, and the impact of our addictive devices in our lives. Will our electronic democracy enable secured, empowered and digitally competent citizenry? Give it some thought. Increasingly, the issue is more digital knowledge divide than digital divide.

* The published version inadvertently identified Eric Schmidt as "chairman of Alphabet Inc, Google's parent company". The error is regretted. 

Published in Republica, 23 February 2016