Fork me on GitHub

Analysing Tourism Data with Kotlin

Tue 09 January 2018 | tags: kotlin

Here's a brief example of some of the functional processing capabilities of Kotlin.

To get some raw data I headed out to http://data.gov.in

The original data set can be found at Foreign Tourist Arrivals In India From Top 15 Source Countries

The modified CSV with the continents added is here

Thus the CSV has the following columnar structure.

  • Column 1: Country of Tourist
  • Column 2: Continent
  • Columns 3 onwards: Number of tourists visiting India starting 2001 to 2015

The problem statement I made up is kind of contrived to be able to demonstrate some interesting coding aspects. In any case the statement is as follows:

Read the file. Group all the data by continents. For each continent compute the following

  • Total number of tourists from all countries in that continent in 2015
  • Percentage growth for total number of tourists for that continent from 2001 to 2015
  • Country from whom the maximum number of tourists visited India in 2015 from that continent

Display the data in the descending order of the percentage growth rate from that continent

The resultant Kotlin code is as follows

import java.io.File

data class CountryData(val name: String, val visitors: List<Int>)
data class Result(val tourists2015: Int, val pctGrowth: Int, val maxCountry: String)

fun main(args: Array<String>) {
//  Open file
  File("tourists-to-india.csv")
      // Read all lines
    .readLines(Charsets.US_ASCII)
      // Drop the first line (column headers)
    .drop(1)
      // Drop the last line (file totals)
    .dropLast(1)
      // For each remaining row in the file
    .map { row ->
      // split into cells using a comma as the delimiter
      row.split(",")
          // for the array of cells in each row
          .let { array ->
            // create a pair. The first value is the continent name (array[1])
            // The second value is the CountryData ie.
            //    list of tourists from that country each year starting 2011
            array[1] to CountryData(array[0], array.drop(2).map { it.toInt() })
          }
    }
      // collate all country data for each continent into a list of countrydata
      // for that continent
    .groupBy({it.first}, { it.second })
      // for each continent
    .map { (continent, countriesData) ->
      // compute tourists from across the continent in 2001
      val tourists2001 = countriesData.sumBy { it.visitors[0] }
      // compute tourists from across the continent in 2015
      val tourists2015 = countriesData.sumBy { it.visitors[14] }
      // compute percentage growth
      val pctGrowth = (tourists2015 - tourists2001) * 100 / tourists2001
      // now we want to find out which country in that continent sent
      // the maximum number of tourists
      val maxCountry = countriesData
          // sort data based on 14th element in the list ie. visitors for 2015
          .sortedByDescending { it.visitors[14] }
          // take the first country data and specifically its name attribute
          .first().name
      // construct a pair of continent to result
      continent to Result(tourists2015, pctGrowth, maxCountry)
    }
      // sort the continent result pairs based on the percentage growth
    .sortedByDescending { it.second.pctGrowth }
      // display the results
    .forEach { println(it) }
}

Comments !

social