An interesting contest caught my eye today. There’s a site Ruby Learning by Pune’s Satish Talim (Twitter : @IndianGuru) which organises regular Ruby Programming Challenge for Newbies and it introduced the 13th challenge earlier yesterday : RPCFN: Economics 101 (#13) by Dr. Bruce Scharlau.
While neither being a regular rubyist nor being a newbie, I thought it made for a decent exercise and a diversion for a little while, albeit in python. As an added interest I wrote the solutions twice. Once in a very procedural way and once leveraging the functional programming constructs.
Here’s the brief problem as stated in the challenge :
The file cia-1996.xml(links back to rublylearning.org) is the data from the CIA World Factbook of 1996 in XML format. It has details about 260 countries across five continents. Your challenge, should you choose to accept it, is to uncover the following details buried within this file:
What is the population of the country with the most people? Yes, we know it’s China, but just how many people lived there in 1996?
What are the five countries with the highest inflation rates, and what were those rates in 1996?
What are the six continents in the file and which countries belong to which continent? Can you also produce them in alphabetical order?
I used python 2.6 with the lxml xml parser for this exercises.
Parsing the xml
Since the source file is a zip file, one needs to open the zip, and extract the xml out of it. Since I am using python 2.6 and not python 2.7 I couldn’t use the with construct which would’ve not required the explicit zipfile close() statement.
importzipfilefromlxmlimportobjectifyfromoperatorimportitemgetterimportitertools# Unzip file, extract xml, convert to object zip=zipfile.ZipFile("../cia-1996.zip")xmlfile=zip.open("cia-1996.xml")root=objectify.fromstring("".join(lineforlineinxmlfile.readlines()))zip.close()
This opens the zip file, extracts the xml out of it, concatenates all the lines as a single string and converts the data into a single object referred to as root
Note: the itemgetter and itertools imports are used subsequently.
Procedural : Find the country with the highest population
# Country with highest populationmaxp=0maxc=''forcountryinroot.country:ifint(country.get('population',0))>=maxp:maxc,maxp=country.get('name'),int(country.get('population',0))print(maxc,maxp)# Output is : ('China', 1210004956)
Procedural : Top 5 countries with highest inflation rates
# Top 5 countries with highest inflation ratesinflation_country_tuples=forcountryinroot.country:inflation_country_tuples.append((float(country.get('inflation',0.0)),country.get('name')))inflation_country_tuples=sorted(inflation_country_tuples,key=itemgetter(0),reverse=True)printinflation_country_tuples[0:5]# Output : ((244.0, 'Belarus'), (94.0, 'Turkey'), (85.0, 'Azerbaijan'), (83.299999999999997, 'Malawi'), (71.299999999999997, 'Yemen'))
This extracts the inflation, name tuple from each country and creates a list out of it, sorts the list using inflation in a descending order and then prints the first five elements.
Procedural : Sorted continents, each associated with all their sorted countries
# Countries by continentcontinent_country_tuples=forcountryinroot.country:continent_country_tuples.append((country.get('continent',''),country.get('name')))continent_country_tuples=sorted(continent_country_tuples)current_continent=Nonecountries_of_continent=Nonecontinent_grouped_countries=forcontinent,countryincontinent_country_tuples:ifcontinent!=current_continent:ifcurrent_continent:continent_grouped_countries.append((current_continent,countries_of_continent))countries_of_continent=current_continent=continentcountries_of_continent.append(country)continent_grouped_countries.append((current_continent,countries_of_continent))printcontinent_grouped_countries# Output : too long to include here
Functional Programming Solutions
Incidentally the same problem can also be solved using very functional programming constructs as follows. This shows an interesting contrast of solutions in both the procedural and functional programming ways. The logic used across both the sets is virtually the same thought the constructs are different.
# Country with highest populationprintreduce(lambda(maxc,maxp),c:(c.get('name'),int(c.get('population',0)))ifint(c.get('population',0))>=maxpelse(maxc,maxp),root.country,('',0))# Top 5 countries with highest inflation ratesprinttuple(itertools.islice(sorted(((float(country.get('inflation',0.0)),country.get('name'))forcountryinroot.country),key=itemgetter(0),reverse=True),5))# Countries by continentprinttuple((continent,tuple(countryforcountryincountries))forcontinent,countriesinitertools.groupby(sorted((country.get('continent',''),country.get('name'))forcountryinroot.country),itemgetter(0)))
I further structured the functional programming approach code. This code has no list comprehensions (for loops) at all. The code is as follows :
# Find the country with the maximum populationprintreduce(lambdacountry_pop_max,country_pop_next:country_pop_nextifcountry_pop_next>country_pop_maxelsecountry_pop_max,map(lambdacountry:(country.get('name'),int(country.get('population',0))),root.country),('',0))printtuple(itertools.islice(sorted(map(lambdacountry:(country.get('name'),float(country.get('inflation',0.0))),root.country),key=itemgetter(1),reverse=True),5))printmap(lambda(continent,continent_country_tuples):(continent,map(lambda(continent,country):country,continent_country_tuples)),itertools.groupby(sorted(map(lambdacountry:(country.get('continent'),country.get('name')),root.country)),itemgetter(0)))
Since the above is likely to be a little too cryptic and confusing, here’s the detailed commented code (only comments and whitespace added)
printreduce(# Comparator to decide the country with the maximum populationlambdacountry_pop_max,country_pop_next:country_pop_nextifcountry_pop_next>country_pop_maxelsecountry_pop_max,# get a sequence of tuples of (country name, country population) map(lambdacountry:(country.get('name'),int(country.get('population',0))),root.country),# initial seed('',0))printtuple(# take the top 5 itemsitertools.islice(# sort the listsorted(# get a sequence of tuples of (country name, inflation) map(lambdacountry:(country.get('name'),float(country.get('inflation',0.0))),root.country),# sorting to be done using the second element of the tuplekey=itemgetter(1),# sort to be done in the descending orderreverse=True),# count of elements to be sliced5))printmap(# function to flatten the continent_country_tuple tuple into a country tuple lambda(continent,continent_country_tuples):# first element of the tuple is continent(continent,# as the second element, return only the country from the continent country tuple to form a tuple of countries map(lambda(continent,country):country,continent_country_tuples)),# group the continent country tuples by continentitertools.groupby(# sort the continent country tuplessorted(# extract a continent country tuple from the countrymap(lambdacountry:(country.get('continent'),country.get('name')),root.country)),# this is the function to extract the key to sort byitemgetter(0)))