Take me to the python code

Question:

Are there typical ways countries trade with each other?

Findings:

There are less than 10 typical trading relationships.

Meaning:

It is easier to understand dynamics of trade and trade policy with this summarization of trading relationships.


The Overall Process


Final design used


1. The Data: Bilateral Product Level Trade Baskets by Year

I pull the HS_M_0010 data set from the World Trade Organization’s API. This is an annual, bilateral import product level data set. The Harmonized System classification system of 96 different trade products is used. The units are millions of nominal USD. I don’t see this explicitly, but I’ll assume the units are also value of imports using cost, insurance and freight (CIF) price. “The Harmonized System is a standardized numerical method of classifying traded products. It is used by customs authorities around the world to identify products when assessing duties and taxes and for gathering statistics.”

I clean the data so each value is in percent of the total value of the trade basket. This is important for two reasons:

  1. This analysis is agnostic of the size of the trade relationship, results are on the composition of trade baskets
  2. This normalizes the data to [0, 1], which is important for the models used


Toy Example of what the data looks like with 3 Products

Importer Exporter Year Product 1 Product 2 Product 3
Country 1 Country 2 2012 10% 70% 20%
Country 1 Country 3 2012 60% 20% 20%


2. Dimension Reduction: Creating 2 numbers that behave like 96

Distance metrics stop behaving in ways that clustering methods need in high dimensional space. Put more plainly, it gets harder to figure out the similarity between two sequences of numbers the larger the sequences become. This motivates dimension reduction while preserving information from the original data. Principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP) are widely used in dimension reduction applications.

PCA is linear and deterministic which is very desirable for interpreting results. Though then it fails to capture more complex non-linear aspects of the data. Also, PCA is less of an appropriate dimension reduction method on its own, but is often used as a preprocess step before tSNE or UMAP. tSNE is can capture non-linear relationships but does not preserve global structure and has many hyper-parameters. UMAP is a newer approach that is non-linear like tSNE but does a better job at preserving global structure and is more stable.

Method Linear ? deterministic? approach Local / Global
PCA Linear yes matrix factorization -
t-SNE Non-Linear no graph based Local
UMAP Non-Linear no, but fairly stable graph based Both

There were no clear distinctions between trade baskets when applying tSNE or UMAP to the original 96 dimensional data set. This motivated an autoencoder to use before applying tSNE or UMAP. Three different autoencoders were tested, of increasing depth. t-SNE applied to the most deep autoencoder’s output produced the most discernible clusters.

Deep autoencoder design:

96 -> 88 -> 80 -> 72 -> 64 -> 56 -> 48 -> 40 -> 32 -> 24 -> 16 -> 8 (and back to 96 symmetrically)

64 to 8 dimensions using trained autoencoder


8 to 2 dimensions using tSNE, colors are dbscan clusters

t-SNE is only reliable locally. Groups far apart are not necessarily more different than groups close together. So applying DBSCAN, a density based clustering method, on t-SNE output has important implications of the interpretations of cluster results

3. Clustering: Identifying types of trading relationships

While it was worth exploring several methods for the previous dimension reduction, I only used one method for clustering. Density-based spatial clustering of applications with noise (DBSCAN) is the method used for clustering. It has two hyper-parameters:

Nice aspects of DBSCAN:

video of how dbscan works:


A grid sweep, testing every combination of hyper-parameters, was used to look for the best dbscan model. EPS from 2 to 9 and the minimum number observations needed to create a cluster from 5 to 35 were swept. The results were evaluated with the silhouette score while being penalized for amount of observations being attributed to noise. The plot below is the best performing dbscan model with eps of 8 and minimum points of 30 with a noise penalty scalar of 2.

The silhouette score does not know the difference between a noise group and a regular group. In some instances the highest scoring hyper-parameter pairs were using the noise label like a regular label. For example, group 0 below could be labeled the noise group and the silhouette score would be high as the group is clearly defined. This penalty helps filter out models with this issue. To “guide” dbscan in other work I’ve also played around with penalizing for too few or many groups.


\[ \begin{aligned} {\sf score} = { \sf silScore} * {\sf percentNoise} ^ {\sf noisePenality} \end{aligned} \]

A simple demonstration of how the score is increasingly penalized with increasing the noise penalty scalar. The selected dimension reduction output visibly does not have noise, so a high value of 2 is used. Though if upon visual inspection there is noise, a smaller value should be used to not remove models with attributing observations to noise.

4. Explaining Characteristics of the Clusters

+ products that contribute to cluster

- absence of these products contribute to cluster


Cluster 0:

+ hydrocarbons, industrial machinery, vehicles, fish, clothing, precious metals, tea, medical equipment

- elecrical machinery, rare-earth metals, iron and steel, industrial agriculture products

Top.Importers Importers.Percent Top.Exporters Exporters.Percent Top.Import.Regions Import.Region.Percent Top.Export.Regions Export.Region.Percent Top.Import.Incomes Import.Income.Percent Top.Export.Incomes Export.Income.Percent
Japan 0.0194359 Venezuela, Bolivarian Republic of 0.0151695 Sub-Saharan Africa 0.2815833 Sub-Saharan Africa 0.2782650 Upper middle income 0.3057597 High income 0.3017303
Korea, Republic of 0.0175397 Qatar 0.0149324 East Asia & Pacific 0.2171131 Latin America & Caribbean 0.1938848 High income 0.2851387 Upper middle income 0.2623845
Uganda 0.0158805 Japan 0.0146954 Latin America & Caribbean 0.2107134 Europe & Central Asia 0.1789524 Lower middle income 0.2325196 Lower middle income 0.2365489
Chinese Taipei 0.0156435 Nigeria 0.0137473 Europe & Central Asia 0.1211187 Middle East & North Africa 0.1590424 Low income 0.1654420 Low income 0.1737379
Thailand 0.0156435 Saudi Arabia, Kingdom of 0.0135103 Middle East & North Africa 0.0836691 East Asia & Pacific 0.1147191 0 0.0111401 0 0.0255985
Macao, China 0.0156435 Bangladesh 0.0125622 South Asia 0.0471676 South Asia 0.0421901 NA NA
United States of America 0.0156435 Kuwait, the State of 0.0123252 North America 0.0274947 0 0.0293909 NA NA
Switzerland 0.0151695 Libya 0.0123252 0 0.0111401 North America 0.0035553 NA NA
Hong Kong, China 0.0139844 Bahamas 0.0120882 NA NA NA NA
Burkina Faso 0.0130363 Bahrain, Kingdom of 0.0120882 NA NA NA NA


Cluster 1:

+ iron and steel, misc chemicals, tobacco, cocoa, food industry waste, toys

- industrial machinery, vehicles, hydrocarbons, electrical machinery, pharmaceutical products, clothes, wood, cereals, coffee,

Top.Importers Importers.Percent Top.Exporters Exporters.Percent Top.Import.Regions Import.Region.Percent Top.Export.Regions Export.Region.Percent Top.Import.Incomes Import.Income.Percent Top.Export.Incomes Export.Income.Percent
Switzerland 0.0200383 World 0.0199376 Latin America & Caribbean 0.2414661 Europe & Central Asia 0.3243379 Upper middle income 0.3289699 High income 0.4158695
Hong Kong, China 0.0193334 China 0.0197362 East Asia & Pacific 0.2107542 East Asia & Pacific 0.1627228 High income 0.3228275 Upper middle income 0.2728829
India 0.0171181 Turkey 0.0162119 Sub-Saharan Africa 0.2104521 Latin America & Caribbean 0.1519484 Lower middle income 0.2217299 Lower middle income 0.1829624
South Africa 0.0167153 Spain 0.0141980 Europe & Central Asia 0.1406706 Sub-Saharan Africa 0.1430873 Low income 0.1162018 Low income 0.0912295
United States of America 0.0167153 Hong Kong, China 0.0140973 Middle East & North Africa 0.0988823 Middle East & North Africa 0.1029101 0 0.0102709 0 0.0370557
New Zealand 0.0159098 Chinese Taipei 0.0138959 South Asia 0.0556842 South Asia 0.0481321 NA NA
Canada 0.0151042 France 0.0134931 North America 0.0318196 0 0.0432988 NA NA
Japan 0.0148021 India 0.0132917 0 0.0102709 North America 0.0235626 NA NA
Dominican Republic 0.0147014 Malaysia 0.0131910 NA NA NA NA
Australia 0.0143994 Netherlands 0.0127882 NA NA NA NA


Cluster 2:

+ electrical machinery, hydrocarbons, plastics

- organic chemicals, fertilisers, food industry waste

Top.Importers Importers.Percent Top.Exporters Exporters.Percent Top.Import.Regions Import.Region.Percent Top.Export.Regions Export.Region.Percent Top.Import.Incomes Import.Income.Percent Top.Export.Incomes Export.Income.Percent
Colombia 0.0163797 Sweden 0.0125997 Latin America & Caribbean 0.2444351 Europe & Central Asia 0.2763545 Upper middle income 0.3317934 High income 0.3443931
Chinese Taipei 0.0159597 Sudan 0.0121798 Sub-Saharan Africa 0.2410752 Sub-Saharan Africa 0.2141957 High income 0.2931541 Upper middle income 0.2658547
Indonesia 0.0159597 Iran 0.0121798 East Asia & Pacific 0.2116758 Latin America & Caribbean 0.1852163 Lower middle income 0.2364553 Lower middle income 0.2179756
Hong Kong, China 0.0155397 Tunisia 0.0117598 Europe & Central Asia 0.1230575 East Asia & Pacific 0.1180176 Low income 0.1301974 Low income 0.1474171
Brazil 0.0151197 Ireland 0.0117598 Middle East & North Africa 0.1012180 Middle East & North Africa 0.1146577 0 0.0083998 0 0.0243595
Saudi Arabia, Kingdom of 0.0146997 Malta 0.0113398 South Asia 0.0482990 South Asia 0.0482990 NA NA
Australia 0.0146997 Bangladesh 0.0109198 North America 0.0218396 0 0.0340193 NA NA
Canada 0.0146997 Ethiopia 0.0104998 0 0.0083998 North America 0.0092398 NA NA
Jordan 0.0146997 Romania 0.0096598 NA NA NA NA
South Africa 0.0142797 Iceland 0.0096598 NA NA NA NA


Cluster 3:

+ fertilsers, minerals, beverages, iron and steel, precious metals

- industrial machinery, vehicles, clothes,

Top.Importers Importers.Percent Top.Exporters Exporters.Percent Top.Import.Regions Import.Region.Percent Top.Export.Regions Export.Region.Percent Top.Import.Incomes Import.Income.Percent Top.Export.Incomes Export.Income.Percent
Hong Kong, China 0.0222116 Germany 0.0186291 Latin America & Caribbean 0.2409840 Europe & Central Asia 0.3365178 Upper middle income 0.3238596 High income 0.4160497
Mexico 0.0171961 Sweden 0.0183903 Sub-Saharan Africa 0.2366850 Sub-Saharan Africa 0.1731550 High income 0.3152615 Upper middle income 0.2486267
United States of America 0.0169572 Hungary 0.0138524 East Asia & Pacific 0.2101743 Latin America & Caribbean 0.1729162 Lower middle income 0.2171005 Lower middle income 0.2085025
Chinese Taipei 0.0160019 Korea, Republic of 0.0133747 Europe & Central Asia 0.1263434 East Asia & Pacific 0.1289706 Low income 0.1356580 Low income 0.1081920
Philippines 0.0155242 Finland 0.0128971 Middle East & North Africa 0.0941008 Middle East & North Africa 0.1038930 0 0.0081204 0 0.0186291
South Africa 0.0152854 Ireland 0.0121806 South Asia 0.0525436 South Asia 0.0437067 NA NA
Australia 0.0152854 Philippines 0.0119417 North America 0.0310485 0 0.0267495 NA NA
Thailand 0.0140912 Russian Federation 0.0112252 0 0.0081204 North America 0.0140912 NA NA
Canada 0.0140912 Czech Republic 0.0102699 NA NA NA NA
Korea, Republic of 0.0138524 Latvia 0.0102699 NA NA NA NA


Cluster 4:

+ industrial machinery, vehicles, wood, coffee, clothes, precision machinery, cereals, foods

- iron and steel, fish, beverages

Top.Importers Importers.Percent Top.Exporters Exporters.Percent Top.Import.Regions Import.Region.Percent Top.Export.Regions Export.Region.Percent Top.Import.Incomes Import.Income.Percent Top.Export.Incomes Export.Income.Percent
Colombia 0.0188857 Japan 0.0198300 Sub-Saharan Africa 0.3267233 Sub-Saharan Africa 0.3484419 Upper middle income 0.3186969 Lower middle income 0.2648725
Egypt 0.0179415 Comoros 0.0188857 Latin America & Caribbean 0.2582625 Latin America & Caribbean 0.1831917 Lower middle income 0.2596789 Upper middle income 0.2549575
Mauritius 0.0165250 Burundi 0.0188857 East Asia & Pacific 0.1628895 Europe & Central Asia 0.1647781 High income 0.2181303 Low income 0.2322946
Dominican Republic 0.0165250 Gabon 0.0179415 Europe & Central Asia 0.1090652 East Asia & Pacific 0.1326723 Low income 0.1940510 High income 0.2119924
Mozambique 0.0165250 Myanmar 0.0146364 Middle East & North Africa 0.0849858 Middle East & North Africa 0.0816808 0 0.0094429 0 0.0358829
Mongolia 0.0146364 New Zealand 0.0141643 South Asia 0.0354108 South Asia 0.0467422 NA NA
Peru 0.0146364 Madagascar 0.0132200 North America 0.0132200 0 0.0406043 NA NA
Uganda 0.0141643 Central African Republic 0.0132200 0 0.0094429 North America 0.0018886 NA NA
Tunisia 0.0136922 Zambia 0.0127479 NA NA NA NA
New Zealand 0.0136922 Ethiopia 0.0118036 NA NA NA NA


Cluster 5:

+ electrical machinery, hydrocarbons, pastics, iron and steel

- stone, food oils, seeds, and fruits, fertilisers

Top.Importers Importers.Percent Top.Exporters Exporters.Percent Top.Import.Regions Import.Region.Percent Top.Export.Regions Export.Region.Percent Top.Import.Incomes Import.Income.Percent Top.Export.Incomes Export.Income.Percent
Argentina 0.0170334 Saint Kitts and Nevis 0.0163236 Sub-Saharan Africa 0.2845990 Sub-Saharan Africa 0.2952449 Upper middle income 0.3151171 High income 0.2824698
Burkina Faso 0.0170334 Malta 0.0134847 Latin America & Caribbean 0.2285309 Europe & Central Asia 0.2065295 Lower middle income 0.2505323 Upper middle income 0.2576295
Mali 0.0163236 Mozambique 0.0134847 East Asia & Pacific 0.2079489 Latin America & Caribbean 0.1894961 High income 0.2441448 Lower middle income 0.2271114
Chinese Taipei 0.0163236 Rwanda 0.0134847 Europe & Central Asia 0.1149752 Middle East & North Africa 0.1178141 Low income 0.1781405 Low income 0.1937544
Philippines 0.0156139 Sudan 0.0134847 Middle East & North Africa 0.0787793 East Asia & Pacific 0.1135557 0 0.0120653 0 0.0390348
Fiji 0.0156139 Sierra Leone 0.0134847 South Asia 0.0553584 0 0.0432931 NA NA
Thailand 0.0156139 Andorra 0.0120653 North America 0.0177431 South Asia 0.0283889 NA NA
Hong Kong, China 0.0149042 Mali 0.0120653 0 0.0120653 North America 0.0056778 NA NA
Bolivia, Plurinational State of 0.0149042 Slovak Republic 0.0113556 NA NA NA NA
Nicaragua 0.0141945 Guatemala 0.0106458 NA NA NA NA


Cluster 6:

+ industrial machinery, clothes, cereals, aluminum

-

Top.Importers Importers.Percent Top.Exporters Exporters.Percent Top.Import.Regions Import.Region.Percent Top.Export.Regions Export.Region.Percent Top.Import.Incomes Import.Income.Percent Top.Export.Incomes Export.Income.Percent
Uruguay 0.0251142 Lao People’s Democratic Republic 0.0273973 Latin America & Caribbean 0.3219178 Sub-Saharan Africa 0.3333333 Upper middle income 0.3721461 Lower middle income 0.2968037
Peru 0.0251142 Tajikistan 0.0251142 Sub-Saharan Africa 0.2853881 Latin America & Caribbean 0.1780822 Lower middle income 0.2625571 Upper middle income 0.2397260
Bolivia, Plurinational State of 0.0251142 Haiti 0.0228311 East Asia & Pacific 0.1415525 Europe & Central Asia 0.1621005 Low income 0.1849315 Low income 0.2351598
Burundi 0.0228311 Bahrain, Kingdom of 0.0205479 Europe & Central Asia 0.1027397 East Asia & Pacific 0.1210046 High income 0.1666667 High income 0.1940639
Costa Rica 0.0228311 Lesotho 0.0205479 Middle East & North Africa 0.0821918 Middle East & North Africa 0.1187215 0 0.0136986 0 0.0342466
Madagascar 0.0228311 Sao Tomé and Principe 0.0182648 South Asia 0.0502283 South Asia 0.0525114 NA NA
Tunisia 0.0205479 Bhutan 0.0182648 0 0.0136986 0 0.0342466 NA NA
Fiji 0.0205479 Maldives 0.0159817 North America 0.0022831 NA NA NA
Burkina Faso 0.0205479 Angola 0.0159817 NA NA NA NA
Indonesia 0.0205479 Saint Kitts and Nevis 0.0159817 NA NA NA NA

References:

Data References