I was watching a talk given by Ryan Peterson, CEO of Flexport where he mentioned something interesting...
In his experience freight forwarding, the sector he disrupted, was operated in a very oldschool manner.
As a disruptor he was asked: Where to look to disrupt?
His reponse, paraphrased: ~"Look at components of GDP and find where programming has been applied the least"
Estimating programming applied to sectors could maybe be done with scraping tech job postings by sector,
or something else clever... but to be quick about this let's use a productivity proxy such as output per employee
Cross country data is too high level... OECD isn't very interesting
The Bureau of Labor Statistics does has some granular enough data to spend some time.
https://www.bls.gov/productivity/
https://www.bls.gov/productivity/tables/labor-productivity-detailed-industries.xlsx
# data manipulation
from matplotlib.pyplot import hist, legend
import pandas as pd
# utility functions written by me for this
from utils.productivity_mpl import *
raw_data= pd.read_excel(r'C:\Users\yosty\Downloads\labor-productivity-detailed-industries.xlsx', sheet_name='MachineReadable')
# from
YEAR_1= 2010
# to
YEAR_2= 2021
# number of industries to show
N= 10
# level of detail for industry classification, appendix for more info
digit='4-Digit'
# calculate change in output per worker
output_per_worker_change_data= output_per_worker_change(raw_data, YEAR_1, YEAR_2)
output_per_worker_change_data.nlargest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Wireless Telecommunications Carriers (except Satellite) | 75.874 | 307.224 | 3.049134 |
Travel arrangement and reservation services | 102.119 | 285.299 | 1.793790 |
Oil and gas extraction | 99.166 | 266.247 | 1.684862 |
Cable and other subscription programming | 84.782 | 188.119 | 1.218855 |
Mining | 106.253 | 218.056 | 1.052234 |
Wired and Wireless Telecommunications Carriers | 88.241 | 170.836 | 0.936016 |
Radio and television broadcasting | 91.723 | 171.120 | 0.865617 |
Broadcasting, except Internet | 90.252 | 161.896 | 0.793822 |
Publishing | 95.656 | 170.262 | 0.779941 |
Drycleaning and laundry services | 91.846 | 145.017 | 0.578915 |
output_worker_gains_losses(raw_data, YEAR_1, YEAR_2, N, large=True)
Are these sectors increasing output faster than the number of new employees?
Or are they reducing headcount faster than output decreases?
measure= 'Employment'
unit= 'Thousands of jobs'
measure_change_chart(raw_data, output_per_worker_change_data.nlargest(columns='change', n=N),
measure, unit)
measure= 'Sectoral output'
unit= "Millions of current dollars"
measure_change_chart(raw_data, output_per_worker_change_data.nlargest(columns='change', n=N),
measure, unit)
output_per_worker_change_data.nsmallest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Beverages and tobacco products | 102.330 | 56.961 | -0.443360 |
Beverages | 99.619 | 56.428 | -0.433562 |
Couriers and messengers | 108.090 | 63.774 | -0.409992 |
Manufacturing and reproducing magnetic and optical media | 97.075 | 57.745 | -0.405151 |
Computer and peripheral equipment | 120.033 | 82.479 | -0.312864 |
Pharmaceuticals and medicine | 107.192 | 76.994 | -0.281719 |
Warehousing and storage | 89.955 | 65.147 | -0.275782 |
Railroad rolling stock manufacturing | 83.685 | 63.167 | -0.245181 |
Other transportation equipment | 121.055 | 92.918 | -0.232432 |
Other leather and allied product manufacturing | 129.903 | 101.339 | -0.219887 |
output_worker_gains_losses(raw_data, YEAR_1, YEAR_2, N, large=False)
Are these sectors increasing output faster than the number of new employees?
Or are they reducing headcount faster than output decreases?
measure= 'Employment'
unit= 'Thousands of jobs'
measure_change_chart(raw_data, output_per_worker_change_data.nsmallest(columns='change', n=N),
measure, unit)
measure= 'Sectoral output'
unit= "Millions of current dollars"
measure_change_chart(raw_data, output_per_worker_change_data.nsmallest(columns='change', n=N),
measure, unit)
which industries had the largest changes in employement in absolute terms
measure= 'Employment'
unit= 'Thousands of jobs'
measure_change_data= measure_change(raw_data,
YEAR_2, YEAR_1,
measure, unit, True)
measure_change_data.nlargest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Restaurants and other eating places | 8568.8 | 9880.4 | 1311.6 |
Food services and drinking places | 9593.4 | 10847.7 | 1254.3 |
Limited-service eating places | 4050.5 | 5255.1 | 1204.6 |
Warehousing and storage | 647.0 | 1653.3 | 1006.3 |
General warehousing and storage | 543.9 | 1524.3 | 980.4 |
Accommodation and food services | 11382.6 | 12342.7 | 960.1 |
Retail trade | 15367.9 | 16208.4 | 840.5 |
Couriers and messengers | 562.4 | 1228.3 | 665.9 |
General Merchandise Stores, including Warehouse Clubs and Supercenters | 1524.8 | 2162.8 | 638.0 |
General merchandise stores, including warehouse clubs and supercenters | 1524.8 | 2162.8 | 638.0 |
measure_change_chart(raw_data, measure_change_data.nlargest(columns='change', n=N),
measure, unit)
measure_change_data.nsmallest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Department stores | 1485.2 | 937.0 | -548.2 |
Clothing and clothing accessories stores | 1461.1 | 1135.9 | -325.2 |
Traveler accommodation | 1710.2 | 1409.5 | -300.7 |
Accommodation | 1789.2 | 1495.0 | -294.2 |
Clothing stores | 1093.8 | 805.2 | -288.6 |
Newspaper, periodical, book, and directory publishers | 527.9 | 271.6 | -256.3 |
Hotels (except casino hotels) and motels | 1419.7 | 1180.6 | -239.1 |
Wired and Wireless Telecommunications Carriers | 785.8 | 587.3 | -198.5 |
Newspaper publishers | 266.7 | 113.2 | -153.5 |
Mining | 678.5 | 532.7 | -145.8 |
measure_change_chart(raw_data, measure_change_data.nsmallest(columns='change', n=N),
measure, unit)
which industries had the largest changes in employement in percent
measure_change_data= measure_change(raw_data,
YEAR_2, YEAR_1,
measure, unit, False)
measure_change_data.nlargest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
General warehousing and storage | 543.9 | 1524.3 | 1.802537 |
Warehousing and storage | 647.0 | 1653.3 | 1.555332 |
Couriers and messengers | 562.4 | 1228.3 | 1.184033 |
Software publishers | 260.9 | 543.4 | 1.082790 |
Electronic shopping and mail-order houses | 311.1 | 588.7 | 0.892318 |
Beverages | 170.6 | 296.3 | 0.736811 |
Truck, utility trailer, and RV (recreational vehicle) rental and leasing | 54.6 | 93.0 | 0.703297 |
Beverages and tobacco products | 187.1 | 306.9 | 0.640299 |
Motor vehicles | 153.2 | 250.4 | 0.634465 |
Motor vehicle bodies and trailers | 109.0 | 164.5 | 0.509174 |
measure_change_chart(raw_data, measure_change_data.nlargest(columns='change', n=N),
measure, unit)
measure_change_data.nsmallest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Video Tape and Disc Rental | 67.1 | 3.9 | -0.941878 |
Apparel knitting mills | 22.6 | 7.9 | -0.650442 |
Newspaper publishers | 266.7 | 113.2 | -0.575553 |
Photofinishing | 15.0 | 6.7 | -0.553333 |
Coal mining | 81.0 | 36.6 | -0.548148 |
Manufacturing and reproducing magnetic and optical media | 25.4 | 12.6 | -0.503937 |
Newspaper, periodical, book, and directory publishers | 527.9 | 271.6 | -0.485509 |
Book stores and news dealers | 130.9 | 68.4 | -0.477464 |
Motion picture and video exhibition | 131.1 | 68.7 | -0.475973 |
Office supplies and stationery stores | 149.3 | 79.7 | -0.466175 |
measure_change_chart(raw_data, measure_change_data.nsmallest(columns='change', n=N),
measure, unit)
which industries had the largest changes in output in absolute terms
measure= 'Sectoral output'
unit= "Millions of current dollars"
measure_change_data= measure_change(raw_data,
YEAR_2, YEAR_1,
measure, unit, True)
measure_change_data.nlargest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Food services and drinking places | 466920.000 | 876336.067 | 409416.067 |
Restaurants and other eating places | 401956.000 | 773828.513 | 371872.513 |
Chemicals | 365892.922 | 685276.901 | 319383.979 |
Software publishers | 151541.523 | 429423.492 | 277881.969 |
Publishing | 262521.031 | 516115.126 | 253594.095 |
Truck transportation | 240447.891 | 469437.625 | 228989.734 |
Transportation equipment | 546846.339 | 745861.940 | 199015.601 |
Limited-service eating places | 203257.000 | 391431.605 | 188174.605 |
Food manufacturing | 543506.238 | 730972.943 | 187466.705 |
Full-service restaurants | 198699.000 | 382523.324 | 183824.324 |
measure_change_chart(raw_data, measure_change_data.nlargest(columns='change', n=N),
measure, unit)
measure_change_data.nsmallest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Petroleum and coal products | 584084.315 | 556703.174 | -27381.141 |
Newspaper, periodical, book, and directory publishers | 110979.508 | 86732.842 | -24246.666 |
Semiconductors and other electronic components | 105315.641 | 83485.127 | -21830.514 |
Coal mining | 38640.254 | 17319.542 | -21320.712 |
Computer and electronic products | 294600.003 | 273805.047 | -20794.956 |
Computer and peripheral equipment | 33536.909 | 17568.160 | -15968.749 |
Newspaper publishers | 33715.115 | 23352.400 | -10362.715 |
Periodical publishers | 32107.520 | 24607.359 | -7500.161 |
Communications equipment | 39373.848 | 32430.206 | -6943.642 |
Air transportation | 122073.293 | 117844.801 | -4228.492 |
measure_change_chart(raw_data, measure_change_data.nsmallest(columns='change', n=N),
measure, unit)
which industries had the largest changes in output in percent
measure_change_data= measure_change(raw_data,
YEAR_2, YEAR_1,
measure, unit, False)
measure_change_data.nlargest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Veneer, plywood, and engineered wood product manufacturing | 12661.091 | 35945.155 | 1.839025 |
Software publishers | 151541.523 | 429423.492 | 1.833702 |
Motor vehicle bodies and trailers | 22719.064 | 57416.890 | 1.527256 |
General freight trucking, local | 32045.995 | 76699.262 | 1.393412 |
Wood products | 60976.210 | 137307.661 | 1.251823 |
Sawmills and wood preservation | 20841.630 | 45425.547 | 1.179558 |
Travel arrangement and reservation services | 32986.345 | 71447.718 | 1.165979 |
Couriers and messengers | 67626.855 | 146259.736 | 1.162746 |
Truck, utility trailer, and RV (recreational vehicle) rental and leasing | 14265.457 | 30605.804 | 1.145449 |
Other wood products | 31076.610 | 65392.300 | 1.104229 |
measure_change_chart(raw_data, measure_change_data.nlargest(columns='change', n=N),
measure, unit)
Large growth in electronic shopping and nonstore retailers
increasing the employement and output of warehousing and courier services
measure_change_data.nsmallest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Manufacturing and reproducing magnetic and optical media | 3183.136 | 1005.882 | -0.683997 |
Coal mining | 38640.254 | 17319.542 | -0.551775 |
Apparel knitting mills | 1685.543 | 832.122 | -0.506318 |
Computer and peripheral equipment | 33536.909 | 17568.160 | -0.476154 |
Newspaper publishers | 33715.115 | 23352.400 | -0.307361 |
Leather and hide tanning and finishing | 1524.395 | 1145.729 | -0.248404 |
Periodical publishers | 32107.520 | 24607.359 | -0.233595 |
Newspaper, periodical, book, and directory publishers | 110979.508 | 86732.842 | -0.218479 |
Semiconductors and other electronic components | 105315.641 | 83485.127 | -0.207287 |
Railroad rolling stock manufacturing | 9896.094 | 8091.918 | -0.182312 |
measure_change_chart(raw_data, measure_change_data.nsmallest(columns='change', n=N),
measure, unit)
measure_change_data.nsmallest(columns='change', n=N)
Year | 2010 | 2021 | change |
---|---|---|---|
Industry | |||
Manufacturing and reproducing magnetic and optical media | 3183.136 | 1005.882 | -0.683997 |
Coal mining | 38640.254 | 17319.542 | -0.551775 |
Apparel knitting mills | 1685.543 | 832.122 | -0.506318 |
Computer and peripheral equipment | 33536.909 | 17568.160 | -0.476154 |
Newspaper publishers | 33715.115 | 23352.400 | -0.307361 |
Leather and hide tanning and finishing | 1524.395 | 1145.729 | -0.248404 |
Periodical publishers | 32107.520 | 24607.359 | -0.233595 |
Newspaper, periodical, book, and directory publishers | 110979.508 | 86732.842 | -0.218479 |
Semiconductors and other electronic components | 105315.641 | 83485.127 | -0.207287 |
Railroad rolling stock manufacturing | 9896.094 | 8091.918 | -0.182312 |
It looks like the data is in a long format... super nice thank you BLS!
The data has several dimensions
Industry, Sector, Sector Detail (NAICS Digit), Measure, Units, Year
print(f'The data has {raw_data.shape[0]} rows and {raw_data.shape[1]} columns')
raw_data.head()
The data has 521440 rows and 9 columns
Sector | NAICS | Industry | Digit | Basis | Measure | Units | Year | Value | |
---|---|---|---|---|---|---|---|---|---|
0 | Mining | 21 | Mining | 2-Digit | All workers | Labor productivity | % Change from previous year | 1988 | 2.3 |
1 | Mining | 21 | Mining | 2-Digit | All workers | Labor productivity | % Change from previous year | 1989 | -0.6 |
2 | Mining | 21 | Mining | 2-Digit | All workers | Labor productivity | % Change from previous year | 1990 | -2.3 |
3 | Mining | 21 | Mining | 2-Digit | All workers | Labor productivity | % Change from previous year | 1991 | 1.6 |
4 | Mining | 21 | Mining | 2-Digit | All workers | Labor productivity | % Change from previous year | 1992 | 10.3 |
I see a few productivity related measures that can be used
raw_data['Measure'].unique()
array(['Labor productivity', 'Real sectoral output', 'Hours worked', 'Output per worker', 'Unit labor costs', 'Sectoral output', 'Sectoral output price deflator', 'Hourly compensation', 'Labor compensation', 'Employment', 'Total factor productivity', 'Combined inputs', 'Combined inputs costs', 'Combined inputs price deflator', 'Capital input', 'Capital costs', 'Capital share', 'Capital productivity', 'Capital intensity', 'Labor share', 'Intermediate inputs', 'Intermediate inputs costs', 'Intermediate inputs share', 'Intermediate inputs productivity', 'Intermediate inputs intensity', 'Contribution of capital intensity to labor productivity', 'Contribution of intermediate inputs intensity to labor productivity'], dtype=object)
count= (raw_data[['Industry', 'Sector']]
.drop_duplicates()
.groupby(['Sector'])
.size()
.sort_values(ascending=False)
)
count.plot(kind='barh', title='Number of Industries per Sector')
<AxesSubplot:title={'center':'Number of Industries per Sector'}, ylabel='Sector'>
count= pd.DataFrame(raw_data[['Digit', 'Industry', 'Sector']]
.drop_duplicates()
.groupby(['Digit', 'Sector'])
.size()
.unstack()
).plot(kind='bar', stacked=True)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
<matplotlib.legend.Legend at 0x18a2054e730>
The charts get quite messy with too much detail...
Though as we gain more detail the top indutry changes which is worth noting
In the "Custom" section is hospitals... which pay a lot in the US!
digit_list= ['2-Digit', '3-Digit', '4-Digit', '5-Digit', '6-Digit', 'Custom']
for digit in digit_list:
total_comp_chart(raw_data, digit)
Charts make these html files get too big to fit on a single page
Making this into a dashboard would be better to get more info into a single html page
right now only one sector fits
for digit in digit_list:
output_per_worker_chart(raw_data, digit, 'Retail Trade')
No handles with labels found to put in legend.