V03. The Tsar's Trans-Atlantic Voyagers
What it is
This is an open-access dataset intended for research and analysis. Its source is a dataset provided by the U. S. National Archives consisting of half a million passenger arrival records and ship manifests across six decades, from 1834 to 1897. Our edition consists of 11 related tables (csv), 6 spatial data files (shapefiles and geojson), and 2 metadata files (ReadMe; codebook). Together the files describe 527,394 passengers, 10,761 voyages, 781 ships, 681 occupations, 182 last known residences, 150 routes, and 78 ports.
Why it matters
The original dataset contains passenger records with name, age, town of last residence, destination, and codes for sex, occupation, literacy, country of origin, transit and/or travel compartment. It also contains manifest records - think of them as voyage records - including ship name, arrival date, and arrival port.
As unique and vast as it is, the original data is not easy to use. The records are full of inconsistencies and ambiguities, and are not easily ingested by visualization and GIS tools. They are, like most historical sources, a wonderful example of messy data. Our work to generate an enhanced and usable edition fell into four buckets:
- We cleaned and tidied, documenting our process every step of the way.
- We reorganized the data according to a relational model.
- We added calculated fields as well as context fields drawn from other sources.
- We created vector data - port locations, route lines, and locations of last known residences - to facilitate spatial analysis.
Fear not: inconsistencies and ambiguities still turn up everywhere, but we are relatively confident that with this new edition you will find them productive and stimulating rather than frustrating.