Matching Electronic Health Records from different sources
An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlapping features is a prerequisite to developing and testing robust, generalizable algorithms, especially in health care. This joining is usually resolved using meta-data, which may be unavailable or ambiguous in a large database. We design and evaluate methods for mapping features between databases independent of meta-data.