News

In this paper, we discuss the deficiency of existing datasets and present a new one. The new dataset, which is publically available to the research community, is composed of 1608 30-second music clips ...
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. We are happy to receive feedback and contributions. Deequ depends on ...