Migrating to Parquet
I work at a data-as-a-service (DaaS) company that delivers PBs of geospatial data to customers across a variety of industries. We build and manage a central data lake, housing years of data, and operationalize that data to solve our customers’ problems. I recently gave a talk about the specifics of file formats at Spark+AI Summit 2020 that generated a lot of questions about my company’s migration from CSV to Apache Parquet. As CTO of a DaaS company, I saw firsthand how this migration had a drastic effect for all of our customers. This session will drill into the operational burden of transforming the storage format in an ecosystem and its impact on the business. ...