Designing Data Pipelines — with Interactivity

The data pipeline has become a fundamental component of the data science, data analyst, and data engineering workflow. Pipelines serve as the glue that links together various components of the data cleansing, data validation, and data transformation process. However, despite its importance to the data ecosystem, constructing the optimal data pipeline is generally an afterthought - if it’s considered at all. This makes any changes to the central pipeline highly error-prone and cumbersome. With the ever-growing demand for new kinds of data, especially from external vendors, constructing pipelines that are scalable and that allow for monitoring is pivotal for the safe and continued use of data. ...

June 17, 2022 · 1 min · 167 words · Vinoo Ganesh

Apache Parquet Website

The Parquet website was a bit dated - especialy given it’s heavy usage. It took the opportunity to rebuild the website using Hugo. Check it out and please let me know if you have any feedback! Code https://github.com/apache/parquet-site/commit/3563721676b364b767058a953f2bcc3e2c0c4b09 Link http://parquet.apache.org

March 25, 2022 · 1 min · 40 words · Vinoo Ganesh

Ask a CISO: S3 Bucket Permissions and IAM Audits

Data is the most valuable resource in the world and more prized than oil, The Economist declared in 2017. Today, at least 97% of organizations use data to power their business opportunities, and we are accumulating data at a rate never before seen in history. The big question then is how do we secure and ensure that we can make optimal use of all this data? Link https://www.horangi.com/blog/s3-buckets-permissions-and-iam-audits

March 16, 2022 · 1 min · 68 words · Vinoo Ganesh

Designing Data Pipelines — with Interactivity

The data pipeline has become a fundamental component of the data science, data analyst, and data engineering workflow. Pipelines serve as the glue that links together various components of the data cleansing, data validation, and data transformation process. However, despite its importance to the data ecosystem, constructing the optimal data pipeline is generally an afterthought - if it’s considered at all. This makes any changes to the central pipeline highly error-prone and cumbersome. With the ever-growing demand for new kinds of data, especially from external vendors, constructing pipelines that are scalable and that allow for monitoring is pivotal for the safe and continued use of data. ...

March 10, 2022 · 1 min · 167 words · Vinoo Ganesh

O'Reilly Radar: Data & AI

O’Reilly Radar: Data & AI will showcase what’s new, what’s important, and what’s coming in the field. It includes two keynotes and two concurrent three-hour tracks—designed to lay out for tech leaders the issues, tools, and best practices that are critical to an organization at any step of their data and AI journey. You’ll explore everything from prototyping and pipelines to deployment and DevOps to responsible and ethical AI. Link https://www.oreilly.com/videos/oreilly-radar-data/0636920654667/ https://www.businesswire.com/news/home/20210909005792/en/O%E2%80%99Reilly-Announces-O%E2%80%99Reilly-Radar-Data-AI-to-Help-Tech-Leaders-Drive-Innovation-and-Successful-Implementation

October 14, 2021 · 1 min · 72 words · Vinoo Ganesh