![]() ![]() I have never encountered a client writing a script to auto-generate DDL, or writing boilerplate tests for SQL-no one wants these to be their job. It kind of works, but you need it to really work in a way that’s publicly observable + verifiable via testing. Manually managing dependencies between scriptsĪgain, this is pretty easy to set up, but it doesn’t get to the heart of the matter: getting trusted data to the people that you care about.Managing schema names between production and dev environments.Writing boilerplate DDL ( CREATE TABLE etc * 1000).These stored procedure-like SQL scripts required: They were often written in naked python scripts that only ran a SQL query + wrote data to BigQuery. That pipeline above included a plethora of data transformation jobs, built in various ways. And that common interface is configured in code + version-controlled. Over time, you end up building a bunch of pieces that Airflow provides out of the box.īut what makes one come alive as a data engineer-is it fine-tuning logging and making sure that the basic overhead of your pipeline works, or is it getting trustworthy data to the people you’re working with?Īirflow solves those same problems, but in a publicly-verifiable and trusted way-it provides a common interface by which data teams can get on the same page about overall data pipeline health. “When something fails, how do you rerun from the point of failure?” Let’s mangle the production script. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |