Machine learning happens in Python and that has an impact on how data platforms are built. ML needs a lot of data and such data must be easily available in Python. The result is an ecosystem of libraries that can handle quite heavy loads in-process without separate backend. Overtime it got pretty complete - so we call it sometimes Composable Data Stack. I spent last two years building OSS library (dlt) that stitches those things together and is used by thousands of engineers to build data platforms. I’ll give you an example how you can create a production grade data lake with a script that fits in two slides. First we’ll take a look at the ecosystem itself: columnar data and storage (arrow and parquet), table formats (delta, iceberg), query engines (duckdb, data fusion, ibis) and transformation engines (sqlmesh). Then we’ll shortly look at the data lake example, its production deployment at Posthog and our headway into filling missing components of composable data stack.
This session is about the new major release of Airflow that we plan to release early in 2025. This is the first major release of Airflow since 2021 when we released Airflow 2 and it is a result of 4 years of improvements we’ve implemented as minor releases, but also a lot of listening to our users, and changing industry. While Airlfow remains the most important and strongest ETL/Data orchestrator in use, with the advent of LLM/GenAI becoming mainstream part of the data orchestration and a wealth of workflow and tooling specialising in those, Airflow 3 is aiming to become the only True Open-Source, Open-Governance Enterprise-level strong Orchestration solution for all your batch processing worfklow needs. This talk will tell about basic principles and plans we have that will make Airlfow even more suited for most of your data pipeline needs.
Marcin started coding at the age of 8 and loves that to this day. In 2010, he co-founded Xyo (ML heavy search engine) and later Priori Data (app store analytics). After that, he would serve as a Head of Technology at Digital Turbine, before co-founding Neufund, a top 10 Ethereum project. Now he's CTO at dltHub where he writes a lot of OSS code that moves data in Python.
Independent Open-Source Contributor and Advisor, Freelance. Jarek is an Engineer with a broad experience in many subjects - Open-Source, Cloud, Mobile, Robotics, AI, Backend, Developer Experience, but he also had a lot of non-engineering experience - running a company, being CTO, organizing big, international community events, technical sales support, pr and marketing advisory but also looking at legal aspect of licensing and building open-source communities are all under his belt. With the experience in very small and very big companies and everything in-between, Jarek found his place in the Open-Source world, where his internal individual-contributor drive can be used to the uttermost of the potential.
Sunscrapers is a technology consultancy where the most driven and experienced engineers come together to solve meaningful challenges using software, data, and AI. We’re a team that values excellence, ambition, and trust, combining deep industry expertise with a passion for engineering to create high-impact software for finance and healthcare. At Sunscrapers, you’ll be part of a close-knit team of 40, where your work is valued and your growth is supported. We actively contribute to the engineering community through events, open-source projects, and our in-house R&D lab, giving you the chance to explore the latest technologies. We’re proud to maintain an eNPS above 70 and a 5/5 rating on Glassdoor, reflecting our commitment to creating an environment where everyone thrives. Join us, and be part of a culture that’s as focused on learning and growth as it is on delivering world-class solutions.