Few words about the project
We’re looking for a Senior Python Developer (Warsaw or remotely) with strong SQL and databases knowledge and experience.
The open-sourced project is connected with the development of data science tools (some of them are in the production stage, but there still remains much to do). Our client is located in the USA - California - near Los Angeles and is specialized in developing its own products - data-related tools, that helps organisations in data loss preventing, diff, optimizing, monitoring, testing and migrations. Your tasks will be connected with an open-source project, that is “live”, but still needs a lot of development. It is a command-line tool and Python library to efficiently diff rows across two different databases (e.g. PostgreSQL -> Snowflake), works for tables with 10s of billions of rows, verifies 25M+ rows in <10s, and 1B+ rows in ~5min, and bridges column types of different formats and levels of precision (e.g. double ⇆ float ⇆ decimal).
There is no overlap requirement, you can work in Polish working hours (flexible hours), no need to adjust it to the USA working hours.
What does the recruitment process look like?
1. Technical interview, but also a good moment for your initial questions about the project (1 hour)
2. Meeting with a Project Lead (1 hour).
3. Meeting with a CTO (1 hour).
All steps are planned online, of course :)
You will be responsible for...
- Answering issues and pull requests in GitHub, questions on Slack
- Reaching out to existing/potential users to assist in adoption.
- Implementing new features, fixing bugs, suggesting improvements
- Writing more tests, anticipating more edge cases
- Improving the CI flow on GitHub, to support testing for more databases
- Assisting in the development of new modules (e.g. same-db data-diff)
- Improving documentation, and writing tutorials.
- Verify that all data was copied when doing critical data migration. For example, migrating from Heroku PostgreSQL to Amazon RDS.
- Verifying data pipelines. Moving data from a relational database to a warehouse/data lake with Fivetran, Airbyte, Debezium, or some other pipeline.
- Alerting and maintaining data integrity SLOs. You can create and monitor your SLO of e.g. 99.999% data integrity, and alert your team when data is missing.
- Debugging complex data pipelines. When data gets lost in pipelines that may span a half-dozen systems, without verifying each intermediate datastore it’s extremely difficult to track down where a row got lost.
- Detecting hard deletes for a updated_at-based pipeline. If you’re copying data to your warehouse based on a updated_at-style column, then you’ll miss hard-deletes that data-diff can find for you.
- Make your replication self-healing. You can use data-diff to self-heal by using the diff output to write/update rows in the target database.
What’s important for us?
- Senior level and at least 7 years of commercial experience.
- Strong knowledge and experience in Python programming (ideally for data solutions, but it is not a must-have).
- Also strong knowledge and experience in SQL and databases (must have).
- Being ready to work independently and having ownership of assigned tasks.
- Experience in carrying out technical documentation.
- Fluent English (C1).
- Perfectly, if you have one month of notice period, or you are available ASAP, we can also wait for you longer.
- Nice to have: experience in working directly with customers and experience in open-source projects.
What do we offer?
- Flexible working hours and remote work possibility
- Multisport card
- Private medical care
- In-house workshops and tech talks
- Free access to the best tools and softwares to develop your skills and work effectively
- Comfortable office in central Warsaw equipped with all the necessary tools for comfortable work (Macbook Pro, external screen, ergonomic chairs) - if working on site
Sounds like a perfect place for you? Don’t hesitate to click apply and submit your application today!
Check our place: fb /instagram