alnoda-workspaces/workspaces/notebook-old-workspace/README.md

80 lines
2.9 KiB
Markdown
Raw Normal View History

2022-05-30 19:24:06 +12:00
# Data Workstation
```sh
docker build -t data-workstation-base:3.8 --build-arg docker_registry=rg.fr-par.scw.cloud/dgym .
docker run -p 3000:3000 -p 8001:8000 -p 3012:3012 -p 8092:8092 -p 8448:8448 -p 9992:9992 -p 8085:8085 -p 8086:8086 -p 8082:8082 -p 8084:8084 data-workstation-base:3.8
docker run -p 3000:3000 -p 8001:8000 -p 3012:3012 -p 8092:8092 -p 8448:8448 -p 9992:9992 -p 8085:8085 -p 8086:8086 -p 8082:8082 -p 8084:8084 rg.fr-par.scw.cloud/dgym/python-workstation:3.8
```
## Luigi
Useful links:
- [Luigi Github Repo](https://github.com/spotify/luigi)
- [A Tutorial on Luigi, the Spotifys Pipeline](https://towardsdatascience.com/a-tutorial-on-luigi-spotifys-pipeline-5c694fb4113e)
- [Create your first ETL in Luigi](http://blog.adnansiddiqi.me/create-your-first-etl-in-luigi/)
- [Luigi on PyPi](https://pypi.org/project/luigi/)
## DBT
Useful links:
- [DBT main page](https://docs.getdbt.com/)
- [dbt(Data Build Tool) Tutorial](https://www.startdataengineering.com/post/dbt-data-build-tool-tutorial/)
- [DBT on PyPi](https://pypi.org/project/dbt/)
- [Analytics Engineering with dbt and PostgreSQL](https://dsotm-rsa.space/post/2019/09/01/analytics-engineering-with-dbt-data-build-tool-and-postgres-11/)
```sh
dbt init simple_dbt_project --adapter postgres
```
## Great expectations
Useful links:
- [Great Expectations main page](https://greatexpectations.io/)
- [Great Expectations documentation](https://docs.greatexpectations.io/en/latest/)
- [Great Expectations on PyPi](https://pypi.org/project/great-expectations/)
- [Understanding Great Expectations and How to Use It](https://medium.com/hashmapinc/understanding-great-expectations-and-how-to-use-it-7754c78962f4)
- [Know Your Data Pipelines with Great Expectations](https://medium.com/hashmapinc/know-your-data-pipelines-with-great-expectations-tool-b6d38a2e6f06)
https://www.startdataengineering.com/post/ensuring-data-quality-with-great-expectations/
https://medium.com/hashmapinc/understanding-great-expectations-and-how-to-use-it-7754c78962f4
https://docs.greatexpectations.io/en/stable/guides/tutorials/how_to_create_expectations.html
## Papermill
- [Papermill Report GitHub](https://github.com/ariadnext/papermill_report)
- [Automated Report Generation with Papermill: Part 1](https://pbpython.com/papermil-rclone-report-1.html)
- [Automated Report Generation with Papermill: Part 2]https://pbpython.com/papermil-rclone-report-2.html)
## Prefect
https://docs.prefect.io/core/getting_started/installation.html
## ADVANCED DATA
https://www.datacouncil.ai/blog/25-hot-new-data-tools-and-what-they-dont-do
## PREFECT
RUN pip install prefect==0.14.20
```
[program:prefect]
directory=/home/
command=/bin/sh -c " prefect backend server; prefect server start --ui-port 8095; prefect agent local start "
stderr_logfile = /var/log/prefect-stderr.log
stdout_logfile = /var/log/prefect-stdout.log
logfile_maxbytes = 1024
```
-p 8095:8095