We're only using 3% CPU and 2 GB of memory (out of 64 GB) but the scheduler is unable to run any other simple task at the same time.Ĭurrently only the long task is running, everything else is queued, even thought we have more resources: I run a single BashOperator (for a long running task, we have to download data for 8+ hours initially to download from the rate-limited data source API, then download more each day in small increments). Installed with Virtualenv / ansible - What happened Virtualenv installation Deployment details INTO default_glue_catalog.database_a137bd.Linux / Ubuntu Server Versions of Apache Airflow ProvidersĪpache-airflow-providers-postgres=2.3.0 Deployment INTO default_glue_catalog.database_a137bd.orders_raw_data ĬREATE SYNC JOB load_sales_info_raw_data_from_s3 Create streaming jobs to ingest raw orders and sales data into the staging tables.ĬREATE SYNC JOB load_orders_raw_data_from_s3 Create empty tables to use as staging for orders.ĬREATE TABLE default_glue_catalog.database_a137bd.orders_raw_data()ĬREATE TABLE default_glue_catalog.database_a137bd.sales_info_raw_data() Run the following code in SQLake /* Ingest data */ĬREATE S3 CONNECTION airflow_alternative_pipelines_samplesĪWS_ROLE = 'arn:aws:iam::949275490180:role/samples_role'ĮXTERNAL_ID = 'AIRFLOW_ALTERNATIVE_SAMPLES' Here is a code example of joining multiple S3 data sources into SQLake and applying simple enrichments to the data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |