Ingenii Data Engineering Package

Details

Current Version: 0.3.3

Overview

This package provides utilities for data engineering on Ingenii's Azure Data Platform. This can be both used for local development, and is used in the Ingenii Databricks Runtime.

Usage

Import the package to use the functions within.

import ingenii_data_engineering

dbt

Part of this package validates dbt schemas to ensure they are compatible with Databricks and the larger Ingenii Data Platform. This happens when a data pipeline to ingest a file is run, to make sure a file is ingested correctly. Full details of how to set up your dbt schema files in your Data Engineering repository can be found in the Ingenii Data Engineering Example repository.

Pre-processing

This package contains code to facilitate the pre-processing of files before they are ingested by the data platform. This allows users to transform any data into a form that is compatible. See details of working with pre-processing functions in the Ingenii Data Engineering Example repository.

This package also contains the code to turn the pre-processing scripts into a package, ready to be uploaded and used by the Data Platform. Once this package is installed, the command

python -m <package name> <command> <folder with pre-processing code>
python -m ingenii_data_engineering pre_processing_package pre_process

will generate a .whl file in a folder called dist/. For more details, see the Ingenii Data Engineering Example repository.

Development

Prerequisites

A working knowledge of git SCM
Installation of Python 3.7.3

Set up

Complete the 'Getting Started > Prerequisites' section
For Windows only:
Run make setup: to copy the .env into place (.env-dist > .env)

Getting started

Complete the 'Getting Started > Set up' section

From the root of the repository, in a terminal (preferably in your IDE) run the following commands to set up a virtual environment:

python -m venv venv
. venv/bin/activate
pip install -r requirements-dev.txt
pre-commit install

or for Windows:

python -m venv venv
. venv/Scripts/activate
pip install -r requirements-dev.txt
pre-commit install

Note: if you get a permission denied error when executing the pre-commit install command you'll need to run chmod -R 775 venv/bin/ to recursively update permissions in the venv/bin/ dir
The following checks are run as part of pre-commit hooks: flake8(note unit tests are not run as a hook)

Building

Complete the 'Getting Started > Set up' section
Run make build to create the package in ./dist
Run make clean to remove dist files

Testing

Complete the 'Getting Started > Set up' and 'Development' sections
Run make test to run the unit tests using pytest
Run flake8 to run lint checks using flake8
Run make qa to run the unit tests and linting in a single command
Run make qa to remove pytest files

Version History

0.3.3: Deprecated path for dbt
0.3.2: Further bugfix for JSON UTF-8 BOM
0.3.1: Remove unnecessary functions specific to Databricks
0.3.0: Create pre-processing package using the module
0.2.1: Handle JSON read UTF-8 BOM
0.2.0: Pre-processing happens all in the 'archive' container
0.1.5: Better functionality for column names in .csv files
0.1.4: Handle JSON files
0.1.3: Adding pre-processing utilities
0.1.2: Rearrangement and better split of work with the Databricks Runtime. Better validation
0.1.1: Minor bug fixes
0.1.0: dbt schema validation, pre-processing class

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
ingenii_data_engineering		ingenii_data_engineering
.gitignore		.gitignore
.pypirc-dist		.pypirc-dist
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ingenii Data Engineering Package

Details

Overview

Usage

dbt

Pre-processing

Development

Prerequisites

Set up

Getting started

Building

Testing

Version History

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ingenii Data Engineering Package

Details

Overview

Usage

dbt

Pre-processing

Development

Prerequisites

Set up

Getting started

Building

Testing

Version History

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages