(Snowflake Series) Introduction to Data Modelling Part 1

Data Ingestion and data engineering is an important aspect of being a good AI Engineer. Data is fundamental for AI/ML projects today. 

Snowflake is an important asset or toolkit for AI Engineering and other data analytics projects. 

                             

In this article, we will look into:

What is Snowflake? What is data modelling? Why do we need data modelling and what is the importance of it? What are the layers in data modelling? What is data normalisation and the steps in it? What is Database Design?

What is Snowflake?

Snowflake is a cloud-based data warehouse platform crafted for modern analytics. Its strengths:

- Handles both structured and semi-structured data

- Separates compute and storage for scalability and cost efficiency

Data can reach Snowflake via:

1. Batch loads (e.g., COPY INTO from cloud storage)

2. Continuous ingestion (Snowpipe)

3. ETL/ELT tools (Fivetran, dbt, Informatica)

4. Direct connectors (APIs, JDBC/ODBC, Power BI/Tableau integrations)

What is data modelling?


Data modelling is the process of defining and structuring data for efficient storage, retrieval, and analysis. At its core, data modelling is about organising data in a way that supports both the immediate business needs (reporting, analytics) and long-term data integrity.

Good data models help:
- Organise and optimise data
- Facilitate efficient data management
- Enable high-quality, accurate analysis

The Importance of Data Modelling

Effective data modelling ensures high-quality, consistent, and accurate data by eliminating redundancy and enforcing integrity rules. It also optimises performance for fast, efficient storage and retrieval—crucial for scalable solutions like Snowflake.

What are the Layers in Data Modelling?


Conceptual Data Model
Focuses on high-level entities and their relationships.
Answers: What are the main things (entities) we need to store? How do they relate?
Example: Customers, Orders, Products.

Logical Data Model
Adds detail by defining attributes, keys, and relationships more precisely.
Example: Customer Name, Order Date (attributes); which table each attribute belongs to; primary and foreign keys.

Physical Data Model
Implements the logical model in a specific database (like Snowflake).
Specifies data types, indexes, and constraints for performance and data integrity.

ER diagrams are a essential component in getting the visual representation of the entities, attributes, and relationship. 


What is Data Normalisation used in Database Design?


Normalization is a database design technique to reduce redundancy and improve integrity via structured rules:

First Normal Form (1NF): Each cell contains only one value; no repeating groups.
Second Normal Form (2NF): Non-key attributes depend fully on the primary key (not partially).
Third Normal Form (3NF): No transitive dependencies—non-key columns depend only on the primary key.

Steps: $$1NF \rightarrow 2NF \rightarrow 3NF 

Why Normalize?
1. To avoid data duplication
2. To achieve clean, efficient data storage
3. To ensure high data integrity

What is Database Design?


Database design is a methodical process that creates databases optimised for storage, reliability, and retrieval. It involves:
- Identifying entities (things like Customers, Orders)
- Defining their attributes (e.g., customer name, order date)
- Establishing relationships (e.g., Customers place Orders)
- Applying normalisation rules
- Implementing constraints like primary and foreign keys

Conclusion:

Data Modelling is just the first step in Data engineering concepts as I start learning on Snowflake. There are different types of data models and different schema for them which we will look in the future articles.















Comments

Popular posts from this blog

(Commentary): Trusting the process

About me

(Hat) AI Engineer