Comments (5)
The first link you have provided only lists the most important fields of the transmitted data object.
This document page in Segment University
mentions two more pieces of data which are:
- Group
- Alias
from ormus.
The source layer is where data originates and is ingested into the system.
It's the entry point for data collection and serves as a crucial component in the data pipeline. Developing the source layer involves integrating with various sources of data, such as websites, server libraries, mobile SDKs, and cloud applications, to collect data and send it to the CDP. Here are some key aspects to consider when developing the source layer:
1. Data Collection:
-
Identify the data sources: Determine which data sources you want to collect data from. These could include websites, mobile apps, IoT devices, server applications, and more.
-
Data collection methods: Implement data collection methods such as JavaScript libraries for web tracking, mobile SDKs for Android and iOS, or server-side libraries to capture data from various sources.
-
Data validation: Data validation in the source layer of a CDP is crucial to ensure that the collected data is accurate, consistent, and adheres to the expected format.
-
Data Deduplication: Prevent duplicate data entries by identifying and removing redundant data points based on unique identifiers, timestamps, or other criteria.
2. Data Transformation: (Which is for the Dataplane layer)
-
Data formatting: Ensure that the data collected from different sources is properly formatted and standardized for ingestion into the CDP.
-
Data Transformation: Offer data transformation capabilities to clean, reformat, or harmonize data from various sources to ensure consistency and accuracy.
-
Data enrichment: You may need to enrich the data by adding additional context or metadata to make it more valuable.
Here are some features to consider for the source layer development:
-
Data Format Validation: We have implemented data format validation to ensure that incoming data adheres to the expected structured formats, such as JSON, XML, or other predefined schema. This helps maintain data integrity and structure.
-
Rate Limiting and Throttling: Rate limiting and throttling mechanisms are in place to prevent brute force attacks and excessive data submissions from single sources.
-
Data Completeness: We ensure that all required fields are present in the incoming data. Missing critical fields could lead to incomplete or unusable data.
-
Data Consistency: We perform checks for consistency within the data. This includes maintaining relationships between data elements, such as product IDs and corresponding products in the database.
-
IP Address Validation: We verify the integrity of IP addresses to ensure they are valid and not associated with malicious sources.
in case of entities and models that we need for our platform, we need Event model that could be something in general like this:
type Event struct {
EventName string `json:"event_name"`
UserID string `json:"user_id"`
Properties map[string]interface{} `json:"properties"`
Timestamp time.Time `json:"timestamp"`
}
from ormus.
Hello everyone,
I appreciate your thorough and valuable research. I would like to introduce another topic that also requires our attention.
Data Security and Compliance:
We should consider implementing an additional strategy for data encryption and protection. However, this is just a preliminary idea. Please share your feedback and thoughts on this matter.
Implementing additional data encryption and protection is a commendable idea. It's crucial for safeguarding sensitive data. I suggest further exploring encryption methods and compliance with industry standards. Assess the potential impact on performance and user experience to strike the right balance. Continuous improvement in this area is essential.
from ormus.
The source layer is where data originates and is ingested into the system. It's the entry point for data collection and serves as a crucial component in the data pipeline. Developing the source layer involves integrating with various sources of data, such as websites, server libraries, mobile SDKs, and cloud applications, to collect data and send it to the CDP. Here are some key aspects to consider when developing the source layer: 1. Data Collection:
* Identify the data sources: Determine which data sources you want to collect data from. These could include websites, mobile apps, IoT devices, server applications, and more. * Data collection methods: Implement data collection methods such as JavaScript libraries for web tracking, mobile SDKs for Android and iOS, or server-side libraries to capture data from various sources. * Data validation: Data validation in the source layer of a CDP is crucial to ensure that the collected data is accurate, consistent, and adheres to the expected format. * Data Deduplication: Prevent duplicate data entries by identifying and removing redundant data points based on unique identifiers, timestamps, or other criteria.
2. Data Transformation: (Which is for the Dataplane layer)
* Data formatting: Ensure that the data collected from different sources is properly formatted and standardized for ingestion into the CDP. * Data Transformation: Offer data transformation capabilities to clean, reformat, or harmonize data from various sources to ensure consistency and accuracy. * Data enrichment: You may need to enrich the data by adding additional context or metadata to make it more valuable.
Here are some features to consider for the source layer development:
1. **Data Format Validation:** We have implemented data format validation to ensure that incoming data adheres to the expected structured formats, such as JSON, XML, or other predefined schema. This helps maintain data integrity and structure. 2. **Rate Limiting and Throttling:** Rate limiting and throttling mechanisms are in place to prevent brute force attacks and excessive data submissions from single sources. 3. **Data Completeness:** We ensure that all required fields are present in the incoming data. Missing critical fields could lead to incomplete or unusable data. 4. **Data Consistency:** We perform checks for consistency within the data. This includes maintaining relationships between data elements, such as product IDs and corresponding products in the database. 5. **IP Address Validation:** We verify the integrity of IP addresses to ensure they are valid and not associated with malicious sources.
in case of entities and models that we need for our platform, we need Event model that could be something in general like this:
type Event struct { EventName string `json:"event_name"` UserID string `json:"user_id"` Properties map[string]interface{} `json:"properties"` Timestamp time.Time `json:"timestamp"` }
Great idea, Taha John!
I think that by combining the Event name, user ID, and timestamp, we can effectively validate each request to identify duplicate data.
from ormus.
Hello everyone,
I appreciate your thorough and valuable research. I would like to introduce another topic that also requires our attention.
Data Security and Compliance:
We should consider implementing an additional strategy for data encryption and protection. However, this is just a preliminary idea. Please share your feedback and thoughts on this matter.
@taha-ahmadi @atareversei @j3yzz
from ormus.
Related Issues (20)
- Proposal: Introducing an Abstraction (Interface) for Services
- #source - Receive & persist new events/tracks
- #source - extract delivery to project root in order to use as an independent service HOT 1
- add guidelines
- Implement Distributed Lock Adapter
- Some changes in main function(destination) HOT 1
- Update event status in source db after event delivered HOT 1
- #destination New implementation of destination service HOT 1
- #source - Sylladb complete configuration
- #ormuse Change log from slog to the Zap
- #destination Add Etcd task idempotency adapter
- Implement OpenTelemetry for improved observability
- Add Instructions for Running the Project Locally
- #Source Implement Redis Repository for WriteKey Validation
- #Source Add SourceHandler initialization to Manager main application setup
- destination: Bug: Task Locking Should Occur Before Status Check to Avoid Race Conditions.
- #Source Implement ScyllaDB Repository Methods for SourceService
- # destination Add configurable TTL for task locking in Service
- #CLI Implement Core CLI Structure and Basic Commands for Ormus
- #CLI Implement User Authentication for CLI
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ormus.