This project aims to build a data pipeline that:
- processes network trace data
- models communication between devices (senders and receivers) as a graph in Neo4j,
- establishes relationships between these devices based on the protocols used (e.g., TCP, UDP, ARP, ICMP),
- enables the visualization and analysis of network traffic to gain insights into network behavior and performance.
Python is chosen as the primary tool to build the data pipeline due to its extensive libraries for handling data (such as pandas for data manipulation), network trace analysis (e.g., scapy for parsing packet data), and graph database interaction (such as py2neo or neo4j-driver for connecting to Neo4j). The code is written in the Jupyter Notebook and the results can be visualized in Neo4j using Cypher SQL.