Bellingcat's Online Investigation Toolkit
  • About
  • Most Used
  • New Tools
  • ⚒️Categories
    • Maps & Satellites
      • Maps
      • Satellite Imagery
      • Street View
    • Geolocation
    • Image/Video
      • Reverse Image Search
      • Facial Recognition
      • Metadata
      • Misc
    • Social Media
      • Discord
      • Facebook
      • Instagram
      • LinkedIn
      • Reddit
      • Telegram
      • Tiktok
      • Twitter/X
      • Vkontakte
      • Youtube
      • Other Networks
      • Multiple Networks
      • International
    • People
    • Websites
    • Companies & Finance
    • Conflict
    • Transport
    • Environment & Wildlife
    • Archiving
    • Data Organization & Analysis
  • 📖Resources
    • Guides & Handbooks
    • Education
    • Newsletters & Toolkits
    • Stay Safe
  • More
    • All Tools
      • 4plebs
      • 527 Explorer
      • About Maps and Satellites
      • ACLED
      • AllTrails
      • Apollo Mapping
      • Apple Maps
      • Archive.today
      • Atlos
      • Auto Archiver
      • AutoStitch
      • Azure AI Video Indexer
      • Baidu Maps
      • Bellingcat OpenStreetMap Search
      • Bing Maps
      • Blackbird
      • Blender
      • Bluesky Insights
      • BskyFollowFinder/Bluesky network analyzer
      • BskyThreadReader
      • Bulletpicker.com
      • CAT UXO
      • China-related resources
      • Chronotrains
      • CITES Trade Database
      • Companies House
      • Convert Geographic Units
      • Datawrapper
      • DeHashed
      • DiscordLeaks
      • Distill.io
      • DomainTools Whois Lookup
      • Earth Explorer
      • Earth Online
      • EDGAR Suite
      • EDGAR
      • Equasis
      • Etherscan
      • EU consolidated corporate registers
      • ExifTool
      • F4Map
      • FaceCheck.ID
      • FlightAware
      • Flightradar24
      • Forensically
      • GeoHints
      • Gephi
      • Ghunt
      • Global Fishing Watch Map
      • Global Forest Watch
      • Global Monitoring System - ECOSOLVE
      • Global Suppliers Online
      • Google Earth Engine
      • Google Earth Pro
      • Google Flood Hub
      • Google Lens
      • Google Maps
      • GovMap
      • GPSJam
      • Have I Been Pwned
      • Hitta.se
      • Hoaxy
      • Hugin
      • Hunchly
      • ICANN Lookup
      • ICIJ Offshore Leaks Database
      • ImportGenius
      • ImportYeti
      • Index Database
      • Instagram Location Search
      • Instant Data Scraper
      • Intelx.io
      • InVID
      • KartaView
      • Leak-Lookup
      • License Plate Maps
      • LittleSis
      • Liveuamap
      • Logseq
      • Lumen
      • Maigret
      • Maltego Graph
      • MapChecking
      • Mapillary
      • MapSwitcher
      • MarineTraffic
      • Meta Content Library
      • MW Geofind
      • Name Variant Search
      • Namechk
      • NASA Firms
      • NASA Worldview
      • NeutrOSINT
      • North Data
      • Obsidian
      • OCCRP Aleph
      • Open Measures
      • Open Ownership
      • Open Source Munitions Portal
      • OpenCorporates
      • OpenSanctions
      • OpenSecrets
      • OrbTrack
      • Osint Tools Map
      • Overpass Turbo
      • PeakVisor
      • Picuki
      • PimEyes
      • Pinpoint
      • PixPlot
      • Planet Labs
      • QGIS
      • Quick geolocation search
      • Radar Interference Tracker
      • RAWGraphs
      • RootAbout
      • RuPEP
      • SanctionsExplorer
      • satellites.pro
      • Search by Image
      • Sentinel Hub Playground
      • ShadeMap
      • Shadow Finder
      • ShadowMap
      • Sherlock
      • Skopenow
      • SkyFi
      • Snap Map
      • Strava
      • Suncalc
      • Telegago
      • Telegram Group Joiner
      • Telegram Phone Number Checker
      • TelegramDB
      • Telemetrio
      • Telemetry
      • Telepathy
      • Tencent Maps
      • TGStat
      • The Information Laundromat
      • TinEye
      • TrueCaller
      • TruffleHog
      • Twitter Advanced Search
      • Twitter Location Search
      • Twitter Video Downloader
      • Umbra Space
      • UN Comtrade Database
      • UNOSAT Analyses
      • Uwazi
      • VesselFinder
      • Wayback Machine
      • Web Archives
      • What CMS
      • WhatsMyName
      • Who posted what?
      • Whoxy
      • Wikimapia
      • Wikipedia list of registers
      • WildEye
      • Wildlife Trade Portal
      • xIFr
      • Yandex Maps
      • Zotero
Powered by GitBook
On this page
  • URL
  • Description
  • Core Social Network Analysis Metrics
  • Gephi in Investigative Journalism
  • Cost
  • Level of difficulty
  • Learning Curve
  • Requirements
  • Limitations
  • Ethical Considerations
  • Guides
  • Comparison with similar software
  • Tool provider
  • Advertising Trackers

Was this helpful?

Edit on GitHub
Export as PDF
  1. More
  2. All Tools

Gephi

Open-source network analysis and visualization software

Last updated 6 days ago

Was this helpful?

URL

(0.10.1 as of May 2025)

Description

Gephi is a free, open-source tool for network visualization and analysis, widely used to explore and represent relationships in large datasets, such as social networks, links between documents, or web structures. Gephi allows users to create customizable network graphs, analyze metrics (like centrality and clustering), and identify patterns within complex datasets. The tool supports importing various data formats (CSV, GEXF) and offers plugins for advanced functionality. It can be used in journalism and open source research to visually analyze and reveal hidden connections in data, such as by examining online misinformation networks.

Gephi has the ability to create , interactive, and network graphs. This visual appeal helps to highlight complex relationships within data, making it easier for journalists to uncover hidden links between entities like individuals, companies, or groups. This capability can be particularly valuable for investigative stories where clear visual representation of connections can be crucial for audience understanding.

Core Social Network Analysis Metrics

Gephi includes built-in support for computing key metrics that help identify important nodes in a network. Three core metrics commonly used are degree centrality, betweenness centrality, and closeness centrality:

    • What it indicates: Nodes with higher degree centrality can be influencers or hubs that directly reach many others.

    • Example: In a Twitter network, a user with connections to many others (through follows or mentions) would have high degree centrality.

    • What it indicates: Such nodes connect different clusters or sections of the graph; they may not have the most connections, but they control information or resource flow by being on the paths that link others. A higher betweenness means a greater brokerage role. They act as gatekeepers or intermediaries.

    • Example: In a criminal network, a person who links two otherwise separate groups (even with only a few connections themselves) likely has high betweenness – remove that person and the network might fragment.

    • What it indicates: This can identify nodes that are centrally positioned overall (not in a geographical sense, but in network topology). Such nodes could quickly disseminate information to the entire network.

    • Example: In a social network, someone at the “center” of the friend-of-friend graph (even if they aren’t connected to everyone directly) will have a high closeness score, meaning they are on average a short distance from anyone in the network.

Further important concepts and metrics available in Gephi

Core Concepts

  • Weighted vs. Unweighted Metrics: Many of these measures (degree, clustering coefficient, path length, centralities) can be computed in both unweighted (treating all edges equally) and weighted modes (if your edges have an associated weight).

  • Directed vs. Undirected Graphs: For directed graphs (e.g., Twitter follow networks), some metrics like in-degree/out-degree, PageRank, and HITS become crucial. In undirected graphs (e.g., co-appearance networks), you only have a single “degree” measure.

Graph-Level Metrics

  1. Average Degree

    • What it is: The mean number of connections (edges) each node has.

    • Why it matters: It quickly shows how well-connected the network is on average.

  2. Network Diameter

    • What it is: The longest shortest path in the network (i.e., the greatest distance between any two nodes when traversing via the shortest route).

    • Why it matters: The diameter indicates how “spread out” or “deep” the network is; a large diameter suggests that it takes many hops to travel from some nodes to others.

  3. Graph Density

    • What it is: The ratio of actual edges in the graph to the maximum possible edges if every node were connected to every other node.

    • Why it matters: Reveals how close the graph is to being fully connected (1.0 = complete graph).

  4. Connected Components

    • What it is: Identifies distinct sub-networks (components) in the graph where each node is reachable from any other node within the same component.

    • Why it matters: Shows whether the network is all in one piece or if it breaks into multiple isolated clusters.

  5. Average Path Length

    • What it is: The mean number of steps along the shortest paths between all pairs of nodes.

    • Why it matters: It gives a sense of how easily (in how many hops) information or influence can spread across the network.

  6. Average Clustering Coefficient

    • What it is: A measure of how often nodes form tightly knit groups (where neighbors of a node are also neighbors with each other). Gephi can calculate both global (average) and node-level clustering.

    • Why it matters: High clustering indicates the presence of local “communities” or “cliques” in the network.

  7. Modularity (Community Detection)

    • What it is: A method that partitions the network into modules (clusters) where nodes within the same cluster have more connections to each other than to other clusters. Gephi computes a modularity score and assigns each node a “community” label.

    • Why it matters: It helps reveal sub-communities or tightly connected groups and is useful for identifying fractions, interest groups, or hidden structures.


Node-Level Metrics

  1. Degree / Weighted Degree

    • What it is: The count of direct connections each node has. In weighted graphs, edges can have a “weight,” and Weighted Degree sums those edge weights.

    • Why it matters: Nodes with higher degree may be more influential or have more direct relationships.

  2. Degree Distribution

    • What it is: Shows how degrees (numbers of connections) are distributed across all nodes. Although not a single numeric metric, Gephi can compute and plot a distribution chart.

    • Why it matters: Helps identify whether a few nodes dominate in connectivity (e.g., a power-law distribution) or if most nodes have similar degree.

  3. Betweenness Centrality

    • What it is: Counts how often a node lies on the shortest paths between other nodes.

    • Why it matters: Captures “broker” or “bridge” nodes that can control information flow across different parts of the network.

  4. Closeness Centrality / Harmonic Closeness

    • What it is: Measures how close a node is to all others (based on the sum of shortest path distances). “Harmonic closeness” is a variant that handles disconnected graphs more gracefully.

    • Why it matters: A higher closeness value means the node can reach the rest of the network more quickly (in fewer hops).

  5. Eigenvector Centrality

    • What it is: Measures a node’s influence based not just on its connections but also on the importance of the nodes it connects to.

    • Why it matters: A node connected to other high-scoring (influential) nodes will have a higher eigenvector centrality, reflecting second-order influence.

  6. PageRank

    • What it is: A well-known algorithm used initially by Google Search to rank web pages. It assigns higher scores to nodes with inbound links from other high-scoring nodes.

    • Why it matters: Useful for directed networks (e.g., Twitter mention/follow graphs), where it identifies nodes with influential incoming connections.

  7. HITS (Hubs & Authorities)

    • What it is: The Hyperlink-Induced Topic Search algorithm calculates two scores per node: a Hub score (links to many good authorities) and an Authority score (linked from many good hubs).

    • Why it matters: In a directed graph (like web links), hubs are nodes pointing to strong authorities, while authorities are nodes receiving links from good hubs. Helps identify specialized roles in the network.

  8. Eccentricity

    • What it is: The greatest distance from a node to any other node in the same component. In other words, how far is the furthest node?

    • Why it matters: Nodes with lower eccentricity can be seen as more “centrally” located (they’re never too far from anyone). This is another perspective on centrality, complementing closeness or betweenness.

After running the statistical analysis functions, results can be used to visually style the graph (e.g., sizing nodes by centrality values). In sum, Gephi visualizes networks and quantifies network structure with built-in measures of centrality (degree, betweenness, closeness, etc.), which can be helpful for investigative analysis.

Gephi in Investigative Journalism

Cost

Level of difficulty

Learning Curve

Requirements

No account is needed, but Java installation is required.

Limitations

Ethical Considerations

Using Gephi to visualize networks from sensitive or personal data requires ethical handling, particularly regarding privacy and consent, and careful interpretation to avoid misrepresenting the connections shown.

Data integrity is crucial for users of Gephi, as the accuracy and reliability of network visualizations depend directly on the quality of input data. For investigative journalism, any insights or patterns revealed through Gephi's analysis are only as trustworthy as the data provided. Poor data quality — such as incomplete records, unverified sources, or outdated information — can lead to misleading visualizations that misrepresent relationships or inflate the importance of specific network nodes. To ensure meaningful results, Gephi users must verify data sources, validate accuracy, and cross-check information before visualizing it. Maintaining high data integrity not only strengthens the credibility of the analysis but also allows for responsible storytelling, helping to prevent the spread of misinformation and ensuring that network insights are grounded in factual, well-vetted data.

Guides

Official Wiki

Complete Beginners

General / Advanced / Multi-Language

Videos

Journalism-Specific

Books

Cherven, K. (2015). Mastering Gephi Network Visualization. Packt Pub Ltd.

Open Datasets

Comparison with similar software

Tool provider

Gephi Consortium (open-source community, CTO : Mathieu Bastian)

Advertising Trackers

Page maintainer

Martin Sona

measures how many direct connections (edges) a node has. A node with a high degree centrality has many links to others, making it well-connected. It’s essentially a count of immediate neighbors.

measures how often a node lies on the shortest paths between other nodes​. In other words, a node with high betweenness centrality is a critical broker or bridge in the network.

: Measures how “close” a node is to all others in the network, typically defined as the reciprocal of the total distance from that node to all other nodes. A node with high closeness centrality can reach all others quickly (in few hops on average).

Plugins & Experimental Metrics: Gephi’s plugin repository may offer additional statistical measures or variants (for example, advanced community detection algorithms, , , or ). Be sure to check the Gephi Plugin Center if you need specialized metrics.

Social network analysis has been used to investigate through, (e.g., via ), and even of or networks. Gephi's network analysis features allow journalists to trace these relationships systematically. Noteworthy examples of the use of Gephi in high-profile cases include:

Panama Papers: The ICIJ’s Panama Papers investigation (2016) involved analyzing a massive trove of offshore financial records. Reporters used network analysis tools, including Gephi to . By converting people and companies into “nodes” and their relationships (e.g., directorships or client links) into “edges,” Gephi helped journalists uncover hidden connections in the data. This cited the case showing how graph visualization enabled the team to trace complex ownership networks and find key intermediaries in the offshore schemes. (Note: ICIJ also used graph databases like , but Gephi was used for certain analyses and producing visualization graphics.)

9/11 Terrorist Network Analysis: Shortly after the 2001 attacks, analyst Valdis Krebs to show how they were interlinked. Krebs’s paper “Mapping Networks of Terrorist Cells” (2002) demonstrated that even though no single terrorist was connected to all others, there were focal points (connectors) in the network​. This analysis pre-dated Gephi (Krebs used available SNA tools of the time), but it’s precisely the kind of investigation Gephi excels at today. Modern journalists and researchers have replicated such network mapping using Gephi to illustrate terrorist cell structures and identify key influencers. Brant Houston (Univ. of Illinois journalism professor) as a tutorial example for anyone learning social network analysis. (Note: although Gephi itself wasn’t used in 2002, later analysts could easily perform similar analyses with Gephi’s tools.)

: They Rule (2004–2005) is an investigative data visualization project by artist Josh On, which mapped the interlocking directorates of major U.S. corporations. It provided an interactive web interface for exploring how corporate board members overlap between companies, revealing tight networks of corporate governance. While They Rule wasn’t built with Gephi (it was a custom web app), it’s been as network journalism examples for its visualization of power networks. The project showed, for instance,, concentrating power within a small elite. An investigative journalist could use Gephi to achieve a similar analysis by importing board membership data and visualizing those connections. So while not a Gephi case per se, it’s a relevant example of network visualization in journalism.

Due to its extensive features, Gephi has a moderate learning curve. Still, beginners can start with basic tutorials and sample datasets to understand the interface and critical functions like layouts, filters, and metrics. A good strategy is to focus on one feature at a time: experiment with layouts to arrange nodes, use filters to simplify complex networks, and apply basic metrics like centrality to interpret relationships. As they become comfortable, users can explore plugins and advanced features like for more tailored analyses.

Gephi has an active user community that can provide help and share tips. The primary hub in recent years has been the , which serves as the main place to ask questions and get support​. This Facebook group effectively replaced the older official forum. (The legacy exists, but as of 2018–2019 it saw declining activity and new questions are directed to the Facebook ​forum.) Additionally, Gephi’s developers and power users monitor the .

Gephi can be run most modern computers, but computing requirements . It can be less intuitive for beginners, and certain advanced functions may require plugins or scripting knowledge.

Levallois, C. (2017, January 20). Simple Gephi Project from A to Z.

Levallois, C. (2024, November 27). Gephi Tutorials.

Grandjean, M. (2024). Gephi. Retrieved November 30, 2024, from (Tutorials incl 30 Gephi examples)

Martin Grandjean. (2022, September 21). GEPHI - Introduction to Network Analysis and Visualization (Tutorial) [Video recording].

Global Investigative Journalism Network (Director). (2023, September 30). GIJC23—Using Social Network Analysis for Investigations [Video recording].

Gephi Cookbook | Cloud & Networking | Print. (n.d.). Packt. Retrieved November 10, 2024, from

Barabási, A.-L. (2016). Network Science. (this is EXCELLENT!)

Datasets. GitHub. Retrieved November 30, 2024, from

ASNR - Animal Network Data. Retrieved November 30, 2024, from (ASNR aims to assemble and provide a comprehensive index of real-world animal interaction data sets across all taxa. Only high-value peer-reviewed data.)

: NodeXL is an add-in for Microsoft Excel that provides network analysis and visualization within a spreadsheet interface. It is Windows-only (as it hooks into Excel) and comes in a free “NodeXL Basic” version and a paid Pro version.. This approach makes it simple to edit data (you can leverage Excel formulas, etc. for node attributes). Brant Houston explained that NodeXL is integrated with Excel, making it very simple for beginners who are comfortable with spreadsheets​. It’s suitable for quick, small to medium-sized network analysis; however, it may struggle with large graphs. Also, advanced visualization customizations and real-time manipulation are more limited than those of Gephi. NodeXL offers a more gentle learning curve and even has built-in data importers for social media (in the Pro version) but lacks the visual polish and plugin extensibility of Gephi. (One might use NodeXL to gather or preprocess data and then use Gephi to fine-tune the visualization​, .)

is a web-based network visualization tool developed at Stanford’s Humanities + Design lab. It runs entirely in the browser – no installation required – and is geared towards historians and humanists for exploring complex historical datasets. Palladio is described as a that focuses on ease of use. You can upload spreadsheet data (nodes and links) and interactively create network views, maps, and timelines. It’s great for quickly visualizing a dataset and finding patterns without coding. However, Palladio has notable limitations: since it’s in-browser and meant for lightweight use, it can become slow or unstable with very large datasets. It also but still works in digital humanities classrooms for introducing network analysis before moving to more comprehensive tools. Compared to Gephi, Palladio is less feature-rich – it doesn’t compute advanced network metrics or offer extensive styling options.

: is a Python library for interactive network visualization. It allows you to generate network graphs in Python and output them as an HTML page (using the JavaScript library vis.js under the hood). Essentially, PyVis is a wrapper that brings the interactivity of vis.js to Python users, so you can script the creation of a network visualization and then view it in a web browser. PyVis is not a GUI tool – it requires writing Python code. It works well with : you can create a Network object, add nodes/edges, and then display an interactive network within the notebook or export it to an HTML file. The result is a web-based visualization where you can pan, zoom, and even click on nodes for details. PyVis offers flexibility for developers (since you can automate tasks and integrate with data analysis pipelines in Python), but it’s less user-friendly for non-coders. It also depends on the browser for rendering, so extremely large networks may be hard to handle (just as any web-based viz would). Gephi might handle larger networks better performance-wise (using OpenGL), whereas PyVis/vis.js running in a browser could hit memory or speed limits for huge graphs. Also, PyVis itself doesn’t compute SNA metrics – you’d use Python libraries (like ) to do analysis, then use PyVis purely for visualization. PyVis is good for creating . This makes it a complementary tool: Gephi for point-and-click exploration and PyVis for scripted, shareable interactive diagrams.

Neo4j (with Datashare Plugin): Neo4j is fundamentally different from the above – it’s a rather than a dedicated visualization tool. It's optimized for storing and querying graph data (nodes and relationships) and managing very large, complex networks. It allows the user to run complex queries (using its query language ) to find patterns, shortest paths, sub-networks, etc., in the data. In practice, one might use Neo4j to crunch the data (find communities, run graph algorithms, handle millions of records), then use a visualization front-end (like Gephi, or Neo4j’s own and Browser interfaces, or ) to visualize the result​. Neo4j does come with basic visualization: the Neo4j Browser GUI can display query results as a node-link diagram, but these are not as customizable as Gephi’s visualizations. A key difference: Gephi works on static data you load into it (good for snapshot analysis and visual exploration), whereas Neo4j is a continuously running database that can be updated and queried in real-time (good for dynamic or very large datasets where you need to sift through data systematically). In short, Neo4j vs Gephi is not an either-or; they often complement each other. Gephi is for visual interactive analysis, Neo4j is for data storage and algorithmic analysis. Also of note: Neo4j is not purely open-source in all its editions (the is open-source, and are commercial), whereas Gephi is fully open-source. For an investigator, choosing Neo4j would depend on needing to handle huge networks or integrate the graph with other systems; choosing Gephi would be about interactive exploration and presentation-quality visuals.

Degree Centrality
Betweenness Centrality
Closeness Centrality
Hierarchical Edge Bundling
timeline-based metrics
new centrality formulas
political
influence
campaign contributions
social media manipulation
election
interference
coordinated
accounts
tracking
criminal
extremist
visualize and explore the web of offshore entities and connections
article
Neo4j and a web interface
mapped the connections between the hijackers and associates
points to Krebs’s 9/11 network mapping
They Rule Project
cited in the same breath
that 87 of the top 100 US companies shared board directors
time-based visualizations
Gephi Facebook Group
Gephi Forum
GitHub issue tracker
increase with graph size
https://github.com/gephi/gephi/wiki
https://seinecle.github.io/gephi-tutorials/generated-html/simple-project-from-a-to-z-en.html
https://seinecle.github.io/gephi-tutorials/
https://www.martingrandjean.ch/gephi/
https://www.youtube.com/watch?v=GXtbL8avpik
https://www.youtube.com/watch?v=-D8E8JY86b4
https://www.packtpub.com/en-us/product/gephi-cookbook-9781783987405?type=print
http://networksciencebook.com/
https://github.com/gephi/gephi/wiki/Datasets
https://bansallab.github.io/asnr/data.html
NodeXL
It allows users to import edge lists into Excel and generates graphs from those tables
as some workflows suggest
Palladio
“simple but powerful exploratory data visualization tool”
hasn't seen active development in a few years
PyVis
Jupyter notebooks
NetworkX
interactive visuals with a few lines of code
graph database
Cypher
Bloom
Linkurious
Community edition
enterprise features
https://gephi.org
detailed
visually compelling
Visualization of a network of Telegram actors. (by J.Weißer, M.Engel, C.Jelonek, M. Hallbach, 2024, used with permission)