Visualizing your LinkedIn graph

A few years ago, LinkedIn released a nice product call “InMap” where one could visualize their connections graph. This feature has been removed since then but, as it was great, we are going to try to reproduce it (or at least a simplified version) in DSS.

Before you start

  • Make sure you have an functional Internet connection since we are going to get data from an external API.
  • In your DSS instance, create a new project, called “LinkedIn” for instance.

Getting your LinkedIn credentials

As we are going to make use of the LinkedIn REST API, you’ll need first to get proper credentials to be able to make calls against it. The LinkedIn API requires to be authenticated using OAuth 2, so you’ll need to follow the instructions on this LinkedIn page to get an access token.

Getting credentials at LinkedIn website

Once your credentials retrieved, as you’ll make use of them several times through the project, you may want to store them as custom variables in DSS. Go to the Administration panel from the top bar,
and under Settings, click on Variables.

Defining Dataiku DSS variables for LinkedIn authentication

Define your variables in the proper JSON format (please replace with your own values):

  "linkedin.consumer.key": "CONSUMER-KEY",
  "linkedin.consumer.secret": "CONSUMER-SECRET",
  "linkedin.oauth.token": "OAUTH-TOKEN",
  "linkedin.oauth.secret": "OAUTH-SECRET"

Getting your first degree connections

Let’s start with fetching your first degree connections. Under your LinkedIn project Flow screen, create a new Python recipe:

Creating a new Python recipe

This recipe takes no input but outputs a dataset called “first_degree_connections”:

Python recipe; defining output dataset

Hit the “Create” button and your are taken to the recipe code editor.

  • Start with importing the required libraries:
import json
import dataiku
import pandas as pd
import oauth2 as oauth

  • Create a OAuth client to query the LinkedIn API (note that we use here DSS custom variables that we created above):
consumer = oauth.Consumer(
  key = dataiku.dku_custom_variables["linkedin.consumer.key"],
  secret = dataiku.dku_custom_variables["linkedin.consumer.secret"],

token = oauth.Token(
  key = dataiku.dku_custom_variables["linkedin.oauth.token"],
  secret = dataiku.dku_custom_variables["linkedin.oauth.secret"],

client = oauth.Client(consumer, token)

  • Actually fetch your first degree connections from the API:
URL = ''
resp, content = client.request(URL)
results = json.loads(content)

  • Finally, store the results in a Pandas dataframe then in a DSS dataset:
df = pd.DataFrame(results['values'])
fdc = dataiku.Dataset("first_degree_connections")

All in one, your recipe may look like this:

Python recipe with code

Click on “Run”, and wait for your job to complete (note this may take long as the recipe will make call to an external API). Once finished, you’ll have a dataset storing the list of your first degree connections with 9 columns:

Dataset resulting from Python recipe

Getting relationships between your connections

Once your first degree connections fetched, the most difficult part is to get the actual relationships between them. The LinkedIn API provides us with a somewhat hackish way to get these data.

Go back to your Flow screen, and create again a Python recipe, but this time taking the “first_degree_connections” dataset as input, and outputting a new dataset that we’ll call “related_connections”.

  • Load the required libraries, the DSS dataset into a Pandas dataframe, and create the OAuth client:
# -*- coding: utf-8 -*-
import json
import dataiku
import pandas as pd
import oauth2 as oauth

# Input datasets
df = dataiku.Dataset("first_degree_connections").get_dataframe()

# Oauth client
consumer = oauth.Consumer(
  key = dataiku.dku_custom_variables["linkedin.consumer.key"],
  secret = dataiku.dku_custom_variables["linkedin.consumer.secret"]

token = oauth.Token(
  key = dataiku.dku_custom_variables["linkedin.oauth.token"],
  secret = dataiku.dku_custom_variables["linkedin.oauth.secret"]

client = oauth.Client(consumer, token)

  • You can now define the set of functions to get the data
def make_url(user_id, start):
  # The actual API call to get relationships between your connections, paginated
  a = "" % user_id
  b = ":(relation-to-viewer:(related-connections))"
  c = "?format=json&count=20&start=%s" % start
  url = a + b + c
  return  url

def get_count(user_id):
  # Get the total number of connections for a given user id
  url = make_url(user_id, 0)
  resp, content = client.request(url)
  rels = json.loads(content)
  cond1 = rels.has_key('relationToViewer')
  cond2 = rels['relationToViewer'].has_key('relatedConnections')
  if cond1 and cond2:
    total = rels['relationToViewer']['relatedConnections']['_total']
  return total

def get_data(from_url):
  # Retrieves the list of related connections for a user id
  resp, content = client.request(from_url)
  rels = json.loads(content)
  res = pd.DataFrame(rels['relationToViewer']['relatedConnections']['values'])
  return res

def get_user_data(user_id):
  # Looping through the pages to get all the results
  results = pd.DataFrame()
  resultCount = get_count(user_id)
  offset = 0
  count = 20
  while offset < resultCount:
    url = make_url(user_id, offset)
    data = get_data(url)
    data['from_id'] = user_id
    results = pd.concat((results, data), axis=0)
    offset += count
  return results

  • Loop through your connections to get their related connections
m = 0
connections = pd.DataFrame()
for c, user_id in enumerate( df['id'] ):
  # Just a bit of progress tracking
  if c % 10 == 0:
    m = m + 1
    print "[+] Done %i..." % (10 * m)
    df = get_user_data(user_id)
    connections = pd.concat((connections, df), axis=0)
    print "No data for %s" % user_id

  • And store the output in a DSS dataset
related_connections = dataiku.Dataset("related_connections")

You’ll need to wait for a fair amount of time while the recipe runs, especially if you have a large number of connections. Hopefully, you will end up with the edges of your graph:

Dataset representing the edges of your connections graph

We are interested in the “from_id” - “to_id” pairs. These are the actual relationships between your connections, the ones we’ll use to build the graph.

Building the Graph dataset

It will now be pretty straightforward to build the final dataset supporting our graph visualisation application. This is basically the set of relationships (edges) between you and your connections, or among your connections (the nodes, represented by both an ID and a label)

  • Start with creating a visual data preparation recipe on your “first_degree_connections” that will add create the graph of your direct relationships (which is simply done by creating an edge between you and first degree connections):

Prepare recipe to create graph of direct relationships

  • Create a similar data structure with your “related_connections” datasets:

Prepare recipe to create graph of second degree relationships

  • Finally, concatenate these 2 datasets using a Stack recipe:

Stack recipe; concatenating first and second degree relationships

Your final workflow might look like this:

Flow to build the connections graph dataset

Build the “graph” dataset. You now have the complete list of nodes and edges of your LinkedIn graph, ready to be visualized.

Visualizing your LinkedIn graph

Even if you may want to go for a solution like Gephi to create nice visualization of your graph, you can also create a custom webapp directly hosted in DSS, hence sharable with other people.

Under Insights, create a new Python-enabled webapp. Also, you’ll want to make d3.js and Bootstrap available to the webapp using the menu from the top right. You can now fill the different components of the webapp:

import dataiku
import pandas as pd
import networkx as nx
from flask import request
from community import best_partition
from networkx.readwrite import json_graph

def draw_graph():

  # Get data
  df = dataiku.Dataset('LINKEDIN.graph').get_dataframe()
  print("Size of products graph", df.shape)

  # Build graph  
  G = nx.Graph()
  for edge in df.itertuples():
    G.add_edge(edge[3], edge[1])
  print("Number of nodes: %i" % G.number_of_nodes())
  print("Number of edges: %i" % G.number_of_edges())

  # Set names
  n1 = df[['from_id', 'from_name']]\
        .rename(columns={'from_id': 'node', 'from_name':'name'})

  n2 = df[['to_id', 'to_name']]\
        .rename(columns={'to_id': 'node', 'to_name':'name'})

  nodes = pd.concat((n1, n2), axis=0)
  nodes = nodes.drop_duplicates()
  for node in nodes.itertuples():
    G.node[node[1]]['name'] = node[2]

  # Find clusters
  partition = best_partition(G)
  for node, cluster in partition.iteritems():
    G.node[node]['community'] = cluster

  # Output
  data = json_graph.node_link_data(G)

  return json.dumps({"status" : "ok", "graph": data})

  • The HTML just has a container for our graph
<!-- Body of your app -->
<div class="container">
  <h3>LinkedIn Graph</h3>
  <div id="graph-container">

  • The CSS is pretty basic
#graph-container {
  width: 900px;
  height: 900px;

.link {
  fill: none;
  stroke: #E5E4E2;
  #stroke-width: 1px;

.node circle {
  #fill: steelblue;
  stroke: #fff;
  stroke-width: 1.5px;

text {
  font: 10px sans-serif;
  pointer-events: none;

  • The most complex part is the Javascript code. We’ll make use of the famous D3.js library to build our graph, interacting with the Python backend:
// You'll need to change the APIKey and insight ID as well below

$("#graph-container").empty() ;

$.getJSON("/html-apps-backends/sjAeKN2/draw_graph", function(data) { ;

    var width = 900 ;
    var height = 900 ;
    var color = d3.scale.category20();

    var force = d3.layout.force()
                  .size([width, height]);

    var svg ="#graph-container").select("svg") ;

    if (svg.empty()) {
      svg ="#graph-container").append("svg")
              .attr("width", width)
              .attr("height", height) ;
    } ;

         .start() ;

    var link = svg.selectAll(".link")
                  .data( force.links() )
                  .attr("class", "link");

    var node = svg.selectAll(".node")
                  .attr("class", "node")
                  .on("mouseover", mouseover)
                  .on("mouseout", mouseout)

        .style("fill", function(d) { return color(parseInt(; })
        .attr("r", function(d) { return 4 }) ;

        .attr("x", 12)
        .attr("dy", ".35em");

    force.on("tick", function() {

      link.attr("d", function(d) {
        var dx = - d.source.x,
            dy = - d.source.y,
            dr = Math.sqrt(dx * dx + dy * dy);
        return "M" + d.source.x + "," + d.source.y + "A" + dr + "," + dr + " 0 0,1 " + + "," +;

      node.attr("transform", function(d) {
        return "translate(" + d.x + "," + d.y + ")";


    function mouseover() {"text")
                     .attr("x", 12)
                     .attr("dy", ".35em")
                     .style({'font-size': '16px'})
                     .text(function(d) { return });

    function mouseout() {
        .text(function(d) { return "" }) ;
    } ;


That’s it. Save your Insights, publish to the Dashboard, and you can now see your LinkedIn graph in a webapp running in your browser, and hosted on DSS:

LinkedIn connections graph


That’s a wrap! Building a visualization of our LinkedIn graph is not an easy thing, but with this tutorial you shoud have the keys to reproduce it in your DSS instance.

The webapp is pretty basic: you’ll probably need to make it nicer and to fine tune the settings. Using the node color (reflecting communities), and the overall graph layout (force), you should be able to pinpoint some of the main clusters if your relationships, just like I did:

LinkedIn connections graph with clusters labeled

Hope you enjoyed this tutorial. Feel free to get in touch with us if you have questions or comments, or if you want to understand how DSS can be used to build other custom webapps.