A few years ago, LinkedIn released a nice product call “InMap” where one could visualize their connections graph. This feature has been removed since then but,
as it was great, we are going to try to reproduce it (or at least a simplified version) in DSS.
Before you start
Make sure you have an functional Internet connection since we are going to get data from an external API.
In your DSS instance, create a new project, called “LinkedIn” for instance.
Getting your LinkedIn credentials
As we are going to make use of the LinkedIn REST API, you’ll need first to get proper credentials to be able to make calls against it.
The LinkedIn API requires to be authenticated using OAuth 2, so you’ll need to follow the instructions on this LinkedIn page to get an access token.
Once your credentials retrieved, as you’ll make use of them several times through the project, you may want to store them as custom variables in DSS. Go to the Administration panel from the top bar,
and under Settings, click on Variables.
Define your variables in the proper JSON format (please replace with your own values):
Getting your first degree connections
Let’s start with fetching your first degree connections. Under your LinkedIn project Flow screen, create a new Python recipe:
This recipe takes no input but outputs a dataset called “first_degree_connections”:
Hit the “Create” button and your are taken to the recipe code editor.
Start with importing the required libraries:
Create a OAuth client to query the LinkedIn API (note that we use here DSS custom variables that we created above):
Actually fetch your first degree connections from the API:
Finally, store the results in a Pandas dataframe then in a DSS dataset:
All in one, your recipe may look like this:
Click on “Run”, and wait for your job to complete (note this may take long as the recipe will make call to an external API). Once finished, you’ll have a dataset storing the list
of your first degree connections with 9 columns:
Getting relationships between your connections
Once your first degree connections fetched, the most difficult part is to get the actual relationships between them. The LinkedIn API provides us with a
somewhat hackish way to get these data.
Go back to your Flow screen, and create again a Python recipe, but this time taking the “first_degree_connections” dataset as input, and outputting a new dataset that
we’ll call “related_connections”.
Load the required libraries, the DSS dataset into a Pandas dataframe, and create the OAuth client:
You can now define the set of functions to get the data
Loop through your connections to get their related connections
And store the output in a DSS dataset
You’ll need to wait for a fair amount of time while the recipe runs, especially if you have a large number of connections. Hopefully, you will end up with the edges of your graph:
We are interested in the “from_id” - “to_id” pairs. These are the actual relationships between your connections, the ones we’ll use to build the graph.
Building the Graph dataset
It will now be pretty straightforward to build the final dataset supporting our graph visualisation application. This is basically the set of relationships (edges) between
you and your connections, or among your connections (the nodes, represented by both an ID and a label)
Start with creating a visual data preparation recipe on your “first_degree_connections” that will add create the graph of your direct relationships
(which is simply done by creating an edge between you and first degree connections):
Create a similar data structure with your “related_connections” datasets:
Finally, concatenate these 2 datasets using a Stack recipe:
Your final workflow might look like this:
Build the “graph” dataset. You now have the complete list of nodes and edges of your LinkedIn graph, ready to be visualized.
Visualizing your LinkedIn graph
Even if you may want to go for a solution like Gephi to create nice visualization of your graph, you can also create a
custom webapp directly hosted in DSS, hence sharable with other people.
Under Insights, create a new Python-enabled webapp. Also, you’ll want to make d3.js and Bootstrap available to the webapp using the menu from the top right.
You can now fill the different components of the webapp:
That’s it. Save your Insights, publish to the Dashboard, and you can now see your LinkedIn graph in a webapp running in your browser, and hosted on DSS:
That’s a wrap! Building a visualization of our LinkedIn graph is not an easy thing, but with this tutorial you shoud have the keys to reproduce it
in your DSS instance.
The webapp is pretty basic: you’ll probably need to make it nicer and to fine tune the settings.
Using the node color (reflecting communities), and the overall graph layout (force), you should be able to pinpoint some of the main clusters if your relationships, just like I did:
Hope you enjoyed this tutorial. Feel free to get in touch with us if you have questions or comments, or if you want to understand how
DSS can be used to build other custom webapps.