Getting Started with Salesforce Data Analysis in Python

I work at a nonprofit that uses Salesforce heavily. When I first started, I would use the Salesforce site to create reports and create dashboards to analyze the data. However, analyzing KPIs and looking at relationships between objects gets harder the more complex your data analysis needs are. And Python is just more fun 😉.

To get started, you will first need to create a Salesforce security token. Head over to Salesforce, click your profile picture, go to Settings, and under My Personal Information, click Reset My Security Token. You will get an email with your security token.

Now, onto the Python side. I will use the simple-salesforce Python package to access Salesforce. It is a simple Salesforce REST API client that returns dicts of API responses. As of the date this article is written, the version hasn’t been updated on pypi, so we will get the latest from GitHub.

This will install the latest version of simple-salesforce from GitHub. Once that is done, create your Python file and authenticate using your Salesforce.com credentials and the security token that was emailed to you.

Pro Tip: Put the authentication code in a separate file to allow for easier and more secure sharing.

Pulling Data

Now we can use the API to pull data from Salesforce. To do this we need to use Salesforce Object Query Language (SOQL). Here is an example query from Salesforce’s website:

To make a query using the simple_salesforce package, use the query_all method of the Salesforce object. This gets all records without the need to do follow up queries for pagination.

This will return the JSON from the API request which needs to be cleaned up. The parent key is called records, and has a key totalSize which shows the total number of results. Every record has an unnecessary key called attributes, which contains the Object type of the result and the REST URL for the lookup.

Using Pandas

If you haven’t already, install pandas through pip:

Now we can use pandas to take the result of the query_all call, which is a dict, and put it in a DataFrame. We want to get the the value of the records field and take out the attributes column.

Now you can use pandas to do some data analysis on the DataFrame.

Full code:

Next Up:

Opportunities Analysis Example

more…

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store