Getting Started with Salesforce Data Analysis in Python
I work at a nonprofit that uses Salesforce heavily. When I first started, I would use the Salesforce site to create reports and create dashboards to analyze the data. However, analyzing KPIs and looking at relationships between objects gets harder the more complex your data analysis needs are. And Python is just more fun 😉.
To get started, you will first need to create a Salesforce security token. Head over to Salesforce, click your profile picture, go to Settings, and under My Personal Information, click Reset My Security Token. You will get an email with your security token.
Now, onto the Python side. I will use the simple-salesforce Python package to access Salesforce. It is a simple Salesforce REST API client that returns dicts of API responses. As of the date this article is written, the version hasn’t been updated on pypi, so we will get the latest from GitHub.
pip install https://github.com/simple-salesforce/simple-salesforce/archive/master.zip
This will install the latest version of simple-salesforce from GitHub. Once that is done, create your Python file and authenticate using your Salesforce.com credentials and the security token that was emailed to you.
from simple_salesforce import Salesforcesf = Salesforce(firstname.lastname@example.org', password='$ecret', security_token='SfheSnvrWjd2SnFHE')
Pro Tip: Put the authentication code in a separate file to allow for easier and more secure sharing.
Now we can use the API to pull data from Salesforce. To do this we need to use Salesforce Object Query Language (SOQL). Here is an example query from Salesforce’s website:
SELECT Id, Name
WHERE Name = ‘Sandy’
To make a query using the
simple_salesforce package, use the
query_all method of the
Salesforce object. This gets all records without the need to do follow up queries for pagination.
from sf import sf # assuming you put the authentication code in a
# separate file called sf.pydata = sf.query_all(SOQL_HERE) # sf is the object we created earlier
This will return the JSON from the API request which needs to be cleaned up. The parent key is called
records, and has a key
totalSize which shows the total number of results. Every record has an unnecessary key called attributes, which contains the Object type of the result and the REST URL for the lookup.
If you haven’t already, install pandas through pip:
pip install pandas
Now we can use pandas to take the result of the
query_all call, which is a dict, and put it in a DataFrame. We want to get the the value of the
records field and take out the attributes column.
import pandas as pddf_data = pd.DataFrame(data['records']).drop(column='attributes')
Now you can use pandas to do some data analysis on the DataFrame.