Getting Started with Salesforce Data Analysis in Python
I work at a nonprofit that uses Salesforce heavily. When I first started, I would use the Salesforce site to create reports and create dashboards to analyze the data. However, analyzing KPIs and looking at relationships between objects gets harder the more complex your data analysis needs are. And Python is just more fun 😉.
To get started, you will first need to create a Salesforce security token. Head over to Salesforce, click your profile picture, go to Settings, and under My Personal Information, click Reset My Security Token. You will get an email with your security token.
Now, onto the Python side. I will use the simple-salesforce Python package to access Salesforce. It is a simple Salesforce REST API client that returns dicts of API responses. As of the date this article is written, the version hasn’t been updated on pypi, so we will get the latest from GitHub.
pip install https://github.com/simple-salesforce/simple-salesforce/archive/master.zip
This will install the latest version of simple-salesforce from GitHub. Once that is done, create your Python file and authenticate using your Salesforce.com credentials and the security token that was emailed to you.
from simple_salesforce import Salesforcesf = Salesforce(username='you@company.com', password='$ecret', security_token='SfheSnvrWjd2SnFHE')
Pro Tip: Put the authentication code in a separate file to allow for easier and more secure sharing.
Pulling Data
Now we can use the API to pull data from Salesforce. To do this we need to use Salesforce Object Query Language (SOQL). Here is an example query from Salesforce’s website:
SELECT Id, Name
FROM Account
WHERE Name = ‘Sandy’
To make a query using the simple_salesforce
package, use the query_all
method of the Salesforce
object. This gets all records without the need to do follow up queries for pagination.
from sf import sf # assuming you put the authentication code in a
# separate file called sf.pydata = sf.query_all(SOQL_HERE) # sf is the object we created earlier
This will return the JSON from the API request which needs to be cleaned up. The parent key is called records
, and has a key totalSize
which shows the total number of results. Every record has an unnecessary key called attributes, which contains the Object type of the result and the REST URL for the lookup.
Using Pandas
If you haven’t already, install pandas through pip:
pip install pandas
Now we can use pandas to take the result of the query_all
call, which is a dict, and put it in a DataFrame. We want to get the the value of the records
field and take out the attributes column.
import pandas as pddf_data = pd.DataFrame(data['records']).drop(column='attributes')
Now you can use pandas to do some data analysis on the DataFrame.
Full code:
Next Up:
Opportunities Analysis Example
more…