Getting Started with Salesforce Data Analysis in Python

I work at a nonprofit that uses Salesforce heavily. When I first started, I would use the Salesforce site to create reports and create dashboards to analyze the data. However, analyzing KPIs and looking at relationships between objects gets harder the more complex your data analysis needs are. And Python is just more fun 😉.

To get started, you will first need to create a Salesforce security token. Head over to Salesforce, click your profile picture, go to Settings, and under My Personal Information, click Reset My Security Token. You will get an email with your security token.

Now, onto the Python side. I will use the simple-salesforce Python package to access Salesforce. It is a simple Salesforce REST API client that returns dicts of API responses. As of the date this article is written, the version hasn’t been updated on pypi, so we will get the latest from GitHub.

pip install https://github.com/simple-salesforce/simple-salesforce/archive/master.zip

This will install the latest version of simple-salesforce from GitHub. Once that is done, create your Python file and authenticate using your Salesforce.com credentials and the security token that was emailed to you.

from simple_salesforce import Salesforcesf = Salesforce(username='you@company.com', password='$ecret', security_token='SfheSnvrWjd2SnFHE')

Pro Tip: Put the authentication code in a separate file to allow for easier and more secure sharing.

Pulling Data

Now we can use the API to pull data from Salesforce. To do this we need to use Salesforce Object Query Language (SOQL). Here is an example query from Salesforce’s website:

SELECT Id, Name
FROM Account
WHERE Name = ‘Sandy’

To make a query using the simple_salesforce package, use the query_all method of the Salesforce object. This gets all records without the need to do follow up queries for pagination.

from sf import sf # assuming you put the authentication code in a 
# separate file called sf.py
data = sf.query_all(SOQL_HERE) # sf is the object we created earlier

This will return the JSON from the API request which needs to be cleaned up. The parent key is called records, and has a key totalSize which shows the total number of results. Every record has an unnecessary key called attributes, which contains the Object type of the result and the REST URL for the lookup.

Using Pandas

If you haven’t already, install pandas through pip:

pip install pandas

Now we can use pandas to take the result of the query_all call, which is a dict, and put it in a DataFrame. We want to get the the value of the records field and take out the attributes column.

import pandas as pddf_data = pd.DataFrame(data['records']).drop(column='attributes')

Now you can use pandas to do some data analysis on the DataFrame.

Full code:

Next Up:

Opportunities Analysis Example

more…

--

--

--

Direct response copywriter and marketing strategist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to Implement MultiTenancy with Spring Boot and Keycloak

Kubernetes(What is Kubernetes) — Part(1)

Code Challenge Prep: Creating Test Objects Efficiently

Demystifying Git: Stash, Basic workflow in the four areas

How Python Works For Two Variables Having Same ID

Show 0 Value When No Sales in Power BI #TAIK18 (14–41)

Explore CS: October Sessions

Umbrella — ARK delegate proposal

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ahmed Roach

Ahmed Roach

Direct response copywriter and marketing strategist

More from Medium

Connect Salesforce to Python

Run Salesforce Report and Email Using Python

How to Create Power BI Calendar Table using DAX Functions

Extracting Data from Salesforce with Python