When you think about quickly writing a simple script for automating some boring stuff, python is pretty simple & easy to work with. Updating and processing google sheet content is another such boring or rather tedius task which can be automated & simplified(and made interesting ;)) using a python script.
Here are some use-cases where this kind of automation could greatly help:
- Creating organized inventory systems
- Streamlining repetitive data entry
- Producing detailed reports and interactive dashboards
- For using sheets as an intermediate store for a bigger automation
Luckily, Google provides APIs which can be used to automate google sheets. Let’s see how we can utilize google sheet libraries using python.
Setup
First, create a new project on Google Cloud or select an existing one. If you don’t have an existing project, you can follow the guidelines here.
Now, enable Google Sheets APIs for your project. Go to Google Cloud > Select your project > APIs & Services. For more help, refer this.
Authentication
There are two ways to authenticate for using google APIs:
Using OAuth:
OAuth 2.0 is a protocol that allows users to authorize applications to access their data. To use OAuth 2.0, you will need to create an OAuth 2.0 client ID and client secret. You can then use these credentials to obtain an access token, which you can use to authenticate to the Google Sheets API.
Read more about OAuth 2.0 here and here.
OAuth 2.0 is a user-based authentication protocol. This means that the user must log in to authorize the application to access their data. This type of authentication is well-suited for web applications, but it is less convenient for automated tasks that run in contained environments. For this reason, we will focus on using a service account to authenticate to the Google Sheets API.
Using Service Account:
A service account is a special type of Google account that can be used to authenticate to Google APIs on behalf of an application. Service accounts are perfect for automating tasks, because they do not require user interaction.
To use a service account, you will need to create a service account and obtain a JSON key file. You can then use this key file to authenticate to the Google Sheets API.
Learn more about service accounts here
In the next section, we will discuss how to create a service account and use it to authenticate to the Google Sheets API.
Create a service account
In Google Cloud console, Click the hamburger menu in the top left corner and select IAM & Admin & click Create service account under service account:
We do not need to add any roles to the service account as the access permissions will instead be given from the sheet. Leave the roles section empty as shown below and click on “Done”.
Generate the service account key
Generate a JSON service account key on the Service accounts page under Keys > Add key:
When you create the key, a json file will automatically be downloaded to your system. Save the downloaded JSON service account key file as service-account.json
in the root path of your project or wherever your Python script will exist or be run from.
💡Note: The JSON key file should be kept secret, as it contains the credentials to the service account. You should also add this file in your
.gitignore
file.
Setting up the spreadsheet
As the service account itself does not have any permissions or roles attached to it, we need some way to authenticate and edit the spreadsheets. This access can be handled at the spreadsheet level by giving our service account read and write permission. Let’s do just that now:
- Google assigns an email address to all the service accounts automatically and this will be something like - [email protected]. You can find this on the service accounts page on google cloud.
- Copy the email that you found for the service account we created.
- Now, we need to share our spreadsheet with this email and give it read/write permissions depending on the use-case (Open Spreadsheet > Share). Note: The service account must be given the appropriate permissions on the spreadsheet in order to read and write data.
Set up python environment
It’s a common practice for Python developers to create a project-specific virtual environment. When you use this environment, the packages you install are kept separate from other environments, like the global one, preventing conflicts and complications caused by different package versions.
Let’s now create a virtual environment for our project:
python3 -m venv venv
Now, we will go ahead and activate the virtual environment:
source venv/bin/activate
💡Tip: Don’t forget to add the
venv
directory to your .gitignore. Also, you can verify if your virtual env is active by runningwhich python
and verifying that the binary from yourvenv
directory is used.
Installing required google api libraries
To use the Google Sheets API client library, you will need to install it using pip:
pip install google-auth google-api-python-client
Once you have installed the library, you can import it into your Python code:
from googleapiclient.discovery
from googleapiclient.errors
Writing our python client
Setup
We could write our whole code for interacting with sheets in the main.py
but just to make it reusable, we will write our google sheet api related code in a separate class which will have different methods to support different functionalities like reading & writing data to sheets, styling the sheets, deleting or clearing data. Also, once we have this skeleton, we can keep adding more functions to our class depending on the requirement.
To organise the files better, we will keep our googleSheetLib.py
(this has our class) in a separate utils
directory. Here is how the structure for our google sheet automation project looks like now:
$ tree ./google-sheets-automation -I venv -I .git/ -a
google-sheets-automation
├── .gitignore
├── main.py
├── service-account.json
└── utils
└── googleSheetLib.py
-
.gitignore: This file tells Git which files and directories to ignore when tracking changes to the project.
-
main.py: This is the main Python file for the project. It will contain the code for consuming the google sheets library.
-
service-account.json: This file contains the JSON credentials for a Google service account.
-
utils: This directory contains common libraries that our project needs, right now, we only have the
googleSheetLib
inside.
Create GoogleSheetClient & add authentication
Let’s add our GoogleSheetClient
class and add the constructor(more about it later):
from google.oauth2.service_account import Credentials
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
class GoogleSheetClient:
def __init__(self, spreadsheet_id):
self.spreadsheet_id = spreadsheet_id
self.creds = Credentials.from_service_account_file("service-account.json", scopes=SCOPES)
The above code imports the Credentials
class from the google.oauth2.service_account
module and defines a GoogleSheetClient
class.
init function(Constructor): The constructor gets called whenever a new object is instantiated for the class. As we can see, it takes the new object as its first argument, which is typically referred to as self
. Any additional arguments that are passed to the constructor of the class are also passed to the __init__
function. Here, “spreadsheet ID” is an additional argument and will be used for all operations further.
Then we are creating an attribute creds
which uses Credentials.from_service_account_file()
method. We will need to pass creds
with every API call that we make to google sheets.
Note:
SCOPES =" ['https://www.googleapis.com/auth/spreadsheets']"
this can be use for both “read/write” values in spreadsheet but if you want to Automate read values from spreadsheet you have to use the scope, https://www.googleapis.com/auth/spreadsheets.readonly
Read Spreadsheet values
While making an API call to retrieve the data from a sheet, you must provide both, the spreadsheet ID and the “A1 notation” specifying the range. If you specify the range without the sheet ID (e.g., A1:B2), the request will operate on the first sheet within the spreadsheet.
💡A1 notations: This syntax is employed to specify a cell or a cell range using a string that incorporates the sheet name, along with the starting and ending cell positions denoted by column letters and row numbers. This approach is widely utilised and particularly valuable when referring to an unchanging range of cells. For more help, refer this
Read Range
To read values from a specific range in spreadsheet, we can use the spreadsheets.values.get
function, let’s add a read_values
function to our python library, the final content of our googleSheetLib
will look like this now:
from google.oauth2.service_account import Credentials
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
class GoogleSheetClient:
def __init__(self, spreadsheet_id):
self.spreadsheet_id = spreadsheet_id
self.creds = Credentials.from_service_account_file("service-account.json", scopes=SCOPES)
def read_values(self, sheet_name, range):
read_range = f"{sheet_name}!{range}"
service = build('sheets', 'v4', credentials=self.creds)
result = service.spreadsheets().values().get(spreadsheetId=self.spreadsheet_id,
range=read_range).execute()
values = result.get('values', [])
if not values:
return None
else:
return values
Now, Let’s understand the above code -
In order to perform any API operation in a spreadsheet, we need to import build()
function and HttpError
class.
build()
is used to build a Python representation of a Google API service.
HttpError
class represents an HTTP error that occurred while making a request to a Google API service.
def read_values(self, sheet_name, range)
This line defines the function read_values
and its two parameters: sheet_name
and range
. The sheet_name
parameter is the name of the spreadsheet page that you want to read the data from. The range
parameter is the range of the cells that you want to read data from.
read_range = f"{sheet_name}!{range}" #A1 notation
This line creates a variable called read_range
and assigns it the value of thesheet_name
and range
parameters of function read_values
.
service = build('sheets', 'v4', credentials=self.creds)
This line creates a service object using the build() function which uses the credentials object that we obtained in the init function above.
result = service.spreadsheets().values().get(spreadsheetId=self.spreadsheet_id, range=read_range).execute()
This line calls the API spreadsheets.values.get
to read the data from the spreadsheet. This method takes two parameters: the spreadsheet ID and the range of cells that you want to read the data from.
Next, we handle the HTTP errors occured, if any.
Write values to the spreadsheet
Now that we have a function to read the spreadsheet values, let’s move forward and add another function to our library for writing values back to the sheet.
In order to add content to a sheet, you must provide the spreadsheet ID, the cell range specified in A1 notation, and the desired data within a suitable request body object. We will be writing data in bulk in a given range instead of writing to one particular cell or row.
Write a range
To write data to a single range, we can use spreadsheets.values.update
function. Our final google sheet client content becomes as follows:
from google.oauth2.service_account import Credentials
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
class GoogleSheetClient:
def __init__(self, spreadsheet_id):
self.spreadsheet_id = spreadsheet_id
self.creds = Credentials.from_service_account_file("service-account.json", scopes=SCOPES)
def read_values(self, sheet_name, range):
read_range = f"{sheet_name}!{range}"
service = build('sheets', 'v4', credentials=self.creds)
result = service.spreadsheets().values().get(spreadsheetId=self.spreadsheet_id,
range=read_range).execute()
values = result.get('values', [])
if not values:
return None
else:
return values
def update_values(self, sheet_name, range, values):
try:
write_range = f"{sheet_name}!{range}"
service = build('sheets', 'v4', credentials=self.creds)
body = {'values' : values, 'majorDimension': 'ROWS'}
result = service.spreadsheets().values().update(spreadsheetId=self.spreadsheet_id, range=write_range, valueInputOption='RAW', body=body).execute()
print(f"{result.get('updatedCells')} cells updated.")
except HttpError as error:
print(f"An error occurred: {error}")
💡RAW: The input is not parsed and is inserted as a string. For instance, if you input “=1+2” into a cell, it will be treated as the text “=1+2,” and not as a formula. Non-string values such as booleans or numbers are always processed as RAW.
Let’s understand the code written above:
def update_values(self, sheet_name, range, values)
This line defines the function update_values
and its two parameters: sheet_name
, range
and values
. The sheet_name
parameter is the name of the spreadsheet sheet that you want to read data from. The range parameter is the range of cells that you want to read data from. The values parameter is the values you need to write in the given range, it is expected to be an array of array. Each array in the array can be considered a row.
write_range = f"{sheet_name}!{range}"
This line creates a variable called write_range
and assigns it the value of the sheet_name
and range
parameters of function update_values
.
service = build('sheets', 'v4', credentials=self.creds)
This line creates a service object using the build() function which uses the credentials object that you obtained in the init function above.
body = {'values' : values, 'majorDimension': 'ROWS'}
The body contains the values that we want to update, the data type of values is expected to be an array of array.
result = service.spreadsheets().values().update(spreadsheetId=self.spreadsheet_id, range=write_range, valueInputOption='RAW', body=body).execute()
This line calls the API spreadsheets.values.update
to write data from the spreadsheet. This will be a POST request and we are passing the request body with values here in this function.
Using the GoogleSheetClient
Now, we are ready to use our GoogleSheetClient
class that we just created above.
Let’s add the required code to our main.py
file now:
def main():
google_sheet = GoogleSheetClient("add-your-spreadsheet-id-here")
values = google_sheet.read_values("Sheet1","A1:J4") #calling read_values function written above.
values_to_update = []
if values:
for row in values:
new_row = [int(cell_value)*4 for cell_value in row]
values_to_update.append(new_row)
google_sheet.update_values("Sheet1", "A1:J4", values_to_update) #calling update_values function written above.
else:
print("No values found in the sheet to be updated!")
if __name__ == '__main__':
main()
Let’s look at the code above in detail:
- Google sheet object: First, we have created an object of our class,
GoogleSheetClient
, here, we need to pass the id of the spreadsheet. If you remember, we had created the init function(constructor) above which expects the spreadsheet id as input. The spreadsheet id can be found in the spreadsheet link. - Now we can call our methods, i.e.
read_values
andupdate_values
using thegoogle_sheet
object. - In this code, we are reading the rows 1-4(colums A to J). Google sheet will give us an array of array in response which we are iterating over in the next steps.
- Essentially, we are reading our data on sheet, multiplying the values by 4, and then creating a new array of array,
values_to_update
. - We then write back the processed values back to the sheet by using the
update_values
function we wrote above.
In today’s blog, we discovered how we can streamline the mundane task of updating Google Sheets using Python. If you found this information valuable, kindly spread the word, and keep an eye out for future engaging technical blogs and guides. Should you require assistance or wish to offer any suggestions, don’t hesitate to contact us at [email protected].