Skip to content
Carmel Eve Carmel Eve

Show & Tell

In this video, we provide a brief introduction to Streamlit, an open-source Python library designed for quickly creating data visualizations with minimal code.

We discuss its ability to facilitate interactive data exploration, making it a powerful tool for data science and machine learning projects. The video covers setting up a Streamlit project, importing necessary packages, and building an application using sample world cities data.

Additionally, we demonstrate how to create and edit visualizations within the app, showcasing Streamlit’s dynamic updating feature for real-time data manipulation. Join us to learn how Streamlit can enhance your data exploration capabilities.

This is the second part in our series on Streamlit Development; to follow along you may need to watch part one: Simplify your Streamlit Python Development Experience with Dev Containers.

  • 00:00 Introduction to Streamlit
  • 00:13 Project Overview and Goals
  • 00:39 Understanding Streamlit
  • 01:07 Setting Up the Development Environment
  • 01:49 Building the Streamlit Application
  • 04:15 Creating Visualizations
  • 04:41 Interactive Data Editing
  • 05:58 Conclusion and Next Steps

Transcript

Hello and welcome. In this video, I'll give a brief introduction to Streamlit and how it can be used to surface insight and allow data exploration. As part of a recent project, we wanted to give a client the ability to interact with their data and explore what if scenarios. Instead of just surfacing insights within the form of a report, we wanted to give them the ability to actually interact with their data and perform their own data science experiments.

Many organizations don't understand the power in their data. And they need to be able to spend time experimenting and exploring in order to really understand what they have. That's where Streamlit comes in. Streamlit is an open source Python library which is used to quickly surface graphs and other data visuals, whilst writing minimal code.

It is based on the principles of interactivity, allowing users to carry out exploratory analysis and supporting data science and machine learning workloads. How does it all work? One of the huge advantages of Streamlit is how easy it is to get started. Streamlit is. So, let's reopen our VS Code project from the last video, which is running inside of a local dev container.

In this container, we've installed Poetry and the pandas and Streamlit packages. If you haven't watched the previous video or you don't want to run inside of a dev container, you'll just need to do a pip or Poetry install of the Streamlit and pandas packages. I have added some open source sample data to the project.

Here we have a CSV that contains the data about the cities of the world, their population, latitude, longitude, and what country they're in. Alongside that, we have a list of all the countries and their respective continents, which we'll use to enrich our data. So, if we switch back over to our Python file, we can start to build our application.

First, we need to import our Streamlit package, then we can start to quickly build up page elements. Having just added a title, let's get the app running. To do this, we open the terminal and use the Streamlit run command.

Streamlit then watches for changes, so anything we add to our Python file will be instantly served on the page.

Something to remember is that for Streamlit to successfully monitor for changes, you need to be running the terminal from inside the folder the app is in. It doesn't work with nested folders. We can now begin to surface our data in the application. We can run any Python code directly in the application, but for cleanliness I have created a separate Python file that cleans and then does the join on the cities data.

Here is the cities helper class, which we can call inside our app. If we call the getCitiesWithContinents method, passing in the relative path to the data, we can see that it will drop unnecessary columns from both files, rename some columns, drop any empty records, Merge the two data frames together to produce a single data frame where each city also has a continent, and then return the resulting data frame.

We can then call this function within our application, and create a Streamlit DataFrame object, setting the width to the page width, and we can display the data in the app.

Now, let's create a simple visualization using the data. Let's go with the bar chart, grouping the number of cities in each continent. And you can see how quickly we are able to start exploring our data.

Now the final thing I want to show you is what makes this so exciting for data exploration and data science scenarios. If, instead of a DataFrame Streamlit object, we instead create a DataEditor object and use the return value of that to populate our bar chart, we can now edit the DataFrame within the app. So we can currently see that there are 415 cities in Oceania.

If we update Tokyo to say that it is in fact in Oceania, we can see that the bar chart is now updated to count 416 cities in the continent. How this works is that each time a change is made on the application, this can be anything, pressing a button, updating a data value, or adjusting the filters, the entire Python script is re run.

This allows you to dynamically update things, and see the changes propagate throughout the rest of the application.

This does have implications for how we retrieve data, and means that you need a way to store state so that things aren't reset each time the script reruns, but I'll go into that in much more detail in my next video. Overall, we can see that Streamlit is a powerful tool that can be leveraged for data exploration, allowing you to tweak and edit the data as you go.

This allows you to answer what if questions, Some of them likely more realistic than moving Japan's biggest city halfway around the world.

Thanks for watching. Look out for my next video in the series coming soon.