Tutorial Playlist

Python tutorial for beginners.

The Best Tips for Learning Python

How to Install Python on Windows?

Top 15+ python ides in 2024: choosing the best one, a beginner’s guide to python variables, understanding python if-else statement, python numbers: integers, floats, complex numbers, introduction to python strings, the basics of python loops, python for loops explained with examples, introduction to python while loop, everything you need to know about python arrays, all you need to know about python list, how to easily implement python sets and dictionaries, a handy guide to python tuples, everything you need to know about python slicing, python regular expression (regex), learn a to z about python functions, objects and classes in python: create, modify and delete, python oops concept: here's what you need to know, an introduction to python threading, getting started with jupyter network, pycharm tutorial: getting started with pycharm, the best numpy tutorial for beginners, the best python pandas tutorial, an introduction to matplotlib for beginners, the best guide to time series analysis in python, an introduction to scikit-learn: machine learning in python, a beginner's guide to web scraping with python, python django tutorial: the best guide on django framework, top 10 reason why you should learn python, 10 cool python project ideas for beginners in 2024, the best ideas for python automation projects, how to become a python developer: a complete guide, the best guide for rpa using python, comprehending web development with php vs. python, the best tips for learning python - remove, the best way to learn about box and whisker plot.

An Interesting Guide to Visualizing Data Using Python Seaborn

The Complete Guide to Data Visualization in Python

Everything you need to know about game designing with pygame in python, the complete simplified guide to python bokeh, top 150 python interview questions and answers for 2024, the supreme guide to understand the workings of cpython, the best guide to string formatting in python, how to automate an excel sheet in python: all you need to know, how to make a chatbot in python, what is a multiline comment in python, palindrome in python, data structures in python: a comprehensive guide, fibonacci series in python, a complete guide to data visualization in python with libraries & more.

Lesson 39 of 50 By Ravikiran A S

The Complete Guide to Data Visualization in Python

Table of Contents

While working with data, it can be difficult to truly understand your data when it’s just in tabular form. To understand what exactly our data conveys, and to better clean it and select suitable models for it, we need to visualize it or represent it in pictorial form. This helps expose patterns, correlations, and trends that cannot be obtained when data is in a table or CSV file.

The process of finding trends and correlations in our data by representing it pictorially is called Data Visualization . To perform data visualization in python, we can use various python data visualization modules such as Matplotlib, Seaborn, Plotly, etc. In this article, The Complete Guide to Data Visualization in Python , we will discuss  how to work with some of these modules for data visualization in python and cover the following topics in detail.

What is Data Visualization?

Data visualization in python, matplotlib and seaborn, line charts, scatter plots, want a top software development job start here.

Want a Top Software Development Job? Start Here!

Data visualization is a field in data analysis that deals with visual representation of data. It graphically plots data and is an effective way to communicate inferences from data.

Using data visualization , we can get a visual summary of our data. With pictures, maps and graphs, the human mind has an easier time processing and understanding any given data. Data visualization plays a significant role in the representation of both small and large data sets, but it is especially useful when we have large data sets, in which it is impossible to see all of our data, let alone process and understand it manually.

Python offers several plotting libraries, namely Matplotlib , Seaborn and many other such data visualization packages with different features for creating informative, customized, and appealing plots to present data in the most simple and effective way.

Data_Visualization_in_Python_1

Figure 1: Data visualization

Matplotlib and Seaborn are python libraries that are used for data visualization. They have inbuilt modules for plotting different graphs. While Matplotlib is used to embed graphs into applications, Seaborn is primarily used for statistical graphs.

But when should we use either of the two? Let’s understand this with the help of a comparative analysis. The table below provides comparison between Python’s two well-known visualization packages Matplotlib and Seaborn.

Table 1: Matplotlib vs Seaborn

A Line chart is a graph that represents information as a series of data points connected by a straight line. In line charts, each data point or marker is plotted and connected with a line or curve. 

Let's consider the apple yield (tons per hectare) in Kanto. Let's plot a line graph using this data and see how the yield of apples changes over time. We start by importing Matplotlib and Seaborn.

Data_Visualization_in_Python_2

Figure 2: Importing necessary modules

Using Matplotlib

We are using random data points to represent the yield of apples. 

Data_Visualization_in_Python_3

Figure 3: Plotting apple yield

To better understand the graph and its purpose, we can add the x-axis values too.

Data_Visualization_in_Python_4.

Figure 4: Axis values

Let's add labels to the axes so that we can show what each axis represents.  

Data_Visualization_in_Python_5.

Figure 5: Axis with labels

To plot multiple datasets on the same graph, just use the plt.plot function once for each dataset. Let's use this to compare the yields of apples vs. oranges on the same graph.

Data_Visualization_in_Python_6

Figure 6: Plotting multiple graphs

We can add a legend which tells us what each line in our graph means. To understand what we are plotting, we can add a title to our graph.

Data_Visualization_in_Python_7

Figure 7: Plotting multiple graphs

To show each data point on our graph, we can highlight them with markers using the marker argument. Many different marker shapes like a circle, cross, square, diamond, etc. are provided by Matplotlib.

Data_Visualization_in_Python_8

Figure 8: Using markers

You can use the plt.figure function to change the size of the figure.

Data_Visualization_in_Python_9

Figure 9: Changing graph size

Using Seaborn

An easy way to make your charts look beautiful is to use some default styles from the Seaborn library. These can be applied globally using the sns.set_style function.

Data_Visualization_in_Python_10.

Figure 10: Using Seaborn

We can also use the darkgrid option to change the background color to a darker shade.

Data_Visualization_in_Python_11.

Figure 11: Using darkgrid in Seaborn

When you have categorical data, you can represent it with a bar graph. A bar graph plots data with the help of bars, which represent value on the y-axis and category on the x-axis. Bar graphs use bars with varying heights to show the data which belongs to a specific category.

Data_Visualization_in_Python_12

Figure 12: Plotting Bar graphs

We can also stack bars on top of each other. Let's plot the data for apples and oranges.

Data_Visualization_in_Python_13

Figure 13: Plotting stacked bar graphs

Let’s use the tips dataset in Seaborn next. The dataset consists of :

  • Information about the sex (gender)
  • Time of day
  • Tips given by customers visiting the restaurant for a week

Data_Visualization_in_Python_14

Figure 14: Iris Dataset

We can draw a bar chart to visualize how the average bill amount varies across different days of the week. We can do this by computing the day-wise averages and then using plt.bar. The Seaborn library also provides a barplot function that can automatically compute averages.

Data_Visualization_in_Python_15

Figure 15: Plotting averages of each bar

If you want to compare bar plots side-by-side, you can use the hue argument. The comparison will be done based on the third feature specified in this argument.

Data_Visualization_in_Python_16.

Figure 16: Plotting multiple bar graphs

You can make the bars horizontal by switching the axes.

Data_Visualization_in_Python_17.

Figure 17: Plotting horizontal bar graphs

A Histogram is a bar representation of data  that varies over a range. It plots the height of the data belonging to a range along the y-axis and the range along the x-axis. Histograms are used to plot data over a range of values. They use a bar representation to show the data belonging to each range. Let's again use the ‘Iris’ data which contains information about flowers to plot histograms.

Data_Visualization_in_Python_18.

Figure 18: Iris datase

Now, let’s plot a histogram using the hist() function.

Data_Visualization_in_Python_19.

Figure 19: Plotting histograms

We can control the number or size of bins too.

Data_Visualization_in_Python_20

Figure 20: Changing number of bins

Learn From The Best Mentors in the Industry!

Learn From The Best Mentors in the Industry!

We can change the number and size of bins using numpy too.

Data_Visualization_in_Python_21

Figure 21: Changing number and size of bins

We can create bins of unequal size too.

Data_Visualization_in_Python_22.

Figure 22: Bins of unequal size

Similar to line charts, we can draw multiple histograms in a single chart. We can reduce each histogram's opacity so that one histogram's bars don't hide the others'. Let's draw separate histograms for each species of flowers.

Data_Visualization_in_Python_23

Figure 23: Multiple histograms

Multiple histograms can be stacked on top of one another by setting the stacked parameter to True.

Data_Visualization_in_Python_24.

Figure 24: Stacking histograms

Scatter plots are used when we have to plot two or more variables present at different coordinates. The data is scattered all over the graph and is not confined to a range. Two or more variables are plotted in a Scatter Plot, with each variable being represented by a different color. Let's use the ‘Iris’ dataset to plot a Scatter Plot.

Data_Visualization_in_Python_25.

Figure 25: Iris Dataset

First, let’s see how many different species of flowers we have.

Data_Visualization_in_Python_26.

Figure 26: Unique flower species

Let’s try plotting the data with the help of a line chart.

Data_Visualization_in_Python_27.

Figure 27: Plotting line chart

This is not very informative. We cannot figure out the relationship between different data points.

Data_Visualization_in_Python_28

Figure 28: Scatter plot

This is much better. But we still cannot differentiate different data points belonging to different categories. We can color the dots using the flower species as a hue.

Data_Visualization_in_Python_29

Figure 29: Scatter plot with multiple colors

Since Seaborn uses Matplotlib's plotting functions internally, we can use functions like plt.figure and plt.title to modify the figure.

Data_Visualization_in_Python_30

Figure 30: Changing dimensions of scatter plot 

Learn 15+ In-Demand Tools and Skills!

Learn 15+ In-Demand Tools and Skills!

Heatmaps are used to see changes in behavior or gradual changes in data. It uses different colors to represent different values. Based on how these colors range in hues, intensity, etc., tells us how the phenomenon varies. Let's use heatmaps to visualize monthly passenger footfall at an airport over 12 years from the flights dataset in Seaborn.

Data_Visualization_in_Python_31.

Figure 31: Flights dataset 

The above dataset, flights_df shows us the monthly footfall in an airport for each year, from 1949 to 1960. The values represent the number of passengers (in thousands) that passed through the airport. Let’s use a heatmap to visualize the above data.

Data_Visualization_in_Python_32.

Figure 32: Plotting heatmap

The brighter the color, the higher the footfall at the airport. By looking at the graph, we can infer that : 

  • The annual footfall for any given year is highest around July and August.
  • The footfall grows annually. Any month in a year will have a higher footfall when compared to the previous years.

Let's display the actual values in our heatmap and change the hue to blue.           

Data_Visualization_in_Python_33

Figure 33: Plotting heatmap with values

In this article, The Complete Guide to Data Visualization in Python, we gave an overview of  data visualization in python and discussed how to create Line Charts, Bar Graphs, Histograms, Scatter Plot, and Heat Maps using various data visualization packages offered by Python like Matplotlib and Seaborn. 

If you need any further clarifications or want to learn more about data visualization in Python and want to understand how to perform data visualization, share your queries with us by mentioning them in this page's comments section. We will have our experts review them at the earliest!

Python offers multiple other visualization packages which can be used to create different types of visualizations and not just graphs and plots. It is, therefore, also important to understand the challenges and advantages of the different libraries and how to use them to their full potential. Check out Simplilearn's Post Graduate Program in Full Stack Web Development. The goal of this course is to make you job-ready and ensure your career success.

If you have any questions, feel free to post them in the comments below, our team will get back to you at the earliest.

Find our Full Stack Java Developer Online Bootcamp in top cities:

About the author.

Ravikiran A S

Ravikiran A S works with Simplilearn as a Research Analyst. He an enthusiastic geek always in the hunt to learn the latest technologies. He is proficient with Java Programming Language, Big Data, and powerful Big Data Frameworks like Apache Hadoop and Apache Spark.

Recommended Programs

Full Stack Java Developer

Full Stack Web Developer - MEAN Stack

*Lifetime access to high-quality, self-paced e-learning content.

Recommended Resources

An Interesting Guide to Visualizing Data Using Python Seaborn

Python Interview Guide

23 Best Data Visualization Tools for 2024

23 Best Data Visualization Tools for 2024

Data Scientist vs Data Analyst: Breaking Down the Roles

Data Scientist vs Data Analyst: Breaking Down the Roles

The Best Tips for Learning Python

Data Analyst Resume Guide

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Cookie Policy

We use cookies to operate this website, improve usability, personalize your experience, and improve our marketing. Privacy Policy .

By clicking "Accept" or further use of this website, you agree to allow cookies.

  • Data Science
  • Data Analytics
  • Machine Learning

python-pandas-tutorial-complete-introduction-beginners-header.jpg

Python Pandas Tutorial: A Complete Introduction for Beginners

Learn some of the most important pandas features for exploring, cleaning, transforming, visualizing, and learning from data.

You should already know:

  • Python fundamentals – you should have beginner to intermediate-level knowledge, which can be learned from most entry-level Python courses

The pandas package is the most important tool at the disposal of Data Scientists and Analysts working in Python today. The powerful machine learning and glamorous visualization tools may get all the attention, but pandas is the backbone of most data projects.

[pandas] is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals. — Wikipedia

If you're thinking about data science as a career, then it is imperative that one of the first things you do is learn pandas. In this post, we will go over the essential bits of information about pandas, including how to install it, its uses, and how it works with other common Python data analysis packages such as matplotlib and scikit-learn .

Article Resources

  • iPython notebook and data available on GitHub

Other articles in this series

  • Applied Introduction to NumPy

What's Pandas for?

Pandas has so many uses that it might make sense to list the things it can't do instead of what it can do.

This tool is essentially your data’s home. Through pandas, you get acquainted with your data by cleaning, transforming, and analyzing it.

For example, say you want to explore a dataset stored in a CSV on your computer. Pandas will extract the data from that CSV into a DataFrame — a table, basically — then let you do things like:

  • What's the average, median, max, or min of each column?
  • Does column A correlate with column B?
  • What does the distribution of data in column C look like?
  • Clean the data by doing things like removing missing values and filtering rows or columns by some criteria
  • Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles, and more.
  • Store the cleaned, transformed data back into a CSV, other file or database

Before you jump into the modeling or the complex visualizations you need to have a good understanding of the nature of your dataset and pandas is the best avenue through which to do that.

data presentation in python

How does pandas fit into the data science toolkit?

Not only is the pandas library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection.

Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in SciPy , plotting functions from Matplotlib , and machine learning algorithms in Scikit-learn .

Jupyter Notebooks offer a good environment for using pandas to do data exploration and modeling, but pandas can also be used in text editors just as easily.

Jupyter Notebooks give us the ability to execute code in a particular cell as opposed to running the entire file. This saves a lot of time when working with large datasets and complex transformations. Notebooks also provide an easy way to visualize pandas’ DataFrames and plots. As a matter of fact, this article was created entirely in a Jupyter Notebook.

When should you start using pandas?

If you do not have any experience coding in Python, then you should stay away from learning pandas until you do. You don’t have to be at the level of the software engineer, but you should be adept at the basics, such as lists, tuples, dictionaries, functions, and iterations. Also, I’d also recommend familiarizing yourself with NumPy due to the similarities mentioned above.

If you're looking for a good place to learn Python, Python for Everybody on Coursera is great (and Free).

Moreover, for those of you looking to do a data science bootcamp or some other accelerated data science education program, it's highly recommended you start learning pandas on your own before you start the program.

Even though accelerated programs teach you pandas, better skills beforehand means you'll be able to maximize time for learning and mastering the more complicated material.

Pandas First Steps

Install and import.

Pandas is an easy package to install. Open up your terminal program (for Mac users) or command line (for PC users) and install it using either of the following commands:

conda install pandas

pip install pandas

Alternatively, if you're currently viewing this article in a Jupyter notebook you can run this cell:

The ! at the beginning runs cells as if they were in a terminal.

To import pandas we usually import it with a shorter name since it's used so much:

Now to the basic components of pandas.

Core components of pandas: Series and DataFrames

The primary two components of pandas are the Series and DataFrame .

A Series is essentially a column, and a DataFrame is a multi-dimensional table made up of a collection of Series.

data presentation in python

DataFrames and Series are quite similar in that many operations that you can do with one you can do with the other, such as filling in null values and calculating the mean.

You'll see how these components work when we start working with data below.

Creating DataFrames from scratch

Creating DataFrames right in Python is good to know and quite useful when testing new methods and functions you find in the pandas docs.

There are many ways to create a DataFrame from scratch, but a great option is to just use a simple dict .

Let's say we have a fruit stand that sells apples and oranges. We want to have a column for each fruit and a row for each customer purchase. To organize this as a dictionary for pandas we could do something like:

And then pass it to the pandas DataFrame constructor:

How did that work?

Each (key, value) item in data corresponds to a column in the resulting DataFrame.

The Index of this DataFrame was given to us on creation as the numbers 0-3, but we could also create our own when we initialize the DataFrame.

Let's have customer names as our index:

So now we could loc ate a customer's order by using their name:

There's more on locating and extracting data from the DataFrame later, but now you should be able to create a DataFrame with any random data to learn on.

Let's move on to some quick methods for creating DataFrames from various other sources.

Want to learn more?

How to read in data.

It’s quite simple to load data from various file formats into a DataFrame. In the following examples we'll keep using our apples and oranges data, but this time it's coming from various files.

Reading data from CSVs

With CSV files all you need is a single line to load in the data:

CSVs don't have indexes like our DataFrames, so all we need to do is just designate the index_col when reading:

Here we're setting the index to be column zero.

You'll find that most CSVs won't ever have an index column and so usually you don't have to worry about this step.

Reading data from JSON

If you have a JSON file — which is essentially a stored Python dict — pandas can read this just as easily:

Notice this time our index came with us correctly since using JSON allowed indexes to work through nesting. Feel free to open data_file.json in a notepad so you can see how it works.

Pandas will try to figure out how to create a DataFrame by analyzing structure of your JSON, and sometimes it doesn't get it right. Often you'll need to set the orient keyword argument depending on the structure, so check out read_json docs about that argument to see which orientation you're using.

Reading data from a SQL database

If you’re working with data from a SQL database you need to first establish a connection using an appropriate Python library, then pass a query to pandas. Here we'll use SQLite to demonstrate.

First, we need pysqlite3 installed, so run this command in your terminal:

pip install pysqlite3

Or run this cell if you're in a notebook:

sqlite3 is used to create a connection to a database which we can then use to generate a DataFrame through a SELECT query.

So first we'll make a connection to a SQLite database file:

If you have data in PostgreSQL, MySQL, or some other SQL server, you'll need to obtain the right Python library to make a connection. For example, psycopg2 ( link ) is a commonly used library for making connections to PostgreSQL. Furthermore, you would make a connection to a database URI instead of a file like we did here with SQLite.

For a great course on SQL check out The Complete SQL Bootcamp on Udemy

In this SQLite database we have a table called purchases , and our index is in a column called "index".

By passing a SELECT query and our con , we can read from the purchases table:

Just like with CSVs, we could pass index_col='index' , but we can also set an index after-the-fact:

In fact, we could use set_index() on any DataFrame using any column at any time. Indexing Series and DataFrames is a very common task, and the different ways of doing it is worth remembering.

Converting back to a CSV, JSON, or SQL

So after extensive work on cleaning your data, you’re now ready to save it as a file of your choice. Similar to the ways we read in data, pandas provides intuitive commands to save it:

When we save JSON and CSV files, all we have to input into those functions is our desired filename with the appropriate file extension. With SQL, we’re not creating a new file but instead inserting a new table into the database using our con variable from before.

Let's move on to importing some real-world data and detailing a few of the operations you'll be using a lot.

Most important DataFrame operations

DataFrames possess hundreds of methods and other operations that are crucial to any analysis. As a beginner, you should know the operations that perform simple transformations of your data and those that provide fundamental statistical analysis.

Let's load in the IMDB movies dataset to begin:

We're loading this dataset from a CSV and designating the movie titles to be our index.

Viewing your data

The first thing to do when opening a new dataset is print out a few rows to keep as a visual reference. We accomplish this with .head() :

.head() outputs the first five rows of your DataFrame by default, but we could also pass a number as well: movies_df.head(10) would output the top ten rows, for example.

To see the last five rows use .tail() . tail() also accepts a number, and in this case we printing the bottom two rows.:

Typically when we load in a dataset, we like to view the first five or so rows to see what's under the hood. Here we can see the names of each column, the index, and examples of values in each row.

You'll notice that the index in our DataFrame is the Title column, which you can tell by how the word Title is slightly lower than the rest of the columns.

Getting info about your data

.info() should be one of the very first commands you run after loading your data:

.info() provides the essential details about your dataset, such as the number of rows and columns, the number of non-null values, what type of data is in each column, and how much memory your DataFrame is using.

Notice in our movies dataset we have some obvious missing values in the Revenue and Metascore columns. We'll look at how to handle those in a bit.

Seeing the datatype quickly is actually quite useful. Imagine you just imported some JSON and the integers were recorded as strings. You go to do some arithmetic and find an "unsupported operand" Exception because you can't do math with strings. Calling .info() will quickly point out that your column you thought was all integers are actually string objects.

Another fast and useful attribute is .shape , which outputs just a tuple of (rows, columns):

Note that .shape has no parentheses and is a simple tuple of format (rows, columns). So we have 1000 rows and 11 columns in our movies DataFrame.

You'll be going to .shape a lot when cleaning and transforming data. For example, you might filter some rows based on some criteria and then want to know quickly how many rows were removed.

Handling duplicates

This dataset does not have duplicate rows, but it is always important to verify you aren't aggregating duplicate rows.

To demonstrate, let's simply just double up our movies DataFrame by appending it to itself:

Using append() will return a copy without affecting the original DataFrame. We are capturing this copy in temp so we aren't working with the real data.

Notice call .shape quickly proves our DataFrame rows have doubled.

Now we can try dropping duplicates:

Just like append() , the drop_duplicates() method will also return a copy of your DataFrame, but this time with duplicates removed. Calling .shape confirms we're back to the 1000 rows of our original dataset.

It's a little verbose to keep assigning DataFrames to the same variable like in this example. For this reason, pandas has the inplace keyword argument on many of its methods. Using inplace=True will modify the DataFrame object in place:

Now our temp_df will have the transformed data automatically.

Another important argument for drop_duplicates() is keep , which has three possible options:

  • first : (default) Drop duplicates except for the first occurrence.
  • last : Drop duplicates except for the last occurrence.
  • False : Drop all duplicates.

Since we didn't define the keep arugment in the previous example it was defaulted to first . This means that if two rows are the same pandas will drop the second row and keep the first row. Using last has the opposite effect: the first row is dropped.

keep , on the other hand, will drop all duplicates. If two rows are the same then both will be dropped. Watch what happens to temp_df :

Since all rows were duplicates, keep=False dropped them all resulting in zero rows being left over. If you're wondering why you would want to do this, one reason is that it allows you to locate all duplicates in your dataset. When conditional selections are shown below you'll see how to do that.

Column cleanup

Many times datasets will have verbose column names with symbols, upper and lowercase words, spaces, and typos. To make selecting data by column name easier we can spend a little time cleaning up their names.

Here's how to print the column names of our dataset:

Not only does .columns come in handy if you want to rename columns by allowing for simple copy and paste, it's also useful if you need to understand why you are receiving a Key Error when selecting data by column.

We can use the .rename() method to rename certain or all columns via a dict . We don't want parentheses, so let's rename those:

Excellent. But what if we want to lowercase all names? Instead of using .rename() we could also set a list of names to the columns like so:

But that's too much work. Instead of just renaming each column manually we can do a list comprehension:

list (and dict ) comprehensions come in handy a lot when working with pandas and data in general.

It's a good idea to lowercase, remove special characters, and replace spaces with underscores if you'll be working with a dataset for some time.

How to work with missing values

When exploring data, you’ll most likely encounter missing or null values, which are essentially placeholders for non-existent values. Most commonly you'll see Python's None or NumPy's np.nan , each of which are handled differently in some situations.

There are two options in dealing with nulls:

  • Get rid of rows or columns with nulls
  • Replace nulls with non-null values, a technique known as imputation

Let's calculate to total number of nulls in each column of our dataset. The first step is to check which cells in our DataFrame are null:

Notice isnull() returns a DataFrame where each cell is either True or False depending on that cell's null status.

To count the number of nulls in each column we use an aggregate function for summing:

.isnull() just by iteself isn't very useful, and is usually used in conjunction with other methods, like sum() .

We can see now that our data has 128 missing values for revenue_millions and 64 missing values for metascore .

Removing null values

Data Scientists and Analysts regularly face the dilemma of dropping or imputing null values, and is a decision that requires intimate knowledge of your data and its context. Overall, removing null data is only suggested if you have a small amount of missing data.

Remove nulls is pretty simple:

This operation will delete any row with at least a single null value, but it will return a new DataFrame without altering the original one. You could specify inplace=True in this method as well.

So in the case of our dataset, this operation would remove 128 rows where revenue_millions is null and 64 rows where metascore is null. This obviously seems like a waste since there's perfectly good data in the other columns of those dropped rows. That's why we'll look at imputation next.

Other than just dropping rows, you can also drop columns with null values by setting axis=1 :

In our dataset, this operation would drop the revenue_millions and metascore columns

What's with this axis=1 parameter?

It's not immediately obvious where axis comes from and why you need it to be 1 for it to affect columns. To see why, just look at the .shape output:

movies_df.shape

Out: (1000, 11)

As we learned above, this is a tuple that represents the shape of the DataFrame, i.e. 1000 rows and 11 columns. Note that the rows are at index zero of this tuple and columns are at index one of this tuple. This is why axis=1 affects columns. This comes from NumPy, and is a great example of why learning NumPy is worth your time.

Imputation is a conventional feature engineering technique used to keep valuable data that have null values.

There may be instances where dropping every row with a null value removes too big a chunk from your dataset, so instead we can impute that null with another value, usually the mean or the median of that column.

Let's look at imputing the missing values in the revenue_millions column. First we'll extract that column into its own variable:

Using square brackets is the general way we select columns in a DataFrame.

If you remember back to when we created DataFrames from scratch, the keys of the dict ended up as column names. Now when we select columns of a DataFrame, we use brackets just like if we were accessing a Python dictionary.

revenue now contains a Series:

Slightly different formatting than a DataFrame, but we still have our Title index.

We'll impute the missing values of revenue using the mean. Here's the mean value:

With the mean, let's fill the nulls using fillna() :

We have now replaced all nulls in revenue with the mean of the column. Notice that by using inplace=True we have actually affected the original movies_df :

Imputing an entire column with the same value like this is a basic example. It would be a better idea to try a more granular imputation by Genre or Director.

For example, you would find the mean of the revenue generated in each genre individually and impute the nulls in each genre with that genre's mean.

Let's now look at more ways to examine and understand the dataset.

Understanding your variables

Using describe() on an entire DataFrame we can get a summary of the distribution of continuous variables:

Understanding which numbers are continuous also comes in handy when thinking about the type of plot to use to represent your data visually.

.describe() can also be used on a categorical variable to get the count of rows, unique count of categories, top category, and freq of top category:

This tells us that the genre column has 207 unique values, the top value is Action/Adventure/Sci-Fi, which shows up 50 times (freq).

.value_counts() can tell us the frequency of all values in a column:

Relationships between continuous variables

By using the correlation method .corr() we can generate the relationship between each continuous variable:

Correlation tables are a numerical representation of the bivariate relationships in the dataset.

Positive numbers indicate a positive correlation — one goes up the other goes up — and negative numbers represent an inverse correlation — one goes up the other goes down. 1.0 indicates a perfect correlation.

So looking in the first row, first column we see rank has a perfect correlation with itself, which is obvious. On the other hand, the correlation between votes and revenue_millions is 0.6. A little more interesting.

Examining bivariate relationships comes in handy when you have an outcome or dependent variable in mind and would like to see the features most correlated to the increase or decrease of the outcome. You can visually represent bivariate relationships with scatterplots (seen below in the plotting section).

For a deeper look into data summarizations check out Essential Statistics for Data Science .

Let's now look more at manipulating DataFrames.

DataFrame slicing, selecting, extracting

Up until now we've focused on some basic summaries of our data. We've learned about simple column extraction using single brackets, and we imputed null values in a column using fillna() . Below are the other methods of slicing, selecting, and extracting you'll need to use constantly.

It's important to note that, although many methods are the same, DataFrames and Series have different attributes, so you'll need be sure to know which type you are working with or else you will receive attribute errors.

Let's look at working with columns first.

You already saw how to extract a column using square brackets like this:

This will return a Series . To extract a column as a DataFrame , you need to pass a list of column names. In our case that's just a single column:

Since it's just a list, adding another column name is easy:

Now we'll look at getting data by rows.

For rows, we have two options:

  • .loc - loc ates by name
  • .iloc - loc ates by numerical i ndex

Remember that we are still indexed by movie Title, so to use .loc we give it the Title of a movie:

On the other hand, with iloc we give it the numerical index of Prometheus:

loc and iloc can be thought of as similar to Python list slicing. To show this even further, let's select multiple rows.

How would you do it with a list? In Python, just slice with brackets like example_list[1:4] . It's works the same way in pandas:

One important distinction between using .loc and .iloc to select multiple rows is that .loc includes the movie Sing in the result, but when using .iloc we're getting rows 1:4 but the movie at index 4 ( Suicide Squad ) is not included.

Slicing with .iloc follows the same rules as slicing with lists, the object at the index at the end is not included.

Conditional selections

We’ve gone over how to select columns and rows, but what if we want to make a conditional selection?

For example, what if we want to filter our movies DataFrame to show only films directed by Ridley Scott or films with a rating greater than or equal to 8.0?

To do that, we take a column from the DataFrame and apply a Boolean condition to it. Here's an example of a Boolean condition:

Similar to isnull() , this returns a Series of True and False values: True for films directed by Ridley Scott and False for ones not directed by him.

We want to filter out all movies not directed by Ridley Scott, in other words, we don’t want the False films. To return the rows where that condition is True we have to pass this operation into the DataFrame:

You can get used to looking at these conditionals by reading it like:

Select movies_df where movies_df director equals Ridley Scott.

Let's look at conditional selections using numerical values by filtering the DataFrame by ratings:

We can make some richer conditionals by using logical operators | for "or" and & for "and".

Let's filter the the DataFrame to show only movies by Christopher Nolan OR Ridley Scott:

We need to make sure to group evaluations with parentheses so Python knows how to evaluate the conditional.

Using the isin() method we could make this more concise though:

Let's say we want all movies that were released between 2005 and 2010, have a rating above 8.0, but made below the 25th percentile in revenue.

Here's how we could do all of that:

If you recall up when we used .describe() the 25th percentile for revenue was about 17.4, and we can access this value directly by using the quantile() method with a float of 0.25.

So here we have only four movies that match that criteria.

Applying functions

It is possible to iterate over a DataFrame or Series as you would with a list, but doing so — especially on large datasets — is very slow.

An efficient alternative is to apply() a function to the dataset. For example, we could use a function to convert movies with an 8.0 or greater to a string value of "good" and the rest to "bad" and use this transformed values to create a new column.

First we would create a function that, when given a rating, determines if it's good or bad:

Now we want to send the entire rating column through this function, which is what apply() does:

The .apply() method passes every value in the rating column through the rating_function and then returns a new Series. This Series is then assigned to a new column called rating_category .

You can also use anonymous functions as well. This lambda function achieves the same result as rating_function :

Overall, using apply() will be much faster than iterating manually over rows because pandas is utilizing vectorization.

Vectorization: a style of computer programming where operations are applied to whole arrays instead of individual elements — Wikipedia

A good example of high usage of apply() is during natural language processing (NLP) work. You'll need to apply all sorts of text cleaning functions to strings to prepare for machine learning.

Brief Plotting

Another great thing about pandas is that it integrates with Matplotlib, so you get the ability to plot directly off DataFrames and Series. To get started we need to import Matplotlib ( pip install matplotlib ):

Now we can begin. There won't be a lot of coverage on plotting, but it should be enough to explore you're data easily.

Plotting Tip

For categorical variables utilize Bar Charts* and Boxplots.

For continuous variables utilize Histograms, Scatterplots, Line graphs, and Boxplots.

Let's plot the relationship between ratings and revenue. All we need to do is call .plot() on movies_df with some info about how to construct the plot:

data presentation in python

What's with the semicolon? It's not a syntax error, just a way to hide the <matplotlib.axes._subplots.AxesSubplot at 0x26613b5cc18> output when plotting in Jupyter notebooks.

If we want to plot a simple Histogram based on a single column, we can call plot on a column:

data presentation in python

Do you remember the .describe() example at the beginning of this tutorial? Well, there's a graphical representation of the interquartile range, called the Boxplot. Let's recall what describe() gives us on the ratings column:

Using a Boxplot we can visualize this data:

data presentation in python

By combining categorical and continuous data, we can create a Boxplot of revenue that is grouped by the Rating Category we created above:

data presentation in python

That's the general idea of plotting with pandas. There's too many plots to mention, so definitely take a look at the plot() docs here for more information on what it can do.

Wrapping up

Exploring, cleaning, transforming, and visualization data with pandas in Python is an essential skill in data science. Just cleaning wrangling data is 80% of your job as a Data Scientist. After a few projects and some practice, you should be very comfortable with most of the basics.

To keep improving, view the extensive tutorials offered by the official pandas docs, follow along with a few Kaggle kernels , and keep working on your own projects!

Applied Data Science with Python — Coursera

Covers an intro to Python, Visualization, Machine Learning, Text Mining, and Social Network Analysis in Python. Also provides many challenging quizzes and assignments to further enhance your learning.

Complete SQL Bootcamp — Udemy

An excellent course for learning SQL. The instructor explains everything from beginner to advanced SQL queries and techniques, and provides many exercises to help you learn.

Get updates in your inbox

Join over 7,500 data science learners.

Recent articles:

The 6 best python courses for 2024 – ranked by software engineer, best course deals for black friday and cyber monday 2024, sigmoid function, dot product, 7 best artificial intelligence (ai) courses.

Top courses you can take today to begin your journey into the Artificial Intelligence field.

Meet the Authors

george-mcintire-profile.png

Data Scientist and writer, currently working as a Data Visualization Analyst at Callisto Media

Brendan Martin

Chief Editor at LearnDataSci and software engineer

lauren-washington-profile.jpg

Lead data scientist and machine learning developer at smartQED, and mentor at the Thinkful Data Science program .

Back to blog index

Purdue University

  • Ask a Librarian
  • Introduction to Tableau (Summer 2021)
  • Data Visualization using Python (Matplotlib and Seaborn)
  • Data Visualization Using Python - Interactive Plots (Bokeh)
  • Data Visualization using Microsoft PowerPoint and Excel
  • Data Visualization with R Part 1: Intro to R
  • Data Visualization with R Part 2: Tidyverse/Tidy Data and dplyr
  • Data Visualization with R Part 3 - Web Scraping with OpenRefine API
  • Data Visualization with R Part 4: ggplot2
  • Data Visualization with R Part 5 - Sentiment Analysis
  • Data Visualization using Tableau (Summer 2020)
  • Introduction to Python
  • Machine Learning Overview (using Python)
  • Preparing your data for Machine Learning
  • Machine Learning using Matlab
  • Supervised Learning 1 - Linear Classifiers
  • Supervised Learning 2 - Tree Based Models
  • Application 1 - Sentiment Analysis
  • Application 2 - Dimensionality Reduction
  • Application 3 - Time Series Data
  • Unsupervised Learning - Clustering Analysis
  • Model Validation and Selection
  • Fairness and Bias in Machine Learning
  • Explainable AI - An Overview
  • Introduction to Reinforcement Learning
  • Introduction to Neural Networks
  • Intro to Automated Machine Learning: Hyper-Parameter Tuning
  • Introduction to NLP part1 - text processing
  • Hyper-Parameter Tuning: Bayesian Optimization
  • Introduction to NLP Part 2 - Neural Networks
  • Introduction to Julia
  • Introduction to Computer Vision with Neural Networks
  • Intro to Python visualization tools: Seaborn and ipywidgets.
  • Data Scraping and Analysis with Python
  • Intro to Reinforcement Learning on an optimization perspective.
  • Intro to Hyperparameter Optimization: Black-Box Optimization Approaches
  • Introduction to Generative adversarial networks (GANs)
  • Introduction to Recommender Systems
  • Intro to Parallel Computing
  • Introduction to Python in Data Science
  • Intro to Supervised and Unsupervised Machine Learning Algorithms
  • Intro to Java and Algorithms Part 1
  • Intro to Java and Algorithms Part 2
  • Introduction to Nueral Network
  • Introduction to Web API and Database
  • Intro to RNN and LSTM
  • Introduction to Transformers in Image Processing
  • Intro to Hyperparameter Optimization: Bayesian Optimization
  • Introduction to Hadoop and Mapreduce
  • Introduction to Container and Kubernetes
  • Introduction to Federated Learning
  • Introduction to Python -2023 fall
  • Introduction to Machine Learning
  • Data visualization using Python
  • Introduction to PyTorch 1
  • Introduction to PyTorch 2
  • Introduction to Transformer Neural Network
  • Recommender Systems
  • Time series and forecasting
  • Clustering techniques in Machine learning
  • Introduction to Deep Learning
  • Machine learning for Audio
  • Intro to Multimodal in Machine Learning
  • Intro to Generative models
  • Practical Machine Learning
  • Introduction to Big data
  • Intro to Contrastive Learning
  • Intro to Few-shot Learning
  • Experimental design
  • Introduction to Natural Language Processing
  • Introduction to Large Language Models

Data Visualization using Python - Matplotlib and Seaborn

Python has emerged as the most popular programming language in the data science community. In this workshop, we will go over the basics of Data Visualization using Python. We will look at the different types of plots that can be created using Matplotlib and Seaborn and go over available styling options.

  • Presentation This is the presentation that was used in the workshop.
  • Colab Notebook Link to colab notebook used during the workshop. more... less... This is an interactive notebook and you can run it by yourself on the browser to see the results of what was presented in the workshop. You can also edit it and work with it at your own pace. Please note you will need a Google account to be able to access it.
  • << Previous: Introduction to Tableau (Summer 2021)
  • Next: Data Visualization Using Python - Interactive Plots (Bokeh) >>
  • Last Edited: May 7, 2024 2:12 AM
  • URL: https://guides.lib.purdue.edu/d-velop

Create interactive slides with Python in 8 Jupyter Notebook cells

Creating presentations in Jupyter Notebook is a great alternative to manually updating slides in other presentation creation software. If your data changes, you just re-execute the cell and slide chart is updated.

Jupyter Notebook is using Reveal.js (opens in a new tab) for creating slides from cells. The standard approach is to write slides code and Markdown in the Jupyter Notebook. When notebook is ready, it can be exported to standalone HTML file with presentation.

What if, you would like to update slides during the slide show? What is more, it would be fantastic to have interactive widgets in the presentation. You can do this in Mercury framework.

In this tutorial, we will create an interactive presentation in Jupyter Notebook and serve it with Mercury.

Create presentation in notebook

Please enable Slideshow toolbar in Jupyter Notebook. It can be done by clicking View -> Cell Toolbar -> Slideshow . It is presented in the screenshot below:

Enable cell toolbar

We will need following packages to create presentation in Python notebook:

Please make sure that they are installed in your environment.

1. Import packages and App setup

The first step is to import packages and setup Mercury App :

We setup title and description for App object.

Please note that we set Slide Type to Skip . This cell will not appear in the presentation.

2. Add title

The second cell is a Markdown with title:

The Slide Type is set to Slide . It is our first slide!

3. Add slide with Markdown

Add new Markdown cell with the following cell.

Please set Slide Type to Slide . It will be a second slide. I'm using ## as slide title ( # will produce too large title in my opinion).

4. Add Mercury Widget

Please add code cell with Text widget. We will use it, to ask users about their name.

We set Slide Type as Skip , so this cell will not appear in the presentation.

5. Display name

Let's use the name.value in the slide. Please add a code cell. We will display a Markdown text with Python variables by using Markdown function from Mercury package.

Please set the Slide Type to Slide .

You can display Markdown with Python variables by calling mr.Markdown() or mr.Md() functions. Both do the same.

The first five cells of the notebook:

Notebook code for presentation in Jupyter Notebook

You can enter your name in the widget during the notebook development. There will be no change in other cells. If you want to update the cell with new widget value, please execute it manually.

6. More widgets

We can add more widgets to the presentation. They will be used to control chart in the next slide.

We have used Slider and Select widgets. They are displayed in the notebook. This cell will not be displayed in the presentation, so set Slide Type to Skip .

7. Scatter plot

We will add a new code cell. It will have Slide Type set to Slide .

We used widgets values by accessing them with samples.value and color.value .

Screenshot of the notebook with scatter plot:

Notebook code for presentation in Jupyter Notebook

8. Final slide

Please add a last Markdown cell. Its Slide Type will be set to Slide :

Please notice that link is added with HTML syntax. There is a target="_blank" used to open link in a new tab.

Run presentation in Mercury

Please run Mercury local server in the same directory as notebook:

The above command will open a web browser at http://127.0.0.1:8000 . Please click on a card with presentation.

You can navigate between slides with arrows in the bottom right corner. You can enter the full screen mode by pressing F on the keyboard. Please use Esc to exit full screen mode.

You can change widgets values in the sidebar and presentation slides will be automatically recomputed:

You can export your slides as PDF or HTML by clicking Download button in the sidebar.

  • Mailing List

Practical Business Python

Taking care of business, one python script at a time

Creating Powerpoint Presentations with Python

Posted by Chris Moffitt in articles   

Introduction

Love it or loathe it, PowerPoint is widely used in most business settings. This article will not debate the merits of PowerPoint but will show you how to use python to remove some of the drudgery of PowerPoint by automating the creation of PowerPoint slides using python.

Fortunately for us, there is an excellent python library for creating and updating PowerPoint files: python-pptx . The API is very well documented so it is pretty easy to use. The only tricky part is understanding the PowerPoint document structure including the various master layouts and elements. Once you understand the basics, it is relatively simple to automate the creation of your own PowerPoint slides. This article will walk through an example of reading in and analyzing some Excel data with pandas, creating tables and building a graph that can be embedded in a PowerPoint file.

PowerPoint File Basics

Python-pptx can create blank PowerPoint files but most people are going to prefer working with a predefined template that you can customize with your own content. Python-pptx’s API supports this process quite simply as long as you know a few things about your template.

Before diving into some code samples, there are two key components you need to understand: Slide Layouts and Placeholders . In the images below you can see an example of two different layouts as well as the template’s placeholders where you can populate your content.

In the image below, you can see that we are using Layout 0 and there is one placeholder on the slide at index 1.

PowerPoint Layout 0

In this image, we use Layout 1 for a completely different look.

PowerPoint Layout 1

In order to make your life easier with your own templates, I created a simple standalone script that takes a template and marks it up with the various elements.

I won’t explain all the code line by line but you can see analyze_ppt.py on github. Here is the function that does the bulk of the work:

The basic flow of this function is to loop through and create an example of every layout included in the source PowerPoint file. Then on each slide, it will populate the title (if it exists). Finally, it will iterate through all of the placeholders included in the template and show the index of the placeholder as well as the type.

If you want to try it yourself:

Refer to the input and output files to see what you get.

Creating your own PowerPoint

For the dataset and analysis, I will be replicating the analysis in Generating Excel Reports from a Pandas Pivot Table . The article explains the pandas data manipulation in more detail so it will be helpful to make sure you are comfortable with it before going too much deeper into the code.

Let’s get things started with the inputs and basic shell of the program:

After we create our command line args, we read the source Excel file into a pandas DataFrame. Next, we use that DataFrame as an input to create the Pivot_table summary of the data:

Consult the Generating Excel Reports from a Pandas Pivot Table if this does not make sense to you.

The next piece of the analysis is creating a simple bar chart of sales performance by account:

Here is a scaled down version of the image:

PowerPoint Graph

We have a chart and a pivot table completed. Now we are going to embed that information into a new PowerPoint file based on a given PowerPoint template file.

Before I go any farther, there are a couple of things to note. You need to know what layout you would like to use as well as where you want to populate your content. In looking at the output of analyze_ppt.py we know that the title slide is layout 0 and that it has a title attribute and a subtitle at placeholder 1.

Here is the start of the function that we use to create our output PowerPoint:

This code creates a new presentation based on our input file, adds a single slide and populates the title and subtitle on the slide. It looks like this:

PowerPoint Title Slide

Pretty cool huh?

The next step is to embed our picture into a slide.

From our previous analysis, we know that the graph slide we want to use is layout index 8, so we create a new slide, add a title then add a picture into placeholder 1. The final step adds a subtitle at placeholder 2.

Here is our masterpiece:

PowerPoint Chart

For the final portion of the presentation, we will create a table for each manager with their sales performance.

Here is an image of what we’re going to achieve:

PowerPoint Table

Creating tables in PowerPoint is a good news / bad news story. The good news is that there is an API to create one. The bad news is that you can’t easily convert a pandas DataFrame to a table using the built in API . However, we are very fortunate that someone has already done all the hard work for us and created PandasToPowerPoint .

This excellent piece of code takes a DataFrame and converts it to a PowerPoint compatible table. I have taken the liberty of including a portion of it in my script. The original has more functionality that I am not using so I encourage you to check out the repo and use it in your own code.

The code takes each manager out of the pivot table and builds a simple DataFrame that contains the summary data. Then uses the df_to_table to convert the DataFrame into a PowerPoint compatible table.

If you want to run this on your own, the full code would look something like this:

All of the relevant files are available in the github repository .

One of the things I really enjoy about using python to solve real world business problems is that I am frequently pleasantly surprised at the rich ecosystem of very well thought out python tools already available to help with my problems. In this specific case, PowerPoint is rarely a joy to use but it is a necessity in many environments.

After reading this article, you should know that there is some hope for you next time you are asked to create a bunch of reports in PowerPoint. Keep this article in mind and see if you can find a way to automate away some of the tedium!

  • ← Best Practices for Managing Your Code Library
  • Adding a Simple GUI to Your Pandas Script →

Subscribe to the mailing list

Submit a topic.

  • Suggest a topic for a post
  • Pandas Pivot Table Explained
  • Common Excel Tasks Demonstrated in Pandas
  • Overview of Python Visualization Tools
  • Guide to Encoding Categorical Values in Python
  • Overview of Pandas Data Types

Article Roadmap

We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.

PSF Supporting Member

Software Technology

Tips and tricks for software technology terms that include development methods, programming languages, and supporting tools in the development period

How to Create PowerPoint presentation from Excel Data with Python

data presentation in python

We’ll learn how to automate a PowerPoint presentation from Excel data using Python. Below picture illustrates what we’ll do in this project:

data presentation in python

The input for this project is an excel file with 2 sheets:

  • First sheet name: KPI , that have 7 columns {Start Time, NE Name, Cell, RRC Connected User, Cell Traffic Volume DL(Gbits), DL Cell Throughput, and DL Resource Block Utilizing Rate (%)}. ‘Start Time’ column will be a data series that will be used as x-axis. ‘NE Name’ column will be used to filter only one network element per slide. ‘Cell’ will be used as a legend (data series). Other 4 columns will be used as a data series on the chart.
  • Second sheet name: Exec , that have 5 columns {NE Name, Background, Action Plan, Action Date, Result}. ‘NE Name’ will be shown on the sheet header. ‘Background’, ‘Action Plan’, and ‘Result’ will be shown at left Text Box as an information. And ‘Action Date’ will be shown on every chart as a red-vertical-line to mark action that already taken at a specific date and time.

You can download the excel file mentioned above for this project, here .

Steps to complete this project:

  • Load Excel File Data from KPI & Exec sheets to Pandas DataFrame
  • Filter DataFrame (based on your requirement)
  • Pivoting filtered DataFrame
  • Create charts from Pivot data above
  • Create PowerPoint presentation Slide
  • Add charts, header, and left-information to the Slide
  • Save PowerPoint presentation (.pptx)

For the purpose of being easy to read and reusable, we’ll try to wrap all the above steps to a class/object ( CMySlide ), as below:

data presentation in python

But, before we can create CMySlide class above, we need some classes from other posts below:

  • How to read excel (xls, xlsx) file in python using pandas . From this post we’ll use CReadExcel , you can copy the class to your project.
  • Create a chart from Excel data in Python with matplotlib . From this post we’ll use CPivot and CChart classes, please copy the classes to your project.
  • How to create PowerPoint Presentation with Python . From this post we need CPptx , CSlide , CTextBox , and CParagraph , please copy all the classes to your project.

You can copy all of the classes above to your Jupyter-notebook in one file so you don’t need to import every class to your project. So now, without wasting time, here is the class to create a PowerPoint presentation from Excel Data:

Below is the description of CMySlide functions above:

  • __init__(self, excelFilePath, kpiSheetName, execSheetName, neName): the input to this function are excel-file, 2 sheet (KPI & Exec), and neName to filter the data frame. On this function, data will load from 2 sheet (KPI & Exec) and filter the data based on neName. Not least, create PowerPoint presentation and add a slide.
  • chart(self, index, columns, values, title, xlabel, ylabel, isStackPlot=False): create a chart from KPI & Exec data frame by pivoting the data and then plot the chart. And this function will return image stream with ByteIO() type.
  • addTextBoxToSlide(self, left, top, width, height, margin_left, margin_right, margin_top, margin_bottom, rgb_color=None): as the function name tell, will add Text Box to Slide
  • addParagraphToTextBox(self, textbox, text, font_bold, font_size, rgb_color=None): as the function name tell, will add paragraph to Text Box
  • addTopHeader(self): This function will add Text Box as a slide header and put some information.
  • addLeftInfo(self): This function will add Text Box on the left side of the slide and put some information.
  • addCharts(self): This function will add 4 charts to the right of the slide content.
  • savePptx(self, filename): This function will save PowerPoint presentation to file (.pptx).

On this test, we’ll try to load Excel Data ( sample data, here ) and process the data in chart and information terms, update it to PowerPoint slide, and save the file (.pptx). Here is the code:

We combine some classes from the previous posts to support our project to create a PowerPoint presentation from Excel Data. And in this project, we wrap all the steps to a class so the code is more readable and easier to organize (and don’t forget you must modify this class to accommodate your requirement).

Share this:

Leave a comment cancel reply.

' src=

  • Already have a WordPress.com account? Log in now.
  • Subscribe Subscribed
  • Copy shortlink
  • Report this content
  • View post in Reader
  • Manage subscriptions
  • Collapse this bar

Code With C

The Way to Programming

  • C Tutorials
  • Java Tutorials
  • Python Tutorials
  • PHP Tutorials
  • Java Projects

Efficient Data Presentation: Using Print Format in Python

CodeLikeAGirl

Hey there, Techies! 👩‍💻 Today, we are diving into the fun and quirky world of print formatting in Python! 🎉 Let’s make our data presentation not just efficient, but also stylish! So grab your Python hats and let’s get started with these cool techniques! 🐍

Basics of Print Format

Ah, the heart of our Python programs – the trusty print() function! 🖨️ This function is like the town crier, shouting out our data for all to hear (or see)! Let’s unravel its mysteries and understand why formatting is the real MVP in printing data.

Understanding the print() function

The print() function is like a friendly chatterbox in Python that helps us display our data on the screen. From strings to numbers, it takes them all and gives them a voice! 🗣️

Importance of formatting in printing data

Imagine a world without formatting – messy, chaotic, and downright confusing! Formatting gives our data structure and makes it readable, just like adding punctuation to a long sentence! Let’s embrace the beauty of well-formatted data! 😌

Using String Formatting

String formatting is where the magic happens! ✨ Let’s explore how we can jazz up our output using f-strings and the trusty format() method.

String interpolation with f-strings

Ah, f-strings, the rockstars of string formatting! 💫 These little gems let us embed expressions inside string literals , making our lives easier and our code more readable! Say goodbye to clunky concatenations and hello to sleek f-strings! 🚀

Formatting with the format() method

For those who like a classic touch, the format() method is here to save the day! 🎩 This method gives us more control over how we format our strings, allowing us to customize our output like a boss! Say hello to flexibility and elegance with the format() method! 💃

Formatting Numeric Data

Numbers need love too! Let’s dive into the world of formatting numeric data – from precision to alignment, we’ve got you covered! 🔢

Specifying precision for floating-point numbers

Floats can be a handful, especially with all those decimal places! Fear not, for we can tame them with precision formatting! Let’s keep those floats in line and show only what truly matters! 🎯

Aligning numbers in columns

Ever wanted your numbers to line up perfectly in neat columns? Well, the alignment gods have heard you! With a few tricks up our sleeves, we can align numbers to create visually pleasing outputs! Say hello to organized data! 📊

Handling Multiple Data Types

Mixing and matching data types can be a challenge, but fear not, for Python is here to rescue the day! Let’s explore how we can print different data types together harmoniously.

Printing different data types in the same line

String, int, float – a trio that might sound like a disaster waiting to happen, but with Python, it’s a piece of cake! Let’s see how we can print diverse data types together without causing a ruckus! 🎭

Converting data types for proper printing

Sometimes data types just don’t get along, but with a little nudge in the right direction, we can make them play nice! Let’s dive into type conversions to ensure our data prints smoothly and without drama! 🎭

Advanced Techniques

Now that we’ve covered the basics, it’s time to level up our print formatting game! Let’s explore some advanced techniques to make our print statements pop!

Using escape characters in print statements

Escape characters are like secret codes that add flair to our output! 🕵️‍♀️ From new lines to tabs, these characters help us control the appearance of our text. Let’s sprinkle some escape character magic in our print statements and watch them come to life! ✨

Customizing print output with sep and end parameters

Ah, the sep and end parameters – our hidden weapons in the battle for custom print output! 💥 Want to separate values by a specific character or end your print statement differently? These parameters have got your back! Let’s unleash their power and add that personal touch to our prints! 🤹‍♀️

Overall, Coding Beauties! 🌟

With these print formatting techniques in your arsenal, you’re ready to jazz up your Python output like a pro! From elegant strings to perfectly aligned numbers, the world of print formatting is your oyster! 🌌

Thank you for joining me on this colorful journey through the art of print formatting in Python ! 🎨 Keep coding, keep creating, and remember, style matters even in the world of data! Adios, amigos! 🚀👩‍💻

Efficient Data Presentation: Using Print Format in Python

Program Code – Efficient Data Presentation: Using Print Format in Python

Code Output:

Code Explanation: The program starts by defining a dictionary named data_info containing sample data. This includs a name, age, profession, and a list of programming languages .

The display_header function prints out a simple header to make the output clearer and more organized, an often overlooked but crucial aspect in data presentation.

In display_data , multiple types of string formatting methods in Python for presenting the data are showcased. This progression effectively demonstrates how Python’s print formatting options have evolved, making code more understandable and maintanable.

  • Simple Display : Uses the most straightforward form of printing data, directly passing variables to the print function. While easy, it lacks flexibility in more complex scenarios.
  • String Concatenation : Shows another simple method, concatenating strings with the + operator. It’s a step up in terms of flexibility but requires explicit type conversion and can be error-prone with larger texts.
  • Old % Method : An older string formatting method using % operators. It’s more concise than concatenation but less readable and flexible compared to newer methods.
  • Format Method : A more modern and versatile method introduced before Python 3.6. It greatly enhances readability and formatting options.
  • F-strings : The latest addition (from Python 3.6 onwards), making string formatting even more succinct and readable. F-strings support direct embedding of expressions within string literals, streamlining code.

The program not only illustrates different techniques for presenting data but also serves as a mini-tutorial on the evolution of print formatting in Python. Adopting the most current methods (like f-strings) can significantly reduce code verbosity, boost readability, and hence, maintanability.

Frequently Asked Questions (F&Q) on Efficient Data Presentation: Using Print Format in Python

  • What is print formatting in Python and why is it important?
  • How do I format strings and variables when using the print function in Python?
  • Can you provide examples of different ways to format output using print in Python?
  • Are there any specific formatting options available for numerical data in Python print statements?
  • How can I align text and numbers using print formatting in Python?
  • What are f-strings and how do they simplify print formatting in Python?
  • Are there any best practices to follow when it comes to choosing the right print formatting method in Python?
  • Can print formatting be customized for specific data types like dates, currencies, or percentages?
  • How does print formatting help improve the readability and aesthetics of output in Python programs?
  • Are there any performance considerations to keep in mind when using different print formatting techniques in Python?

You Might Also Like

Adaptive software development: embracing change in technology projects, python vs. other programming languages: what sets it apart, getting started with the python programming language, the evolution of computer languages: python’s rise to prominence, utilizing python to write data to a file: techniques and tips.

Avatar photo

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Latest Posts

codewithc 61 1 Unveiling the Ultimate Cyber Risk Assessment Model Project for Critical Information Infrastructures

Unveiling the Ultimate Cyber Risk Assessment Model Project for Critical Information Infrastructures

74 Empowering Students: Deep Learning Project for Personalized Affective Feedback to Address Frustration in IT's

Empowering Students: Deep Learning Project for Personalized Affective Feedback to Address Frustration in IT’s

81 Tailored Deep Learning Project: Personalized Affective Feedback to Address Students Frustration in ITS

Tailored Deep Learning Project: Personalized Affective Feedback to Address Students Frustration in ITS

93 Top Machine Learning Projects in Python with Source Code for Your Next Project

Top Machine Learning Projects in Python with Source Code for Your Next Project

86 Machine Learning Projects for Final Year with Source Code: A Comprehensive Guide to Success

Machine Learning Projects for Final Year with Source Code: A Comprehensive Guide to Success

Privacy overview.

en_US

Sign in to your account

Username or Email Address

Remember Me

Browse Course Material

Course info, instructors.

  • Dr. Ana Bell
  • Prof. Eric Grimson
  • Prof. John Guttag

Departments

  • Electrical Engineering and Computer Science

As Taught In

  • Algorithms and Data Structures
  • Programming Languages

Learning Resource Types

Introduction to computer science and programming in python, lecture slides and code.

The slides and code from each lecture are available below.

facebook

You are leaving MIT OpenCourseWare

  • Python Basics
  • Interview Questions
  • Python Quiz
  • Popular Packages
  • Python Projects
  • Practice Python
  • AI With Python
  • Learn Python3
  • Python Automation
  • Python Web Dev
  • DSA with Python
  • Python OOPs
  • Dictionaries
  • Creating and updating PowerPoint Presentations in Python using python - pptx
  • Creating Your Own Python IDE in Python
  • How to generate a documentation using Python?
  • Creating Digital Clock Using Date Shower in Python
  • Convert the .PNG to .GIF and it's vice-versa in Python
  • Generating Beautiful Code Snippets using Python
  • Taking Screenshots using pyscreenshot in Python
  • How to update a plot on same figure during the loop?
  • How to create MS Paint clone with Python and PyGame?
  • Creating Your First Application in Python
  • How to Add Audio to Powerpoint Presentation
  • How to Save PowerPoint Presentations as PDF Files using MS PowerPoint?
  • How to Create a MS PowerPoint Presentation in Java with a Maven Project?
  • How to Edit a Powerpoint Presentation?
  • 10 PowerPoint Presentation Tips to Make More Creative Slideshows
  • What is the purpose of the PPTM file in PowerPoint?
  • Applying Transitions to Slides in MS PowerPoint
  • Layout and Views in Presentation Tool
  • Formatting Text on a Slide in a PPT using Java
  • How to Record Screen using Microsoft PowerPoint?
  • Adding new column to existing DataFrame in Pandas
  • Python map() function
  • Read JSON file using Python
  • How to get column names in Pandas dataframe
  • Taking input in Python
  • Read a file line by line in Python
  • Enumerate() in Python
  • Dictionaries in Python
  • Iterate over a list in Python
  • Different ways to create Pandas Dataframe

Creating and updating PowerPoint Presentations in Python using python – pptx

python-pptx is library used to create/edit a PowerPoint (.pptx) files. This won’t work on MS office 2003 and previous versions.  We can add shapes, paragraphs, texts and slides and much more thing using this library.

Installation: Open the command prompt on your system and write given below command:

Let’s see some of its usage:

Example 1: Creating new PowerPoint file with title and subtitle slide.

Adding title and subtitle to the powerpoint

Example 2: Adding Text-Box in PowerPoint.

Adding text box to the powerpoint

Example 3: PowerPoint (.pptx) file to Text (.txt) file conversion.

data presentation in python

Example 4: Inserting image into the PowerPoint file.

Adding images to the powerpoint

Example 5: Adding Charts to the PowerPoint file.

Adding charts to the powerpoint

Example 6: Adding tables to the PowerPoint file.

Adding table to the powerpoint

Please Login to comment...

Similar reads.

  • python-modules
  • python-utility

advertisewithusBannerImg

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

data-presentation

Here are 26 public repositories matching this topic..., datopian / datahub.

🌀 Rapidly build rich data portals using a modern frontend framework

  • Updated May 9, 2024

whythawk / data-as-a-science

Lesson guide and textbook for "Data as a Science" course.

  • Updated Jun 5, 2021
  • Jupyter Notebook

datopian / portal.js.bak

🌀 The JS data presentation framework. For a single dataset to a full catalog.

  • Updated Jan 7, 2023

wp-xyz / Edimax

PC application to communicate with an Edimax SP2101W smart plug and to collect measured power data from it

  • Updated Jul 30, 2023

Daviedavie100 / Pisa2012_data_analysis

Udacity Nanodegree Program, a data analysis project using the Pisa2012 dataset.

  • Updated Feb 29, 2024

Andrey123815 / kasperski_testing_2022

📑 SPA application for convenient presentation and work with large arrays of data, sorting, search, editing data

  • Updated Nov 11, 2022

TelRich / Communicate_Data_Findings

I chose a dataset from kaggle and performed an EDA, over 30 insights (visualization) was produced. The key insights is presented in a Jupyter Notebook slide..

  • Updated Sep 19, 2022

Papicodea / Prospers-Loan_data

This data contains 113,937 loans with 81 variables on each loan, I have provided 30+ visualizations of a selected few variable features that best explain and give insights to how loans are measured and to what factors that determine a loan interest

  • Updated Nov 16, 2022

DevExpress-Examples / winforms-grid-use-layoutview-as-master-view

Use the Layout View as a master View in master-detail mode.

  • Updated Dec 6, 2023
  • Visual Basic .NET

DevExpress-Examples / winforms-grid-master-detail-entity-framework

Visualize many-to-many relationship of Entity Framework objects.

  • Updated Oct 13, 2023

vithika-karan / KPMG-Virtual-Internship

KPMG Data Analytics Consulting Virtual Internship

  • Updated Dec 31, 2021

DevExpress-Examples / winforms-grid-show-text-along-with-progress-bar-in-print-preview

Show progress text along with a progress bar in the exported document.

  • Updated May 1, 2024

WiktorSusfal / football_tactics_viewer

Python app to visualize json data describing the course of a particular football match

  • Updated Sep 13, 2022

karthickr7 / KPMG-Virtual-Internship

Data Analysis of Sprocektly Central Private Limited

  • Updated Oct 14, 2023

ErolGelbul / movies_data

Analysing and presenting movie dataset.

  • Updated Apr 6, 2023

Mupaose / Excel

Data Analysis using Excel. This is part of my learning journey while also building my portfolio

  • Updated Feb 17, 2024

quzeem91 / Communicate_Data_Findings-Prosper-Loan-Dataset-Exploration

This repo seeks to Explore The Variables That are Pivotal In Predicting a Borrowers Loan outcome.

  • Updated Jul 14, 2023

gregschlitt / Google_Data_Analytics_Capstone-Cyclistic_Bike_Share

This repository contains Google Data Analytics Professional Certificate course's capstone project from Coursera.

  • Updated Oct 5, 2023

oshinrathor / datSci

Dive into my Data Science Projects Repository, featuring a Spam SMS Classifier, NIA Dashboard, H1N1 Vaccine Prediction, and NYC Taxi Fare Prediction. Each project showcases my skills in data cleaning, exploratory analysis, modeling, and visualization, offering valuable insights and methodologies for data enthusiasts and practitioners.

  • Updated May 2, 2024

DevExpress-Examples / winforms-grid-display-custom-rows

Display custom rows in the WinForms Data Grid control.

  • Updated Sep 28, 2023

Improve this page

Add a description, image, and links to the data-presentation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-presentation topic, visit your repo's landing page and select "manage topics."

IMAGES

  1. Data Visualization using Python

    data presentation in python

  2. Data visualization using python

    data presentation in python

  3. Python Tutorial: Nail PYTHON DATA TYPES in 5 min. Datatype trick

    data presentation in python

  4. Python Data Structures and Algorithms: A Beginner's Guide

    data presentation in python

  5. PPT

    data presentation in python

  6. Variables & Data Types In Python

    data presentation in python

VIDEO

  1. presentation.16025876 Python Based Classroom Management System

  2. Data analysis with python: Understanding the Data

  3. Python for Data Engineers-Presentation Part -1

  4. Python Project Presentation 👩‍💻 #python #programming #chuechuevlogs

  5. PYTHON PROGRAMMING (SESSION 4)#python #programmer #programming#developer#pythondeveloper

  6. PYTHON IDENTIFIERS AND COMMENTS (PY 3)#python #machinelearning #coding #programmer #developer

COMMENTS

  1. Introduction to Data Visualization in Python

    5. Figure 1: Photo by Lukas Blazek on Unsplash. Data visualization is the discipline of trying to understand data by placing it in a visual context so that patterns, trends and correlations that might not otherwise be detected can be exposed. Python offers multiple great graphing libraries that come packed with lots of different features.

  2. Data Visualization with Python

    Matplotlib is an easy-to-use, low-level data visualization library that is built on NumPy arrays. It consists of various plots like scatter plot, line plot, histogram, etc. Matplotlib provides a lot of flexibility. To install this type the below command in the terminal. pip install matplotlib.

  3. Data Visualization in Python: Overview, Libraries & Graphs

    Data Visualization in Python. Python offers several plotting libraries, namely Matplotlib, Seaborn and many other such data visualization packages with different features for creating informative, customized, and appealing plots to present data in the most simple and effective way. Figure 1: Data visualization.

  4. Python Pandas Tutorial: A Complete Introduction for Beginners

    Author: George McIntire Data Scientist. Author: Brendan Martin Founder of LearnDataSci. Author: Lauren Washington Lead Data Scientist & ML Developer. Python Pandas Tutorial: A Complete Introduction for Beginners. Learn some of the most important pandas features for exploring, cleaning, transforming, visualizing, and learning from data.

  5. Using Python for Data Analysis

    When you need to analyze data, Python's pandas library is a popular option. To install pandas in a Jupyter Notebook, ... You could do this using a report or presentation. You'll likely discuss your data sources and analysis methodology before stating your conclusions. Having the data and methodology behind your conclusions gives them authority.

  6. PDF Data Visualization

    Con: Visual presentation tends to be simple compared to other tools. Matplotlib - Installation Installing Matplotlib should be straightforward. Sample code for installing packages: ... Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

  7. Data Visualization With Python (Learning Path)

    Applied Data Visualization. In this final section, apply your data visualization skills in Python on real world tasks. Learn to build interactive web applications with Dash, and interactive web maps using Folium. Then, explore the creative side of data visualization by drawing the Mandelbrot set, a famous fractal, using Matplotlib and Pillow.

  8. Data Visualization using Python (Matplotlib and Seaborn)

    Data Visualization using Python - Matplotlib and Seaborn. Python has emerged as the most popular programming language in the data science community. In this workshop, we will go over the basics of Data Visualization using Python. ... This is the presentation that was used in the workshop. Colab Notebook. Link to colab notebook used during the ...

  9. Create interactive slides with Python in 8 Jupyter Notebook cells

    Create interactive slides with Python in 8 Jupyter Notebook cells. Creating presentations in Jupyter Notebook is a great alternative to manually updating slides in other presentation creation software. If your data changes, you just re-execute the cell and slide chart is updated. Jupyter Notebook is using Reveal.js ...

  10. Using pandas and Python to Explore Your Dataset

    You can see how much data nba contains: Python. >>> len(nba) 126314 >>> nba.shape (126314, 23) You use the Python built-in function len() to determine the number of rows. You also use the .shape attribute of the DataFrame to see its dimensionality. The result is a tuple containing the number of rows and columns.

  11. Working with Presentations

    Opening a presentation ¶. The simplest way to get started is to open a new presentation without specifying a file to open: from pptx import Presentation prs = Presentation() prs.save('test.pptx') This creates a new presentation from the built-in default template and saves it unchanged to a file named 'test.pptx'. A couple things to note:

  12. Creating Powerpoint Presentations with Python

    Here is the start of the function that we use to create our output PowerPoint: def create_ppt(input, output, report_data, chart): """ Take the input powerpoint file and use it as the template for the output file. """ prs = Presentation(input) # Use the output from analyze_ppt to understand which layouts and placeholders # to use # Create a ...

  13. How to Create PowerPoint presentation from Excel Data with Python

    We'll learn how to automate a PowerPoint presentation from Excel data using Python. Below picture illustrates what we'll do in this project: excel_data_to_pptx_illustration The input for this project is an excel file with 2 sheets: First sheet name: KPI, that have 7 columns {Start Time, NE Name, Cell, RRC Connected User, Cell Traffic Volume DL(Gbits),…

  14. Python and Pandas for Data Engineering

    In this first course of the Python, Bash and SQL Essentials for Data Engineering Specialization, you will learn how to set up a version-controlled Python working environment which can utilize third party libraries. You will learn to use Python and the powerful Pandas library for data analysis and manipulation. Additionally, you will also be ...

  15. Efficient Data Presentation: Using Print Format in Python

    This progression effectively demonstrates how Python's print formatting options have evolved, making code more understandable and maintanable. Simple Display: Uses the most straightforward form of printing data, directly passing variables to the print function. While easy, it lacks flexibility in more complex scenarios.

  16. Lecture Slides and Code

    Python Classes and Inheritance Slides for Lecture 9 (PDF - 1.6MB) Code for Lecture 9 (PY) 10 Understanding Program Efficiency, Part 1 Slides for Lecture 10 (PDF) Code for Lecture 10 (PY) 11 Understanding Program Efficiency, Part 2 Slides for Lecture 11 (PDF) Code for Lecture 11 (PY) 12 Searching and Sorting Slides for Lecture 12 (PDF - 2.4MB)

  17. Creating and updating PowerPoint Presentations in Python using python

    pip install python-pptx. Let's see some of its usage: Example 1: Creating new PowerPoint file with title and subtitle slide. Python3. from pptx import Presentation . root = Presentation() first_slide_layout = root.slide_layouts[0] . 0 -> title and subtitle. 5 -> Title only .

  18. Working with tables

    Inserting a table. A table is inserted into the placeholder by calling its insert_table() method and providing the desired number of rows and columns: >>> shape = table_placeholder.insert_table(rows=3, cols=4) The return value is a GraphicFrame shape containing the new table, not the table object itself.

  19. Read From PowerPoint Table in Python?

    Text in the presentation outside of tables is omitted, but you can modify my code to capture text from non-table objects as well. import pptx as pptx. from pptx import *. def get_tables_from_presentation(pres): """. The input parameter `pres` should receive. an object returned by `pptx.Presentation()`.

  20. data-presentation · GitHub Topics · GitHub

    Python app to visualize json data describing the course of a particular football match. gui-application data-parsing data-presentation Updated Sep 13, 2022; ... To associate your repository with the data-presentation topic, visit your repo's landing page and select "manage topics." Learn more Footer