The Trends of the Software Engineering Field in 2021

I’m interested in the trends in the software engineering field. Here are three basic questions.

Which Language will most people desire in 2021?

Which Database will most people desire in 2021?

Which Platform will most people desire in 2021?

The dataset I will use in this article for the answers from StackOverflow.

Overview of the Dataset

The name of the dataset is Stack Overflow Annual Developer Survey 2020. With nearly 65,000 responses fielded from over 180 countries and dependent territories, the 2020 Annual Developer Survey examines all aspects of the developer experience from career satisfaction and job search to education and opinions on open-source software. Each row represents the answer of a person.

I downloaded the dataset and put it in the data folder. The below code is used to import packages and the dataset to JupyterNodebook.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('./data/survey_results_public.csv')
num_rows = df.shape[0] #Provide the number of rows in the dataset
num_col = df.shape[1] #Provide the number of columns in the dataset
print("rows=",num_rows, "cols=",num_cols)
print("columns", df.columns.tolist())

The dataset has 61 columns and 64461 rows. There are three columns that could help me find out answers.

‘LanguageDesireNextYear’: which programming, scripting, and markup languages do you want to work in over the next year?

‘DatabaseDesireNextYear’: which database environments do you want to work in over the next year?

’PlatformDesireNextYear’: which platforms do you want to work in over the next year?

Data Preparation

First, let’s see what is the “LanguageDesireNextYear”’s values. Below code is used to count the values of LanguageDesireNextYear.

desire_L = df.LanguageDesireNextYear.value_counts()#Provide a pandas series of the counts for each value

the top 5 values :

Python 1152
Rust 528
HTML/CSS;JavaScript;TypeScript 499
C# 461
Go 412

The value is a possible combination that should be split. After the splitting, I add up numbers of each language. Here it’s a class. (code below)

class helper():

def __init__(self, column, dataframe):
self.column = column
self.dataframe = dataframe.dropna(subset = [self.column], how = "all")# Drop only rows with missing values in self.column
self.dic = {}

def split_sum_value(self):
self.dataframe.reset_index(drop=True,inplace=True)

for i in range(self.dataframe.shape[0]):

temp_list = self.dataframe[self.column][i].split(";")# split combination value

for j in range(len(temp_list)):
if temp_list[j] in self.dic:
self.dic[temp_list[j]] += 1
else:
self.dic[temp_list[j]] = 1

return self.dic

The dataset is ready, Let’s dive into exploratory analysis.

Exploratory Analysis

I input ‘LanguageDesireNextYear’ as a column to a helper class. Then I create the bar chart that shows the result of the sum.

#creat a helper instance and call the split_sum_value() function
LanguageDesireNextYear_sum = helper("LanguageDesireNextYear", df)
dict_1 = LanguageDesireNextYear_sum.split_sum_value()
dict_1 = dict((sorted(dict_1.items(), key=lambda item: item[1])))
#creat bar chart
plt.bar(range(len(dict_1)), list(dict_1.values()), align='center')
plt.title("The rank of LanguageDesireNextYear")
plt.xticks(range(len(dict_1)), list(dict_1.keys()),rotation='vertical')
plt.tight_layout() #make room for the label.
#save figure
plt.savefig('The_rank_of_LanguageDesireNextYear.png')
plt.show()
Figure 1–1

In Figure 1–1, The top 2 language is Python and JavaScript. It matches my expectation. Python is a language that most people use in the Artificial intelligence field which is hot Technology.

second, I input ‘DatabaseDesireNextYear’ as a column to the helper class and got Figure 1–2 below.

Figure 1–2

The most desire Database is PostgreSQL which is a free and open-source relational database management system. The next two are MongoDB and MySQL. These two are also free and open-source.

Third, I input ‘PlatformDesireNextYear’ as a column to the helper class and got Figure 1–3 below.

Figure 1–3

The plot shows that the top 1 of desire Platform is Linux.

Conclusion

The dataset of StackOverFlow is very useful. I found out the answers and shows some visualizations.

Which Language will most people desire in 2021? answer: Python

Which Database will most people desire in 2021? answer: PostgreSQL

Which Platform will most people desire in 2021? answer:Linux

--

--

--

Data Scientist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How To Make Anime Move Like Your Head

Good and bad flexibility in code

Using Cloud Run as a webhook for Actions on Google

Monitoring Camel-K applications on Openshift using the Fuse Console — Part 2 (Kamelets)

Day 14 — Animating Sprites in Unity

Timeline Unity: Adding a Panning Effect to a Virtual Camera

An Agile capped min/max-per-item pricing model to share risks

CS373 Spring 2022: Joriann Bassi

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yangfan

Yangfan

Data Scientist

More from Medium

You Know Your Slogan is Important — But Do You Know How to Write One? — Samantha Morris

Reflecting on Adobe Summit 2022

5 ways to improve your Return On Ad Spend (ROAS)

What is return on ad spend

An all-powerful client portal with SuperOkay