How Do YOU Become a AirBnB Host?

A data based approach using Kaggle’s Boston AirBnB Dataset from 2017

Laila Shahreen
8 min readDec 23, 2020
Boston City Center

When I first heard about AirBnB, I was really excited about exploring the unbound adventure and experiences it brings when you rent your dream place to stay. Be it a mountain house with peacefulness or a profound serenity near a lake or an active, cheerful and festive touch of a lively city. Woohoo! I am always thrilled to travel!

While being a guest using AirBnb is fun, but how about being a host? Isn’t it more exciting? Or does that bring more responsibility and management tasks. Well, here is my viewpoint with Boston AirBnB dataset analysis.

Boston AirBnb dataset from here is a great resource to be familiar with airbnb rental business for future hosts and also guests. If you ever plan to visit Boston and enjoy nearby places and attractions or make a single night stop for your personal business, do NOT hesitate to do it.

The dataset has three files with listings, reviews and calendar . Listing dataset includes more than 100 descriptions of amenities, location, host type, property type, and price for each listing.This dataset is used for answering two main questions. Reviews file contains comments from guests while the calendar file contains records of availability of each listing.

Let’s jump into real business now.

Business Understanding

Whether you want to establish yourself as a host or not, few important things are always helpful to keep in mind. AirBnB location pricing largely not only depends on supply demand cycle but also amenities and most importantly on location, property type etc. Can you prove that by our analysis here or could you predict the optimized price before you make your place as a listing? Also there are legality issues if you can use your property or not.

To serve your quest, let’s answer the three questions and become a superhost.

Part I : What factors affect the price of an AirBnB rental? Let’s find out!

After data assessment, cleaning and analyzing univariate, bivariate and multivariate features for predicting price, the following visuals are obtained with important findings.

Figure 1: Price distribution against property type
Figure 2: Price distribution against number of beds

What the correlation heat-map below indicates is that price is moderately correlated by number of beds, bedrooms and bathrooms along with guests included. Other features of our interest play negatively with price but those are not significant.

Figure 3: Correlation depiction via Heat-map among variables

Now time for modeling!

Modeling

Other categorical variables such as amenities, property type, bed type, reviews, neighborhood are missing from the above heat-map those are presented in the below plot obtained through modeling.

Since we have explored many features and their effects on pricing, we can model the pricing also based on our features of interest(FOI) that are instrumental according to our opinion and primary EDA. The FOIs are some of the columns form the original dataset along with some of the new columns with categorical values created from the amenities.

Of course this step may not bring the most accurate pricing model but we can start with our selected features anyway.It is important to note that we strictly circled our model features to property type, amenities, location, and we did not take into account many other crucial features such as peak seasons and or other pricing and host type, to name a few.

However, our model was still able to account for a 2/3rd of price variations.The R squared value of our model is 0.64. It means that a full 64% of the variation of property price is explained by our model.

Figure 4: How much relative price change along with your features?

There is no doubt that relative price is mostly driven by location and the closer the neighborhood is to the Boston City center, the more expensive it is. It is surely backed by the fact that most attractions, activities and business centers revolve around the city.

So, the minute you want to save by staying closer to the city center makes sense. Even if you want to spend some extra dollars for that. Main findings from these price modeling are described below.

Neighborhood matters. Leather District shows that you can ask for an extra 100 dollar per night stay. On the other hand, a quiet residential area, far from a city with decent public transportation and other facilities, Jamaica Plain could be a very good deal for you.

To name other influential features, property types such as villa, guesthouse, boat(which is quite exotic and sounds premium) may offer unique experiences with additional bucks.

Number of bedrooms plays an important role in figuring out how you can set your price for sure. For additional bedrooms, you can ask for 39.2 dollars more while each additional bathroom will cost extra 8.04 dollars.

Number of beds does not add that much cost.Bed Type can change the pricing though .Pull out sofa, Futon, Couch can save a guest up to 53 dollars more than average price.

Room type private or shared doesn’t matter. Pricing is on the lower side for those two features.

Cancellation policy also impacts on pricing. If moderate, a slight positive change can occur in pricing. If strict, guests get the benefit of saving a few bucks.

Now look at the amenities. Wheelchair access,Washer, Wireless Intercom,Elevator in the building, Gym,Indoor Fireplace, TV, Breakfast,Cable TV, Dogs are top amenities for those prices going up. It is great to have accessibility for everyone.

Apparently the number of reviews and review score ratings do not play a significant role in pricing which is a bit surprising.

Although we don’t have more than 90% accuracy for this model, it could be tuned if we increase the complexity of it by adding other features. However, the model speaks on many effective features and for basic guidance we can use this to set a price for a particular listing.

Part II:Can You Describe the Vibe of Each Boston Neighborhood Using Listing Descriptions?

Here comes our second part where neighborhood analysis at several steps determines how pricing varies with locations. We first built graphical representation of mean pricing against neighborhood.

Figure 5: Which neighborhood is pricy?

Findings:Bay Village,Lather District, South Boston Waterfront are top three locations according to average pricing which ranges between USD 250–260.Then Downtown,Chinatown, Back Bay are next three in line and have similar range of average price. Let’s explore what amenities or other features influence these areas’ listing and popularity.

In this part of our answer development, we checked neighborhood_overview column and converted the text as word cloud. This technique is really useful with prime magnification of powerful words.

Obviously two price zones offer different vibes. For lower price Jamaica Plain could be considered as a popular place to choose.This is also evident in our previous analysis of price vs total number of reviews_scores_rating.

World cloud is actually a beneficial tool to figure out prominent use of phases that are mostly highlighted among any comment(s) or description. This can tell you the vibe of neighborhood whether it is diverse or lively or quiet or busy.

Figure 6: Word cloud for lower price group

Jamaica plain has a diverse range of listing availability and affordable pricing. This area is also safe and the community has a well balanced cultural blend. From both word clouds, restaurant is an extensively used phrase to attract travelers and that is quite understandable. You may have to advertise for your listing to get customers and better earnings too.

For higher price world cloud, definitely Bay Village is expensive along with Fenway Park, South End, Back Bay and other areas. This is again proved from this cloud. Another important thing is that types of listing are mostly apartments whereas in lower price zones it is a house.

Both neighborhood groups offer facilities within walking also. Be it public transportation or shopping or restaurants.

Figure 7: Word cloud for higher price group

Part III:What are the busiest times of the year to visit Boston? By how much do prices spike?

In order to check how the AirBnB business and popularity among guests have grown over time with available data, we used a data frame created from calendar.csv file. We plotted the cumulative number of house available on a particular date and average price for date for the year of 2016–2017.

Figure 8: Timeline of availability of properties and pricing trend

Findings

Rental availability is not stable all year round. From September to November, availability increases .It remains relatively stable from 01/2017–09/2017, although the reason why there are two sudden drops of home supply is unclear.

Price drops as home supply increases from 09/2016 to 12/2016. The sudden drop of supply in 03/2017 does not drive price up.However the sudden drop of supply before 05/2017 just rocket price up. So I guess it has something to do with demand change.

There seems to have a small periodical price moving circle and this may correspond to weekends.

The sudden drop of supply and spike in price may indicate a special event or any famous and popular activity going on around. This might be a reason for that. Let’s take a closer look at the date.

average_price_sub = average_price[average_price.index > ‘2017–03–1’]

average_price_sub[average_price_sub.average_prices == max(average_price_sub.average_prices)]

average_prices date 2017–04–15235.501618

After googling, I found that on that particular date Boston Marathon was held which certainly did draw a lot of tourists in town. So, no surprise why the price went up.

Conclusions

Since this is a large dataset, it offers greater scope to answer a lot more questions than we did here in our analysis. But since I could predict pricing, understand the neighborhood well along with the airbnb business trend(demand and supply for a year), we could call it a finish line here. The main findings includes:

  1. Neighborhood matters most.While neighborhoods near Downtown are easy to access and a short walk to commute line(public transport), those are most wanted and also at the same time localities a bit far from city center brings quiet but active and diverse communities with lower price.
  2. Number of bedrooms and property types are also important for setting affordable prices.
  3. Availability is not stable all year round. September to November is a varying season. From January to September it remains pretty stable.
  4. Few amenities such as wheelchair accessibility, washer dryer, TV, elevator add value to the property and pricing.
  5. Number of reviews and review score rating don’t affect that much which is quite surprising.
  6. The r squared value is 0.64 which explains the price variability of 64% by our linear model which is not that bad. But there is still more room for improvement for better prediction.

With the above insights, I am pretty comfortable to be an Airbnb host . Are you? I hope you get a little bit of inspiration to do so!

References

The libraries used, dataset, and detailed code breakdown are available on my GitHub:https://github.com/lailashahreen17/Boston-AirBnB-Data-Exploration-and-Modeling

Other inspiration came from the followings:

  1. https://www.geeksforgeeks.org/generating-word-cloud-python/
  2. https://www.kaggle.com/residentmario/modeling-prices
  3. https://stackoverflow.com/questions/38516481/trying-to-remove-commas-and-dollars-signs-with-pandas-in-python

--

--