This project is about getting insights from Brazil real estate data.
Analysis was conducted using visualizations to uncover trends and patterns.
The following questions guided the project:
- Which regions top the Brazilian real estate market?
- Is there any relationship between property size and prices in Brazil?
- Which states top the Brazilian real estate market?
There were two (2) datasets with similar columns.
Both datasets contained the following columns:
-
property_type:- Type of property, such as "apartment" and "house".
-
state:- The state where the property is located, e.g., "São Paulo" or "Pernambuco."
-
region:- A more localized area or neighborhood within a state, such as "North East", or "South".
-
lat(Latitude):- Geographic coordinate indicating how far north or south the property is located.
-
lon(Longitude):- Geographic coordinate indicating how far east or west the property is located.
-
area_m2:- The size of the property in square meters (m²).
-
price_brl:- The price of the property in Brazilian Real (BRL).
-
price_m2:- The price of the property per square meter, calculated as
price_brl / area_m2.
- The price of the property per square meter, calculated as
The datasets were downloaded to a local repository dedicated to this project.
After library packages were loaded, the datasets were loaded onto separate dataframes.
The first dataset (dn1) was pre-processed by extracting the state record of each row from
the original 'place_with_parent_names' column.
Records in the 'region' column were separated by a space character to enhance readability.
The 'lat' and 'lon' columns were created based on the original 'lat-lon' column.
The original 'price_usd' column which was in 'object' data type was first stripped of special
characters like trailing white spaces, the dollar ($) symbol, and commas (,), with the resulting
information converted to 'float' data type which is appropriate for currency records, and then the
result was further multiplied by the USD/BRL (approximate) exchange rate of 6.20. The product was
rounded to two (2) decimal places.
The only pre-processing conducted on the second dataset (contained in the dataframe 'dn2') was
separating singularized/contracted records with a space for readability.
After the above processes were done. a copy of the dataframes 'dn1' and 'dn2' with select columns were
created as dataframes 'df1' and 'df2', and then concatenated into just one (1) dataframe 'dn'.
The 'dn' dataframe was cleaned by dropping duplicated records.
The following trends were uncovered from the datasets.
It was discovered that Distrito Federal, Minas Gerais, Ceara, Para, and Amapa are
the five (5) states leading the Brazilian real estate market.
The Central West, North, and North East are the regions leading the real estate market.
From a combination of the scatter plot and Correlation matrix, it can be seen that the relationship
between property sizes and property prices is in the positive direction. It is moderately strong at
fifty percent (50%). The prices are concentrated around the lower left region of the scatter plot which
implies that much of the Brazilian real estate properties range from small to medium sizes that are
typically available at low to mid-range prices. This is the range that is relatively affordable.
Of the two (2) property types, apartments are usually more expensive than houses.
States like Amazonas, Distrito Federal, Caeras, Minas Gerais, Pana, and Parana are Brazilian states with
leading real estate markets. This is largely due to their political and socio-economic significance.
They are states with urban centers which means that population is higher in these states. States like
Amazonas, Minas Gerais, Pana, and Caeras are hubs for export, agriculture and mining, among other economic
activities. As these economic activities attract more people, property prices keep going higher.
Places like Distrito Federal and Amazonas are national and regional capitals of Brazil. This reflects the
decentralization strategy of the country. States like these are usually top states with infrastructural
investments which indirectly impact property prices.
The Central West, North, and North East regions are top regions with high real estate markets.
These are regions that were once overlooked, but are now experiencing rapid growth and attracting investments.
With the Central West now a leading Agricultural hub, the North East as an emerging tourist attraction, and
North leading in Industry and mining, property values are increasing to meet the demand of settlers in these regions.
In all, we have discovered that property prices increase with property sizes, and that political and
socio-economic factors influence property values across regions and states in Brazil.
I would like to specially appreciate Mr Dada Dayo for making this project possible.






