Tag Archives: Data Cleaning

How to filter PhilGEPS data with R and R Studio

PRE-REQUISITE: You must have R and R Studio BOTH installed in your computer. If you don’t, download the installers on the following links and install them to your computer!

R - https://cran.stat.upd.edu.ph
R Studio - https://www.rstudio.com/

Once your have R and R Studio both up and running, we can now proceed to the filtering!

STEP 1 – Go to www.PHILGEPS.GOV.PH and go to the “Open Data” Section. Try downloading the excel files they have over there. We often use the datasets of the Invitations to Bid and Notices of Award.

STEP 2 – The files are in XLSX format. We prefer to export them to CSV because its easier to ingest and because csv is #OPENData format. So yeah. ALSO! Make sure that the rows do not have blanks on the top. The top row will automatically become the ‘header’ once it is ingested in R Studio so make sure the top row is the row that contains the column labels (except if you you key in more lines of course, so let’s keep it simple).

STEP 3 – Ingest the CSV file in RStudio as a dataframe. Normally, we do the following:

DATA_FRAME_NAME = read.csv("PATH/TO/FILE.csv") 

STEP 4 – Now that you have a dataframe in R, you can now perform basic operations on it. For example, we normally filter the name of the agency that we are interested in. For example we do something like:

DATA_FRAME_NAME_NEW <- subset(DATA_FRAME_NAME, ColumnName==”ParameterHere”)
example:

JUL_SEP_2018_sub <- subset(JUL_SEP_2018, Organization.Name=="DEPARTMENT OF HEALTH - REGIONAL OFFICE V")

We now have a NEW Dataframe, with only the data from Department of Health Regional Office 5. 🙂 But remember! The stings are case sensitive so you have to make sure that you are inputting the correct name. You can do a quick search in the raw table and copy-paste the parameter just to be sure 🙂

You can perform several other operations on the dataframe! You can remove columns, count occurrences, join two or more dataframes, and more. To find out more operations, you can google for “R Cheatsheets” for commands and examples.

STEP 5 – If you are satisfied with your final dataframe, we request that you save it as a CSV file through the following commands:

write.csv(DATA_FRAME_NAME, file=”PREFERRED_FILE_NAME.csv”

This is because, chances are, some other researcher, or concerned citizen (who isn’t familiar with R) would need the dataset you just made. Sharing is Caring! 🙂

STEP 6 – Finally, you can share your cleaned datasets in our repository! We will make sure to credit you with your preferred name/nickname. Send us your dataset and we will upload it for you (for now. We are working on a way so that you can upload by yourself :) )

All data uploaded will be available as open datasets. They are free for all, so that we can encourage more and more researchers, advocates, to use data and be data-driven in their decision making.

Thank you very much! For more information, kindly email support@layertechlab.com. From time to time, we conduct hands-on trainings! Please let us know if you are interested!