Tag Archives: R

Layertech Leads Data Scraping Session using R and R Studio

Supported by Hivos, Layertech lead a data scraping session using R and R Studio, using official Philippine Procurement Datasets found in Philgeps.gov.ph’s Open Data Portal.

The session attendees were faculty members of Bicol University College of Science IT Department, and faculty and graduating students of Southern Luzon Technological College Foundation, Inc.

While procurement datasets are available in open data formats, the large PhilGEPS datasets still need to be pre-processed, filtered, visualized and analyzed. Only then can researchers, advocates, and concerned citizens draw useful insights to aid them in their advocacy and decision making.

What’s the objective?

Layertech and partners advocate for #DataDrivenGovernance. We aim to encourage researchers to study and innovate on government procurement. For them to do that, they must be equipped with the necessary skills and tools to draw insights from procurement information that is available.

For this first session, the academe was specifically invited because the team also needs to get their insights, comments and suggestions about the training design, in order to improve the next tech training sessions to come.

The team and partners will soon be deploying training sessions for young researchers, students, innovators, and faculty, on various topics such as:

Data Science
Data Analytics
Data Visualization
Python Programming
Machine Learning
Cybersecurity
Data Privacy

What are the results?

For the first training session with R, the participants were able to publish a total of 13 datasets (downloadable HERE, in our Open Data Portal) under various categories such as Health, Local Government, and Education.

The participants also actively presented their outputs and how these cleaned datasets can help increase transparency and efficiency in public procurement in the Philippines.

Want to know more about HOW to filter Philgeps datasets? Here’s a quick, general guide HERE.

Keep on visiting our “References” section for more procurement and data scraping, visualization, and analytics guides!

How to filter PhilGEPS data with R and R Studio

PRE-REQUISITE: You must have R and R Studio BOTH installed in your computer. If you don’t, download the installers on the following links and install them to your computer!

R - https://cran.stat.upd.edu.ph
R Studio - https://www.rstudio.com/

Once your have R and R Studio both up and running, we can now proceed to the filtering!

STEP 1 – Go to www.PHILGEPS.GOV.PH and go to the “Open Data” Section. Try downloading the excel files they have over there. We often use the datasets of the Invitations to Bid and Notices of Award.

STEP 2 – The files are in XLSX format. We prefer to export them to CSV because its easier to ingest and because csv is #OPENData format. So yeah. ALSO! Make sure that the rows do not have blanks on the top. The top row will automatically become the ‘header’ once it is ingested in R Studio so make sure the top row is the row that contains the column labels (except if you you key in more lines of course, so let’s keep it simple).

STEP 3 – Ingest the CSV file in RStudio as a dataframe. Normally, we do the following:

DATA_FRAME_NAME = read.csv("PATH/TO/FILE.csv") 

STEP 4 – Now that you have a dataframe in R, you can now perform basic operations on it. For example, we normally filter the name of the agency that we are interested in. For example we do something like:

DATA_FRAME_NAME_NEW <- subset(DATA_FRAME_NAME, ColumnName==”ParameterHere”)
example:

JUL_SEP_2018_sub <- subset(JUL_SEP_2018, Organization.Name=="DEPARTMENT OF HEALTH - REGIONAL OFFICE V")

We now have a NEW Dataframe, with only the data from Department of Health Regional Office 5. 🙂 But remember! The stings are case sensitive so you have to make sure that you are inputting the correct name. You can do a quick search in the raw table and copy-paste the parameter just to be sure 🙂

You can perform several other operations on the dataframe! You can remove columns, count occurrences, join two or more dataframes, and more. To find out more operations, you can google for “R Cheatsheets” for commands and examples.

STEP 5 – If you are satisfied with your final dataframe, we request that you save it as a CSV file through the following commands:

write.csv(DATA_FRAME_NAME, file=”PREFERRED_FILE_NAME.csv”

This is because, chances are, some other researcher, or concerned citizen (who isn’t familiar with R) would need the dataset you just made. Sharing is Caring! 🙂

STEP 6 – Finally, you can share your cleaned datasets in our repository! We will make sure to credit you with your preferred name/nickname. Send us your dataset and we will upload it for you (for now. We are working on a way so that you can upload by yourself :) )

All data uploaded will be available as open datasets. They are free for all, so that we can encourage more and more researchers, advocates, to use data and be data-driven in their decision making.

Thank you very much! For more information, kindly email support@layertechlab.com. From time to time, we conduct hands-on trainings! Please let us know if you are interested!