Datasets
Access to Venture Capital
In order to test whether Start-Up had an impact upon acccess to venture capital since its founding in 2010, I decided to longitudinally measure venture capital activity from 2000 to 2019. I also decdied to compile this data for Latin America as well to conduct some comparative analyses as well. Therefore, I needed to compile a dataset of venture capital transactions made in Chile and Latin America in this timeframe. A venture capital transaction consists of when a private or public entity decided to fund an early stage start-up. From the different databases made accessible to Harvard afilliates, I decided to use the one that returned the most exhaustive results for venture capital transactions in Chile and Latin America, which was Standard & Poor's CapitalIQ database (see Figure 1).
Upon identifying the database, I began to use CapitalIQ's screenbuilder for "Venture Capital Private Placement Transactions" (see Figure 2). In order to obtain the data I needed, the only two filters I needed to add were: Geographic Location of Company and Date of Transaction. For geographic locations, I filtered for only countries that belong to Latin America. For date of transactions, I filtered for all transactions that were made on or after 1/1/2000.
Once I built the screen for the data I needed, I exported the results to an excel file (see Figure 3). The dataset consisted 3,868 unique transactions in Latin America, and had 17 different columns for each transaction, such as date, transaction value (in $USD mm) and geographic location. It would have been interesting to run an analysis based on the gross amount of venture capital invested in Chile overtime, but many of the transaction values were missing in the dataset. I assume this is due to the often confidential and competitive nature of start-ups and the venture capital industry.
Upon exploring my dataset in Microsoft Excel, I began to aggregate the amount of transactions conducted by year in a separate spreadsheet (see Figure 4 and Gist below). For example, I used a formula such as, "=SUMIFS('LatAm VC Data Raw'!$I$9:$I$3876,'LatAm VC Data Raw'!$B$9:$B$3876,'LatAm VC Data Aggregates'!$A2)" to weave in information from sheets holding my raw data and sum the data that pertained specifically to the variable I was measuring in a given year. This particular line of code refers to the formula I used to calculate the first value listed in the gist below, which is the amount of venture capital invested in Latin America in 2000. Utilizing the COUNTIFS and similar SUMIFS functions, I was able to aggregate the number of venture capital transactions and the sum of their total value (in $USD mm) by year, from 2000-2019, for Chile and the rest of Latin America (without Chile). I did not include 2020 because we are still not finished the year, and I did not include Chile in the aggregate of Latin America because I wanted to see if the trends in Chile reflect or disagree with trends seen in the rest of the region. These aggregates are what will be used as inputs for my visualizations made with Tableau and regression analyses conducted using R.
Entrepreneurial Activity
In order to test my second hypothesis that Start-Up Chile positively shifted Chileans' propensity to pursue entrepreneurship, I needed to obtain data about entrepreneurial activity and intentions in Chile over time. In the same line of thinking as my hypothesis, I postulate that the influx of succcesful entrepreneurs into the country through Start-Up Chile resulted in peer effects that led more Chileans to engage in entrepreneurship. In attempt to find data to test this, I came across the Global Entrepreneurship Monitor (GEM), an international consortium based at Babson College that conducts surveys regarding entrepreneurship in over 100 countries around the world.
The GEM website has extensive data collected about Chile and other countries in Latin America, from entrepreneurial behaviors and atittudes to frameworks and conditions for entrepreneurship in a given nation. More specifically, for the purposes of my research, I was interested in two surveys that GEM conducts annually in Chile:
- Entrepreneurial Intentions: Percentage of 18-64 population (individuals involved in any stage of entrepreneurial activity excluded) who are latent entrepreneurs and who intend to start a business within three years
- Total early-stage Entrepreneurial Activity (TEA): Percentage of 18-64 population who are either a nascent entrepreneur or owner-manager of a new business
Utilizing the GEM's custom data table tool (see Figure 5), I was able to compile the data points for these two variables from the years 2002 to 2019. I then exported the custom dataset to Microsoft Excel (see Figure 6 and Gist below). I plan to visualize this data using Tableau, which will I hope will reflect my hypothesis and show positive trends post-2010. Similarly, I plan to analyze this data using R and run a regression that compares the mean percentages of 2002-2009 and 2010-2019.
__
