You Are What It Says On Your Business Card

476 words • reading time 2 minutes

Human relationships have degraded into transactions – “you help me and I’ll help you” encapsulates how most relationships operate today. Perhaps I’m too young to reminisce on the days when people invested time and energy to learn from one another and share life experiences, but I’ve noticed that people are subconsciously prejudice when they encounter someone they’ve never met before.

Alain De Botton, a philosopher, gives an insightful quick talk about how ‘status anxiety’ has found its way into common society. This was my favorite part of his talk:

Look, as soon as you’ve finished college, what starts to matter is what you do in life. The first question becomes, what do you do? And according to how you answer that question, people are either incredibly pleased to see you or make a run for it. In other words, how clever and nice and friendly and, you know, sympathetic you are ceases to matter in most social occasions. We live in a world surrounded by snobs. What is a snob?—A snob is someone who takes a small part of you and uses that to judge the whole of you. And the dominant snobbery nowadays is job snobbery.

We’ve all been in situations during networking events when someone approaches us, introduces themselves, asks what you do, then makes it obvious through facial and body expressions whether they want to continue to talk to you or find a way to escape the conversation. The fact that people value a person’s life based on such superficial criteria is disappointing.

Even more dangerous, is that status anxiety leads to more people being dishonest in order to gain favor. Someone may slightly exaggerate their job title, or the work they do because they’re slightly ashamed. Dan Ariely mentions in his book – The (Honest) Truth About Dishonesty about how these little lies eventually lead to bigger lies. Just look at the number of executives that have been caught lying about their education (Scott Thompson the ex-CEO of Yahoo comes to mind).

In an ideal world, we’d like our true identity and perceived identity to be line. This is becoming increasingly difficult with the implicit knowledge that we’ll be treated differently depending on what our perceived identity is. There’s a conflict of interest of sorts. We want to be honest and truthful but there’s an incentive to be otherwise. I don’t have a general solution in mind to curb this narrow-minded behavior, since I think the issue of status anxiety is deeply rooted in our culture. I do however, think that on an individual level we can make a more conscious effort to surround ourselves with people who have a lower tendency to lie and exaggerate. And if you ever go to networking events, my best advice is to be yourself and not care what other people think of you.

College Delusion

546 words • reading time 3 minutes

Truth I believe, is one of the most important foundations of living a happy life. Above all else, seek the truth even if it hurts.

Herd mentality is quite common in college. Many people don’t reflect on what they really want do as a career, but blindly follow what other people want. We have this perverted sense of what it’s like to work at company X. We’re not completely at blame here, since companies and specifically recruiters are very good at convincing students to want to work for their company (and rightfully so). Naive students with little experience in the industry get excited when they hear pitches about the endless perks, pay, and prestige of working at company X. These things all matter of course, but what matters even more are the people and the work you’ll be doing. The only way to find this out is to talk to people who work at the company or who’ve worked there that you trust. These people are most likely to give you honest feedback as opposed to another pitch.

Students come out of college with the expectation that they learned something and that they’ve gained enough independence to make their own decisions in life. The problem is that many colleges are progressively getting worse and worse at providing affordable education and a lot of students aren’t coming out with a strong sense of what they really want to do. Young people decide to strive for Wall Street or become an entrepreneur because they see the end goal – prestige and money. What they don’t see is everything before. They fail to realize that becoming an investment banker actually means working long hours doing work that in reality, no normal human being can enjoy. Some argue that after 2-3 years of toiling away that they’ll be able to relax with a great salary to feed more than 10,000 people living poverty, but the question then becomes, “now what?”. You’re in your late 20s or early 30s, you’ve spent 2-3 years plus school striving for prestige and pay (don’t lie), and you’re probably married with kids. If you decide to do something else, there’s already people ahead of you who actually decided to pursue a career earlier on that aligns with their interests and values. People in general but moreso younger people don’t have the hindsight and the perspective to think about the future.

The regular parties at college are a symptom that students to some extent are seeking an escape from the real world. Sure, going out and socializing is important for our mental well-being, but when partying becomes something regular, something that you look forward to, then it becomes a problem. There’s enough things in life to make you naturally happy when you’re sober.

If you made it here, you’ve probably realized this post was just a long rant. I myself don’t have much perspective of the industry, so perhaps I don’t have any credibility to the comments I made, but the point I wanted to make is that we should always think ahead and question whether we’re making the right decision for ourselves. Stop following what other people are doing and try things. That’s the only way to really find out what you love.

News is flawed

501 words • reading time 2 minutes

We live in an era where there’s too much information to be able to make sense of it all. The way news is delivered online is fundamentally flawed. With the ease of sharing and creating content, it’s become increasingly difficult to find quality news. Adding to that, central news outlets aren’t entirely incentivized to produce content that educates and promotes awareness of the most important global events. This will obviously have profound effects on the population, as a country that isn’t properly educated will make poor decisions.

News outlets justified moving online because it’d save money (eliminate printing costs) and because more and more people were coming online. There was obviously one catch though, and it was that they had to monetize differently which led to ad-based revenue models. With this shift, news companies no longer had to produce great content, but only content that generated the most traffic. But doesn’t content that receives a lot of traffic indicative of its importance? Clearly no. Just because an article with Edward Snowden somewhere in its title gets a million social shares it doesn’t necessarily mean that the content itself is great. Sharing a post on Twitter or Facebook is cheap.

Freedom of speech is one of the greatest hallmarks of a modern democracy. The alternative, that is, having a single monopoly delivering news will obviously be worse than what we have now, but there’s a clear need for high-quality and accurate news. Firstly, a news company driven by a clear mission to educate and tying that into a business model that captures value based on how well they deliver on their value proposition is key. A subscription based online-model could potentially be an option, as content creation will be focused on educating as opposed to be simply getting traffic. With a strong readership, the company would be able to sustain itself with higher margins but less readers.

Media companies are at the whim of advertisers (if you watch the show the Newsroom, you’ll get a better idea why). A single daily update on the Zimmerman debacle is all that’s needed as opposed to hourly ‘breaking updates’. Even though people may argue that this news is important, people rarely know what they want and what’s best for them. We’re human, we’re susceptible to manipulation, and as Dan Ariely put it, ‘predictably irrational’. News shouldn’t be entertainment, but sadly, that’s the direction it’s going in.

Fixing this isn’t easy of course. A new company looking to compete with already established ‘Too Big To Fail’-esque news networks will have to attract readers who are already bombarded with link-bait in every direction. I’m an optimist though, confident that news will someday be used properly to help people understand what’s going outside our tiny bubbles.

Shameless plug: A few weeks ago I started a news site that curates content at the intersection of technology, humans, and democracy. These are topics that me and my co-founder have a deep interest in. You can visit us at www.thegrandsignal.com.

MapReduce Basics

516 words • reading time 3 minutes

MapReduce (MR) is one of the fundamental tools in the domain of data analysis. For many people outside of the data community, MapReduce is seen as some voodoo technique that analysts/data scientists (or whatever they’re called today) use for data processing. What I hope to do in this post is go over the basics of MapReduce and its associated implementations. I haven’t had the most experience with MR, but this post is meant to direct you towards further study and testing.

What is MapReduce

MapReduce is a programming model for parallel data processing. The idea is simple – you take large data set in the form of key-value pairs, break it into a separate chunks and distribute them to several servers which run a mapper() function. The mapper function processes each key-value pair and itself outputs a key-value pair, which are then shuffled and distributed to a different set of servers which run a reduce() function. The reduce function will take in a set of key-value pairs and output a single key-value.

If you found that confusing don’t worry! Hopefully a simple example clears things up. Here’s a diagram that gives you an idea of how MapReduce works:

Counting the Frequency of Words

Say you wanted to count the frequency of words in a large text corpus. Let’s say the corpus is terabytes in size, so processing it in one computer would take extremely long. To implement MapReduce, your mapper() and reduce() function would like (in Python) as follows :

def mapper(record):
    # key: document identifier
    # value: document contents
    key = record[0]
    value = record[1]
    words = value.split()
    for w in words:
      mr.emit_intermediate(w, 1)
def reducer(key, list_of_values):
    # key: word
    # value: list of occurrence counts
    total = 0
    for v in list_of_values:
      total += v
    mr.emit((key, total))

Code courtesy of uwescience

Hadoop, Pig, and Hive

MapReduce originated from Google engineers sometime in the early 2000s. It wasn’t until in 2005 that Doug Cutting and Mike Cafarella from Yahoo developed an open-source implementation of MapReduce. 8 years later, Hadoop has become the staple of modern day data processing. Other tools have been developed like Pig which uses a higher-level syntax called Pig Latin (a procedural language) that is also able to run MapReduce programs. More recently, Facebook developed their own tool called Hive which is a SQL like syntax that also implements MapReduce. There are some decent answers on stackoverflow that explain the difference.

Where to go from here

Now that you have a basic understanding of MapReduce, I suggest you read up on the topic more and get your hands dirty by writing some mapper() and reducer() functions. Some other things that will be useful to get acquainted with are NoSQL databases (since MapReduce works by manipulating key-value pairs) and Amazon EMR which allows you to run Pig or Hive on their servers (note: if you’re a student you can apply for a free $100 grant). The awesome people at Yelp have also developed a tool called mrjob for running jobs on Amazon EMR through Python.

Writers Live Forever

463 words • reading time 2 minutes

I realized the other day that most of the books on my bookshelf were written by authors that have passed away – Asimov, Adams, Bradbury, Vonnegut. It doesn’t feel like they’re dead of course, since I can always grab any of their novels and start reading at any moment in time. And oddly enough, I probably know more about these authors than some of the people I work and go to school with. More importantly though, old ideas are the link to our past and the key to our tomorrow. Writers live forever, and the work they leave behind will inspire new minds who will shape our future.

What really fascinates me is the durability of the written word, the fact that the novel in my hands was written fifty to a hundred years ago. Not only that, but much of who I am and my perspective on the world was affected by the novels that I read. New ideas and innovations are the result of understanding previous ideas and innovations. There are many things that I wouldn’t even have considered had I not read it. To quote Larry Summers, “… what came out of considering every argument debating every question, looking at every kind of evidence was a closer approximation to truth. And out of a closer approximation to truth, came better understanding of our world. And out of better understanding of our world, came a better world” 1. Authors take the central arguments and questions during their time and capture that in a novel. The beauty of reading a great novel is that it forces you to think and to question, and by doing so, you develop a better understanding of the world.

Living in an information rich world means that not everything people write about will get read. What gets read by the general public today is mostly controlled by a few hundred media outlets and publishing companies. The problem with this is obvious – people aren’t necessarily considering every argument, question, and evidence on the major issues surrounding our society. Instead of reading important and informative literature, people are reading extremely biased work. The result is an uninformed public, and an uninformed public is one that makes poor decisions.

Regardless of whether we choose to ignore or pay attention to an author, their work will continue to live on, either physically or digitally. The choice falls on us to seek out new ideas (even though they might be hard to find) and push for more informative news. Making the world a better place comes from understanding the problem – the facts, the people involved, the history. Only if we truly comprehend these things will we be able to solve our problems.

Links

1Larry Summers Talk

K-Means Clustering

491 words • reading time 2 minutes

The first unsupervised learning algorithm that I’m going to cover is called k-means clustering. Recall that an unsupervised learning algorithm differs from a supervised learning algorithm in that ‘right’ answers are provided in the training set. Instead of being given a training set of (x, y), we’re only given a training set of x (x can be a vector). The idea behind k-means is to find some sort of structure within the data set.

Algorithm

Step 1 – Initialize centroids (μ k)

Let’s say we’re given the following data set:

Here, we’re not interested in fitting a regression line, but instead, want to group clusters of people with similar height and weight (this could actually be useful for clothing companies trying to decide how many different sizes they should produce).

The first thing to do is to define the cluster centroids, μ k. When you first start k-means, you can initialize the centroids to random points from the training set Typically, k-means is run several times with different initial centroids and the one with the lowest cost is chosen after the final clusters are determined.

The number of centroids to define can sometimes be tricky – in the example above, it’s pretty easy to determine that we need 3 clusters since the data points are very distinct and we’re only dealing with a two-dimensional vector. In more complex situations, where visualization isn’t possible, one option is to plot the cost function with respect to the number of centroids, and choose the number of centroids (k) with the lowest cost.

Step 2 – Assign each point to the closest centroid

After defining the cluster centroids, the next step is to assign each point to the nearest centroid. Mathematically, we assign each point to the cluster with the smallest square euclidean distance.

Minimize euclidean distance
min k || x (i) – μ k || 2
where k represents the cluster k, and i each data point in the training set

Step 3 – Calculate the mean for each cluster

Now that every point has been assigned to a cluster, we can now calculate the new mean (centroid) of each cluster. With new centroids, we go back and repeat step 2 then 3 again until convergence.

Overview of heuristic algorithm

Repeat until convergence {
for i = m
c (i) = index (from 1 to k) of cluster centroid closest to x ^(i)
for k = 1 to K
μ k = average (mean) of points assigned to cluster k
}

Weaknesses

K-means is a relatively simple unsupervised learning algorithm, but it doesn’t always produce the intended clusters. For example, in the example below, another unsupervised learning algorithm called DBSCAN was used and clustered the data set properly. K-means wouldn’t have produced the correct clusters. The main advantage of DBSCAN over k-means is that the number of clusters doesn’t need to be defined.

The Importance of Memory

679 words • reading time 4 minutes

Here are some of my thoughts after reading the first half of the book Moonwalking With Einstein

There was once time when being smart meant having a good memory. People with great memories were revered because it “represented the internalization of a universe of external knowledge” 1. Teaching mnemonic techniques to young minds used to be a hallmark of education, something everyone was expected to know. But the advent of books, then computers and machines changed everything. Storing information in bits and bytes reduced the need to remember, and with it, came down the societal value placed on memory.

Today, we live in an information and computer-rich world, with endless streams of data at our fingertips awaiting to be processed and analyzed. Being smart today means being a good problem solver, having the uncanny ability to quizzically sort through puzzles and contrived mathematical formulas. But isn’t one’s ability to solve problems still correlated with one’s memory? Certainly, being good at chess relies on remembering patterns and positioning from experience. Our ability to solve problems is heavily dependent on our experience with similar problems and our creative capacity, which itself is also influenced by our experiences. Fundamentally, a prerequisite to becoming an expert in a specific domain is to have above-average memory in that domain. And so, perhaps to be smart today you still need to have good memory, but what’s changed is simply a matter of semantics.

I’ve written previously about how memory and psychological time are closely intertwined. Our perception of time is shaped by our every day lives and decisions. Monotony makes time contract, while novelty extends it. In July 1962 a french chronolobiologist of the name Michel Siffre lived a cave for two months in isolation to explore how our perception of time was affected when isolated from the external reality. When he came out of the cave on September 14, he thought it was August 20, suggesting that time appeared to contract. Time in physics is the fourth dimension, which fuses with space to form space-time, a surface that Einstein proved to cause gravity. In our day-to-day lives though, time is simply the chronology of events spread across a fixed period. Our perception of who we are is a result of what we’ve done and what we’ve remembered. Living a long life isn’t just physical, it’s also mental. The more experiences and memories that are stored into our mind, the more landmarks there are spread across chronological time, and the longer our lives seem.

So how do we improve our memories? Our minds aren’t so great at remembering words or numbers, but they are good at remembering images. A mnemonic technique called the memory palace or method of loci, is based off of this fact. The idea is to engage our visual-spatial memory and transform things we aren’t good at remembering (numbers, words) into images placed within mental locations, hence the name memory palace. For example, say I wanted to remember the words apple, pizza, baseball, and 10 other items. What I’d do is place the image of each item within some visual location I’m familiar with, like the house I grew up in. Creating some structure in your mind allows you to remember things sequentially, since all you have to do is retrace your steps in your mental location. Making the images more memorable (picture a dancing apple doing the macarena) makes them even easier to remember.

We often forget how important memory is in our everyday lives. It isn’t a simple database that you can query to get an almost immediate result (assuming the database isn’t close to reaching capacity). It’s much more complex, with constant reads, writes, and deletes, making it extremely error prone. The key is to understand the importance of memory, not only in the context of learning, but also the role it plays in the perception of our lives. When we look back at the many years we’ve spent on this world, what matters isn’t simply what we’ve done, but what we remember.

1Moonwalking with Einstein

Applying Machine Learning Techniques

636 words • reading time 3 minutes

My last two posts on machine learning covered linear and logistic regression. There’s probably a lot of questions concerning implementing the two algorithms with real data, so I’ll go over some of the techniques Andrew Ng suggested which are available on the coursera course (see week 6).

Model Selection

Selecting a model can sometimes be tricky from the outset. What degree of a polynomial should I choose? How many parameters? A good strategy is to try different models, each with a different degrees, while splitting your training set into 3 – the training set, cross validation set, and test set. You then calculate your parameters θ based on the new smaller training set for each of the different possible models, then choose the model with lowest cost (error) for the cross validation set (see this post on how to calculate the error). The reason why you split up the training set in 3 is so you can calculate the generalized error based on the test set. If there were only two sets, you’d essentially be ‘fitting’ the test set (since you’d choose the model with the lowest generalized error).

If you’re regularizing your models, then the procedure is nearly the same except that your training set will be optimized using the regularized cost function. The other two costs functions for the cross validation set and test set do not use the regularized version.

Bias and Variance

There’s two general problems to look for after fitting a model to a training set – bias and variance. Think of bias as underfitting (fitting a line when the fit should be a higher order polynomial) and variance as overfitting. In most cases, when there’s bias the error for the training set and for the cross validation set are high. Opposingly, when the error for the training set is low and the error for the cross validation set is high the model most likely has high variance. Knowing whether your model falls into one category or the other is important because it determines what you should do to fix it. If you have high bias, increasing the size of your training set won’t help that much. Likewise, for a model with high variance, getting more data is most likely going to help.

A good method for determining whether you model has bias or variance is to plot a learning curve. A learning curve plots the error with respect to the size of the training set. Here’s an example for a model with high bias (from Andrew Ng’s lecture):

Notice how both the cross validation error and training error converge as the size of the training set increases and are high. Here’s a learning curve for a model with high variance (note how the cross validation error is higher than the training error):

Precision and Recall (Logistic Regression)

If you recall from my post on logistic regression , we used a pretty naive approach to make predictions. If the hypothesis was greater than 0.5 we set the predicted value (y) to 1, and if it was less we set it to 0. For cases where the extremes occur more often (for instance in cancer cases), there needs to be a better way to choose the threshold of the hypothesis function. One way is to compare the F1-score for different thresholds and choose the model with the highest score. The F1-score is calculated as follows:

F1-score
F1-score = 2 (Precision x Recall)/(Precision + Recall)

Where precision and recall are equal to:

Precision
Precision = True positives/(True Positives + False Positives)

Recall
Recall = True positives/(True Positives + False Negatives)

My next post on ML is going to be on neural networks as I’m still trying to wrap my head around it. Stay tuned!

The Myth of Hard Work

425 words • reading time 2 minutes

One of the things I remember most as a child was being told by my parents, teachers, and elders that I could accomplish anything if I worked hard enough. Success in any discipline was attainable as long as I put in the countless hours of dedication and devotion. It wasn’t until my teens that I began to realize that there were certain things that I most likely wouldn’t be able to do professionally, like play in the NBA or become a professional actor.

As the years moved along I could see my career options slowly converge. No longer could I accomplish anything if I worked hard enough, only certain things. I was a harsh realist and still am to some extent today. Yet, one thing that I noticed was that many people weren’t quite as harsh as me. People seemed to still hold on to the maxim that hard work guarantees success. As a culture, we are taught to revere the underdog that overcomes his deficits. The media pounces at any opportunity to celebrate the success of an underdog and ignores the countless others who fail. Being the emotional creatures we are, there’s a loss of rationality when it comes to evaluating what’s realistic.

I don’t think that it’s impossible for someone with no knowledge of programming to learn how to program, or an introvert to become a good salesperson – these things are within the realm of possibility. But people who generally aren’t patient most likely won’t be great programmers. I’m not a strong believer in natural talent either – a person’s set of skills is a reflection of their upbringing and a list of other factors. And so, a good way to put it is that your current strengths really determine whether you’ll be successful at X or Y in the short term (say 1-2 years).

Knowing your strengths is just as important as knowing your weaknesses. Your strengths, unlike your weaknesses, tell you what you can do. If you’re someone like me who hasn’t really thought about strengths until recently, try to spend some time thinking what you’re good at and how that ties in with where you are in your career. I’ve been lucky enough to receive a copy of Strength Finder 2.0 from my awesome mentor at Mozilla. If you have some cash to spare, pick up a copy and fill out the questionnaire (it takes around 15-20 minutes) to find out your top five talents. Here were mine:

  • Input
  • Futuristic
  • Intellection
  • Learner
  • Restorative

Matrix Multiplication Using SQL

137 words • reading time less than a minute

Something pretty interesting you can do in SQL is multiply two matrices. In fact, it’s actually quite efficient if the matrices are sparse, meaning that they’re populated with many zeros. Here’s the SQL code:

SELECT A.row_number, B.column_number, SUM(A.value * B.value)
FROM A,B
WHERE A.column_number = B.row_number
GROUP BY A.row_number, B.column_number

Each table only contains entries where the value isn’t zero, since including it would be inefficient. The two tables are joined where the column number of A and the row number of B are equal (basically this filter makes sure each element in matrix A is multiplied by the right element in matrix B). The group by statement is a little tricky, but it aggregates the terms so that you end up multiplying and summing each row in table A by each column in table B.