Monday, July 11, 2011

New Discoveries

Oof it's going to be difficult remembering everything that happened these past two weeks. That's what I get for not writing thoughts down. :P The first thing that comes to mind is Bitcoins!

Logo is still changing

I came across it a month or so ago on Reddit, and by chance I bumped into it again through another link; that was when I began to look into it. In simple terms, Bitcoin is a digital currency, and its most notable characteristic is the fact that there is no intervention from central authority unlike other forms of currency like American dollars today. Bitcoins are stored on a 'wallet.dat' file on your computer, and transferred electronically from one account to another through 'keys' that are represented by various numbers and letters. Additionally, Bitcoins are generated through a hashing algorithm by users such as me and you through out own computers. This is called mining. The hashing algorithm is created such that a block of 50BTC (Bitcoins) is generated every 10 minutes regardless of the number of miners; this prevents Bitcoin inflation or excess 'printing'. Currently, Bitcoins are still very young, with an estimate of less than 100,000 users total. However, they're already starting to make their way into people's everyday lives, including coffee shop sites and third-party transactions for sites like Newegg, Ebay and Amazon. Probably the largest objective for the Bitcoin community today is to get more people interested in Bitcoins and even for businesses to begin accepting Bitcoins for their products and services. The current Bitcoin to USD conversion is 1BTC for $14.4. Here are some further resources on Bitcoin:

  1. Introductory video and website
  2. Bitcoin Forum and Bitcoin on Reddit
  3. Bitcoin Mining Guide
  4. Building a Bitcoin Mining Rig
  5. Bitcoin Prices

.... and in the midst of the entire Bitcoin fascination, I decided to learn how to build a computer!!! That was the biggest accomplishment within the past couple days - it must have hours at a time watching videos and reading guides on forums and websites on the processes and components. I wanted to learn how to build a computer since I am majoring in C.S., and I'm very glad to have at least the basics and overall concept in mind. I'd like to write a guide within the near future to help anyone else interested in building computers (seriously, if you game or even if you don't game, knowing how to give a computer gives you a huge advantage in customizing what sort of features you want - this is something you'll have to do yourself/pay a comp-savvy person and believe me, it pays off). Here are some short notes that I took on the process:

  • ATX - Advanced Technology eXtended (motherboard)
  • AMD - Advanced Micro Devices
  • PCI Express 2.0 x16 - For GPUs
  • Asus vs MSI -> MSI
  • overclocking
  • SDRAM vs RAM

Yeah, it looks like a bunch of mumbo jumbo at first, but no fear, this is just plain English, not some foreign language. I'm getting excited to write the guide now. :D

The reason for going through all this trouble is because in order to mine Bitcoins, you need the proper hardware, especially GPUs (Graphics Processing Units) or video cards since they will be doing the actual hashing/mining. Yep, this is for mining Bitcoins! It might be fun to get into. :)

The Sapphire Radeon HD 5830 (an example GPU)

Monday was the 4th of July fireworks, and I got to see them with my family on Sunday and Wayne and friends on Monday. :) You should have seen the smiles the entire half hour the fireworks went off in Inner Harbor. Later that Monday evening, we discovered that there was a report of violence during the fireworks while we were still there!! I've been lucky to grow up in safe neighborhoods since and this is the first time the lightbulb has gone off that it's truly necessary to be aware and safe. Be safe!! Seriously :P

I hopped on the Amtrak train on Saturday July 9th to visit my family in Pennsylvania and my best friend Kathie in New Jersey for her birthday party!! We played Cranium, ate lots of pesto pizza and cake, and then played a few rounds of mah jong. I have no idea how the time went by so quickly. When I took a second to look at the clock, four hours had already passed! It was a simply get together and I really loved hanging out with Kathie and also go to meet her awesome college friends, Regina and Deep. My parents and I then drove back around 9pm, and chatted with Wayne. He gifted me my first game on Steam called Worms Reloaded, which has a similar structure to Gunbound (you and the opponent are on a 2D, hilly map and you can choose from a variety of weapons to damage the opponent), and I played that for two hours before falling asleep. I returned to Johns Hopkins and peeled crabs with Wayne for dinner, battled each other in Worms Reloaded and WON, and fell asleep.

Natural Language Processing
The CLSP summer school began two weeks ago and ended last Friday. Each day consisted of two lectures from 9am - 12pm regarding topics in computing such as computer vision and of course natural language processing. It was a great mix of introductory to advanced content presented by passionate and well known lecturers, including Ben Van Durme, one of my supervisors at the lab, and Jason Eisner, my Declarative Methods (and future Natural Language Processing) professor. When I look back at the experience, I wish I could have recorded the lectures because some of the material definitely flew past me and it would have been nice to rewatch and piece together the lectures better. I'm looking forward to the new eye contacts being developed, which may conveniently possess this recording ability. :)

The second half of the day was devoted to mini-projects such as jQuery cookies, capturing references on Wikipedia pages (more info on this later), and researching about Twitter Firehose.

Notes about Twitter Firehose, a service which provides a constant stream of tweets to developers (so yes, if you tweeted about your boyfriend or your cat, some researcher or developer probably has it).

Twitter Streams

Firehose (100%): Google paid $15million, Microsoft paid $10million (released to a select 15-20 developers)
Halfhose (50%): $360k/year (estimates)
Decahose (10%): $60k/year (estimates)
Gardenhose (10%): Free [No longer available as indicated -> http://captico.com/the-twitter-firehose-is-up-for-grabs-if-you-have-the-cash/2010/11
                             -> http://www.thegurureview.net/tag/garden-hose]
                   - Then Gardenhose users were transfered to Decahose on Gnip
Spritzer (2%): Free

More information found here: http://gnip.com/twitter

'Contact info@gnip.com for pricing and details'

But most of all, the second half of the days were devoted to the 'Sentence Summary HIT'. There were two main revisions:

1st revision

2nd revision

Just to give a quick overview of what the 'Sentence Summary HIT' is, we developed this HIT to ask actual people to help us isolate the sentences of a Wikipedia article that explain why that Wikipedia page received so many page views - was it a newly premiered video game? was it the death of a celebrity? Here is an example:

This is an example of what a submissions for the 'Sentence Summary HIT' should look like. Sentences can also be anywhere in the article and do not need to be in the same place (though they usually are).

The first revision was made to accommodate five articles per HIT, while the second revision was made for three (so that the Turkers don't get bored!). Additionally, the second revision had a few 'upgrades' and tweeks to the features. The smaller ones included a [clean] button to clean the highlighted sentences, extra detail to the instructions, adjustments to the number of sentences allowed to be highlighted (which will be removed), and separations of sentences with tab spaces instead of bullet points (standardization). The largest addition was a 'Recent Sentences' box, which showed the three sentences containing the most recent dates (taken from the 'Generate Newsworthy HIT'). This is because sentences containing dates usually describe an event on that date in a condensed format, which is just what we need for this HIT. Moreover, the recent sentences are click-able, such that clicking on the sentence will scroll the text box containing the article on the right to that exact sentence as well as bold it. These sentences were placed to give users a place to look for these 'summarizing sentences' without needing to skim through the entire article as them of them can be quite long.

To further explain this HIT, I will also be creating a video complete with instructions, examples, and plenty of information on how we would like the users to approach the HIT and even more importantly, why. A deep understanding of this HIT will prevent and misunderstandings and allow Turkers to efficiently and accurately complete the HIT.

Here is the current dialogue of the video:

Hello this is an instructional video for the Sentence Summary HIT. My name is Katherine, one of Chris' research students, and I will be walking you through this simple and easy HIT. The page I'm currently on displays an actual HIT from the batch of 100 HITs. My goal today is to provide you with plenty of information, along with examples, to be able to complete this HIT. I've also provided a table of contents of the video so you can skip to parts that best appeals to your interests.

The first thing we'll do is just read out the background information and short guidelines on the Sentence Summary HIT in this paragraph.

[Read Instructions]

In summary, the goal of this HIT is to distinguish the sentences that summarize a current event relating to a Wikipedia article for three Wikipedia articles. Simple enough, right?

Let's go over the interface of the HIT and then do some quick practice runs.

You'll notice if I scroll down to the first article, you see two text boxes. The first text box contains the title of the article, and the article itself. That's what this large box of text is right here. The second text box contains clickable sentences from the article that contain the 3 most recent dates. Cliking each sentence takes you to the section in this article with the corresponding sentence.

Once again, the purpose of this HIT is to distinguish the sentences that tell us why this topic was newsworthy or popular around this date, which happens to be ____. To do that, you click on the sentences, which will become highlighted a sea blue. You can also click on highlighted sentences or this [clean] button to unselect one or all of them respectively. Initially, the tricky part is deciding which sentences to highlight. But believe me, you'll get the hang of it once you know what you're looking for.

The first step you should take is to skim over the first paragraph or two. Usually, the summary sentence is staring at you right there, no need to skip all the way to the bottom. The second step is to look at this text box of recent sentences. When it comes to current events, summary sentences are inclined to contain dates though that is not always the case. The third step is just 'thinking on your feet' and making logical conclusions based on the characteristics of the topic. You'll see what I mean in a bit - let's take a look at the first article.
          Note* remember to provide transcript for video

This is a look into the data we'll be uploading for this HIT:

The columns are offset because excel cells hold a maximum of 32k characters per cell. It looks wrong, but the data is fine and in the right format - yep, the data is visually misleading.

These are more notes that I took regarding the current goals and to-do list:

- (check) \t -> sentence separator
- (check) alternate design for HIT (one article per HIT)
  - (change to infinite) let them click on any sentences they want (not just one)
- post the current events one (download the current date)
- ask byung gyu how the clustering algorithm works -> see if can have another clustering algorithm that can work on all 1000
- record groups of clusters in a new column called cluster number
- try to keep real trending scores (for wikitopics.org)
- (in progress) refining the instructions
- (check) point our recent sentences somehow in Sentences Summary HIT

Information
-----------
- nohup [command] &
  - screen (look up if curious)

Over the past two weeks, I've definitely gotten more attached to the entire Wikitopics project. The possible website designs including extra features such as forums and commenting. There's just so much we can do with it!! The main focus right now is just to get the nitty-gritty down to collect good data for the website.

1 comment:

  1. Hi Katherine,

    Your wish is granted. We actually do record the CLSP summer school videos. Last year's videos are poster here:
    http://www.clsp.jhu.edu/workshops/ws10/summerschool.php

    Once this year's videos come online they'll be here:
    http://www.clsp.jhu.edu/workshops/ws11/summerschool.php

    --CCB

    ReplyDelete