Saturday, June 11, 2011

The Drawing Board

I'm on the train back home to Pennsylvania now; it's my first time taking Amtrak. I'm looking through the window at the outside scenery as Wayne suggested that I do. It's completely different from a regular car ride, where you're surrounded by other cars, houses and cities. Even though there are hundreds of strange faces around you, it is completely silent. Instead, you feel like you're by yourself, leveled with the infinite trees and passing green. Surreal - that's what you call it.

I'm coming back home to take care of a few things: (We're riding through an enormous lake now, and there's nothing but trees surrounding both sides of us) groceries, furniture, living essentials, as well as creating an ideal schedule for the rest of the summer. It's about time I set things right; our apartment, even after promises from the landlord to clean up the place by last weekend, is still covered in the previous tenants' trash and spoils. The kitchen repairs are an unfinished mess; the locks are broken, and there are stains covering the walls, especially in the bathroom. We can't even bring in furniture or unpack because the repairs and cleaning has yet to be done. You can imagine that my mom was furious when I told her what sort of living conditions we were in; it seemed like she wanted to give the landlord a piece of her mind. She's a tough mom, and I've learned a lot about responsibility and taking things into your own hands. That's why we're not going to wait anymore for someone to clean up the place; we'll do it ourselves. ;)

Things I need to bring back
+ (check) vacuum cleaner
+ (will buy tomorrow, Friday at IKEAS) beds, mattresses, sheets
+ (will buy) desks, (check) chairs, (check) lamps
+ (check) dust wipes and (check) stain remover
+ (check) air freshener
+ (check) sponges and mops
+ posters, (check) lights, plants (you wouldn't believe how much of a difference plants add)
+ (check) laundry detergent, (check) towels
+ pencils, pens, paper
+ (check) cute things ^.^ (just because I like :D)
+ (check) movies and video games! (I find that just having these around add quality living .. somehow)

Additionally, now that the kitchen should soon be refurnished (and if not, there's a stove and oven in the lounge room), I'd like to start cooking. We've been living on cold cut sandwiches and ramen for way too long now (not that I'm complaining about the ramen)

Kitchen things to bring back
+ (check) saucepan
+ (check) pot
+ (check) rice cooker
+ (check) kettle
+ (check) spatula, (check) ladles, (check) forks/spoons/knives, (check) chopsticks
+ (check) regular plates and cups
+ (check) strainers
+ (check) cutting boards
+ (check) KNIVES muahahhaa (jk o_o)
+ (check) oven mittens (and apron? dunno)
+ (check) trays, (check) waxed sheets, (check) Pan
+ (check) measuring cups
+ (check) other assorted baking containers for cuter and sweeter things :)

Food to bring back
+ (check) flour, (check) (brown, white, and confectioner) sugar, (check) eggs
+ (check) vanilla/almond extract
+ (check) butter, cream, (check) milk
+ chocolate, cocoa powder, (check) chocolate chips
+ (check) baking powder, (check) custard powder
+ mint leaves and pineapple (In grade school, I remember I made mint pineapple juice for my classmates and turns it was pretty good! Time to test it again)
+ (check) rice
+ (check) soy sauce, (check) rice vinegar, (check) sesame oil, (check) oil
+ (check) salt, (check) pepper, cinnamon, spices
+ (check) dried noodles, tomato sauce, basil, cheeses
+ (check) raw beef, (check) chicken, pork
+ (check) spinach, bok choy, green beans, other asian vegetables I don't know the names of x)
+ onion, garlic, (check) mushroom, sweet potato
+ (check) blueberries, (check) strawberries, (check) bananas, (check) apples, (check) oranges
+ (check) bread (lots!), (check) cereal, (check) orange juice
+ chicken broth (can make by boiling chicken bones and meat in hot water :))


(closes laptop to get off the train)

Recent work on the Wikitopics project:

With the goal to improve the Generate Newsworthy HIT underway, our most recent addition is a column that displays the k-sentences containing the most recent dates from the corresponding Wikipedia page since we found that Wikipedia pages are regularly updated when a notable event directly related to the page occurs. The updates do not happen for all Wikipedia pages, but for the ones that do, Turkers are able to distinguish the newsworthy articles on the spot. Here is what the 'Generate Newsworthy HIT' looks like now:

The recent changes are in gold (Missing sentences for positive and negative controls)

The first step towards generating the k-sentences containing the most recent dates was to understand Byung Gyu's pick_recent_xml.py script. After installing a few python modules and rearranging data:


This script outputs the sentence containing the most recent date: what we want is to change the script to output the k-most recent.

Error (start)
I actually had a misunderstanding with the instructions, so what I had originally done was output the sentence with the most recent date in addition to the k-surrounding sentences (oops sorry Chris!) Here's a peek into that. >.>

Outputting the line number and the sentences

Here's how that script worked

Error (finished)

Okay! SO you can ignore everything in that section. :D Here's how I extracted the k-sentences containing the most recent dates.


blue: confirms the proper format to run the script
pink: sets the paths and prepares iteration over multiple files
gold: initializes variables containing sentences with dates
orange: writes the the k-sentences containing the most recent dates to the variable 'result'
purple: writes 'result' to a file called [ARTICLE].sentences and closes the file

Directory where the files are stored

Once pick_recent_xml2.py runs through the generate_newsworthy.sh shell script, we also modify generate_newsworthy.pl to read the corresponding file to write the sentences containing recent dates to the .csv file.

Works for just a few articles

Works for all articles
 
Side note: generate_newsworthy.sh takes 40 minutes to run o_o ... we'll have to see if we can fix that

Here's another run of the graded-wikitopics.py script now that we have a few more submissions from Turkers:

Turkers who failed the evaluation

Shows the Approve/Reject decisions

With the recent date sentences extracted for the sentences needing labeling, here is my progress so far, which includes the next steps. (Chris and a few other researchers will be leaving for Oregon for the next two weeks, so we created a huge list of possible things to do)

Todo-List
+ Complete Generate Newsworthy HIT
   - (check) combine the 'Google News' and 'Recent sentences' column
   - (check) change the width of the lower table to 1000px instead of 800px
   - (check) change the width of the last two columns to 300px each
   - generate the recent date sentences for the positive and negative controls
     - changed get_positives_article.py script to print the date of the article as well
   - create bullet points for every recent date sentence extracted
   - add google news link at the bottom (so that there is no ambiguity) instead of linking the entire recent date sentences
   - go back to just one column of drop-downs
   - instead of random articles on HIT, make HIT display articles from clusters (email BA)
   - sentences HIT (Wikipedia interface, highlight/click sentences to tell why current)
+ manage DropBox for all the previous HIT batches (because Amazon is deleting them after a certain amount of days!!)
+ JQuery Cookies
   - cookie to automatically fill out the 'Age, Location' items
   - cookie to collapse the instructions if done once
+ create a third parallel file with citations -> name, date, link citation (wpextractor parses xml/wikimarkup) (interesting sentences with references to articles)
    - wget to get the link citation, 'beautiful soup' pulls out the text from the html (python), and you would run nltk to sentence split() -> serif on the results (marks the date + coreference solutions, markup names of people and organizations and generates parse trees)

Miscellaneous
+ (check) Download DropBox and accept Chris' past MTurk files
  - For some reason I happened upon a case of the 'Malware Protection' virus right after this .. o_o All my windows closed, and tray popups were flying onto the screen like crazy. Of course, a window of the 'System Scan' popped up, so I tried to access the internet, and then realized that the virus (it's called a rogue I believe) would not let me open any other applications besides itself - this was also the case even after restarting. So after restarting in safe mode for the first time in a thousand years (really, I've never used it before so I gave it a shot), the virus surprisingly wasn't making any active attacks. I ran Malewarebytes' Anti-Malware and Spy++ ASAP and flushed it out of the system - everything's perfect now! Best free software ever.
    I looked on the internet to see if downloading DropBox caused this, but it doesn't seem like there's a connection - DropBox should be completely safe. It's weird ... I'll have to run system scans more often now - you should go run one now (and download Malewarebyte's Anti-Malware/Spy++ if you haven't already. I swear they look fishy, but they're on your side. :))

Python
+ variable = [{FORMAT} for {EVERY_ELEMENT} in {THIS_SET} {CONDITIONS}] (list comprehension, a quick way to write lists)
+ dict({LIST}) (creates a python dictionary from the list inside)
+ utils.convert_date({STRING}) (converts string that represents a date to a datetime object)
+ for i, a enumerate({LIST}) (i stands for the index and a stands for {LIST}[i])
+ for a, b zip({LIST_A}, {LIST_B}) (iterates over two lists in parallel)
+ for i, a, b enumerate(zip({LIST_A}, {LIST_B})) (i stands for the index, a stands for {LIST_A}[i], and b stands for {LIST_B}[i])
   + supposedly faster way: for i, a, b izip(count(), {LIST_A}, {LIST_B})
      
   from itertools import izip, count
   alist = ['a1', 'a2', 'a3']
   blist = ['b1', 'b2', 'b3']

   for i, a, b in izip(count(), alist, blist):
      print i, a, b
   ------------------------------------------
   >>> def foo():
   ...  for i, x, y in izip(count(), a, b):
   ...   pass
   ...
   >>> def bar():
   ...  for i, (x, y) in enumerate(zip(a, b)):
   ...   pass
   ...
   >>> delta(foo)
   0.0213768482208
   >>> delta(bar)
   0.180979013443 
   (source)  


+ lambda (creating anonymous functions (not bound to a name))
+ min({LIST}, key={ARBITRARY_FUNCTION}) (finding minimum element of dictionary, where key determines how the minimum is found)
+ {LIST}.remove({ELEMENT}) (removes an element from a list)

Perl
+ counter++; (just like Java, has '++' trick and needs a semicolon)

SERIF
+ marks the date + coreference solutions (meaning that SERIF can match pronouns to their actual subjects, superscript references ...)
+ markup names of people and organizations and generates parse trees 

History behind the 'Hello World' tradition (from Wikipedia)
The first known instance of the usage of the words "hello" and "world" together in computer literature occurred earlier, in Kernighan's 1972 Tutorial Introduction to the Language B,[1] with the following code:
main( ) {
  extrn a, b, c;
  putchar(a); putchar(b); putchar(c); putchar('!*n');
}
a 'hell';
b 'o, w';
c 'orld';
 

4 comments:

  1. do you use those kind of super awesome and big Chinese knives? :D
    -Xuchen

    ReplyDelete
  2. haha yes! But my parents gave me a half-sized one for safe measure :) i'll post a picture sooner or later xD

    ReplyDelete
  3. not sure if you know this, but my sister rented out a place her senior year down in MD. most of the places are like that, unfortunately, so you're not alone

    and about the food.. good choices! speaking from experience though: if you take all of that in one batch you're bound to waste some. i'd just bring staple pantry items. (eg. flour, EVOO, etc.) - that means... take it easy on the raw meats, fruits, and veggies; you can buy that stuff at a supermarket.

    good luck with living on your own! i'm sure you'll have loads of fun with it.

    -g

    ReplyDelete
  4. hey =) yeah, I have some packaged raw chicken and beef sitting in the freezer. Might have bought too much. x)

    thanks, I'm sure I'll learn a thing or two about cooking and taking care of the place. Maybe put some of my own recipes up! haha

    take care ^^

    ReplyDelete