Before my parents sent us back to Johns Hopkins, we made a quick trip to the King of Prussia Mall and Costco for clothes and food. I'll have to wait until next time to cook because of apartment renovations, but you betcha I'm going to learn. :) Here's what we bought!
Clothes
<.jpg of the pictures of clothes when I get my camera>
Food
1 bag of oranges
1 bag of oranges
1 bundle of bananas
2 french baguettes
1 jar of strawberry jam
1 jar of peanut butter
1 roll of aluminum foil
1 jar of spicy tofu
1 jar of spicy, pickled green beans
1 case of mandarin oranges cups
1 loaf of whole wheat bread
1 case of yoghurt
2 heads of lettuce
1 case of canned fruit
2 bottles of mango juice
1 package of salsa additions
1 cup of hot spiced olives
1 case of blueberries
1 cup of sliced cucumbers
1 package of cold cut meat
1 package of frozen sweet potato fries
4 tv dinners
1 carton of milk
4 cooked sausages
When we got back, Wayne and I finished playing Portal 2! It's really an amazing game, and I was surprised to find it having an adventure element to it. For sure there's an initial learning curve, but it's not long before using portals feels natural! I'm hooked on the ending song now, and definitely going to back to play the original Portal. I think I kind of forgot what it's like to play games again after I quit Junior year of high school. I've still been buying the games, but I could still get into them like I used to while having many other real-world responsibilities in mind. I'm glad Portal 2 changed that for me, especially since I think the best games have the ability to evoke emotion as Portal 2 did - I'm sure it will be a blast getting back into gaming. That being said, my sister, Wayne and I are eagerly waiting the release of Dragon Nest BETA. ;)
Back to Johns Hopkins University
The big goal of this week is to get our 'Generate Newsworthy HIT' out on Amazon Mechanical Turk. Chris gave me this list of things to do:
Before we could get the HIT out in public though, there were still a few things I had to take care of:
Pre-release
1. Incorporate the positive and negative control outputs into generating the .csv file (an excel spreadsheet where each column is a variable, and we upload it onto MTurk to work with our HIT template)
2. Fix any bugs of the whole spreadsheet including (I sent this to Chris last night)
a. The script prints outside of the columns <-- this should be a quick fix, the for loop is probably iterating more than needed
b. The script only calls one article for the positive control and one article for the negative control for all 100 <-- I can fix this by initially generating more control articles and put them into a directory, and then use a for loop to iterate over each one
c. The csv files had bugs from the earlier, original script (I attached the old csv file we used), in that sometimes it doesn't print the lead_section variable, so the columns get messed up. I'm not sure how this error came to be but I'm guessing that it probably cam from within the .pl script
3. Check for any bugs in the actual HIT template (the JavaScript/CSS/HTML I wrote to make the HIT look like what it looks like)
a. Spend 45-60mins researching geolocation for the Turkers
Getting the sentence/section extractor for Wikipedia articles working
I ran into some trouble Tuesday morning figuring out Python modules (packs of code that you can conveniently use rather than writing your own functions, because some of them are very advanced) and how Byung Gyu's programs relied on them. Questions that came up included:
1. How do you install Python modules if you aren't logged into an administrator account?
2. Is it more efficient to install your own Python modules or utilize someone else's? Is either necessary?
3. What is the PYTHONPATH?
4. What does it mean to write the line 'import ...' at the top of your code?
Unfound modules: wpTextExtractor, mwlib, pkg_resource, _uscan, splitting
-------------------------------------------------------------------------------------------
[Neat Tricks 1] I began by installing modules as needed, and learned quite a few Linux commands along the way as I figured there must have been a faster way than just tediuos c/pying files/fumbling here and there:
wget [url] (downloads the file into the current directory)
find [path] | grep [directory] (finds a specific file or [directory] based on the [path])
[Neat Tricks 2] Additionally, I discovered about the tilde in the context of paths (such as ~/wikitopics/src/):
~ (stands for the $HOME environment variable; /home/yourusername)
From StackOverFlow.com:
"Unless you're writing a shell script or using some other language that knows to substitute the value of $HOME for ~, tildes in file paths have no special meaning and will be treated as any other non-special character."
Another good advantage to tilde's is that if the home directory changes, the path does not need to change because the '~' tilde character already compensates for that. Planning to research more on this.
[Neat Tricks 3] The last little thing that I forgot to mention (that Chris taught me on the first day), is that pressing 'tab' finishes off a file/directory name. For instance, if you've typed:
emacs wiki-
And the full filename is wiki-02.1-mylib (and no other names have 'wiki-' has a prefix), then pressing tab will finish it for you! Meh I used to write it out completely. :P I'm really getting the hang of it now as well as CTRL+e (goes to the end of the line), CTRL+a (goes to the beginning of the line) and CTRL+k (cuts everything after on the line). They're little things that make utilizing the Linux environment much easier.
-------------------------------------------------------------------------------------------
Then half-way through, instead of installing everything, Chris showed me that we could simply utilize Byung Gyu's installed modules rather than installing them ourselves. We simply link to the directories using 'ln -s [path]', and also set the PYTHONPATH environment variable by typing export PYTHONPATH=/[path]:[path], where the colons separate paths that Python searches.
This is the message that Chris sent me with more details on setting PYTHONPATH:
You can set PYTHONPATH in your ".bash" file which is loaded every time you log in (assuming you are using bash as your default shell instead of csh or something).
Here's how you can set it to include NLTK
export PYTHONPATH="/usr/local/bin/python:/usr/local/lib/python2.5/site-packages/nltk-2.0b7-py2.5.egg/"
You can also point to specific modules in your own directory space or in ByungGyu's by adding a colon and then the path. For instance:
export PYTHONPATH="/usr/local/bin/python:/usr/local/lib/python2.5/site-packages/nltk-2.0b7-py2.5.egg/:/home/hltcoe/bahn/wikipydia"
I know, I know. It was a pretty forwards and backwards time.
The overall lesson that we learned from all this mumbo jumbo was that Python looks for modules according to the PYTHONPATH, which can point to multiple paths. Using this, we can utilize Byung Gyu's installed modules rather than installing our own and figuring out the write-permissions that come along with a non-adminstrator account. By navigating the PYTHONPATH, we were able to make the script sucessfully run with the necessary Python modules, and this was the output:
It basically gets sentences from respective Wikipedia pages of the randomly generated articles from the negative control worked on last Friday. Woot! Just to have it output something distinguishable is a remarkable feeling. Chris was very patient with me, and I was glad he took the time to help me work through this situation. I definitely got a hold of many Python fundamentals, and the next goal is to upload this HIT - I'm working to make it soon!
No comments:
Post a Comment