Author: webgeek

  • MySQL Cheat Sheet

    Found this today. Not to bad for a quick reference sheet. MySQL Cheat Sheet

  • Hortonworks Releases Its (Conservative) Hadoop Platform

    Check this article out.

    “Hortonworks sets itself apart from Cloudera and MapR by sticking with standards-based software and Apache’s proven 1.0 code line.” >>

  • Configuring MySQL as a Hive Metastore

    Hive uses DerbyDB to store its metadata by default, if you have ever tried having more than one person, or multiple services, such as the Hive Web UI or Hive Service connect, it doesn’t work really well and can cause problems. To improve this I recommend use MySQL as the metastore_db.

     

    Install a MySQL server and client either on the same server as Hive or a remote server.

    Log into MySQL and create the metastore and user permission.

     

    shell> mysql -u root -p

     

    mysql > create database hive_metastore;

    mysql > CREATE USER ‘hive’@'[localhost|remotehost]’ IDENTIFIED BY ‘metastore’;

    mysql > GRANT ALL PRIVILEGES ON hive_metastore.* TO ‘hive’@'[localhost|remotehost]”;

     

    Now download the MySQL JDBC driver from http://www.mysql.com/downloads/connector/j/an install it into you hive library path “install_path/hive/lib”

     

    Now configure your hive-site.xml file which should be in “install_path/hive/conf/hive-site.xml”.

     

    <?xml version=”1.0″?>

    <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

    <configuration>

     

    <property>

    <name>javax generic xenical.jdo.option.ConnectionURL</name>

    <value>jdbc:mysql://[localhost|remotehost]/metastore</value>

    <description>JDBC connect string for a JDBC metastore</description>

    </property>

     

    <property>

    <name>javax.jdo.option.ConnectionDriverName</name>

    <value>com.mysql.jdbc.Driver</value>

    <description>Driver class name for a JDBC metastore</description>

    </property>

     

    <property>

    <name>javax.jdo.option.ConnectionUserName</name>

    <value>hive</value>

    <description>username to use against metastore database</description>

    </property>

     

    <property>

    <name>javax.jdo.option.ConnectionPassword</name>

    <value>metastore</value>

    <description>password to use against metastore database</description>

    </property>

     

    <property>

    <name>datanucleus.autoCreateSchema</name>

    <value>false</value>

    <description>creates necessary schema on a startup if one doesn’t exist. Set this to false, after creating it once</description>

    </property>

     

    <property>

     

    More information at:https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin

  • Hadoop, Hive, Oh my.

    Sorry, I haven’t really wrote anything in the last couple of months, but after a year or more of reading about Hadoop and poking at it. I have actually got the chance to do something with it.

    A year or so ago I helped build a MySQL database to help analyze user data. A large amount of user data, billions of rows of CSV files and like any good DBA and/or DB Developer would do. We started importing and normalizing the data, so we could aggregate it into rational database. This works great, except a few problems start to show up.

    The first is the bigger your tables get the slow the database gets, even with table partitioning and several other tricks of the trade it gets slow. Let’s be honest here, a single query summing up billions of rows is going to take a little while no matter what the database is, unless it’s vertically partitioned or you are using some type shard architecture.

    The second problem we had was that we were expected to store the original CSV files for a up to two years, for business reasons, and no matter what the SAN Storage vendors tell you SAN storage is not as cheap as they would like you to think it is, even if you compress the files. On top of that, there was the fact, that if we need to reload the files for some reason, such as the ‘powers that be’ deciding they needed this other column now. We would have to decompress the files and reload the data.

    Then the last problem of which most of us know all too well little or no budget. Well, there goes must of the commercial products, okay all of them.

    Now, this is where I see my big chance to bring up Hadoop. First billions of rows and columns check, second cheap commodity hardware check, and last but not leased open source little check.

    The next thing that I realized, we have way more people that know SQL, than we do java and pythons, which are the programming languages of choice for writing mapreduce. Good thing I am not the only one to have this problem. Facebook developers have written an application which sits on top of Hadoop call Hive. Hive is great it allows users that are already familiar with SQL to write similar queries in HiveQL and then Hive takes that query and then transforms it into mapreduce queries.

    It proof of concept time. I was able to spin up a 14 VM Hadoop cluster in about two days, copy over a sizable amount of test data in another. I spent a couple week playing around with HiveQL and converting a few SQL queries over to HiveQL. With the results being I am able to process five days in the same time it was taking to process one day. Not bad for 14 VM.

    So stay tune for more blog entry on Hadoop, Hive, and the crew.

  • Using Iozone for Filesystem Benchmarking

    If you have been around computer systems long enough everyone knows how import disk performance is, espeacially with database systems. There’s the standard htparm -tT and dd test the everyone does, but it really does give you the whole picture. What you really want is to test read, write, re-read, re-write, read backwards, read strided, fread, fwrite, random read, pread ,mmap, aio_read, and aio_write. For that I would recommend using Iozone. It gives you a better idea of what’s going on.

    http://www.iozone.org

     

  • Using Mysqldump to Backup Your Stored Procedures, Functions and Triggers

    Now that MySQL has Stored Procedures, Functions and Trigger. I have found myself needing to back them up or make copies to move to a different MySQL server. It’s actually pretty easy to do with mysqldump. Here is the command line for it.

    shell> mysqldump –routines –no-create-info –no-data –no-create-db –skip-opt <database> > outputfile.sql

    Then if you need to restore it.

    shell> mysql <database> < outputfile.sql

    more info: http://dev.mysql.com/doc/refman/5.5/en/mysqldump.html

  • Install Git From Source On Linux

    If you are like me and want to install git-core core from source instead of one to the many binary packages out or you just have a distro that does have a binary for it. Here is what you will need to get it installed.

    • POSIX-compliant shell
    • GCC – gnu c compiler
    • GNU Interactive Tools
    • Perl 5.8 or Later
    • Openssl
    • Openssh
    • zlib
    • libcurl
    • expat

    Once all you have verified or install on the required packages. You can download the source from Git Homepage.

    shell$ wget http://git-core.googlecode.com/files/git-1.x.x.tar.gz

    shell$ tar -zxf git-1.x.x.tar.gz

    shell$ cd git-1.x.x

    shell$ ./configure –prefix=[install_path]

    shell$ make all

    shell$ sudo make install

    A little old school and not the hard.

    Resources:
    http://git-scm.com/
    http://ruby.about.com/od/git/a/2.htm

  • Here is what happens when I recovery from surgery.

    After being told that I would need to take five days to recovery from a recent surgery and being informed that they would prefer me not to work on any of the system at work well under the influence of pain killers, I started looking for thing to kill time.

    Well trying to find something to do for 5 days and not having much luck, I was surfing the web using Google’s Chrome Web Browser, I started looking at their “New Tab Apps Feature” or whatever it’s called. I started thinking that would be really simple to create a web based version of the application that did roughly the something, but wasn’t dependent on the web browser. It would also be handy for all those table devices that have web browsers. It’s kind of a pain to hit those little links with fat fingers like mine. After doing a quick design in my head, I figured what the heck; I got nothing better to do.

    I decided to go old school LAMP on it with Linux, Apache, MySQL, and PERL. I decided to use mod_perl and MASON as my frame work, MASON being a throw back from my short stent at Amazon.com. I went with the old MVC architecture, since it’s the easiest for a one node system and because of my state of the art Dual Pentium Pro 180 MHZ, with 256G of RAM, and 14G 5400 RPM Hard Drive for the server that I was building it on.

    After 5 days, a few additional week nights and Saturdays, here is what I came up withhttp://www.myapplinks.com. It is still in the Alpha/Beta stage by not a bad start.

  • Using CURL to manage Tomcat

    The other day I and a few of my colleges were talking about a easy way to deploy and undeploy war files from the command line like you could through the Tomcat Web Application Manager portal and being on a python kick, I started writing it in python. After an hour or two I realized that I had made this way more complex then I need to.  I had been reading theApache Tomcat 6.0 Manager App HOW-TO and I was using curl to test all the commands from localhost.

    shell> curl –anyauth -u admin:password http://localhost:8080/manager/start?path=/myapp

    So now after slapping myself in forehead and saying “duh!”. I decided I could write this as shell script and have it knock out in 20 minutes.

    So here what I came up with tomcat-cli.sh.

    –Cheers

     

  • Simple HTTP Server with Python

    Ever needed a quick web server to share something with a Windows user from you Linux box.  Python has really easy to use embedded HTTP Server. Just try the following.shell> python -m SimpleHTTPServer 9001

    And point you web browser at http://localhost:9001 and see what happens.

    — Cheers