IOzone for Arch Linux ARM
For different benchmarks I will later publish on this blog, I wanted to use IOzone. Unfortunately, this package is not available on Arch Linux ARM via the pacman package manager.
This tool is not difficult to compile, but still I decided to create an Arch Linux ARM package for it and share it with you: iozone-3_414-1-armv6h.pkg.tar.xz
You can install it using the following command:
pacman -U iozone-3_414-1-armv6h.pkg.tar.xz
Please note that I also tried to submit the package to the AUR (Arch User Repository). To my surprise, it was already present (https://aur.archlinux.org/packages/iozone/) but not available for ARM. 🙁
Retrieving UTF-8 values from Cassandra
I recently tried out Apache Cassandra which is a NoSQL solution that was initially developed by Facebook and designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure.
In order to populate the database, I used Apache Flume and Flume NG Apache Cassandra Sink which helped me to inject logs into it. But let’s focus on Cassandra here, I will write posts about Flume later on.
This is the Cassandra schema I was using (which is the one suggested by the Cassandra sink):
create keyspace logs with strategy_options = {datacenter1:1} ; use logs; create column family records with comparator = UTF8Type and gc_grace = 86400 ;
After adding the data into the database, I wanted to fetch them to make sure everything went well.
I tried by three different ways:
- Cassandra CLI
To return the first 100 rows (and all associated columns) from the records column family, I ran the following command:LIST records;
However, the rows were looking like:
RowKey: 6c6f67696b6465763a32653630306661362d633664652d346336612d386561352d323636326533353661616332 => (column=data, value=39312e36362e3233392e323530202d202d205b32392f4465632f323031323a30393a31383a3433202d303730305d2022 474554202f77702d636f6e74656e742f706c7567696e732f73796e746178686967686c6967687465722f73796e746178686967686c6967687465723 22f736372697074732f636c6970626f6172642e73776620485454502f312e31222032303020313635392022687474703a2f2f7777772e6c6f67696b 6465762e636f6d2f323031302f30372f30372f7573652d736572766572786d6c687474702d7468726f7567682d70726f78792f2220224d6f7a696c6 c612f352e30202857696e646f7773204e5420362e313b20574f5736343b2072763a31372e3029204765636b6f2f3230313030313031204669726566 6f782f31372e3022, timestamp=1357169131235135) => (column=host, value=39312e36362e3233392e323530, timestamp=1357169131235134) => (column=src, value=6c6f67696b646576, timestamp=1357169131235133) => (column=ts, value=323031322d31322d32395431363a31383a34332e3030305a, timestamp=1357169131235132)
This behavior is clearly explained on the DataStax page Getting Started Using the Cassandra CLI:
Cassandra stores all data internally as hex byte arrays by default. If you do not specify a default row key validation class, column comparator and column validation class when you define the column family, Cassandra CLI will expect input data for row keys, column names, and column values to be in hex format (and data will be returned in hex format).
To pass and return data in human-readable format, you can pass a value through an encoding function. Available encodings are:
* ascii
* bytes
* integer (a generic variable-length integer type)
* lexicalUUID
* long
* utf8Which means that we need to specify the encoding in which column family data should be returned. We can do it for the entire client session using the following commands:
ASSUME records KEYS AS utf8; ASSUME records COMPARATOR AS utf8; ASSUME records VALIDATOR AS utf8;
So if we now run the previous command, the rows are looking like:
RowKey: logikdev:2e600fa6-c6de-4c6a-8ea5-2662e356aac2 => (column=data, value=91.66.239.250 - - [29/Dec/2012:09:18:43 -0700] "GET /wp-content/plugins/syntaxhighlighter/syntax highlighter2/scripts/clipboard.swf HTTP/1.1" 200 1659 "http://www.logikdev.com/2010/07/07/use-serverxmlhttp-through-pro xy/" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0", timestamp=1357169131235135) => (column=host, value=91.66.239.250, timestamp=1357169131235134) => (column=src, value=logikdev, timestamp=1357169131235133) => (column=ts, value=2012-12-29T16:18:43.000Z, timestamp=1357169131235132)
Much better! 😉
- CQL
In order to retrieve columns in the records column family, we can use the following SELECT command:SELECT * FROM records LIMIT 1;
However, the output looks like:
key | column1 | value --------------------------------------------------------------------------------------------+---------+--------------- ---------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------- ------------- 6c6f67696b6465763a31656438616166362d386438322d343330372d386133612d336438363664633133323636 | data | 3131352e313131 2e33322e313730202d202d205b32392f4465632f323031323a30353a34303a3338202d303730305d2022474554202f323031302f30382f32362f76 616c69646174652d66696c652d73697a652d7072696f722d75706c6f61642f20485454502f312e31222032303020393931302022687474703a2f2f 7777772e676f6f676c652e636f2e696e2f75726c3f73613d74267263743d6a26713d26657372633d7326736f757263653d7765622663643d392676 65643d3043484951466a41492675726c3d687474702533412532462532467777772e6c6f67696b6465762e636f6d25324632303130253246303825 3246323625324676616c69646174652d66696c652d73697a652d7072696f722d75706c6f61642532462665693d764f5465554c7157443457717241 665f6d4948774177267573673d4146516a434e4839385070715373655375304e4a6a46484f534d5576515642704e41266361643d726a612220224d 6f7a696c6c612f352e30202857696e646f7773204e5420362e313b2072763a31372e3029204765636b6f2f32303130303130312046697265666f78 2f31372e3022
As you can see, we have a similar problem than in Cassandra CLI. And funny enough, the resolution is also similar except that the syntax is a bit different.
Indeed, to treat the column values in the table records as being of the type UTF-8, we need to run the following commands:ASSUME records(key) VALUES ARE text; ASSUME records(value) VALUES ARE text
First of all, note that we overrode the type as ‘text’ instead of UTF-8 as explained on this page.
Secondly, the cqlsh help page says we can simply run the commandASSUME records VALUES are text;
which is supposed to treat all column values in the table records as being of the type text. But, for some reason, this is not working and we have to run it for each column value. - twitter/cassandra
Finally, I tried this Ruby client (which can be found here) directly with the Interactive Ruby Shell (IRB).
The first thing we have to do is to connect to Cassandra using the following commands:require 'cassandra' client = Cassandra.new('logs', '127.0.0.1:9160')
We then can run the command below to get one of the rows:
client.get(:records, 'logikdev:69a2ef29-599f-48f0-99ef-d4c2ad1f4169')
This would return something like the following:
=> #
"85.168.229.99 - - [29/Dec/2012:09:48:22 -0700] \"GET /wp-content/plugins/syntaxhighlighter/s yntaxhighlighter2/styles/shThemeDefault.css?ver=2.1.364 HTTP/1.1\" 200 3936 \"http://www.logikdev.com/2010/12/02/update -your-twitter-status-with-java/\" \"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0\"", "host" =>"85.168.229.99", "src"=>"logikdev", "ts"=>"2012-12-29T16:48:22.000Z"} {"data"=>1357169131284005, "host"=>1357169131284004, "src"=>1357169131284003, "ts"=>1357169131284002}> And miracle!
As you can see, there is no need to specify any type this time, the data are already decoded. 🙂
EDIT:
After some more research, I found a better solution to our problem: you can define the validation class in the create statement of the column family. It means that, for our example, the Cassandra schema would be:
create keyspace logs with strategy_options = {datacenter1:1} ; use logs; create column family records with comparator = UTF8Type and gc_grace = 86400 and key_validation_class = UTF8Type and default_validation_class = UTF8Type ;
With this solution, there is no need to use the ASSUME
command, the data will be displayed properly.
MD5 generates 31 bytes instead of 32
I wrote the following method to generate MD5 hashes:
private static String md5(String input) throws NoSuchAlgorithmException { MessageDigest md = MessageDigest.getInstance("MD5"); byte[] messageDigest = md.digest(input.getBytes()); BigInteger number = new BigInteger(1, messageDigest); return number.toString(16); }
This method takes a String
and outputs a 32 character hash.
However, depending on the value of the parameter, this method outputs a 31 character hash instead of 32.
Why is that?
It appears that it is because the leading zero is missing. But the question remains! Why?
This is simply because the toString
method doesn’t know that we are expecting a leading zero. BigInteger
stores just a number; it doesn’t know how many leading zeroes we need.
For example, how many leading zero would you expect with the following code?
new BigInteger("33").toString()
In order to be sure MD5 generates 32 bytes, we could replace the number.toString(16)
by:
String.format("%032x", number)
So, the method will look like the following:
private static String md5(String input) throws NoSuchAlgorithmException { MessageDigest md = MessageDigest.getInstance("MD5"); byte[] messageDigest = md.digest(input.getBytes()); BigInteger number = new BigInteger(1, messageDigest); return String.format("%032x", number); }
Remove DOS carriage return
After writing a shell script on Windows and trying to execute it on Linux, I got the following error message:
/bin/sh^M: bad interpreter: No such file or directory
The problem is obviously because the return line character on Windows and Linux are different. But how to fix it in order to execute it on Linux?
I firstly tried to open it, copy the content of the file and paste it in a new file using only Linux command line, but it didn’t work. 🙁
However, I found the following thread which fixed the problem:
http://www.reachdba.com/showthread.php?335-bin-sh-M-bad-interpreter-No-such-file-or-directory-apps11i-Instalation
Here is what you need to do:
- Open your file using vi
- Write the command
:set fileformat=unix
- Save the file using
:wq
The file should now run on Linux. 🙂