Scribe: знакомство

09 Mar 2011


...

также см. более позднюю страницу scribe_packaging

Overview

https://github.com/facebook/scribe/wiki/
https://github.com/facebook/scribe/wiki/Scribe-Overview
http://incubator.apache.org/thrift/
https://github.com/facebook/scribe/wiki/Logging-Messages

http://www.facebook.com/note.php?note_id=32008268919 – Facebook’s Scribe technology now open source, October 24, 2008
http://axonflux.com/how-facebook-uses-scribe-hadoop-and-hive-for – How Facebook uses Scribe, Hadoop, and Hive for Analytics, Ad hoc analysis, Spam detection and Ad Optimization\

Scribe for bringing together the logs. Hadoop for map/reduce. Hive for querying. Ultimately SQL for storing final reporting info.

О надежности
Scribe spools data to disk on any node to handle intermittent connectivity node failure, but it doesn’t sync a log file for every message, so there’s a possibility of a small amount of data loss in the event of a crash or catastrophic hardware failure. Basically, this is more reliability than you get with most logging systems, but not something you should use for database transactions.

Когда могут потеряться данные
These error cases will lead to loss of data:

  • If a client can’t connect to either the local or central scribe server the message will be lost
  • If a scribe server crashes it could lose a small amount of data that’s in memory but not on disk
  • Some multiple component failure cases, such as a resender can’t connect to any central server and its local disk fills up
  • Some rare timeout conditions can lead to duplicate messages

Настройка

https://github.com/facebook/scribe/wiki/Scribe-Configuration

Installing Thrift / Scribe on Ubuntu Lucid (10.04)

http://vccv.posterous.com/installing-thrift-scribe-on-ubuntu-lucid-1004
Thrift от верхней ревизии не собирается, надо откатывать на “около 9 августа 2010”.
Id, указанный в шпаргалке, не работает, надо искать правильный git log’ом

### Installation Notes for Thrift / Scribe on Ubuntu Lucid (v10.04)

### Step 1: Install required tools

sudo apt-get install libboost-dev libevent-dev python-dev automake pkg-config libtool flex bison
sudo apt-get install php5-dev
sudo apt-get install ant
sudo apt-get install openjdk-6-jdk
sudo apt-get install bjam
sudo apt-get install libboost-all-dev
sudo apt-get install libbit-vector-perl      # если нужны Perl-овые библиотеки

### Step 2: Building / Installing Thrift
git clone git://git.apache.org/thrift.git

# Note: The fb303 library doesn't compile on the newest snapshots
# so revert to around Aug. 9th, 2010
cd thrift
git reset 720186cd456a2cd97f606202c46fefb9efb68ddb --hard

./bootstrap.sh
./configure
make
sudo make install

# Build python libs
cd lib/py
sudo python setup.py install

### Step 3: Build fb303 libs
cd contrib/fb303
./bootstrap.sh
./configure
make
sudo make install

### Step 4: Building / Installing Scribe
git clone http://github.com/facebook/scribe.git
cd scribe

./bootstrap.sh
make
sudo make install
cd lib/py
sudo python setup.py install

### Step 4: Housekeeping
# Make sure that the python libraries are in your PYTHONPATH environment variable -- otherwise
# you will not be able to rum the scribe_ctrl (Scribe Control) script. After installation,
# the scribed binary should be in your path

Perl client

Perl client for Facebook’s scribe logging software
http://search.cpan.org/jjschutz/Log-Dispatch-Scribe-0.05/lib/Log/Dispatch/Scribe.pm – только на CPAN, пакета для Ubuntu нет
http://search.cpan.org/stbey/Bit-Vector-7.1/Vector.pod , libbit-vector-perl – надо

http://notes.jschutz.net/2009/04/perl-client-for-facebooks-scribe-logging-software/

Scribe is a log aggregator, developed at Facebook and released as open source. Scribe is built on Thrift, a cross-language RPC type platform, and therefore it is possible to use scribe with any of the Thrift-supported languages. Whilst Perl is one of the supported languages, there is little in the way of working examples, so here’s how I did it:

  1. Install Thrift.
  
  2. Build and install FB303 perl modules
  cd thrift/contrib/fb303
  # Edit if/fb303.thrift and add the line 'namespace perl Facebook.FB303' after the other namespace declarations
  thrift --gen perl if/fb303.thrift
  sudo cp -a gen-perl/ /usr/local/lib/perl5/site_perl/5.10.0 # or wherever you keep your site perl
  This creates the modules Facebook::FB303::Constants, Facebook::FB303::FacebookService and Facebook::FB303::Types.
  
  3. Install Scribe.
  
  4. Build and install Scribe perl modules
  cd scribe
  # Edit if/scribe.thrift and add 'namespace perl Scribe.Thrift' after the other namespace declarations
  thrift -I /path/to/thrift/contrib/ --gen perl scribe.thrift
  sudo cp -a gen-perl/Scribe /usr/local/lib/perl5/site_perl/5.10.0/ # or wherever
  This creates the modules Scribe::Thrift::Constants, Scribe::Thrift::scribe, Scribe::Thrift::Types.

Оттуда же, пример без Log::Dispatch::Scribe:
Here is an example program that uses the client (reading one line at a time from stdin and sending to a scribe instance running locally on port 1465):

#! /usr/bin/perl

use Scribe::Thrift::scribe;
use Thrift::Socket;
use Thrift::FramedTransport;
use Thrift::BinaryProtocol;
use strict;
use warnings;

my $host = 'localhost';
my $port = 1465;
my $cat = $ARGV[0] || 'test';

my $socket = Thrift::Socket->new($host, $port);
my $transport = Thrift::FramedTransport->new($socket);
my $proto = Thrift::BinaryProtocol->new($transport);

my $client = Scribe::Thrift::scribeClient->new($proto, $proto);
my $le = Scribe::Thrift::LogEntry->new({ category => $cat });

$transport->open();

while (my $line = <>) {
    $le->message($line);
    my $result = $client->Log([ $le ]);
    if ($result == Scribe::Thrift::ResultCode::TRY_LATER) {
    print STDERR "TRY_LATER\n";
    }
    elsif ($result != Scribe::Thrift::ResultCode::OK) {
    print STDERR "Unknown result code: $result\n";
    }
}

$transport->close();

Running

cd /scribe/scribe
vim examples/README
export PYTHONPATH='/usr/lib/python2.6/site-packages:/usr/lib/python2.6:/usr/lib/pymodules/python2.6'
cd /scribe/perl
./Log-Dispatch-Scribe_example.pl

See Also

http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg00916.html таймауты
http://snippets.notmyidea.org/2009/07/29/howto-install-scribe-the-facebook-log-system-on-debian установка на Debian
http://www.debian-administration.org/article/337/Rolling_your_own_Debian_packages_part_2 вообще полезное о сборке deb-пакетов

Installing on FreeBSD

# pkg_add -r scribe
# pkg_add -r p5-Log-Dispatch-Scribe
# pkg_add -r fb303

# pkg_add -r py26-thrift  # если нужны питоновые клиенты

Установка в локальный каталог

###thrift###
> cd thrift
> ./bootstrap.sh # или make clean
> ./configure --prefix '/home/lena-san/scribe/thrift_installation' PY_PREFIX=/home/lena-san/scribe/thrift_installation PERL_PREFIX=/home/lena-san/scribe/thrift_installation/ 
# могут оказаться проблемы с java и памятью -- java.io.IOException: error=12, Cannot allocate memory
# пишут, что надо своп увеличивать. Или отключаем java:
> ./configure --prefix '/home/lena-san/scribe/thrift_installation' PY_PREFIX=/home/lena-san/scribe/thrift_installation PERL_PREFIX=/home/lena-san/scribe/thrift_installation/ --disable-gen-java --without-java
> make 
# make ломается, если отключить java. Помогает убрать в комментарий строки про "namespace java" в нескольких файлах (make ругается на все по очереди):
> vim /home/lena-san/scribe/thrift/test/DebugProtoTest.thrift
> make install
> cd -
> cd thrift/contrib/fb303
> ./bootstrap.sh # или make clean
> ./configure --prefix '/home/lena-san/scribe/thrift_installation' PY_PREFIX=/home/lena-san/scribe/thrift_installation PERL_PREFIX=/home/lena-san/scribe/thrift_installation/ --with-thriftpath='/home/lena-san/scribe/thrift_installation' 
# или, если надо отключить java:
> ./configure --prefix '/home/lena-san/scribe/thrift_installation' PY_PREFIX=/home/lena-san/scribe/thrift_installation PERL_PREFIX=/home/lena-san/scribe/thrift_installation/ --with-thriftpath='/home/lena-san/scribe/thrift_installation' --disable-gen-java --without-java
# может понадобится убрать namespace java:
> vim /home/lena-san/scribe/thrift/contrib/fb303/if/fb303.thrift
> make 
> make install
> cd -

###scribe###
> cd scribe
> ./bootstrap.sh # или make clean
> ./configure --prefix '/home/lena-san/scribe/scribe_installation' PY_PREFIX=/home/lena-san/scribe/scribe_installation --with-thriftpath=/home/lena-san/scribe/thrift_installation --with-fb303path=/home/lena-san/scribe/thrift_installation
> make 
# может понадобится убрать namespace java
# (симптом -- [FAILURE:/home/lena-san/scribe/scribe/if/bucketupdater.thrift:21] No generator named 'java' could be found!):
> vim /home/lena-san/scribe/scribe/if/bucketupdater.thrift
> make install
> cd -

# копируем на др. машину, устанавливаем нужные библиотеки (boost, libevent и т.д. по списку выше)
# запускаем:
# сервер
> mkdir /tmp/scribetest
> export LD_LIBRARY_PATH='/home/lena/scribe/thrift_installation/lib'
> ./scribe_installation/bin/scribed scribe_examples/example1.conf
# клиент
> export PYTHONPATH='/home/lena/scribe/scribe_installation/lib/python2.6/site-packages:/home/lena/scribe/thrift_installation/lib/python2.6/site-packages'
> echo "hello!" |./scribe_examples/scribe_cat test

Сборка .deb-пакетов

###thrift
> mkdir -p package_build/thrift
> cd package_build/thrift
> wget http://www.sai.msu.su/apache//incubator/thrift/0.5.0-incubating/thrift-0.5.0.tar.gz
> tar -xzf thrift-0.5.0.tar.gz
> cd thrift-0.5.0
# закомментировали всюду namespace java
> dh_make --single --email lena-san@yandex-team.ru -f ../thrift-0.5.0.tar.gz
написали debian/rules
> dch -i -D unstable "first build"    ## ???
> dpkg-buildpackage -rfakeroot
>ls ../*.deb

###fb303
>cd contrib
>cp -r fb303 fb303-0.5.0
>cd fb303-0.5.0
>./bootstrap.sh
>dh_make --single --email lena-san@yandex-team.ru --createorig
написали debian/rules
>dch -i -D unstable "first build"   ## ???
>dpkg-buildpackage -rfakeroot

###scribe-1, неудачно
mkdir scribe 
cd scribe
wget --no-check-certificate 'https://github.com/downloads/facebook/scribe/scribe-2.2.tar.gz'
tar -xzf scribe-2.2.tar.gz
mv scribe scribe-2.2
cd scribe-2.2
./bootstrap.sh
dh_make --single --email lena-san@yandex-team.ru --createorig
написали debian/rules
dpkg-buildpackage -rfakeroot
FAIL: scribe_server.h:45: error: conflicting return type specified for ‘virtual scribe::thrift::ResultCode scribeHandler::Log(const std::vector<scribe::thrift::LogEntry, std::allocator<scribe::thrift::LogEntry> >&)’

###scribe
mkdir scribe 
cd scribe
git clone https://github.com/facebook/scribe.git
mv scribe scribe-2.2.1
cd scribe scribe-2.2.1
# закомментировали всюду namespace java
./bootstrap.sh
vim if/scribe.thrift  # убедились, что namespace perl есть
dh_make --single --email lena-san@yandex-team.ru --createorig
написали debian/rules
dpkg-buildpackage -rfakeroot

Log::Dispatch::Scribe packaging

sudo apt-get install libtest-tester-perl
cpan2deb Test::Timer

cpan2deb Log::Dispatch::Scribe  #FAIL
wget 'http://search.cpan.org/CPAN/authors/id/J/JJ/JJSCHUTZ/Log-Dispatch-Scribe-0.05.tar.gz'
tar -xzf Log-Dispatch-Scribe-0.05.tar.gz
mv Log-Dispatch-Scribe-0.05 liblog-dispatch-scribe-perl-0.05
cd liblog-dispatch-scribe-perl-0.05
dh_make --single --email lena-san@yandex-team.ru -f ../Log-Dispatch-Scribe-0.05.tar.gz
пишем debian/control -- копируем из автосгенеренного в cpan2deb, и исправляем зависимости вида 'perl/'
dpkg-buildpackage -rfakeroot