GraphicsMagick OpenMP 性能比较(icc+iomp vs gcc+gomp)

GraphicsMagick(GM)是ImageMagick(IM)的可替代的图片处理
方案,但是, GraphicsMagick比ImageMagick具有性能高,稳定的优点. 而且, IM能实现的,GM都可以做到.
IM的最大问题就是代码变动太大,不够稳定. GM相对而言要稳定对了, 此外体积也没有GM那么臃肿.
Flickr 从2004年后就放弃了ImageMagick而使用GraphicsMagick, 可谓GM最佳的成功案例.

GraphicsMagick性能提升的一个亮点就是支持OpenMP, 通过OpenMP的优化,性能提升数倍以上.
虽然IM也能够支持OpenMP,但即便如此, 也比GM要慢很多.

ImageMagick也无法能够使用Icc进行支持OpenMP的编译, 而GraphicMP则可以.

为了了解OpenMP对性能有何影响,以及,icc 和 gcc相比,有多大的差异, 我做了以下简单的测试:

1. 测试环境
* CentOS 5.4
* GCC v4.1.2-46.el5_4.1
* PowerEdge R710(Intel(R) Xeon(TM) CPU 3.00GHz *2)

2. 编译脚本
build_icc() {
OPENMP=’-openmp’
CC=’icc’ \
CXX=’icpc’ \
LD=’xild’ \
CFLAGS=”-std=gnu99 $OPENMP -O3 -ip -restrict -xSSE3 -axSSE3,SSSE3,SSE4.1,SSE4.2″ \
CXXFLAGS=” $OPENMP -O3 -ip -restrict -xSSE3 -axSSE3,SSSE3,SSE4.1,SSE4.2″ \
CPPFLAGS=’-I/opt/local/include’ \
LDFLAGS=’ -L/opt/local/lib -L/usr/lib64 ‘ \
LIBS=’-liomp5 -ltcmalloc_minimal ‘ \
./configure –prefix=/opt/GraphicsMagick \
–disable-static \
–enable-openmp \
–enable-shared
}
build_gcc() {
OPENMP=’-fopenmp’
CFLAGS=”$OPENMP -O3 -msse3 -mssse3″ \
CXXFLAGS=”$OPENMP -O3 -msse3 -mssse3″ \
CPPFLAGS=’-I/opt/local/include’ \
LDFLAGS=’ -L/opt/local/lib -L/usr/lib64 ‘ \
./configure –prefix=/opt/GraphicsMagick \
–disable-static \
–enable-openmp \
–enable-shared
}
make distclean
#build_icc
#build_gcc

build_gcc使用gcc编译,使用的GNU的openmp库libgomp,
build_icc则使用icc, link icc的高效openmp库iomp5.

3. 测试脚本
# cat bench.sh
for threads in 1 2 3 4
do
env OMP_NUM_THREADS=$threads /opt/GraphicsMagick/bin/gm benchmark -duration 10 convert \
-size 2048×1080 pattern:granite -operator all Noise-Gaussian 30% null:
done

在这个脚本中,通过设置OMP_NUM_THREADS环境变量,分别使用1-4个线程( R710共有8 core,但我只测试使用4个)

4. 测试结果
——————gcc(gomp+O3)————–
Results: 5 iter 11.05s user 11.07s total 0.452 iter/s (0.452 iter/s cpu)
Results: 10 iter 22.14s user 11.07s total 0.903 iter/s (0.452 iter/s cpu)
Results: 12 iter 31.26s user 10.42s total 1.152 iter/s (0.384 iter/s cpu)
Results: 16 iter 41.50s user 10.38s total 1.541 iter/s (0.386 iter/s cpu)

——————icc(iomp5+O3)————–
Results: 16 iter 10.39s user 10.39s total 1.540 iter/s (1.540 iter/s cpu)
Results: 27 iter 20.53s user 10.35s total 2.609 iter/s (1.315 iter/s cpu)
Results: 40 iter 30.37s user 10.23s total 3.910 iter/s (1.317 iter/s cpu)
Results: 60 iter 40.41s user 10.12s total 5.929 iter/s (1.485 iter/s cpu)

以上结果中, iter/s代表每个cpu时间能够执行的循环的次数, 数值越高,性能越大.

从结果看,虽然上述数值还有一些随机性, 根据当前负载会有一些波动,但是,OpenMP的效果很明显, 启用4个线程,执行次数是单CPU的3倍以上.而ICC的运行效果也是GCC的3倍以上!

使用ICC优化编译Mysql percona 分支(Compile mysql-percona v5.0.87)

生产环境跑的是打了google mysql-patch v4的mysql, 运行效果一直不错. Percona提供的mysql补丁集也不错,
尤其是增加了很多有用的信息,在运行时分析性能瓶颈很有用. Google的v3/v4补丁相对来说就少了一些.

最新的5.0.97b20出来后,我决定替换slave,目的是希望更方便的分析运行期统计信息.
和google v4一样,我使用了新的icc v11.1.x进行了优化编译.
步骤如下:

1. 编译libunwind
CC=icc \
CXX=icpc \
LD=xild \
AR=xiar \
CFLAGS=’-O3 -ipo -no-prec-div -xSSE3 -axSSE4.2,SSE4.1,SSE3,SSE2′ \
CXXFLAGS=’-O3 -ipo -no-prec-div -xSSE3 -axSSE4.2,SSE4.1,SSE3,SSE2′ \
./configure –prefix=/opt/local
make install

2.编译google-perftools-1.4
CC=icc \
CXX=icpc \
LD=xild \
AR=xiar \
CPPFLAGS=” -I/opt/local/include ” \
CXXFLAGS=’-xSSE3 -axSSE4.2,SSE4.1,SSE3,SSE2 -O3 -ip -no-prec-div ‘ \
LDFLAGS=’ -L/opt/local/lib ‘ \
./configure –prefix=/opt/local
make install

3.编译mysql-percona 5.0.87b20

#!/bin/bash
ICC_FLAGS=’-O3 -no-prec-div -ip -unroll2 -restrict -fno-implicit-templates -fno-exceptions -fno-rtti -static-intel -static-libgcc -xSSE3 -axSSE2,SSE3,SSE4.1,SSE4.2′
MYSQL_ROOT=/opt/mysql-percona
BUILD_VERSION=’ICC v11.1.059/Percona v5.0.87-b20′
ICC=icc
ICPC=icpc
build_client() {
CFLAGS=”$ICC_FLAGS” \
CXXFLAGS=”$ICC_FLAGS” \
CPPFLAGS=’-I/opt/local/include’ \
LDFLAGS=’-L/opt/local/lib’ \
LD=xild \
AR=xiar \
CC=$ICC \
CXX=$ICPC \
./configure \
–prefix=$MYSQL_ROOT \
–with-server-suffix=’-cv-mysql’ \
–with-comment=”$BUILD_VERSION” \
–with-collation=utf8_general_ci \
–with-charset=utf8 \
–with-extra-charsets=complex \
–with-client-ldflags=’-all-static’ \
–enable-thread-safe-client \
–enable-assembler \
–with-fast-mutexes \
–with-innodb \
–with-pic \
–enable-assembler \
–enable-local-infile \
–without-server \
–without-ndbcluster \
–without-embedded-server\
–without-example-storage-engine \
–without-archive-storage-engine \
–without-blackhole-storage-engine \
–without-csv-storage-engine \
–without-federated-storage-engine \
–with-zlib-dir=bundled \
–without-debug \
–with-readline
make -j8
make install
}
build_server(){
CFLAGS=”$ICC_FLAGS” \
CXXFLAGS=”$ICC_FLAGS” \
CPPFLAGS=’-I/opt/local/include’ \
LDFLAGS=’-L/opt/local/lib’ \
CC=$ICC \
CXX=$ICPC \
LD=xild \
AR=xiar \
./configure \
–disable-shared \
–prefix=/opt/mysql-percona \
–with-server-suffix=’-cv-mysql’ \
–with-comment=”$BUILD_VERSION” \
–with-collation=utf8_general_ci \
–with-charset=utf8 \
–with-extra-charsets=complex \
–with-mysqld-ldflags=’-all-static -ltcmalloc_minimal’ \
–enable-thread-safe-client \
–enable-assembler \
–with-innodb \
–with-pic \
–with-fast-mutexes \
–enable-assembler \
–enable-local-infile \
–without-bench \
–without-extra-tools \
–without-docs \
–without-man \
–without-ndbcluster \
–without-embedded-server\
–without-example-storage-engine \
–without-archive-storage-engine \
–without-blackhole-storage-engine \
–without-csv-storage-engine \
–without-federated-storage-engine \
–with-zlib-dir=bundled \
–without-debug \
–with-readline
make -j8
install -s -D sql/mysqld $MYSQL_ROOT/libexec/mysqld
}
make clean distclean
build_client
make clean distclean
build_server

client和server是分别编译的,server是static.

修改调优mysql的配置
cat /etc/my.cnf

[mysqld]

# generic configuration options
port = 3306
socket = /tmp/mysql.sock
datadir = /db/data

back_log = 50
max_connections = 500
max_connect_errors = 100
table_cache = 2048
max_allowed_packet = 16M
binlog_cache_size = 1M
max_heap_table_size = 64M
sort_buffer_size = 8M
join_buffer_size = 8M
thread_cache_size = 8
thread_concurrency = 8
query_cache_size = 64M
query_cache_limit = 2M
ft_min_word_len = 4
default_table_type = InnoDB
thread_stack = 192K
transaction_isolation = REPEATABLE-READ
tmp_table_size = 64M
log-bin=mysql-bin
long_query_time = 3
log_long_format

replicate-same-server-id
server-id = 100
binlog-ignore-db=mysql
binlog-ignore-db=test
key_buffer_size = 32M
read_buffer_size = 2M
read_rnd_buffer_size = 16M
bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 128M
myisam_max_sort_file_size = 10G
myisam_max_extra_sort_file_size = 10G
myisam_repair_threads = 1
myisam_recover

innodb_additional_mem_pool_size = 16M
innodb_buffer_pool_size = 2G
innodb_data_file_path = ibdata1:5G;idbdata2:10G;idbdata3:30G;idbdata4:40G
innodb_data_home_dir = /db/tb
innodb_file_io_threads = 4
innodb_thread_concurrency = 0
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_log_group_home_dir= /db/tlog
innodb_max_dirty_pages_pct = 80
innodb_flush_method=O_DIRECT
innodb_lock_wait_timeout = 120
auto_increment_increment=2
auto_increment_offset=1
expire_logs_days=3
allow_view_trigger_sp_subquery
#google patch
innodb_adaptive_checkpoint
innodb_adaptive_checkpoint=1
innodb_write_io_threads=4
innodb_io_capacity=200
#percona only
rpl_transaction_enabled=1
rpl_mirror_binlog_enabled
sync-mirror-binlog
#slow log
#sql_log_filename=/db/slowlog/s2.log
#log_slow_queries=/db/slowlog/s2.log
#log_queries_not_using_indexes

几个重要的参数:
innodb_adaptive_checkpoint=1
要开启
innodb_max_dirty_pages_pct
要根据运行时信息进行微调
innodb_io_capacity=200 or 300
这里的数量是raid中磁盘stripe size*100
例如raid10,2*2, 设置为200, 2*3则可设置为300

rpl_transaction_enabled=1
rpl_mirror_binlog_enabled
sync-mirror-binlog
和replication相关.需要手动打补丁
mirror-binlog.patch

update:(当前补丁列表,自己打补丁按此顺序):

show_patches.patch
microslow_innodb.patch
profiling_slow.patch
userstatv2.patch
microsec_process.patch
innodb_io_patches.patch
mysqld_safe_syslog.patch
innodb_locks_held.patch
innodb_show_bp.patch
innodb_check_fragmentation.patch
innodb_io_pattern.patch
innodb_fsync_source.patch
innodb_show_hashed_memory.patch
innodb_dict_size_limit.patch
innodb_extra_rseg.patch
innodb_thread_concurrency_timer_based.patch
innodb_use_sys_malloc.patch
innodb_recovery_patches.patch
innodb_misc_patch.patch
innodb_split_buf_pool_mutex.patch
innodb_rw_lock.patch
mysql-test.patch

Compile gearmand with icc (ICC v11.x编译Gearmand)

系统已安装:
1. tcmalloc (google-perftools-1.4 )
2. libmemcached v0.35(v0.30+)

编译gearmand-0.10:
tar zxvf gearmand-0.10.tar.gz
./compile-gearman.sh

=========gearman.sh=====
make distclean
CC=icc \
CXX=icpc \
CFLAGS=” -O3 -ip -std=gnu99 -no-prec-div -xSSE3 -axSSE4.2,SSE4.1,SSE3 -static-intel -no-gcc” \
CPPFLAGS=’-I/opt/local/include -Wno-error’ \
LDFLAGS=’-L/opt/local/lib’ \
./configure \
–prefix=/opt/gearmand \
–enable-tcmalloc \
–disable-libsqlite3 \
–disable-libdrizzle \
–with-libevent-prefix=/opt/local \
–with-libmemcached-prefix=/opt/local
make install
=======end===

note:
1. 关闭gcc宏定义
2. 打开std gnu99支持
3. -ipo failed

Compile php+php-fpm with ICC v11.1

打算把生产环境的PHP升级到5.2.11, 于是重新使用ICC编译了PHP-5.2.11+PHP-FPM-0.6.
结果编译时失败,出现以下错误:

fpm_atomic.h(116): catastrophic error: #error directive: unsupported architecture. please write a patch and send it in
#error unsupported architecture. please write a patch and send it in

开始以为是我使用独立安装造成的,不过尝试了integrated安装,问题也一样.
于是检查了fpm_atomic.h 116行:
#else

#error unsupported architecture. please write a patch and send it in

#endif

原来是没有检测当前arch的宏分支. 由于icc的x86_64是定义了__x86_64 而不是__amd64__.
修改了一下:
#elif ( __amd64__ || __amd64 || __x86_64__ )

Patch (php-fpm-icc.patch) :
=====================================
@@ -37,7 +37,7 @@
return res;
}
-#elif ( __amd64__ || __amd64 )
+#elif ( __amd64__ || __amd64 || __x86_64__ )
typedef int64_t atomic_int_t;
typedef uint64_t atomic_uint_t;

======================================
我把patch提交到了php-fpm mailinglist.

附:更新版本的PHP-5.2.11+php-fpm0.6-x86_64 编译过程:

环境准备:
* Icc v11.1.059 EM64T
* MYSQL: mysql-percona分支, ICC优化
* google-perftools-1.4 (我使用tcmallock_minimal来优化PHP)
* php-5.2.11.tar.bz2
* php-fpm-0.6~5.2.11.tar.gz

1. 生成php-fpm patch

tar zxvf php-fpm-0.6~5.2.11.tar.gz
cd php-fpm-0.6-5.2.11
cd fpm
path -p0 < php-fpm-icc.patch ( 这是修正fpm_atomic.h)
cd ../..
./php-fpm-0.6-5.2.11/generate-fpm-patch
这生成一个fpm.patch

2. 准备php源码
tar zxvf php-5.2.11.tar.gz
cd php-5.2.11
patch -p1 < ../fpm.patch
./buildconf –force

3. 编译:
./compile-php.sh
=======compile-php.sh============
#!/bin/bash
make distclean
VERSION=5.2.11
DEST=/opt/php-$VERSION
CFLAGS=’ -O3 -ip -unroll2 -no-prec-div -fp-model source -restrict -static-intel -xSSE2,SSE3,SSE4.1,SSE4.2 -axSSE2,SSE3,SSE4.1,SSE4.2 ‘ \
CXXFLAGS=’ -O3 -ip -unroll2 -no-prec-div -fp-model source -restrict -static-intel -xSSE2,SSE3,SSE4.1,SSE4.2 -axSSE2,SSE3,SSE4.1,SSE4.2 -fno-implicit-templates -fno-exceptions -fno-rtti’ \
LDFLAGS=’ -ltcmalloc_minimal -L/opt/local/lib’ \
CC=icc \
CXX=icpc \
LD=xild \
AR=xiar \
./configure \
–prefix=$DEST \
–with-libdir=lib64 \
–with-fpm \
–enable-force-cgi-redirect \
–enable-fastcgi \
–enable-mbstring \
–enable-mbregex \
–enable-pcntl \
–enable-exif \
–enable-sockets \
–enable-sysvsem \
–enable-sysvshm \
–enable-inline-optimization \
–enable-zend-multibyte \
–disable-ipv6 \
–disable-debug \
–with-mysql=/opt/mysql \
–with-mysqli \
–with-config-file-path=$DEST/etc \
–with-config-file-scan-dir=$DEST/etc/php.d \
–with-zlib \
–with-curl \
–with-gettext \
–with-jpeg-dir=/usr \
–with-png-dir=/usr \
–with-freetype-dir=/usr \
–with-iconv \
–with-pcre-regex
make install
[ -e $DEST/etc/php.ini ] || cp -u php.ini-recommend $DEST/etc/php.ini

我习惯把所有优化过的lib都安装到/opt/local,因此需要修改成你自己的配置(通常是/usr/local)

备注:
1. 以上脚本是将php-fpm使用integrated方式安装. 省事也是官方推荐的方式.
2. 若使用mysqlnd,可以修改为:
–with-mysql=mysqlnd \
–with-mysqli=mysqlnd \

不过由于mysqlnd只在官方的5.3中存在,除了我自己的特制php source,估计没人会在5.2.x使用到mysqlnd吧.