Openstack Kolla

List of Hosts:

10.240.169.3: “MAAS, docker, openstack-kolla, kolla” all installed on here

Local Docker Registry

To install multi-node docker openstack, we need to have local registry service, Nexus3 is a GUI visible easy to use registry server.
install it via docker,
create ./nexus3/data/docker-compose.yml

=====================================
nexus:
image: sonatype/nexus3:latest
ports:
– “8081:8081”
– “5000:5000”
volumes:
– ./data:/nexus-data
=======================================

and then “docker-compose up -d” to create docker container. May need to pip install docker-compose.

Launch web browser to 10.240.169.3(docker host):8081, default account:admin/admin123, then create a new repo type hosted/docker, use port 5000 and enable docker v1.

verify on docker hosts they can login this private registry: docker login -p admin123 -u admin 10.240.169.3:5000

To pull images from internet repo to local registry

on 10.240.169.3

pip install kolla
kolla-build –base ubuntu –type source –registry 10.240.169.3:5000 –push

This will pull all available docker images from internet, and stored at local.

Prepare hosts for ceph osd

part disk label on each host:
parted /dev/sdb -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1 -1
parted /dev/sdc -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1 -1
parted /dev/sdd -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1 -1
parted /dev/sde -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1 -1
parted /dev/sdf -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1 -1
parted /dev/sdg -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP 1 -1
parted /dev/sdh -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_1 1 -1
parted /dev/sdi -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_2 1 -1
parted /dev/sdj -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_1_J 1 -1
parted /dev/sdk -s — mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_2_J 1 -1

each host needs to install following:

apt install python-pip -y
pip install -U docker-py

apt-get install bridge-utils debootstrap ifenslave ifenslave-2.6 lsof lvm2 ntp ntpdate openssh-server sudo tcpdump python-dev vlan -y

no need to install docker.io manually, as there’s a bootstrap cmd doing this job under kolla-ansible: kolla-ansible -i multinode bootstrap-servers

if any deployment failure, copy /usr/local/share/kolla-ansible/tools/cleanup-containers to each host and run it to clean up containers and redo deploy again.

“kolla-ansible -i multinode destroy” can remove all deployed containor on all nodes, but ceph partitions will be kept. so to erase partitioned disks, run following on each host:

umount /dev/sdb1
umount /dev/sdc1
umount /dev/sdd1
umount /dev/sde1
umount /dev/sdf1
umount /dev/sdg1
umount /dev/sdh1
umount /dev/sdi1
dd if=/dev/zero of=/dev/sdb bs=512 count=1
dd if=/dev/zero of=/dev/sdc bs=512 count=1
dd if=/dev/zero of=/dev/sdd bs=512 count=1
dd if=/dev/zero of=/dev/sde bs=512 count=1
dd if=/dev/zero of=/dev/sdf bs=512 count=1
dd if=/dev/zero of=/dev/sdg bs=512 count=1
dd if=/dev/zero of=/dev/sdh bs=512 count=1
dd if=/dev/zero of=/dev/sdi bs=512 count=1
dd if=/dev/zero of=/dev/sdj bs=512 count=1
dd if=/dev/zero of=/dev/sdk bs=512 count=1

Swift

Here’s a guide to calculate what number should be used for “swift-ring-builder create “. The first step is to determine the number of partitions that will be in the ring. We recommend that there be a minimum of 100 partitions per drive to insure even distribution across the drives. A good starting point might be to figure out the maximum number of drives the cluster will contain, and then multiply by 100, and then round up to the nearest power of two.

For example, imagine we are building a cluster that will have no more than 5,000 drives. That would mean that we would have a total number of 500,000 partitions, which is pretty close to 2^19, rounded up.

It is also a good idea to keep the number of partitions small (relatively). The more partitions there are, the more work that has to be done by the replicators and other backend jobs and the more memory the rings consume in process. The goal is to find a good balance between small rings and maximum cluster size.

The next step is to determine the number of replicas to store of the data. Currently it is recommended to use 3 (as this is the only value that has been tested). The higher the number, the more storage that is used but the less likely you are to lose data.

It is also important to determine how many zones the cluster should have. It is recommended to start with a minimum of 5 zones. You can start with fewer, but our testing has shown that having at least five zones is optimal when failures occur. We also recommend trying to configure the zones at as high a level as possible to create as much isolation as possible. Some example things to take into consideration can include physical location, power availability, and network connectivity. For example, in a small cluster you might decide to split the zones up by cabinet, with each cabinet having its own power and network connectivity. The zone concept is very abstract, so feel free to use it in whatever way best isolates your data from failure. Each zone exists in a region.

A region is also an abstract concept that may be used to distinguish between geographically separated areas as well as can be used within same datacenter. Regions and zones are referenced by a positive integer.

Run following script on any random host first to create swift templates for kolla to use.

########################################################################################

export KOLLA_INTERNAL_ADDRESS=10.240.101.10 #don’t really need for multinodes
export KOLLA_SWIFT_BASE_IMAGE=”10.240.100.4:5000/kolla/ubuntu-source-swift-base:4.0.1″

mkdir -p /etc/kolla/config/swift

# Object ring
docker run \
–rm \
-v /etc/kolla/config/swift/:/etc/kolla/config/swift/ \
$KOLLA_SWIFT_BASE_IMAGE \
swift-ring-builder \
/etc/kolla/config/swift/object.builder create 8 3 1

for i in {1..8}; do
docker run \
–rm \
-v /etc/kolla/config/swift/:/etc/kolla/config/swift/ \
$KOLLA_SWIFT_BASE_IMAGE \
swift-ring-builder \
/etc/kolla/config/swift/object.builder add r1z1-10.240.103.1${i}:6000/d0 1;
done

# Account ring
docker run \
–rm \
-v /etc/kolla/config/swift/:/etc/kolla/config/swift/ \
$KOLLA_SWIFT_BASE_IMAGE \
swift-ring-builder \
/etc/kolla/config/swift/account.builder create 8 3 1

for i in {1..8}; do
docker run \
–rm \
-v /etc/kolla/config/swift/:/etc/kolla/config/swift/ \
$KOLLA_SWIFT_BASE_IMAGE \
swift-ring-builder \
/etc/kolla/config/swift/account.builder add r1z1-10.240.103.1${i}:6001/d0 1;
done

# Container ring
docker run \
–rm \
-v /etc/kolla/config/swift/:/etc/kolla/config/swift/ \
$KOLLA_SWIFT_BASE_IMAGE \
swift-ring-builder \
/etc/kolla/config/swift/container.builder create 8 3 1

for i in {1..8}; do
docker run \
–rm \
-v /etc/kolla/config/swift/:/etc/kolla/config/swift/ \
$KOLLA_SWIFT_BASE_IMAGE \
swift-ring-builder \
/etc/kolla/config/swift/container.builder add r1z1-10.240.103.1${i}:6002/d0 1;
done

for ring in object account container; do
docker run \
–rm \
-v /etc/kolla/config/swift/:/etc/kolla/config/swift/ \
$KOLLA_SWIFT_BASE_IMAGE \
swift-ring-builder \
/etc/kolla/config/swift/${ring}.builder rebalance;
done

#######################################################

Then copy all what’s been generated by this script into kolla-deployer host’s /etc/kolla/config/swift.

Neutron

By default, kolla uses flat network and only enable vlan provider network when ironic is enabled. so you’ll see this in the ml2_config.ini

[ml2]
type_drivers = flat,vlan,vxlan
tenant_network_types = vxlan
mechanism_drivers = linuxbridge,l2population
extension_drivers = qos,port_security,dns

[ml2_type_vlan]
network_vlan_ranges =

[ml2_type_flat]
flat_networks = physnet1

[ml2_type_vxlan]
vni_ranges = 1:1000
vxlan_group = 239.1.1.1

[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.IptablesFirewallDriver

[linux_bridge]
physical_interface_mappings = physnet1:br_vlan

[vxlan]
l2_population = true
local_ip = 10.240.102.14

Move physnet1 from flat to network_vlan_ranges will enable vlan provider feature.

Ironic

Ironic is Bare Metal Service on openstack. It needs few parts to be installed before deployed by kolla.

  1. apt-get install qemu-ultis
  2. sudo pip install -U “diskimage-builder>=1.1.2”
  3. disk-image-create ironic-agent ubuntu -o ironic-agent (this cannot be done under lxc)
  4. copy generated ironic-agent.kernel and ironic-agent.initramfs to kolla-ansible host /etc/kolla/config/ironic

enable ovs, and then kolla-ansible deploy.

When deploying ironic, iscsid will be required and it may have error “iscsid container: mkdir /sys/kernel/config: operation not permitted”, the fix is to run “modprobe configfs” on each host.

iscsid may fail to start, remove open-iscsi on all hosts will fix this.

Magnum

pip install python-magnumclient, version 2.6.0.

source user who need to use magnum need to have role in heat

make sure have following value in magnum.conf, otherwise barbican will complain for not being able to create certs.

[certificates]
cert_manager_type = barbican
cert_manager_type = x509keypair

Current COE and their supported distro, it has to match this table, otherwise it will complain vm type not supported.

COE distro
Kubernetes Fedora Atomic
Kubernetes CoreOS
Swarm Fedora Atomic
Mesos Ubuntu

Example to create docker swarm cluster:

wget https://fedorapeople.org/groups/magnum/fedora-atomic-newton.qcow2
openstack image create \
–disk-format=qcow2 \
–container-format=bare \
–file=fedora-atomic-newton.qcow2 \
–property os_distro=’fedora-atomic’ \
fedora-atomic-newton
magnum cluster-template-create swarm-cluster-template \
–image fedora-atomic-newton \
–keypair mykey \
–external-network public \
–dns-nameserver 8.8.8.8 \
–master-flavor m1.small \
–flavor m1.small \
–coe swarm
magnum cluster-create swarm-cluster \
–cluster-template swarm-cluster-template \
–master-count 1 \
–node-count 1

Collectd Influxdb and Grafana

these combination can be a really nice tool for monitoring openstack activities.

few things need to changed from default kolla deployment config:

collectd:

FQDNLookup false
LoadPlugin network
LoadPlugin syslog
LoadPlugin cpu
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
Server “10.240.101.11” “25826”

influxdb:

[[collectd]]
enabled = true
bind-address = “10.240.101.11:25826”
database = “collectd”
typesdb = “/usr/share/collectd/types.db”

Be caution, it needs [[]] for collectd on influxdb. And also this types.db won’t be created automatically! Even though you can see on influxdb some udp traffic received from collectd, but it’s not stored in types.db until you manually copy it from collectd host/folder. This is critical!!!

Then Grafana is much simpler. You just need to add influxdb as datasource, and make up graphics in the dashboard. if you want to show interface traffic in bit/s, just use derivative and if_octets.

Rally with Tempest testing benchmark

Create tempest verifier, this will automatically download from github repo.

rally verify create-verifier –type tempest –name tempest-verifier

set this tempest verifier for current deployment with modified part in options.conf

rally verify configure-verifier –extend extra_options.conf

cat options.conf

[compute]
image_ref = acc51ecc-ee27-4b3a-ae2a-f0b1c1196918
image_ref_alt = acc51ecc-ee27-4b3a-ae2a-f0b1c1196918
flavor_ref = 7a8394f1-056b-41b3-b422-b5195d5a379f
flavor_ref_alt = 7a8394f1-056b-41b3-b422-b5195d5a379f
fixed_network_name = External

then just run test, “rally verify start –pattern set=compute” for specific parts of openstack.

Mount cdrom along with disk drive

Some time we’d like to have cdrom mounted with bootable disk to install OS instead of boot from images. In such a case, we need to tell openstack volume a will be a bootable cdrom, volome b will be secondary and be kept as vdb disk. After OS installed, we can then kill whole VM and recreate it again with volume b only and assign it as bootable vda, then it will be working as regular vm.

Here’s how to create VM with cdrom

nova boot –flavor m1.small –nic net-id=e3fa6e8f-5ae9-4da6-84ba-e52d85a272bb –block-device id=e513a39b-36a1-49df-a528-0ccdb0f8515b,source=volume,dest=volume,bus=ide,device=/dev/vdb,type=cdrom,bootindex=1 –block-device source=volume,id=87ae535a-984d-4ceb-87e9-e48fa109c81a,dest=volume,device=/dev/vda,bootindex=0 –key-name fuel fuel

Create PXE boot image

Openstack doesn’t support instance PXE boot. To make it work, we need to create our own PXE bootable image.

Here’s how(only works on non-container):

https://kimizhang.wordpress.com/2013/08/26/create-pxe-boot-image-for-openstack/

1.Create a small empty disk file, create dos filesystem.
dd if=/dev/zero of=pxeboot.img bs=1M count=4
mkdosfs pxeboot.img
2.Make it bootable by syslinux
losetup /dev/loop0 pxeboot.img
mount /dev/loop0 /mnt
syslinux –install /dev/loop0
3.Install iPXE kernel and make sysliux.cfg to load it at bootup
wget http://boot.ipxe.org/ipxe.iso
mount -o loop ipxe.iso /media
cp /media/ipxe.krn /mnt
cat > /mnt/syslinux.cfg <<EOF
DEFAULT ipxe
LABEL ipxe
KERNEL ipxe.krn
EOF
umount /media/
umount /mnt

iSCSI on MDS

一篇很棒的BLOG。

CISCO关于iSCSI的指导

MDS在iSCSI中扮演一个中介角色,它一端通过FC连接STORAGE,一端通过GI口连接SERVER。在FC看来,它在和SERVER的PWWN相连(其实是MDS给的假PWWN),从SERVER端看来,它在连一个ISCSI TARGET IP或IQN。

iSCSI和ISLB的很大区别是,iSCSI initiator命令只用作修改VSAN和CHAP,而TARGET需要另外用iscsi virtual-target命令定义。islb是把这两个合并到islb initiator命令中使用。同时islb virtual-target提供更多的详细control access。

在MDS没有配置任何iSCSI initiator的情况下,如果有SERVER前来连接MDS,MDS会根据iSCSI口上是否启用动态分配PWWN功能做成反应。如果开启,就会自动分配地址。

MDS1(config)# show iscsi initiator 
iSCSI Node name is iqn.1998-01.com.vmware:53de1d20-106c-8c14-070d-0025b500010d-612838b7 
    Initiator ip addr (s): 10.150.150.10 
    iSCSI alias name:  
    Auto-created node (iSCSI)
    Node WWN is 21:09:00:0d:ec:54:63:82 (dynamic) 
    Member of vsans: 101
    Number of Virtual n_ports: 1
    Virtual Port WWN is 21:0a:00:0d:ec:54:63:82 (dynamic)
      Interface iSCSI 1/2, Portal group tag: 0x3001 
      VSAN ID 101, FCID 0x010104

iSCSI配置举例:

feature iscsi
iscsi enable module 1
开启MODULE1上的GI口的iSCSI功能,开启后自动生成GI口对应的iSCSI口,但处于SHUT状态

int iscsi 1/2
no shut

switchport proxy-initiator 这句可有可无,PROXY的作用是将多个FLOGI和FCID合并为一个

vsan database
vsan 101 interface iscsi 1/2
将新iSCSI口分配给VSAN101,后面才能和其他VSAN101中的FC上的PWWN归为一个ZONE

iscsi import target fc
将FC上连的所有PWWN对象都引入为自动创建的iSCSI TARGET

zoneset name VSAN101 vsan 101
zone name ESXi-JBOD1-D2
member pwwn 21:00:00:1d:38:1c:6f:24
FC上的STORAGE PWWN
member ip-address 10.150.150.10
SERVER端IP地址
member pwwn 21:0a:00:0d:ec:54:63:82
MDS为iSCSI 1/2自动生成的PWWN,在STORAGE看来它就是SERVER端的PWWN
member symbolic-nodename iqn.1998-01.com.vmware:53de1d20-106c-8c14-070d-0025b500010d-612838b7
SERVER端IQN
zoneset activate name VSAN101 vsan 101

iscsi save-initiator
将系统自动分配的动态SERVER PWWN存为固态PWWN,防止重启后PWWN改变

======================================================================================
以上是动态分配的配置方法,下面介绍固态
iscsi import target fc会自动将所有连入MDS的IQN请求关联,如果想针对某一TARGET IQN的连接进行限制,就要针对TARGET建立可以访问的表格
UCS BOOT from iSCSI是需要写TARGET IQN,就会用到这个

iscsi virtual-target name iqn.2014-08.lab.mds1:jbod1-d3
pwwn 21:00:00:1d:38:1c:78:fa
通过FC连接的STORAGE的PWWN
initiator ip address 10.150.150.10 255.255.255.255 permit
可以和此STORAGE连接的SERVER IP
initiator iqn.1998-01.com.vmware:53de1d20-106c-8c14-070d-0025b500010d-612838b7 permit
可以和此STORAGE连接的SERVER IQN
advertise interface g1/2
限定只能SERVER只能从G1/2连入(可以不限定)
=======================================================================================
CHAP的配置
username iscsiuser password abc123 iscsi
创建一个属于ISCSI的用户
iscsi initiator name iqn.1998-01.com.vmware:53de1d20-106c-8c14-070d-0025b500010d-612838b7
username iscsiuser
限制可以访问IQN的用户
vsan 101
可以对个别initiator指定可其所属VSAN,对initiator进行限制除了CHAP以外意义不大。

iSLB

iSCSI Server Load Balancing,作为iSCSI的高级应用,依靠VRRP支持负载均衡。设计中,系统为每个VRRP机器配置一个负荷值(METRIC,从0开始计数),负荷小的机器会在下一次分配中被分配到任务。默认VRRP MASTER为(0+),所以第一次分配永远不会分给MASTER。

Configuration Sample:

islb distribute
开启全局ISLB
int iscsi 1/2
no shut
开启G1/2 iSCSI
interface g1/2
ip add 10.150.150.5 255.255.255.0
no shut
vrrp 150
ip 10.150.150.254
no shut
配置G1/2 VRRP
islb vrrp 150 load-balance
启用VRRP 150 LB
islb commit
配置完ISLB就要COMMIT
islb initiator name iqn.1998-01.com.vmware:53de1d20-106c-8c14-070d-0025b500010d-612838b7
vsan 101
static nwwn system-assign
static pwwn system-assign 1
以ISLB替换ISCSI关键字,功能是一样的
target pwwn 22:00:00:1d:38:1c:76:db iqn-name iqn.2014-08.lab.mds.jbod1-d8-b
ISLB中的TARGET是在INIT下面定义的,和ISCSI不同

CCIE DC TIPs

FI到NEXUS的UPLINK口,在NEXUS上要spanning-tree port type edge trunk,实现快速收敛。

Gen1不支持UCS Chassis到FI之间Port Channel。只有两边都是Gen2才行。

从一个FI UPLINK口收到的流量不会从另一个UPLINK口出去。

同一CHASSIS上的同一IO只连同一个FI,不可以IO A连FI A+B。

HOST上的VIC card分1280和1240,1280从HOST到IO A/B分别有4条PATH,每条10G,一共80G带宽。1240从HOST到IO A/B分别有2条PATH,每条10G,一共40G带宽。IO分为2208和2204两种,2208有8口共80G带宽,2204有4口共40G带宽。所以用2208对外一个有160G带宽可用,而如果CHASSIS使用1280卡满负荷运行会产生8×80G=640G流量。

HOST上的STORAGE流量通过VSAN PIN GROUP从指定FI的FC口流出,从HOST到FI的路径是自动内部FCOE。FI的HOST模式自动将FI设成NPV MODE,所以要求上家FC交换机有NPIV功能。所以就是MDS要开NPIV。

要令Nexus上的Unified Port在ethernet和fc间转换需要使用命令slot然后进入各个口进行type定义,之后需要重启。开启FCOE之后就可以启用新的FC口了。

VDC ha-policy:

Default VDC不可以修改,RESET=RELOAD。

reload:重启机器,保持VDC配置不变, copy start run。

restart:不重启机器,no vdc, copy start run。

bringdown:什么都不做,只将坏的sup关停。

switchover:SUP间切换。

NPV-NPIV

NPV mode SW 没有E PORT,只有F,NP,SD。

NPIV-NPIV之间应该是E port????

qos/mtu

You can only apply input to a qos policy; you can apply both input and output to a queuing policy.

qos是给input traffic分组打TAG,queuing负责bandwidth,network-qos负责对qos分组的数据进行再塑(qos class default没有cos标,所以如果是改MTU的话应该应用在class default上)。

1000v needs mtu to be applied under port-profile and qos needs to be in service policy under port-profile; 5k needs it to be applied in network-qos under “system qos – service policy”.