1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
|
Networks
========
192.168.11.0/24 (18-port IB switch): Legacy network, non-production systems including storage
192.168.12.0/24 (12-port IB swotch): KATRIN Storage network
192.168.13.0/24 (12-port IB switch): HPC Cloud & Computing network
192.168.26.0/24 (Ethernet): Infrastructure network (OpenShift nodes and everything else)
192.168.16.0/22 External IPs for testing and production
192.168.111.0/24 (OpenVPN): Gateway to Katrin network using Master1 tunnel
192.168.112.0/24 (OpenVPN): Gateway to Katrin network using Master2 tunnel
192.168.212.0/24
192.168.213.0/24
192.168.226.0/24 (Ethernet): Staging network (Virtual OpenShift and other nodes)
192.168.216.0/22 External IPs for staging
192.168.221.0/24 (OpenVPN): Gateway to Katrin network using staging Master1 tunnel
192.168.222.0/24 (OpenVPN): Gateway to Katrin network using staging Master2 tunnel
KIT resources
=============
- ipekatrin*.ipe.kit.edu Cluster nodes
- ipekatrin[1:2].ipe.kit.edu Master nodes with fixed IPs (one could be dead)
+ katrin[1:2].ipe.kit.edu Virtual IPs assigned to master nodes (HA)
+ kaas.kit.edu (katrin.ipe.kit.edu) DNS-based load balancer between katrin[1:2].ipe.kit.edu
+ *.kaas.kit.edu (*.katrin.ipe.kit.edu) Default application domain?
- katrin.kit.edu Apache/mod_proxy pod (In DNS put CN to katrin.ipe.kit.edu)
+ openshift.ipe.kit.edu Gateway (VIPS) to staging cluster (Just one IP migrating between 2 nodes)
- *.openshift.ipe.kit.edu Default application domain for staging cluster
Storage
=======
LVM VGs
VolGroup00
-> LogVol*: System partitions
-> docker-pool: Docker storage
Katrin
-> Heketi PD (we reserve space, but do not configure heketi so far)
-> vg_*
-> Heketi-managed Gluster Volumes
-> Katrin (mounted at '/mnt/ands')
-> Space for manually-managed Gluster Bricks
-> Storage for Galera / Cassandra / etc.?
Gluster Volume Types:
tmp: disitribute ? Various data which should be preserved, but not critical if lost or temporarily inaccessible (logs, etc.) [ check if we can still write if one brick is gone ]
cfg: replica=3 Small and critical data sets (configs, sources, etc.)
cache: replica+arbiter Large re-generatable data which anyway should be always available [ potentially we can use disperse to save space ]
data: replica+arbiter Very large and critical data
db: dispersed A few very large files, like large single-table database (ADEI many tables)
Scalling storage:
cfg: 3 nodes is enough
cache/data: [d][d][a] => [da][d ][ad][ d] => [d ][d ][ d][ d][aa] => further increas in pairs, at some point add second arbiter node
Gluster Volumes:
provision cfg /mnt/provision Provisioning volume which is not expected to be mounted in the containers (temporarily may contain secret information, etc.)
openshift cfg /mnt/openshift Multi-purpose: Various small size configurations (adei, apache, etc.)
temporary tmp /mnt/temporary Multi-purpose: Various logs & temporary files
?adei cfg /mnt/adei/adei
adei-db cache /mnt/adei/db
adei-tmp tmp /mnt/adei/tmp
katrin-mysql data /mnt/katrin/mysql
katrin-data cfg /mnt/katrin/archive
katrin-kali cache /mnt/katrin/storage
katrin-tmp tmp /mnt/katrin/workspace
OpenShift Volumes:
etc cfg/ro openshift Various configurations (ADEI & Apache configs, other stuff in etc.)
src cfg/ro openshift Interpreted source files
log tmp/rw tmp Suff in /var/log
tmp tmp/rw tmp Various temporary files
adei-db data/rw adei-db ADEI cache database and a few primary source [ will take ages to regenerate, so we can't consider it as dispensable cache really ]
adei-tmp tmp/rw adei-tmp ADEI, Apache, and Cron logs [Techically we have also downloads here which are more cache when tmp... But I think it is fine for now...]
adei-cfg cfg/ro adei? ADEI & Apache configs
adei-src cfg/ro adei? ADEI sources
katrin-mysql cfg/rw katrin-mysql KATRIN Database with configurations, etc.
katrin-data data/rw katrin-data KATRIN data archives, all primary raw data from Orca, etc.
katrin-kali cache/rw katrin-kali Generated ROOT files [ Can we make this separation? Marco uses hardlinks ]
katrin-proc tmp/rw katrin-proc Data processing volume (inbox, etc.)
Services
========
- Keepalived
- OpenVPN
- Gluster
- MySQL Galera (?)
- Cassandra (?)
- oVirt (?)
- OpenShift Master / Node
- Heketi
- Apache Router
- ADEI Services
- Apache Spark & etc.
Inventories
===========
- staging & production will be operating in parallel (staging in vagrant and production on bare-metal)
- testing is just pre-production tests which will be removed once production is running
Labels
======
- We specify if node is master and provides fat storage for glusterfs
- All nodes currently in 'infra' region (for example, student computers will be non-infra nodes; nodes outside of KIT as well)
- The servers in cellar are in 'default' zone (if we put something in the 4th floor server room, we would define a new zone there)
Computing
=========
- Define CUDA nodes and OpenCL nodes
- Intel Xeon Phi is replaced by new Tesla in the ipepdvcompute2
- Gen1 UFO servers does not support "Above 64G decoding" and can't run Xeon Phi. May be we can put it in new Phi server.
|