g5k-campaign-tutorial.rb

Jump To …

curl-tutorial.sh g5k-campaign-tutorial.rb resources-status-2.0.rb restfully-tutorial.rb resources-status-sid.rb

g5k-campaign-tutorial.rb
¶
¶ This tutorial will show you how to use the g5k-campaign tool to easily build a whole experiment workflow. This tool is built on the Restfully library, and provide a higher level of abstraction, with support for parallel execution of reservations, deployments, and resource configuration. You can download the source file for this tutorial from here: https://github.com/grid5000/tutorials/blob/master/api/2.0/g5k-campaign-tutorial.rb.
¶ Prerequisites
¶ You need to install the `g5k-campaign` library. Assuming you have `Ruby` and `rubygems` (1.3.5+) already installed on your system, this can be done with: `gem install g5k-campaign \ —source http://g5k-campaign.gforge.inria.fr/pkg` The library comes with an executable. You can get a better feel of what it can do by displaying the usage help: `g5k-campaign -h` If you do not provide any option, it will launch a campaign using the default parameters, which are: submit a 1-hour job on one node of the `rennes` site, and deploy the `lenny-x64-base` environment on the reserved node. Note that you must have an SSH key installed on your machine, with the public part of that key in the authorized keys of the `access.grid5000.fr` machine. If this is not the case, follow the first steps as described at http://pkeck.myweb.uga.edu/ssh/, or in the Grid'5000 wiki.
¶ Building your own engine
¶ You can play a little with the various options of the default engine, but it is probably more useful to create your own engine that will execute specific actions before, after or at any state of your experiment.
¶ Let’s start with a simple engine:	class SimpleCustomEngine < Grid5000::Campaign::Engine
¶ We override some of the parameters. The complete list of options can be found here. Note that every parameter given on the command-line will always overwrite those defined in the engine.	set :environment, "lenny-x64-base" set :resources, "nodes=2" set :walltime, 7200 set :site, "nancy"
¶ By default, all the reserved nodes are released when the engine terminates. Here we want to keep the nodes available after the end of the workflow, so that we can still use them.	set :no_cleanup, true
¶ Define an action to execute before reserving resources:	before :reserve! do \|env, *args\| logger.info "Executed before reservation!"
¶ For all hooks, we must always return `env`. Otherwise the hook is considered to have failed.	env end
¶ Define an action to execute after deploying resources:	after :deployment! do \|env, *args\| logger.info "[#{env[:site]}] Nodes have been deployed: #{env[:nodes].inspect}" env end
¶ Define an action to execute on the installation phase (i.e. after reservation and deployment are done):	on :install! do \|env, *args\| logger.info "[#{env[:site]}] Installing additional software on the nodes..."
¶ We SSH to each node to install additional software. Note that this is a naive approach that only works for small numbers of nodes.	env[:nodes].each do \|node\| ssh(node, "root", :timeout => 10) do \|ssh\| output = ssh.exec!("apt-get update && apt-get install -y ganglia-monitor bonnie++") logger.debug output
¶ You can easily copy files on your nodes if you wish. As an example, here we just send a file containing the list of reserved nodes.	nodes_file = "/tmp/#{env[:job]['uid']}" ssh.scp.upload!(StringIO.new(env[:nodes].join("\n")), nodes_file) end end env end
¶ Define an action to execute after the nodes have been reserved, deployed, and installed:	on :execute! do \|env, *args\|
¶ Use the :multi option if you want to run SSH commands in parallel. This is better than sequentially SSHing to nodes, but for large number of nodes, you should probably connect to the frontend and launch a `taktuk` process for efficient execution.	ssh(env[:nodes], "root", :multi => true, :timeout => 10) do \|ssh\|
¶ Run the `bonnie++` benchmark on each node, and publish a custom metric every 5 secs (in a real experiment you’d want to send something else than $RANDOM):	cmd = %Q{nohup sh -c '(while true; do gmetric --name custom_metric_#{env[:user]} --type uint16 --value $RANDOM; sleep 20; done &) && bonnie++ -u root -d /tmp 1>/dev/null 2>&1' >/dev/null &} logger.info "[#{env[:site]}] Executing command: #{cmd}" ssh.exec(cmd)
¶ In the multi version, we must explicitly tell when the commands are ready to be launched in parallel on all nodes. See http://net-ssh.github.com/multi/v1/api/index.html for more info.	ssh.loop end env end
¶ The following is just an example of what we can do once everything is setup and running. Here we’ll just poll the values for two timeseries (our custom metric and `cpu_idle`). We use the Metrology API for that.	after :execute! do \|env, *args\| from = env[:job]['submitted_at'] resolution = 15 10.times do to = Time.now.to_i-resolution ["custom_metric_#{env[:user]}", "cpu_idle"].each do \|metric\| begin logger.info "[#{env[:site]}] Fetching timeseries for #{metric} metric..."
¶ In all engines you have access to a `connection` handler, which is a `Restfully::Session` object that you can use to access the Grid'5000 API. See the Restfully tutorial for more details.	connection.root.sites[env[:site].to_sym].metrics[metric.to_sym]. timeseries( :query => { :only => env[:nodes].join(","), :resolution => resolution, :from => from, :to => to } ).reload.each do \|timeseries\| logger.info [env[:site], timeseries['uid'], metric].join(" - ") logger.info timeseries['values'].inspect end rescue => e logger.warn "[#{env[:site]}] Error when fetching #{metric} metric: #{e.inspect}" end end sleep 15 end env end end
¶ Running your engines
¶ To execute this engine, you have two solutions. Either you copy the source file on your machine, and then launch it as followsi (please replace `login` with your login): `g5k-campaign -i path/to/file —gateway access.grid5000.fr \ -u login SimpleCustomEngine` or you directly pass the source file URI to `g5k-campaign` (but you can’t make changes): `g5k-campaign -i https://github.com/grid5000/tutorials\ /raw/master/api/2.0/g5k-campaign-tutorial.rb \ —gateway access.grid5000.fr \ -u login SimpleCustomEngine` As a side-note, if you are in the process of developing or modyfing an engine, the `—dev` option of `g5k-campaign` is very useful ;–)
¶ Advanced example
¶ One of the interesting feature of `g5k-campaign` is that you can reuse existing engines by creating a subclass. Let’s say you’d like to execute the previous workflow on more than one site, here is a way to do it.
¶ Note how we inherit from `SimpleCustomEngine`:	class Grid < SimpleCustomEngine set :site, :all # :all or :rennes or [:rennes, :nancy] or... before :reserve! do \|env, *args\| logger.info "Reserving nodes on #{site} sites..." env end
¶ Here we change what is done by the default `:reserve!` hook, so that we can launch the reservation process on more than one site at a time.	on :reserve! do \|env, block\|
¶ We make use of the `how_many?` helper function which returns the number of available nodes on each site (see http://g5k-campaign.gforge.inria.fr/Grid5000/Campaign/Engine.html#how_many%3F-instance_method).	status = how_many? logger.info "Status=#{status.inspect}" case env[:site].to_s when "all" sites = status.keys when "any"
¶ If any site will do, take the one with the most nodes available:	sites = [status.sort_by{\|k,v\| v}.last[0]] else sites = [env[:site]].flatten end
¶ `g5k-campaign` comes with helper methods for parallel execution (see http://g5k-campaign.gforge.inria.fr/Grid5000/Campaign/Parallel.html), whose usage is demonstrated here.	env[:parallel_reserve!] = parallel(:ignore_thread_exceptions => true) envs = [] sites.each do \|uid\| if status[uid].nil? \|\| status[uid] < 5 logger.info "Skipped #{uid} since it has only #{status[uid]} nodes that match our requirements." else new_env = env.merge(:site => uid) env[:parallel_reserve!].add(new_env) do \|env\| reserve!(env, &block) end envs.push(new_env) end end
¶ Master thread must wait for all other threads termination:	env[:parallel_reserve!].loop!
¶ At the end of the whole workflow, automatically display the URL at which a graphical view of the metrics can be seen. Skip the sites where the reservation failed.	metrics_query = envs.reject{\|e\| e[:job].nil?}.map do \|e\| [e[:site], e[:job]['uid']].join(":") end.join(",") logger.info "You can get a graph of your metrics at https://api.grid5000.fr/sid/ui/metrics.html?jobs=#{metrics_query}" env end before :execute! do \|env, *args\|
¶ Call `#wait!` on the parallel object if you want to synchronize all threads at some point. In this example, the execution phase will be launched only after all the other threads are arrived here.	logger.info "[#{env[:site]}] Waiting for other deployments to finish..." env[:parallel_reserve!].wait! env end after :execute! do \|env, *args\| logger.info "[#{env[:site]}] Done!" env end end
¶ Running your engines (with inheritance)
¶ If all your engines are declared in the same file, just use the same command as before but replace `SimpleCustomEngine` with `Grid` in the command (please replace `login` with your login): `g5k-campaign -i https://github.com/grid5000/tutorials\ /raw/master/api/2.0/g5k-campaign-tutorial.rb \ —gateway access.grid5000.fr \ -u login Grid` If you have engines declared in more than one file, just use multiple `-i` flags to include them all.
¶ Conclusion
¶ This concludes our tutorial, please see the documentation and examples for more advanced usages, including grid reservation, multiple deployments, notifications, etc.

g5k-campaign-tutorial.rb

Prerequisites

Building your own engine

Running your engines

Advanced example

Running your engines (with inheritance)

Conclusion