Amazon Braket Hybrid Jobs

Amazon Braket Hybrid Jobs allow you to easily run hybrid classical-quantum workflows on AWS managed infrastructure by submitting your own scripts which run in a Docker container, either one provided by Amazon Braket or one that is made available to you through Amazon ECR. To learn more about Amazon Braket Hybrid Jobs, see the Developer Guide, and to learn how to provide your own Docker images, see the Bring Your Own Container (BYOC) guide.

You can also run a LocalJob, which runs the container and your script on your compute hardware (your laptop, or an EC2 instance, for example), using PyBraket.jl. This can be useful for debugging and performance tuning purposes.

Braket.AwsQuantumJobMethod
AwsQuantumJob(device::String, source_module::String; kwargs...)

Create and launch an AwsQuantumJob which will use device device (a managed simulator, a QPU, or an embedded simulator) and will run the code (either a single file, or a Julia package, or a Python module) located at source_module. The keyword arguments kwargs control the launch configuration of the job.

Keyword Arguments

  • entry_point::String - the function to run in source_module if source_module is a Python module/Julia package. Defaults to an empty string, in which case the behavior depends on the code language. In Python, the job will attempt to find a function called main in source_module and run it. In Julia, source_module will be loaded and run with Julia's include.
  • image_uri::String - the URI of the Docker image in ECR to run the Job on. Defaults to an empty string, in which case the base container is used.
  • job_name::String - the name for the job, which will be displayed in the jobs console. The default is a combination of the container image name and the current time.
  • code_location::String - the S3 prefix URI to which code will be uploaded. The default is default_bucket()/jobs/<job_name>/script
  • role_arn::String - the IAM role ARN to use to run the job. The default is to use the default jobs role.
  • wait_until_complete::Bool - whether to block until the job is complete, displaying log information as it arrives (true) or to run the job asynchronously (false, default).
  • hyperparameters::Dict{String, Any} - hyperparameters to provide to the job which will be available from an environment variable when the job is run. See the Amazon Braket documentation for more.
  • input_data::Union{String, Dict} - information about the training/input data to provide to the job. A Dict should map channel names to local paths or S3 URIs. Contents found at any local paths encoded as Strings will be uploaded to S3 at s3://{default_bucket_name}/jobs/{job_name}/data/{channel_name}. If a local path or S3 URI is provided, it will be given a default channel name "input". The default is Dict().
  • instance_config::InstanceConfig - the instance configuration to use to run the job. See the Amazon Braket documentation for more information about available instance types. The default is InstanceConfig("ml.m5.large", 1, 30).
  • distribution::String - specifies how the job should be distributed. If set to "data_parallel", the hyperparameters for the job will be set to use data parallelism features for PyTorch or TensorFlow.
  • stopping_condition::StoppingCondition - the maximum length of time, in seconds, that a job can run before being forcefully stopped. The default is StoppingCondition(5 * 24 * 60 * 60).
  • output_data_config::OutputDataConfig - specifies the location for the output of the job. Any data stored here will be available to download_result and results. The default is OutputDataConfig("s3://{default_bucket_name}/jobs/{job_name}/data").
  • copy_checkpoints_from_job::String - specifies the job ARN whose checkpoint is to be used in the current job. Specifying this value will copy over the checkpoint data from use_checkpoints_from_job's checkpoint_config S3 URI to the current job's checkpoint_config S3 URI, making it available at checkpoint_config.localPath during the job execution. The default is not to copy any checkpoints (an empty string).
  • checkpoint_config::CheckpointConfig - specifies the location where checkpoint data for this job is to be stored. The default is CheckpointConfig("/opt/jobs/checkpoints", "s3://{default_bucket_name}/jobs/{job_name}/checkpoints").
  • tags::Dict{String, String} - specifies the key-value pairs for tagging this job.
source
Braket.log_metricFunction
log_metric(metric_name::String, value::Union{Float64, Int}; timestamp=time(), iteration_number=nothing)

Within a job script, log a metric with name metric_name and value value which can later be fetched outside the job with metrics. A metric might be, for example, the loss of a training algorithm at each epoch, or similar.

source
Braket.metricsFunction
metrics(j::AwsQuantumJob; metric_type="timestamp", statistic="max")

Fetches the metrics for job j. Metrics are generated by log_metric within the job script.

source
Braket.logsFunction
logs(j::AwsQuantumJob; wait::Bool=false, poll_interval_seconds::Int=5)

Fetches the logs of job j. If wait is true, blocks until j has entered a terminal state ("COMPLETED", "FAILED", or "CANCELLED"). Polls every poll_interval_seconds for new log data.

source
Braket.download_resultFunction
download_result(j::AwsQuantumJob; kwargs...)

Download and extract the results of job j. Valid kwargs are:

  • extract_to::String - the local folder to extract the results to. Default is the current working directory.
  • poll_timeout_seconds::Int - the maximum number of seconds to wait while polling for results. Default: 864000
  • poll_interval_seconds::Int - how many seconds to wait between download attempts. Default: 5
source