Operational Processes

Managing domain entities

Domain bootstrapping

If you’re running a domain node in its default configuration, it will have a sequence and mediator embedded and these components will be automatically bootstrapped for you.

However if your domain operates with external sequencers and mediators for improved availability and performance properties, you need to instead configure a domain manager node (which only runs topology management) and bootstrap your domain with at least one external sequencer node and one external mediator node as illustrated below:

        domainManager1.setup.bootstrap_domain(Seq(sequencer1), Seq(mediator1))

Domain managers are configured as domain-managers under the canton configuration. Domain managers are configured similarly to domain nodes, except that there is no sequencer, mediator, public api or service agreement configs.

Please note that if your sequencer is database based and you’re horizontally scaling it as described under sequencer high availability, you do not need to pass all sequencer nodes into the command above. Since they all share the same relational database, you only need to run this initialization step on one of them.

For other non-database based sequencer such as Ethereum or Fabric sequencers you need to have each node initialized individually. For these kinds of sequencers you can either initialize them as part of the initial domain bootstrap shown above or you can dynamically add a new sequencer at a later point like follows:

        domainManager1.setup.onboard_new_sequencer(initialSequencer = sequencer1, newSequencer = sequencer2)

Importing existing Contracts

You may have existing contracts, parties, and DARs in other Daml Connect Participant Nodes (such as the Daml sandbox) that you want to import into your Canton-based participant node. To address this need, you can extract contracts and associated parties via the ledger api, modify contracts, parties, and daml archived as needed, and upload the data to Canton using the Canton Console.

You can also import existing contracts from Canton as that is useful as part of Canton upgrades across major versions with incompatible internal storage.

importing ledger contracts from other Daml Connect Participant Nodes or instances of Canton based on previous major versions

Preparation

As contracts (1) “belong to” parties and (2) are instances of Daml templates defined in Daml Archives (DARs), importing contracts to Canton also requires creating corresponding parties and uploading DARs.

  • Contracts are often interdependent requiring care to honor dependencies such that the set of imported contracts is internally consistent. This requires particular attention if you choose to modify contracts prior to their import.

  • Additionally use of divulgence in the original ledger has likely introduced non-obvious dependencies that may impede exercising contract choices after import. As a result such divulged contracts need to be re-divulged as part of the import (by exercising existing choices or if there are no-side-effect-free choices that re-divulge the necessary contracts by extending your Daml models with new choices).

  • Party Ids have a stricter format on Canton than on non-Canton ledgers ending with a required “fingerprint” suffix, so at a minimum, you will need to “remap” party ids.

  • Canton contract keys do not have to be unique, so if your Daml models rely on uniqueness, consider extending the models using these strategies or limit your Canton Participants to connect to a single Canton domain with unique contract key semantics.

  • Canton does not support implicit party creation, so be sure to create all needed parties explicitly.

  • In addition you could choose to spread contracts, parties, and DARs across multiple Canton Participants.

With the above requirements in mind, you are ready to plan and execute the following three step process:

  1. Download parties and contracts from the existing Daml Connect Participant Node and locate the DAR files that the contracts are based on.

  2. Modify the parties and contracts (at the minimum assigning Canton-conformant party ids).

  3. Provision Canton Participants along with at least one Canton Domain. Then upload DARs, create parties, and finally the contracts to the Canton participants. Finally connect the participants to the domain(s).

Importing an actual Ledger

To follow along with this guide, ensure you have installed and unpacked the Canton release bundle and run the following commands from the “canton-X.Y.Z” directory to set up the initial topology.

export CANTON=`pwd`
export CONF="$CANTON/examples/03-advanced-configuration"
export IMPORT="$CANTON/examples/07-repair"
bin/canton \
  -c $IMPORT/participant1.conf,$IMPORT/participant2.conf,$IMPORT/participant3.conf,$IMPORT/participant4.conf \
  -c $IMPORT/domain-export-ledger.conf,$IMPORT/domain-import-ledger.conf \
  -c $CONF/storage/h2.conf,$IMPORT/enable-preview-commands.conf \
  --bootstrap $IMPORT/import-ledger-init.canton

This sets up an “exportLedger” with a set of parties consisting of painters, house owners, and banks along with a handful of paint offer contracts and IOUs.

Define the following helper functions useful to extract parties and contracts via the ledger api:

  def queryActiveContractsFromDamlLedger(
      hostname: String,
      port: Port,
      tls: Option[TlsClientConfig],
      token: Option[String] = None)(implicit consoleEnvironment: ConsoleEnvironment): Seq[CreatedEvent] = {

    // Helper to query the ledger api using the specified command.
    def queryLedgerApi[Svc <: AbstractStub[Svc], Result](
        command: GrpcAdminCommand[_, _, Result]): Either[String, Result] =
      consoleEnvironment.grpcAdminCommandRunner
        .runCommand("sourceLedger", command, ClientConfig(hostname, port, tls), token)
        .toEither

    (for {
      // Identify all the parties on the ledger and narrow down the list to local parties.
      allParties <- queryLedgerApi(LedgerApiCommands.PartyManagementService.ListKnownParties())
      localParties = allParties.collect {
        case PartyDetails(party, _, isLocal) if isLocal => LfPartyId.assertFromString(party)
      }

      // Look up the ledger id needed next to query for the contracts.
      ledgerId <- queryLedgerApi(LedgerApiCommands.LedgerIdentityService.GetLedgerIdentity())

      // Query the ActiveContractsService for the actual contracts
      acs <- queryLedgerApi(LedgerApiCommands.AcsService.GetActiveContracts(ledgerId, localParties.toSet))
    } yield acs.map(_.event)).valueOr(err =>
      throw new IllegalStateException(s"Failed to query parties, ledger id, or acs: $err"))
  }

  def removeCantonSpecifics(acs: Seq[CreatedEvent]): Seq[CreatedEvent] = {
    def stripPartyIdSuffix(suffixedPartyId: String): String = suffixedPartyId.split(SafeSimpleString.delimiter).head

    acs.map { event =>
      ValueRemapper.convertEvent(identity, stripPartyIdSuffix)(event)
    }
  }

  def lookUpPartyId(participant: ParticipantReference, party: String): PartyId =
    participant.parties.list(filterParty = party + SafeSimpleString.delimiter).map(_.party).head

As the first step, export the active contract set (ACS). To illustrate how to import data from non-Canton ledgers, strip the Canton-specifics by making the party ids generic (stripping the Canton-specific suffix).

    val acs =
      queryActiveContractsFromDamlLedger(
        exportLedger.config.ledgerApi.address,
        exportLedger.config.ledgerApi.port,
        exportLedger.config.ledgerApi.tls.map(_.clientConfig)
      )

    val acsExported = removeCantonSpecifics(acs).toList

Step number two involves preparing the Canton participants and domain by uploading DARs and creating parties. Here we choose to place the house owners, painters, and banks on different participants.

placing contracts on all the correct Canton Participants

Also modify the events to be based on the newly created party ids.

    // Decide on which canton participants to host which parties along with their contracts.
    // We place house owners, painters, and banks on separate participants.
    val participants     = Seq(participant1, participant2, participant3)
    val partyAssignments = Seq(participant1 -> houseOwners, participant2 -> painters, participant3 -> banks)

    // Connect to domain prior to uploading dars and parties.
    participants.foreach { participant =>
      participant.domains.connect_local(importLedgerDomain)
      participant.dars.upload(darPath)
    }

    // Create canton party ids and remember mapping of plain to canton party ids.
    val toCantonParty: Map[String, String] =
      partyAssignments.flatMap {
        case (participant, parties) =>
          val partyMappingOnParticipant = parties.map { party =>
            participant.ledger_api.parties.allocate(party, party)
            party -> lookUpPartyId(participant, party).toLf
          }
          partyMappingOnParticipant
      }.toMap

    // Create traffic on all participants so that the repair commands will pick an identity snapshot that is aware of
    // all party allocations
    participants.foreach { participant =>
      participant.health.ping(participant, workflowId = importLedgerDomain.name)
    }

    // Switch the ACS to be based on canton party ids.
    val acsToImportToCanton = acsExported.map(ValueRemapper.convertEvent(identity, toCantonParty(_)))

As the third step, perform the actual import to each participant filtering the contracts based on the location of contract stakeholders and witnesses.

    // Disconnect from domain temporarily to allow import to be performed.
    participants.foreach(_.domains.disconnect(importLedgerDomain.name))

    // Pick a ledger create time according to the domain's clock.
    val ledgerCreateTime =
      consoleEnvironment.environment.domains.getRunning(importLedgerDomain.name).get.clock.now.toInstant

    // Filter active contracts based on participant parties and upload.
    partyAssignments.foreach {
      case (participant, rawParties) =>
        val parties = rawParties.map(toCantonParty(_))
        val participantAcs = acsToImportToCanton
          .collect {
            case event
                if event.signatories.intersect(parties).nonEmpty
                  || event.observers.intersect(parties).nonEmpty
                  || event.witnessParties.intersect(parties).nonEmpty =>
              val wrappedCreatedEvent = WrappedCreatedEvent(event)

              SerializableContractWithWitnesses(utils.contract_data_to_instance(wrappedCreatedEvent.toContractData,
                                                                                ledgerCreateTime),
                                                Set.empty)
          }

        participant.repair.add(importLedgerDomain.name, participantAcs, ignoreAlreadyAdded = false)
    }

    def verifyActiveContractCounts() = {
      Map[LocalParticipantReference, (Boolean, Boolean)](
        participant1 -> ((true, true)),
        participant2 -> ((true, false)),
        participant3 -> ((false, true))
      ).foreach {
        case (participant, (hostsPaintOfferStakeholder, hostsIouStakeholder)) =>
          val expectedCounts =
            (houseOwners.map { houseOwner =>
              houseOwner.toPartyId(participant) ->
                ((if (hostsPaintOfferStakeholder) paintOffersPerHouseOwner else 0)
                  + (if (hostsIouStakeholder) 1 else 0))
            }
              ++ painters.map { painter =>
                painter.toPartyId(participant) -> (if (hostsPaintOfferStakeholder) paintOffersPerPainter else 0)
              }
              ++ banks.map { bank =>
                bank.toPartyId(participant) -> (if (hostsIouStakeholder) iousPerBank else 0)
              }).toMap[PartyId, Int]

          assertAcsCounts((participant, expectedCounts))
      }
    }

    /*
      If the test fails because of Errors.MismatchError.NoSharedContracts error, it could be worth to
      extend the scope of the suppressing logger.
     */
    loggerFactory.assertLogsUnorderedOptional(
      {
        // Finally reconnect to the domain.
        participants.foreach(_.domains.reconnect(importLedgerDomain.name))

To demonstrate that the imported ledger works, let’s have each of the house owners accept one of the painters’ offer to paint their house.

    def yesYouMayPaintMyHouse(houseOwner: PartyId, painter: PartyId, participant: ParticipantReference): Unit = {
      val iou  = participant.ledger_api.acs.await[Iou.Iou](houseOwner, Iou.Iou)
      val bank = iou.value.payer
      val paintProposal = participant.ledger_api.acs
        .await[Paint.OfferToPaintHouseByPainter](houseOwner,
                                                 Paint.OfferToPaintHouseByPainter,
                                                 pp => pp.value.painter == painter.toPrim && pp.value.bank == bank)
      val cmd = paintProposal.contractId
        .exerciseAcceptByOwner(houseOwner.toPrim, iou.contractId)
        .command
      val _ = clue(s"$houseOwner accepts paint proposal by $painter financing through ${bank.toString}")(
        participant.ledger_api.commands.submit(Seq(houseOwner), Seq(cmd)))
    }

    // Have each house owner accept one of the paint offers to illustrate use of the imported ledger.
    houseOwners.zip(painters).foreach {
      case (houseOwner, painter) =>
        yesYouMayPaintMyHouse(lookUpPartyId(participant1, houseOwner),
                              lookUpPartyId(participant1, painter),
                              participant1)
    }

    // Illustrate that acceptance of have resulted in
    {
      val paintHouseContracts = painters.map { painter =>
        participant2.ledger_api.acs.await[Paint.PaintHouse](lookUpPartyId(participant2, painter), Paint.PaintHouse)
      }
      assert(paintHouseContracts.size == 4)
      paintHouseContracts
    }

This guide has demonstrated how to import data from non-Canton Daml Connect Participant Nodes or from a Canton Participant of a lower major version as part of a Canton upgrade.

Backup and Restore

It is recommended that your database is frequently backed up so that the data can be restored in case of a disaster.

In the case of a restore, a participant can replay missing data from the domain considering the domain’s backup is more recent than that of the participant’s. It is important that the participant’s backup is not more recent than that of the domain’s as that would constitute a ledger fork. Therefore if you backup both participant and domain, always backup participant database before the domain.

In case of a domain restore from a backup, if a participant is ahead of the domain, the participant will refuse to connect to the domain and you must either:

  • restore the participant’s state to a backup before the disaster of the domain,

  • or roll out a new domain as a repair strategy in order to recover from a lost domain

We recommend that in production, a domain should be run with offsite synchronous replication to assure the most crucial data is always safely backed up and as up-to-date as possible.

Postgres Example

If you are using Postgres to persist the participant or domain node data, you can create backups to a file and restore it using Postgres’s utility commands pg_dump and pg_restore as shown below:

Backing up Postgres database to a file:

pg_dump -U <user> -h <host> -p <port> -w -F tar -f <fileName> <dbName>

Restoring Postgres database data from a file:

pg_restore -U <user> -h <host> -p <port> -w -d <dbName> <fileName>

Although the approach shown above works for small deployments, it is not recommended in larger deployments. For that, we suggest looking into incremental backups and refer to the resources below:

Repairing Participants

Canton enables interoperability of distributed participants and domains. Particularly in distributed settings without trust assumptions, faults in one part of the system should ideally produce minimal irrecoverable damage to other parts. For example if a domain is irreparably lost, the participants previously connected to that domain need to recover and be empowered to continue their workflows on a new domain.

This guide will illustrate how to replace a lost domain with a new domain providing business continuity to affected participants.

Recovering from a Lost Domain

Suppose that a set of participants have been conducting workflows via a domain that runs into trouble. In fact consider that the domain has gotten into such a disastrous state that the domain is beyond repair, for example:

  • The domain has experienced data loss and is unable to be restored from backups or the backups are missing crucial recent history.

  • The domain data is found to be corrupt causing participants to lose trust in the domain as a mediator.

Next the participant operators each examine their local state, and upon coordinating conclude that their participants’ active contracts are “mostly the same”. This domain-recovery repair demo illustrates how the participants can

  • coordinate to agree on a set of contracts to use moving forward, serving as a new consistent state,

  • copying over the agreed-upon set of contracts to a brand new domain,

  • “fail over” to the new domain,

  • and finally continue running workflows on the new domain having recovered from the permanent loss of the old domain.

Repairing an actual Topology

To follow along with this guide, ensure you have installed and unpacked the Canton release bundle and run the following commands from the “canton-X.Y.Z” directory to set up the initial topology.

export CANTON=`pwd`
export CONF="$CANTON/examples/03-advanced-configuration"
export REPAIR="$CANTON/examples/07-repair"
bin/canton \
  -c $REPAIR/participant1.conf,$REPAIR/participant2.conf,$REPAIR/domain-repair-lost.conf,$REPAIR/domain-repair-new.conf \
  -c $CONF/storage/h2.conf,$REPAIR/enable-preview-commands.conf \
  --bootstrap $REPAIR/domain-repair-init.canton

To simplify the demonstration, this not only sets up the starting topology of

  • two participants, “participant1” and “participant2”, along with

  • one domain “lostDomain” that is about to become permanently unavailable leaving “participant1” and “participant2” unable to continue executing workflows,

but also already includes the ingredients needed to recover:

  • The setup includes “newDomain” that we will rely on as a replacement domain, and

  • we already enable the “enable-preview-commands” configuration needed to make available the “repair.change_domain” command.

In practice you would only add the new domain once you have the need to recover from domain loss and also only then enable the repair commands.

We simulate “lostDomain” permanently disappearing by stopping the domain and never bringing it up again to emphasize the point that the participants no longer have access to any state from domain1. We also disconnect “participant1” and “participant2” from “lostDomain” to reflect that the participants have “given up” on the domain and recognize the need for a replacement for business continuity. The fact that we disconnect the participants “at the same time” is somewhat artificial as in practice the participants might have lost connectivity to the domain at different times (more on reconciling contracts below).

          lostDomain.stop()
          Seq(participant1, participant2).foreach { p =>
            p.domains.disconnect(lostDomain.name)
            // Also let the participant know not to attempt to reconnect to lostDomain
            p.domains.modify(lostDomain.name, _.copy(manualConnect = true))
          }
"lostDomain" has become unavailable and neither participant can connect anymore

Even though the domain is “the node that has broken”, recovering entails repairing the participants using the “newDomain” already set up. As of now, participant repairs have to be performed in an offline fashion requiring participants being repaired to be disconnected from the the new domain. However we temporarily connect to the domain, to let the topology state initialize, and disconnect only once the parties can be used on the new domain.

      Seq(participant1, participant2).foreach(_.domains.connect_local(newDomain))

      // Wait for topology state to appear before disconnecting again.
      utils.retry_until_true()(
        participant1.domains.active(newDomain.name) && participant2.domains.active(newDomain.name),
        "newDomain initialization timed out"
      )

      Seq(participant1, participant2).foreach(_.domains.disconnect(newDomain.name))

With the participants connected neither to “lostDomain” nor “newDomain”, each participant can

  • locally look up the active contracts assigned to the lost domain using the “testing.pcs_search” command made available via the “features.enable-testing-commands” configuration,

  • and invoke “repair.change_domain” (enabled via the “features.enable-preview-commands” configuration) in order to “move” the contracts to the new domain.

      // Extract participant contracts from "lostDomain".
      val contracts1 = participant1.testing.pcs_search(lostDomain.name, filterTemplate = "^Iou", activeSet = true)
      val contracts2 = participant2.testing.pcs_search(lostDomain.name, filterTemplate = "^Iou", activeSet = true)

      // Ensure that shared contracts match.
      val Seq(sharedContracts1, sharedContracts2) = Seq(contracts1, contracts2).map(
        _.filter {
          case (_isActive, contract)
              if contract.metadata.stakeholders.contains(Alice.toLf) && contract.metadata.stakeholders.contains(
                Bob.toLf) =>
            true
          case _ => false
        }.toSet
      )
      utils.retry_until_true(timeout = java.time.Duration.ZERO)(
        sharedContracts1.equals(sharedContracts2),
        s"Contracts don't match: Participant1 and participant2 operators need to coordinate to agree on a common set of contracts"
      )

      // Finally change the contracts from "lostDomain" to "newDomain"
      participant1.repair.change_domain(contracts1.map(_._2.contractId), lostDomain.name, newDomain.name)
      participant2.repair.change_domain(contracts2.map(_._2.contractId),
                                        lostDomain.name,
                                        newDomain.name,
                                        skipInactive = false)

Note

The code snippet above includes a check that the contracts shared among the participants match (as determined by each participant, “sharedContracts1” by “participant1” and “sharedContracts2” by “participant2). Should the contracts not match (as could happen if the participants had lost connectivity to the domain at different times), this check fails soliciting the participant operators to reach an agreement on the set of contracts. The agreed-upon set of active contracts may for example be

  • the intersection of the active contracts among the participants

  • or perhaps the union (for which the operators can use the “repair.add” command to create the contracts missing from one participant).

Also note that both the repair commands and the “testing.pcs_search” command are currently “preview” features, and therefore their names may change.

Once each participant has associated the contracts with “newDomain”, let’s have them reconnect, and we should be able to confirm that the new domain is able to execute workflows from where the lost domain disappeared.

    Seq(participant1, participant2).foreach(_.domains.reconnect(newDomain.name))

    // Look up a couple of contracts moved from lostDomain
    val Seq(iouAlice, iouBob) = Seq(participant1 -> Alice, participant2 -> Bob).map {
      case (participant, party) =>
        participant.ledger_api.acs.await[Iou.Iou](party, Iou.Iou, _.value.owner == party.toPrim)
    }

    // Ensure that we can create new contracts
    Seq(participant1 -> ((Alice, Bob)), participant2 -> ((Bob, Alice))).foreach {
      case (participant, (payer, owner)) =>
        participant.ledger_api.commands.submit_flat(
          Seq(payer),
          Seq(Iou.Iou(payer.toPrim, owner.toPrim, Iou.Amount(value = 200, currency = "USD"), List.empty).create.command)
        )
    }

    // Even better: Confirm that we can exercise choices on the moved contracts
    Seq(participant2 -> ((Bob, iouBob)), participant1 -> ((Alice, iouAlice))).foreach {
      case (participant, (owner, iou)) =>
        participant.ledger_api.commands
          .submit_flat(Seq(owner), Seq(iou.contractId.exerciseCall(owner.toPrim).command))
    }
"newDomain" has replaced "lostDomain"

In practice, we would now be in a position to remove the “lostDomain” from both participants and to disable the repair commands again to prevent accidental use of these “dangerously powerful” tools.

This guide has demonstrated how participants can recover from losing a domain that has been permanently lost or somehow become irreparably corrupted.