Getting Started

Getting Started

Haetae is incremental task runner.
The task can be test, lint, build, or anything. It can be used in any project, no matter what language, framework, test runner, linter/formatter, build system, or CI you use.

For now, in this Getting Started article, we are starting from an example of incremental testing.

Why?

Let's say you're building a calculator project, named 'my-calculator'.

my-calculator
├── package.json
├── src
│   ├── add.js
│   ├── exponent.js
│   ├── multiply.js
│   └── subtract.js
└── test
    ├── add.test.js
    ├── exponent.test.js
    ├── multiply.test.js
    └── subtract.test.js

The dependency graph is like this.


Dependeny graph of 'my-calculator'

exponent.js depends on multiply.js, which depends on add.js and so on.

When testing, we should take the dependency graph into account.
We do NOT have to test all files (*.test.js) for every single tiny change (Waste of your CI resources and time).
Rather, we should do it incrementally, which means testing only files affected by the changes.

For example, when multiply.js is changed, test only exponent.test.js and multiply.test.js.
When add.js is changed, test all files (exponent.test.js, multiply.test.js, subtract.test.js and add.test.js).
When test file (e.g. add.test.js) is changed, then just execute the test file itself (e.g. add.test.js).

Then how can we do it, automatically?
Here's where Haetae comes in.
By just a simple config, Haetae can automatically detect the dependency graph and test only affected files.
(In this article, Jest (opens in a new tab) is used just as an example. You can use any test runner. )

Installation

So, let's install Haetae. (Node 16 or higher is required.)
It doesn't matter whether your project is new or existing (Haetae can be incrementally adapted).
It's so good for monorepo as well. (Guided later in other part of docs.)
Literally any project is proper.


npm install --save-dev haetae
💡

Are you developing a library (e.g. plugin) for Haetae?
You can depend on @haetae/core, @haetae/utils, @haetae/git, @haetae/javascript, @haetae/cli independently. Note that the package haetae includes all of them.

Basic configuration

Now, we are ready to configure Haetae.
Let's create a config file haetae.config.js.

my-calculator
├── haetae.config.js # <--- Haetae config file
├── package.json
├── src # contents are omitted for brevity
└── test # contents are omitted for brevity
💡

Typescript Support
If you want to write the config in typescript, name it haetae.config.ts. Then install ts-node (opens in a new tab), which is an optional peerDependencies of haetae (from @haetae/core).

💡

CJS/ESM
Haetae supports both CJS and ESM project.
Haetae is written in ESM, but it can be used in CJS projects as well, as long as the config file is ESM. If your project is CJS, name the config file haetae.config.mjs or haetae.config.mts. If your project is ESM, name the config file haetae.config.js or haetae.config.ts.

We can write it down like this.
Make sure you initialized git. Haetae can be used with any other version control systems, but using git is assumed in this article.

haetae.config.js
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  // Other options are omitted for brevity.
  commands: {
    myTest: {
      run: async () => {
        // An array of changed files
        const changedFiles = await git.changedFiles()
        // An array of test files that (transitively) depend on changed files
        const affectedTestFiles = await js.dependOn({
          dependents: ['**/*.test.js'], // glob pattern
          dependencies: changedFiles,
        })
 
        if (affectedTestFiles.length > 0) {
          // Equals to "pnpm jest /path/to/foo.test.ts /path/to/bar.test.ts ..."
          // Change 'pnpm' and 'jest' to your package manager and test runner.
          await $`pnpm jest ${affectedTestFiles}`
        }
      },
    },
  },
})

Multiple APIs are used in the config file above.
They all have various options (Check out API docs). But we are going to use their sensible defaults for now.

The Tagged Template Literal (opens in a new tab) $ on line number 19 can run arbitrary shell commands. If it receives a placeholder (${...}) being an array, it automatically joins a whitespace (' ') between elements. It has other traits and options as well. Check out the API docs for more detail.

import { $, utils } from 'haetae'
 
// The following three lines of code have same effects respectively
await $`pnpm jest ${affectedTestFiles}`
await $`pnpm jest ${affectedTestFiles.join(' ')}`
// $ is a wrapper of utils.exec.
// Use utils.exec if you need a function.
// utils.exec may be easier to pass non-default options
await utils.exec(`pnpm jest ${affectedTestFiles.join(' ')}`)

In the above config, pnpm jest is used in $. Just change them to your package manager and test runner.

💡

Credit to google/zx
$ as a Tagged Template Literal is inspired by google/zx (opens in a new tab). Thanks!

Then run haetae like below.

$ haetae myTest
(Unless you installed haetae globally, you should execute it through package manager (e.g. pnpm haetae myTest))

Note that myTest in the command above is the name of the command we defined in the config file. You can name it whatever you want. And as you might guess, you can define multiple commands (e.g. myLint, myBuild, myIntegrationTest, etc) in the config file.

It will print the result like this.

terminal
✔  success   Command myTest is successfully executed.
 
⎡ 🕗 time: 2023 May 28 11:06:06 Asia/Seoul 1(timestamp: 1685239566483)
⎜ 🌱 env: {}
#️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
"@haetae/git":
⎜        commit: 979f3c6bcafe9f0b81611139823382d615f415fd
⎜        branch: main
⎣        specVersion: 1

As this is the first time of running the command haetae myTest, git.changedFiles() in the config returns every file tracked by git in your project as changed files (There are options. Check out API docs after reading this article). This behavior results in running all of the tests.

js.dependOn() understands direct or transitive dependencies between files, by parsing import or require(), etc. So it can be used to detect which test files (transitively) depend on at least one of the changed files.

💡

js.dependOn can detect multiple formats
ES6+, CJS, TypeScript, JSX, Webpack, CSS Preprocessors(Sass, Scss, Stylus, Less), PostCSS are supported. For node, Subpath Imports (opens in a new tab) and Subpath Exports (opens in a new tab) are also supported. For TypeScript, Path Mapping (opens in a new tab) is also supported. If you use Typescript or Webpack, check out the API docs and pass additional options like options.tsConfig and/or options.webpackConfig.

💡

js.dependOn vs js.dependsOn vs utils.dependOn vs utils.dependsOn
There are severel APIs of simliar purposes.

Check out the API docs later for more detail.

Note that it cannot parse dynamic imports (import()). Dynamic or extra dependencies can be specified as additionalGraph option, explained later in this article.

my-calculator
├── .haetae/store.json # <--- Generated. Haetae store file
├── haetae.config.js
├── package.json
├── src
└── test

May you have noticed, the store file .haetae/store.json is generated. It stores history of Haetae executions, which makes incremental tasks possible. For example, the commit ID 979f3c6 printed from our first execution example above is the current git HEAD haetae myTest ran on. This information is logged in the store file to be used later.

Detecting the last commit Haetae ran on successfully

Let's say we made some changes and added 2 commits.


Commit history after the first running of Haetae

979f3c6 is the last commit Haetae ran on successfully.
0c3b3cc and 1d17a2f are new commits after that.
What will happen when we run Haetae again?

$ haetae myTest

This time, only exponent.test.js and multiply.test.js are executed. That's because git.changedFiles() automatically returns only the files changed since the last successful execution of Haetae.

For another example, if you modify add.js, then all tests will be executed, because js.dependOn() detects dependency transitively.

If you modify add.test.js, only the test file itself add.test.js will be executed, as every file is treated as depending on itself.

terminal
✔  success   Command myTest is successfully executed.
 
⎡ 🕗 time: 2023 May 28 19:03:25 Asia/Seoul (timestamp: 1685268205443)
⎜ 🌱 env: {}
#️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
"@haetae/git":
⎜        commit: 1d17a2f2d75e2ac94f31e53376c549751dca85fb
⎜        branch: main
⎣        specVersion: 1

Accordingly, the new commit 1d17a2f is logged in the store file.

The output above is an example of successful task. Conversely, if the test fails, pnpm jest <...>, which we gave to $ in the config, exits with non-zero exit code. This lets $ throws an error. So myTest.run() is not completed successfully, causing the store file is not renewed.

This behavior is useful for incremental tasks. The failed test (or any incremental task) will be re-executed later again until the problem is fixed.

env configuration

Sometimes we need to separate several environments.

Simple environment variable example

For example, logic of your project might act differently depending on the environment variable $NODE_ENV. So, the history of an incremental task also should be recorded for each environment in a separated manner. Let's add env to the config file to achieve this.

haetae.config.js
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  commands: {
    myTest: {
      env: { // <--- Add this
        NODE_ENV: process.env.NODE_ENV,
      },
      run: async () => { /* ... */ },
    },
  },
})

The key name NODE_ENV is just an example. You can name it as you want.

From now on, the store file will manage the history of each environment separately. For example, if $NODE_ENV can have two values, 'development' or 'production', then Haetae will manage two incremental histories for each environment.

You don't have to care about the past history of myTest executed without env. When a command is configured without env, it's treated as if configured with env: {}, which is totally fine. So there will be 3 envs to be recorded in the store file:

  • {}
  • { NODE_ENV: 'production' }
  • { NODE_ENV: 'development' }

Though we changed the schema of env in the config from {} to { NODE_ENV: 'development' | 'production' }, the history of env: {} already recorded in the store file is NOT automatically deleted. It just stays in the store file. This behavior is safe because incremental histories are managed per env. So don't worry about the past's vestige. If you care about disk space, configuring the auto-removal of some obsolete history is guided later in this article.

Multiple keys

You can add more keys in env object.
For instance, let's change the config to this.

haetae.config.js
import assert from 'node:assert/strict' // `node:` protocol is optional
import { $, configure, git, utils, js, pkg } from 'haetae'
import semver from 'semver'
 
export default configure({
  commands: {
    myTest: {
      env: async () => { // <--- Changed to async function from object
        assert(['development', 'production'].includes(process.env.NODE_ENV))
        return {
          NODE_ENV: process.env.NODE_ENV,
          jestConfig: await utils.hash(['jest.config.js']),
          jest: (await js.version('jest')).major,
          branch: await git.branch(),
          os: process.platform,
          node: semver.major(process.version),
          haetae: pkg.version.major,
        }
      },
      run: async () => { /* ... */ },
    },
  },
})

The object has more keys than before, named jestConfig, jest, branch and so on. In this example, if any of $NODE_ENV, Jest config file, major version of Jest, git branch, OS platform, major version of Node.js, or major version of the package haetae is changed, it's treated as a different environment.

And now env becomes a function. You can even freely write any additional code in it, like assertion (assert()) in line number 9 above. myTest.env() is executed before myTest.run(). When an error is thrown in myTest.env(), myTest.run() is not executed, and the store file is not renewed. This is intended design for incremental tasks.

If you just want to check the value the env function returns, you can use -e, --env option. This does not write to the store file, but just prints the value.

terminal
$ haetae myTest --env
 
✔  success   Current environment is successfully evaluated for the command myTest
 
⎡ env:
⎜   NODE_ENV: development
⎜   jestConfig: 642645d6bc72ab14a26eeae881a0fc58e0fb4a25af31e55aa9b0d134160436eb
⎜   jest: 29
⎜   branch: main
⎜   os: darwin
⎜   node: 18
⎜   haetae: 0
⎣ envHash: 203ceac1714279231e82d91614f2ebe50f5b1a7a

Additional dependency graph

Until now, js.dependOn() is used for automatic detection of dependency graph. But sometimes, you need to specify some dependencies manually.

Simple integration test

For example, let's say you're developing a project communicating with a database.

your-project
├── haetae.config.js
├── package.json
├── src
│   ├── external.js
│   ├── logic.js
│   └── index.js
└── test
    ├── data.sql
    ├── external.test.js
    ├── logic.test.js
    └── index.test.js

The explicit dependency graph is like this.
logic.js contains business logic, including communicating with a database.
external.js communicates with a certain external service, regardless of the database.


Dependency graph

But there is an SQL file named data.sql for an integration test. It's not (can't be) imported (e.g. import, require()) by any source code file obviously.

Let Haetae think logic.js depends on data.sql, by additionalGraph.

haetae.config.js
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  commands: {
    myTest: {
      env: { /* ... */ },
      run: async () => {
        const changedFiles = await git.changedFiles()
        // A graph of additional dependencies specified manually
        const additionalGraph = await utils.graph({
          edges: [
            {
              dependents: ['src/logic.js'],
              dependencies: ['test/data.sql'],
            },
          ],
        })
        const affectedTestFiles = await js.dependOn({
          dependents: ['**/*.test.js'],
          dependencies: changedFiles,
          additionalGraph, // <--- New option
        })
        if (affectedTestFiles.length > 0) {
          await $`pnpm jest ${affectedTestFiles}`
        }
      },
    },
  },
})

Then the implicit dependency graph becomes explicit.


Dependency graph with .env

From now on, when the file data.sql is changed, index.test.js and logic.test.js. are executed. As external.test.js doesn't transitively depend on data.sql, it's not executed.

Unlike this general and natural flow, if you decide that index.test.js should never be affected by data.sql, you can change the config.

haetae.config.js
// Other content is omitted for brevity
const additionalGraph = await utils.graph({
  edges: [
    {
      dependents: ['test/logic.test.js'], // 'src/logic.js' to 'test/logic.test.js'
      dependencies: ['test/data.sql'],
    },
  ],
})

By this, data.sql doesn't affect index.test.js anymore.
But I recommend this practice only when you're firmly sure that index.test.js will not be related to data.sql. Because, otherwise, you should update the config again when the relation is changed.

env vs additionalGraph

The effect of addtionalGraph is different from env. env is like defining parallel universes, where history is recorded separately.

If you place data.sql in env (e.g. with utils.hash()) instead of additonalGraph, every test file will be executed when data.sql changes, unless the change is a rollback to past content which can be matched with a past value of env logged in the store file (.haetae/store.json).

external.js and external.test.js are regardless of database. That's why data.sql is applied as addtionalGraph, not as env.

But that's case by case. In many situations, env is beneficial.

  1. If data.sql affects 'most' of your integration test files,

or

  1. If which test file does and doesn't depend on data.sql is not clear or the relations change frequently,

or

  1. If data.sql is not frequently changed,

then env is a good place.

haetae.config.js
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  commands: {
    myTest: {
      env: async () => ({
        testData: await utils.hash(['test/data.sql']),
      }),
      run: async () => { /* ... */ }, // without `additionalGraph`
    },
  },
})

Cartesian product

You can specify the dependency graph from a chunk of files to another chunk.

haetae.config.js
// Other content is omitted for brevity
const additionalGraph = await utils.graph({
  edges: [
    {
      dependents: ['test/db/*.test.js'],
      dependencies: [
        'test/docker-compose.yml',
        'test/db/*.sql',
      ],
    },
  ],
})

This means that any test file under test/db/ depends on any SQL file under test/db/ and test/docker-compose.yml.


Additional Dependency Graph Cartesian Product

Distributed notation

You don't have to specify a dependent's dependencies all at once. It can be done in a distributed manner.

haetae.config.js
// Other content is omitted for brevity
const additionalGraph = await utils.graph({
  edges: [
    {
      dependents: ['foo', 'bar'],
      dependencies: ['one', 'two'],
    },
    {
      dependents: ['foo', 'qux'], // 'foo' appears again, and it's fine
      dependencies: ['two', 'three', 'bar'], // 'two' and 'bar' appear again, and it's fine
    },
    {
      dependents: ['one', 'two', 'three'],
      dependencies: ['two'], // 'two' depends on itself, and it's fine
    },
    {
      dependents: ['foo'],
      dependencies: ['one'], // 'foo' -> 'one' appears again, and it's fine
    },
  ],
})

On line number 13-14, we marked two depending on two itself. That's OK, as every file is treated as depending on itself. So foo depends on foo. bar also depends on bar, and so on.


Additional Dependency Graph Distributed Notation

Circular dependency

Haetae supports circular dependency as well. Although circular dependency is, in general, considered not a good practice, it's fully up to you to decide whether to define it. Haetae does not prevent you from defining it.

haetae.config.js
// Other content is omitted for brevity
const additionalGraph = await utils.graph({
  edges: [
    {
      dependents: ['index.js'],
      dependencies: ['foo'],
    },
    {
      dependents: ['foo'],
      dependencies: ['bar'],
    },
    {
      dependents: ['bar'],
      dependencies: ['index.js'],
    },
  ],
})

Circular dependency graph

Assume the relations between index.js, foo, and bar are given by additionalGraph, and the rests are automatically detected.

In this situation, index.test.js is executed when any of files, except utils.test.js, are changed, including foo, and bar.
On the other hand, utils.test.js is executed only when utils.js or utils.test.js itself is changed.

💡

More APIs not covered
There're more APIs related to dependency graph, like js.graph, js.deps, utils.deps, utils.mergeGraph, etc. This article doesn't cover them all. Check out the API docs for more detail.

Record Data

Haetae has a concept of 'Record' (type: core.HaetaeRecord) and 'Record Data' (type: core.HaetaeRecord.data).

In the previous sections, we've already seen terminal outputs like this.

terminal
$ haetae myTest
 
✔  success   Command myTest is successfully executed.
 
⎡ 🕗 time: 2023 May 28 11:06:06 Asia/Seoul (timestamp: 1685239566483)
⎜ 🌱 env: {}
#️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
"@haetae/git":
⎜        commit: 979f3c6bcafe9f0b81611139823382d615f415fd
⎜        branch: main
⎣        specVersion: 1

This information is logged in the store file (.haetae/store.json), and called 'Record'. The data field is called 'Record Data'. Let's check them out.

$ cat .haetae/store.json

The output is like this.

terminal
{
  "specVersion": 1,
  "commands": {
    "myTest": [
      {
        "data": {
          "@haetae/git": {
            "commit": "1d17a2f2d75e2ac94f31e53376c549751dca85fb",
            "branch": "main",
            "specVersion": 1
          }
        },
        "env": {},
        "envHash": "bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f",
        "time": 1685239566483
      },
      {
        "data": {
          "@haetae/git": {
            "commit": "a4f4e7e83eedbf2269fbf29d91f08289bdeece91",
            "branch": "main",
            "specVersion": 1
          }
        },
        "env": {
          "NODE_ENV": "production"
        },
        "envHash": "4ed28f8415aeb22c021e588c70d821cb604c7ae0",
        "time": 1685458529856
      },
      {
        "data": {
          "@haetae/git": {
            "commit": "442fefc582889bdaee5ec2bd8b74804680fc30ee",
            "branch": "main",
            "specVersion": 1
          }
        },
        "env": {
          "NODE_ENV": "development"
        },
        "envHash": "2b580e42012efb489cdea43194c9dd6aed6b77d8",
        "time": 1685452061199
      },
      {
        "data": {
          "@haetae/git": {
            "commit": "ef3fdf88e9fad90396080335096a88633fbe893f",
            "branch": "main",
            "specVersion": 1
          }
        },
        "env": {
          "jestConfig": "642645d6bc72ab14a26eeae881a0fc58e0fb4a25af31e55aa9b0d134160436eb",
          "jest": 29,
          "branch": "main",
          "os": "darwin",
          "node": 18,
          "haetae": 0
        },
        "envHash": "62517924fb2c6adb38b4f30ba75a513066f5ac80",
        "time": 1685455507556
      },
      {
        "data": {
          "@haetae/git": {
            "commit": "7e3b332f0657272cb277c312ff25d4e1145f895c",
            "branch": "main",
            "specVersion": 1
          }
        },
        "env": {
          "testData": "b87b8be8df58976ee7da391635a7f45d8dc808357ff63fdcda699df937910227"
        },
        "envHash": "7ea1923c8bad940a97e1347ab85abd4811e82531",
        "time": 1685451151035
      }
    ]
  }
}
 
💡

Env Hash
The field envHash is SHA-1 of env object. The env object is serialized by a deterministic method no matter how deep it is, and calculated as a hash. The hash is used to match the current env with previous records. SHA-1 is considered insecure to hide information, but good enough to prevent collision for history comparison. For example, git also uses SHA-1 as a commit ID. When your Env or Record Data contains a confidential field and you're worrying what if the store is leaked, you can preprocess secret fields with a stronger cryptographic hash algorithm, like SHA-256 or SHA-512. The practical guide with utils.hash() is explained just in the next section.

💡

recordRemoval.leaveOnlyLastestPerEnv of localFileStore
By default, you're using localFileStore as a 'Store Connector'. localFileStore stores records into a file (.haetae/store.json). The option recordRemoval.leaveOnlyLastestPerEnv is true by default. So only the last records per env exist in the store file. This is useful when you only depend on the latest Records. To utilize further past Records, you can set the option false. Changing or configuring 'Store Connector' is guided later.

5 Records are found in total. These are what we've done in this article so far. Each of these is the last history of Records executed in each env respectively. For example, the command myTest was executed with env: {} on several commits, and 1d17a2f is the last commit.

Custom Record Data

Configuration files for your application is a good example showing the usefulness of Record Data. I mean a config file not for Haetae, but for your project itself. To say, dotenv (.env), .yaml, .properties, .json, etc.

Usually, an application config file satisfies these 2 conditions.

  1. It's not explicitly imported (e.g. import, require()) in the source code. Rather, the source code 'reads' it on runtime. ---> additionalGraph or env are useful.
  2. It's ignored by git. ---> 'Record Data' is useful.

Let's see how it works, with a simple example project using .env as the application config.

💡

dotenv
.env is a configuration file for environment variables, and NOT related to Haetae's env at all.

your-project
├── .env # <--- dotenv file
├── .gitignore # <--- ignores '.env' file
├── haetae.config.js
├── package.json
├── src
│   ├── config.js
│   ├── utils.js
│   ├── logic.js
│   └── index.js
└── test
    ├── utils.test.js
    ├── logic.test.js
    └── index.test.js

src/config.js reads the file .env, by a library dotenv (opens in a new tab) for example.

src/config.js
import { config } from 'dotenv'
 
config()
 
export default {
  port: process.env.PORT,
  secretKey: process.env.SECRET_KEY,
}

Let's assume logic.js gets the value of environment variables through config.js, not directly reading from .env or process.env. The explicit source code dependency graph is like this.


Dependency graph

Let Haetae think config.js depends on .env.

haetae.config.js
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  commands: {
    myTest: {
      env: { /* ... */ },
      run: async () => {
        const changedFiles = await git.changedFiles()
        const additionalGraph = await utils.graph({
          edges: [
            {
              dependents: ['src/config.js'],
              dependencies: ['.env'],
            },
          ],
        })
        const affectedTestFiles = await js.dependOn({
          dependents: ['**/*.test.js'],
          dependencies: changedFiles,
          additionalGraph,
        })
        if (affectedTestFiles.length > 0) {
          await $`pnpm jest ${affectedTestFiles}`
        }
      },
    },
  },
})

Then the implicit dependency graph becomes explicit.


Dependency graph with .env

But that's not enough, because .env is ignored by git. git.changedFiles() cannot detect if .env changed or not.

Let's use 'Record Data' to solve this problem. Add these into the config file like this.

haetae.config.js
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  commands: {
    myTest: {
      env: { /* ... */ },
      run: async ({ store }) => {
        const changedFiles = await git.changedFiles()
        const previousRecord = await store.getRecord()
        const dotenvHash = await utils.hash(['.env'])
        if (previousRecord?.data?.dotenv !== dotenvHash) {
          changedFiles.push('.env')
        }
        const additionalGraph = await utils.graph({
          edges: [
            {
              dependents: ['src/config.js'],
              dependencies: ['.env'],
            },
          ],
        })
        const affectedTestFiles = await js.dependOn({
          dependents: ['**/*.test.js'],
          dependencies: changedFiles,
          additionalGraph,
        })
 
        if (affectedTestFiles.length > 0) {
          await $`pnpm jest ${affectedTestFiles}`
        }
        return {
          dotenv: dotenvHash
        }
      },
    },
  },
})

Now, we return an object from myTest.run. Let's execute it.

terminal
$ haetae myTest
 
✔  success   Command myTest is successfully executed.
 
⎡ 🕗 time: 2023 Jun 08 09:23:07 Asia/Seoul (timestamp: 1686183787453)
⎜ 🌱 env: {}
#️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
"@haetae/git":
⎜        commit: ac127da6531efa487b8ee35451f24a70dc58aeea
⎜        branch: main
⎜        specVersion: 1
⎣      dotenv: 7f39224e335994886c26ba8c241fcbe1d474aadaa2bd0a8e842983b098cea894

Do you see the last line? The value we returned from myTest.run is recorded in the store file, as part of Record Data.

💡

Hash confidential
utils.hash() is good for secrets like a dotenv file. By default, it hashes by SHA-256, and you can simply change the cryptographic hash algorithm by its options, like to SHA-512 for example. Thus, you do not need to worry about if the store file is leaked.

This time, .env was treated as a changed file, as the key dotenv did not exist from previousRecord.

haetae.config.js
// Other content is omitted for brevity
if (previousRecord?.data?.dotenv !== dotenvHash) {
  changedFiles.push('.env')
}

Therefore, index.test.js and logic.test.js, which transitively depend on .env, are executed.

If you run Haetae again immediately,

terminal
$ haetae myTest

This time, no test is executed, as nothing is considered changed. .env is treated as not changed, thanks to the Record Data.

From now on, though the file .env is ignored by git, changes to it are recorded by custom Record Data. So it can be used in incremental tasks.

Reserved Record Data

We can enhance the workflow further.

haetae.config.js
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  commands: {
    myTest: {
      env: { /* ... */ },
      run: async () => {
        const changedFiles = await git.changedFiles()
        const changedFilesByHash = await utils.changedFiles(['.env'])
        changedFiles.push(...changedFilesByHash)
        const additionalGraph = await utils.graph({
          edges: [
            {
              dependents: ['src/config.js'],
              dependencies: ['.env'],
            },
          ],
        })
        const affectedTestFiles = await js.dependOn({
          dependents: ['**/*.test.js'],
          dependencies: changedFiles,
          additionalGraph,
        })
 
        if (affectedTestFiles.length > 0) {
          await $`pnpm jest ${affectedTestFiles}`
        }
        // No return value
      },
    },
  },
})

We return nothing here.
We do not calculate hash by ourselves.
But this has the same effect as what we've done in the previous section.

terminal
$ haetae myTest
 
✔  success   Command myTest is successfully executed.
 
⎡ 🕗 time: 2023 Jun 11 00:27:40 Asia/Seoul (timestamp: 1686410860187)
⎜ 🌱 env: {}
#️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
"@haetae/git":
⎜        commit: 018dd7e0c65c3a9d405485df7949ef75ff96e757
⎜        branch: main
⎜        specVersion: 1
"@haetae/utils":
⎜        files:
⎜          .env: 7f39224e335994886c26ba8c241fcbe1d474aadaa2bd0a8e842983b098cea894
⎣        specVersion: 1

You can see the hash of .env is recorded. utils.changedFiles automatically writes hash in Record Data, and compares the current hash with the previous one.

How is this possible? There's a concept of Reseved Record Data. If you call core.reserveRecordData, you can 'reserve' Record Data without directly returning custom Record Data from the command's run function. git.changedFiles and utils.changedFiles call core.reserveRecordData internally.

This mechanism can be especially useful for sharable generic features, like a 3rd-party library for Haetae. For that, it's important to avoid naming collision. Record Data can have arbitrary fields. So Haetae uses a package name as a namespace by convention. '@haetae/git' and '@haetae/utils' keys in Record Data are namespaces to avoid such a collision.

💡

Multiple Reserved Record Data
All Reserved Record Data are saved in the list reservedRecordDataList. The list is to be merged by deepmerge (opens in a new tab).

utils.changedFiles is more useful for multiple files. Let's say you have multiple dotenv files per environment, unlike the previous assumption. For example, .env.local, .env.development, and .env.staging are targets to test. Now, config.js reads .env.${process.env.ENV}, where $ENV is an indicater of environment: 'local', 'development' or 'staging'.

Then we can modify the config file like this.

haetae.config.js
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  commands: {
    myTest: {
      env: { /* ... */ },
      run: async () => {
        const changedFiles = await git.changedFiles()
        const changedFilesByHash = await utils.changedFiles(
          ['.env.*'], // or explicit glob pattern ['.env.{local,development,staging}']
          {
            renew: [`.env.${process.env.ENV}`],
          },
        )
        changedFiles.push(...changedFilesByHash)
        const additionalGraph = await utils.graph({
          edges: [
            {
              dependents: ['src/config.js'],
              dependencies: [`.env.${process.env.ENV}`],
            },
          ],
        })
        const affectedTestFiles = await js.dependOn({
          dependents: ['**/*.test.js'],
          dependencies: changedFiles,
          additionalGraph,
        })
 
        if (affectedTestFiles.length > 0) {
          await $`pnpm jest ${affectedTestFiles}`
        }
      },
    },
  },
})

renew is a list of files (or glob pattern) that will be renewed (if changed) by their current hash. By default, renew is equal to all files(['env.*']) we gave as the argument. In our config, by limiting it to .env.${process.env.ENV}, you only renew the single dotenv file.

Let's say currently $ENV is 'local'. Obviously, .env.local, .env.development, and .env.staging are compared to the previous hashes. If changes are detected, included in the result array.

But regardless of it, .env.development, and .env.staging are not renewed in the new Record Data. Their previous hashes will be written in the new Record instead of current hashes.

This behavior can be good for our test in many scenarios.

For instance, you may modify .env.development when $ENV is 'local'. As it's not in renew list, the hash of .env.development is not updated. When later $ENV becomes 'development', utils.changedFiles would still think .env.development is a changed file, as the current hash and previously recorded hash are different. This makes sure test files are to be executed when $ENV becomes 'development'. renew exists for the discrepancy between when the physical change actually happens and when the detection of the change is needed.

utils.changedFiles has many more options, and acts in a sophisticated way.

For example, by an option keepRemovedFiles, which is not introduced above, you can handle cases like when not all of the files might exist on the filesystem at the same time and only a few of them are dynamically used in incremental tasks. For instance, a CI workflow might have access to only .env.development at a certain time, while it might have access to only .env.staging at another time. And you may still want the incremental history not separated but shared between the two cases. That's where keepRemovedFiles comes in.

utils.changedFiles is not covered thoroughly here. Check out the API docs for more detail.

There's one more thing to take care of utils.changedFiles. You should NOT give a dynamic files argument to it. Otherwise, a file would be treated as changed every time the dynamic argument changes.

haetae.config.js
// Other content is omitted for brevity
const changedFilesByHash = await utils.changedFiles(
  [`.env.${process.env.ENV}`] // <--- Anti-pattern
)

The snippet above lets only a single file to be recorded. So, if $ENV is changed, the previous file is no longer recorded. This has no safety problem, but reduces incrementality.

Therefore you should list all of the candidates, like ['.env.*'].

Root Env and Root Record Data

Haetae has a concept of 'Root Env' (type: core.RootEnv) and 'Root Record Data' (type: core.RootRecordData). They are decorater (opens in a new tab)-like transformers for the return value of env and run of every command.

haetae.config.js
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  recordData: async (data) => ({ // <--- 'Root Record Data'
    hello: data.hello.toUpperCase(),
  }),
  commands: {
    myGreeting: {
      run: () => ({ hello: 'world' }),
    },
  },
})
terminal
$ haetae myGreeting
 
✔  success   Command myGreeting is successfully executed.
 
⎡ 🕗 time: 2023 Jun 14 15:49:52 Asia/Seoul (timestamp: 1686725392672)
⎜ 🌱 env: {}
#️⃣ envHash: bf21a9e8fbc5a3846fb05b4fa0859e0917b2202f
⎜ 💾 data:
⎣      hello: WORLD # <--- capitalized

Let's get into a more practical example.
You may want the config file's hash to be automatically recorded into every command's env.

haetae.config.js
import * as url from 'node:url'
import { $, configure, git, utils, js } from 'haetae'
 
export default configure({
  env: async (env) => ({ // <--- 'Root Env'
    ...env,
    // Equals to => await utils.hash(['haetae.config.js']),
    haetaeConfig: await utils.hash([url.fileURLToPath(import.meta.url)]),
  }),
  commands: {
    myGreeting: {
      env: {
        NODE_ENV: process.env.NODE_ENV
      },
      run: () => { /* ... */ }
    },
  },
})

By Root Env, it's done in a single place.

terminal
$ haetae myGreeting --env
 
✔  success   Current environment is successfully evaluated for the command myGreeting
 
⎡ env:
⎜   NODE_ENV: development
⎜   haetaeConfig: f7c12d5131846a5db496b87cda59d3e07766ed1bde8ed159538e85f42f3a8dae
⎣ envHash: e9422335258f9338b7205d11aafdb329bb008f7a

By the way, you can go even thoroughly. js.deps lists every direct and transitive dependency.

haetae.config.js
// Other content is omitted for brevity
haetaeConfig: await utils.hash(
  await js.deps({ entrypoint: url.fileURLToPath(import.meta.url) }),
),

This snippet calculates a hash of the config file and its dependencies. For example, if you import a.js into haetae.config.js, and a.js depends on b.js, then the hash is calculated against the three files: haetae.config.js, a.js, and b.js. When hashing multiple files, a single-depth Sorted Merkle Tree is used. Check out the API docs for more detail. If you don't import other modules in the config, this is not necessary.